wwPDB 2014 News
Improved Representation of Large Structures in the PDB Archive
To meet the challenges of greater numbers of PDB depositions, involving ever larger and more complex structures, often determined using multiple methods, wwPDB has developed a new software system for structure deposition and annotation. This new system, currently in production for X-ray crystallographic entries, is based on the PDBx/mmCIF format and can thus handle structures of any size. Large structures have been processed and released intact by wwPDB since May 2013.
As announced previously, wwPDB has now combined the structures that were historically "split" across multiple entries into single PDBx/mmCIF files. Each combined file has been issued a new PDB identifier. This constitutes a significant improvement in the representation of large entries in the PDB archive.
The combined files will be made publicly available through the PDB archive on July 9, 2014. However, for a transition period of 6 months, these large PDBx/mmCIF files will be available only in a separate ftp area (see below) so they can be thoroughly reviewed and tested by the PDB user community. During this transition period, the original "split" PDB-format files will remain available from the active archive. The relationship between the split and combined entries will be described in the combined PDBx/mmCIF file in the item pdbx_database_related.
After the transition period, the combined PDBx/mmCIF files will be moved from the separate large_structures directory into the main PDB FTP archive. At that time, the multiple "split" entries will be taken out of the active archive, and searches of the archive at all wwPDB member sites using the PDB identifier of a split entry will return the new "combined" entry that represents the full structure.
Developers and users should note that after the transition period, PDBx/mmCIF will be the definitive format used for archive distribution. Due to the limitations of the old PDB format it cannot be used to represent large structures and therefore PDB files will only be provided on a best-effort basis, outside of the active archive. Developers and users who process the entire archive should therefore make sure that their software supports PDBx/mmCIF towards the end of 2014. Resources for this are described at http://mmcif.wwpdb.org.
The wwPDB has convened a Working Group for PDBx/mmCIF Data Deposition, chaired by Paul Adams, that includes representatives from the major providers of X-ray structure-refinement software. To ease the transition from the PDB file format to PDBx/mmCIF, the Working Group has made recommendations about essential format extensions required for large structures that have been used in creating the consolidated data files. These include:
PDBx/mmCIF files suitable for deposition can be created with recent versions of the CCP4 (REFMAC 5.8) and Phenix (1.8.2) software packages. Both packages support the above extensions for large structures.
Users and developers with questions about the new deposition system or the procedures for handling large structures should contact firstname.lastname@example.org.
During the transition period, the combined large-structure files will be available from:
PDB Reaches a New Milestone: 100,000+ EntriesBuilding a Community Resource, The Early Structures, Launching Tools for the Next Generation)
With this week's update, the PDB archive contains a record 100,147 entries.
Established in 1971, this central, public archive has reached this critical milestone thanks to the efforts of structural biologists throughout the world who contribute their experimentally-determined protein and nucleic acid structure data.
Four wwPDB data centers support online access to three-dimensional structures of biological macromolecules that help researchers understand many facets of biomedicine, agriculture, and ecology, from protein synthesis to health and disease to biological energy. The archive is quite large, containing more than 1,000,000 files related to these PDB entries that require more than 249 GBbytes of storage.Function follows form
In the 1950s, scientists had their first direct look at the structures of proteins and DNA at the atomic level. Determination of these early three-dimensional structures by X-ray crystallography ushered in a new era in biology-one driven by the intimate link between form and biological function. As the value of archiving and sharing these data were quickly recognized by the scientific community, the Protein Data Bank (PDB) was established as the first open access digital resource in all of biology by an international collaboration in 1971 with data centers located in the US and the UK.
Among the first structures deposited in the PDB were those of myoglobin and hemoglobin, two oxygen-binding molecules whose structures were elucidated by Chemistry Nobel Laureates John Kendrew and Max Perutz. With this week's regular update, the PDB welcomes 219 new structures into the archive. These structures join others vital to drug discovery, bioinformatics and education.
The PDB is growing rapidly, doubling in size since 2008, and releasing around 200 new structures to the scientific community every week. The resource is accessed hundreds of millions of times annually by researchers, students, and educators intent on exploring how different proteins are related to one another, to clarify fundamental biological mechanisms and discover new medicines.
"The PDB is a critical resource for the international community of working scientists which includes everyone from geneticists to pharmaceutical companies interested in drug targets," said Nobel laureate Venki Ramakrishnan of the MRC Laboratory of Molecular Biology in Cambridge, UK.A growing community
Since its inception, the PDB has been a community-driven enterprise, evolving into a mission critical international resource for biological research. Since 2003 the Worldwide PDB (wwPDB) organization, a collaboration involving four PDB data centers in the US, UK, and Japan, has ensured that these valuable data are securely stored, expertly managed, and made freely available for the benefit of scientists and educators around the globe. wwPDB data centers work closely with community experts to define deposition and annotation policies, resolve data representation issues, and implement community validation standards. In addition, the wwPDB works to raise the profile of structural biology with increasingly broad audiences.
Each structure submitted to the archive is carefully curated by wwPDB staff before release. New depositions are checked and enhanced with value-added annotations and linked with other important biological data to ensure that PDB structures are discoverable and interpretable by users with a wide range of backgrounds and interests.Future challenges
The scientific community eagerly awaits the next 100,000 structures and the invaluable knowledge these new data will bring. However, the increasing number, size and complexity of biological data being deposited in the PDB and the emergence of hybrid structure determination methods, which use a variety of biophysical, biochemical, and modelling techniques to determine the shapes of biologically relevant molecules, constitute major challenges for the management and representation of structural data. wwPDB will continue to work with the community to meet these challenges and ensure that the archive maintains the highest possible standards of quality, integrity, and consistency.
Email email@example.com to contact the wwPDB.
The Road to 100,000 Entries: Launching Tools for the Next GenerationBuilding a Community Resource and The Early Structures)
Many structures deposited to the PDB today require special tools and processes for data capture and annotation in order to ensure the best representation in the archive. To meet the challenges posed by large structures, complex chemistry, and use of multiple experimental methods, the wwPDB is launching a new Deposition and Annotation System that will allow the partners to meet the evolving needs of the scientific community over the next decade.
Since its initial launch at the end of January, >750 X-ray crystallographic structures from 30 countries have already been deposited using the new system.
For data depositors, new or enhanced features include the generation of X-ray validation reports (following the recommendations of the wwPDB X-ray Validation Task Force), improved capture and review of ligand information, the ability to replace coordinate and/or experimental data files pre- and post-submission, improved communication process between depositors and wwPDB curators, and the ability to preview and download the PDBx/mmCIF entry file prior to submission.
For data users, the new annotation system greatly improves the efficiency and consistency of data processing.
The system will replace all current deposition and annotation systems in use at the wwPDB deposition centers.http://deposit.wwpdb.org/deposition/.
The Road to 100,000 Entries: The Early Structures
Scientists first began to decipher the 3D structure of proteins at the level of individual atoms using X-ray crystallography in the 1950s. These views of the structures of myoglobin,1,2 hemoglobin,3 lysozyme,4,5 and ribonuclease6,7 provided unexpected insights into the regularities and similarities of proteins, and relationships between sequence, structure and function, and evolution. The enormous potential for scientific discovery and understanding was recognized early and rewarded with several Nobel prizes.8 These early structures also inspired a new field of scientific endeavor: molecular structural biology.wwPDB's 2014 calendar highlights the first entries deposited to the PDB archive.
Beginning with just seven entries-carboxypeptidase, chymotrypsin, cytochrome, hemoglobin (lamprey), lactate dehydrogenase, subtilisin, and trypsin inhibitor9-15-the PDB archive was established in 1971 to provide both a home and an access point to these information-rich structures. It is a testament to the vision and foresight of the pioneers in the field that they understood the value and potential of archiving and sharing data in an era when computer networking was virtually non-existent.
Since its inception, the size of the archive has increased by a factor of 10 roughly every 10-15 years: the PDB reached 100 released entries in 1982, 1000 entries in 1993, and 10,000 in the year 2000. When the 100,000th is made available in May 2014, ~90% of the archive will have been released in the past fourteen years.
For a look at some of the milestone entries in the archive, see PDB Pioneers at RCSB PDB's Molecule of the Month; Biophysical Highlights from 54 Years of Macromolecular Crystallography in Biophysical Journal, Revealing Views of Structural Biology in Biopolymers; and A brief history of macromolecular crystallography, illustrated by a family tree and its Nobel fruits in FEBS Journal.
PDB structures are also regularly highlighted in PDBe's Quite Interesting PDB Structures (Quips) features; PDBj's Encyclopedia of Protein Structures (eProtS), and the Molecule of the Month series at RCSB PDB (also available in Japanese at PDBj).
1. J. C. Kendrew, R. E. Dickerson, B. E. Strandberg, et al. (1960) Structure of myoglobin: A three-dimensional Fourier synthesis at 2 A. resolution. Nature 185: 422-427.
2. J. C. Kendrew, G. Bodo, H. M. Dintzis, et al. (1958) A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 181: 662-666.
3. M. F. Perutz, M. G. Rossmann, A. F. Cullis, et al. (1960) Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5 Å resolution, obtained by X-ray analysis. Nature 185: 416-422.
4. C. C. F. Blake, L. N. Johnson, G. A. Mair, et al. (1967) Crystallographic studies of the activity of hen egg-white lysozyme. Proc. R. Soc. London Ser. B 167: 378-388.
5. C. C. F. Blake, D. F. Koenig, G. A. Mair, et al. (1965) Structure of hen egg-white lysozyme. A three dimensional Fourier synthesis at 2 Å resolution. Nature 206: 757-761.
6. G. Kartha, J. Bello, D. Harker. (1967) Tertiary structure of ribonuclease. Nature 213: 862-865.
7. H. W. Wyckoff, K. D. Hardman, N. M. Allewell, et al. (1967) The structure of ribonuclease-S at 6 Å resolution. J. Biol. Chem. 242: 3749-3753.
8. M. Jaskolski, Z. Dauter, A. Wlodawer. (2014) A brief history of macromolecular crystallography, illustrated by a family tree and its Nobel fruits. FEBS Journal in press.
9. D. W. Christianson, W. N. Lipscomb. (1986) X-ray crystallographic investigation of substrate binding to carboxypeptidase A at subzero temperature. Proc Natl Acad Sci U S A 83: 7568-7572.
10. J. J. Birktoft, D. M. Blow. (1972) Structure of crystalline alpha-chymotrypsin. V. The atomic structure of tosyl-alpha-chymotrypsin at 2 Å resolution. J Mol Biol 68: 187-240.
11. R. C. Durley, F. S. Mathews. (1996) Refinement and structural analysis of bovine cytochrome b5 at 1.5 A resolution. Acta Crystallogr D Biol Crystallogr 52: 65-76.
12. W. A. Hendrickson, W. E. Love, J. Karle. (1973) Crystal structure analysis of sea lamprey hemoglobin at 2 Å resolution. J Mol Biol 74: 331-361.
13. C. Abad-Zapatero, J. P. Griffith, J. L. Sussman, et al. (1987) Refined crystal structure of dogfish M4 apo-lactate dehydrogenase. J Mol Biol 198: 445-467.
14. R. A. Alden, J. J. Birktoft, J. Kraut, et al. (1971) Atomic coordinates for subtilisin BPN' (or Novo). Biochem Biophys Res Commun 45: 337-344.
15. M. Marquart, Walter, J., Deisenhofer, J., Bode, W., Huber, R. (1983) The geometry of the reactive site and of the peptide groups in trypsin, trypsinogen and its complexes with inhibitors. Acta Crystallogr B39: 480.
The Road to 100,000 Entries: Building a Community Resource
With this week's update, the PDB archive contains 99,624 entries and will soon pass the milestone of 100,000. In the weeks leading up to this event, wwPDB is looking back at other PDB milestones.
Through lively conversations, debates, and planning for the future, a community of structural biologists banded together to establish the PDB in 1971 at Brookhaven National Laboratory1 as an archive for the experimentally-determined 3D structures of biological macromolecules. Today, the PDB archive is managed by the Worldwide Protein Data Bank,2 a unique collaboration of organizations that act as deposition, curation and distribution centers for PDB data.3 The wwPDB's mission is to maintain a single PDB archive of macromolecular structural data that is freely and publicly available to the global community.4,5
In support of this mission, wwPDB works closely with the various communities that rely on the archive. wwPDB has convened Task Forces for X-ray, NMR, 3DEM and Small Angle Scattering that bring together acknowledged experts in these fields to advise wwPDB on issues of method-specific validation, deposition and annotation. Their recommendations are implemented in wwPDB validation pipelines that are an integral part of the new Deposition and Annotation system. Validation reports produced by this pipeline have recently been released for all X-ray structures in the current archive; reports for NMR and EM structures will follow later.
Through a dedicated PDBx/mmCIF Working Group, wwPDB works with software developers from the major macromolecular crystallographic software packages on the representation of large structures, complex chemistry, and new and hybrid experimental methods in the PDB. Recommendations about essential extensions to PDBx/mmCIF have been developed to accommodate large structures, and CCP4 and Phenix can now produce PDBx/mmCIF files suitable for deposition.
An international advisory committee, made up of experts in X-ray crystallography, 3DEM, NMR, and bioinformatics, advises the wwPDB and meets annually.click for names). (Photo by Constance Brukin)
In addition, wwPDB reaches out to the wider community through symposia and publications. One notable event was the 2011 symposium celebrating the 40th anniversary of the Protein Data Bank (PDB40) held at Cold Spring Harbor Laboratory, the intellectual birthplace of the PDB. Many distinguished speakers described structural biology's past, present and future. Selected presentations from this event are available online.
Many publications describe the development and future of the PDB archive and wwPDB organization, including: How community has shaped the Protein Data Bank (Structure, 2013), The future of the Protein Data Bank (Biopolymers, 2013), and Creating a Community Resource for Protein Science (Protein Science, 2012). A full list is available.
1. Protein Data Bank. (1971) Protein Data Bank. Nature New Biol. 233: 223.
2. H. M. Berman, K. Henrick, H. Nakamura. (2003) Announcing the worldwide Protein Data Bank. Nat Struct Biol 10: 980.
3. H. M. Berman, G. J. Kleywegt, H. Nakamura, J.L. Markley (2013) The future of the protein data bank. Biopolymers 99: 218-222.
4. H. M. Berman, G. J. Kleywegt, H. Nakamura, J.L. Markley (2013) How community has shaped the Protein Data Bank. Structure 21: 1485-1491.
5. H. M. Berman, G. J. Kleywegt, H. Nakamura, J.L. Markley (2012) The Protein Data Bank at 40: Reflecting on the Past to Prepare for the Future. Structure 20: 391-396.
wwPDB X-ray validation reports added to PDB archive
Validation reports for all X-ray crystal structures released in the PDB archive are now publicly available. The reports are accessible from the following FTP sites:
The reports were added to the FTP archives as part of the March 19, 2014 update, adding ~56 GB of data. Files are organized in directories following the 2-character hash code format. For example, files for entry 1ABC will be found in directory /pdb/validation_reports/ab/1abc/. All files are compressed using gzip (http://www.gzip.org/) compression.
Five files will be included in the directory for each entry:
Instructions for downloading data from the PDB archive are available.
The validation reports contain an assessment of the quality of a structure and highlight specific concerns by considering the coordinates of the model, the experimental data and the fit between the two. Easily interpretable summary information that compares the quality of a model with that of other models in the archive will help users of PDB data to critically assess archived entries and to select the most appropriate structural models for their needs.
The reports implement recommendations of a large group of community experts on validation of X-ray crystal structures and have been developed in the context of a larger initiative, the new wwPDB Deposition and Annotation system, which was created to unify the annotation tools and practices used across all wwPDB deposition centers and for all common structure-determination methods.
The new X-ray structure-validation reports have been provided to depositors as part of the structure-annotation process since August 2013. More recently, a stand-alone wwPDB X-ray structure validation server was launched allowing crystallographers to generate reports on demand in order to check early, intermediate and near-final models and to help them to identify any potential problems that need addressing prior to structure analysis, publication and deposition.
Further information, including sample X-ray validation reports, is available. We welcome your feedback on the new validation reports. If you would like to send us your comments or questions, then please contact firstname.lastname@example.org.
Archiving Structures Derived from SAS Data
Small-angle scattering (SAS) methods are increasingly used to study the 3D structure of biomacromolecules, either by themselves or in conjunction with other techniques (e.g., crystallography, NMR, 3DEM). The wwPDB has convened an SAS Task Force made up of experts in X-ray and neutron scattering, as well as experts in crystallography, NMR, 3DEM, modeling and archiving.
The Task Force had its inaugural meeting in 2012, where it considered whether the archiving of SAS-based models would be of value to the structural biology community and, if so, what kinds of experimental data, meta-data and validation methods would be required. A report summarizing the Task Force recommendations was published in the journal Structure (Trewhella et al., 21, 875-881 (2013)). The Task Force strongly recommended that a global archive for SAS data and purely SAS-derived models be established, separate from (but federated with) the PDB archive.
There are a few dozen models in the current PDB archive that are based solely on SAS data. The Task Force recommends that these models are also transferred to the future SAS data and model archive, and at that stage be removed from the active PDB archive.
The Task Force also recommended that the models based solely on SAS data previously submitted to the PDB that are currently on-hold awaiting a policy decision should not be processed or archived in the PDB. Instead, they should be transferred to and processed by the future SAS data and model archive once this has been established. The wwPDB will follow this recommendation, and in addition, apply this to any similar structures deposited to the PDB archive. These SAS entries will be issued a PDB ID code, but not processed or released.
Questions may be sent to email@example.com.
New wwPDB Deposition System Now Available for X-ray Structures
The wwPDB partners are pleased to announce the launch of a new deposition system for structures determined using X-ray crystallography. The deposition system can be accessed at http://deposit.wwpdb.org/deposition/.
The new system was developed to allow the wwPDB partners to meet the evolving needs of the scientific community over the next decade, including support for very large systems, complex chemistry, and joint use of multiple experimental methods. The system replaces all current deposition and annotation systems in use at the wwPDB deposition centers, and will lead to improved efficiency and consistency.
New or enhanced features of the deposition system include the generation of X-ray validation reports (following the recommendations of the wwPDB X-ray Validation Task Force), improved capture and review of ligand information, the ability to replace coordinate and/or experimental data files pre- and post-submission, improved communication process between depositors and wwPDB curators, and the ability to preview and download the PDBx/mmCIF entry file prior to submission.
Depositors will have the option to use the new system or one of the legacy deposition tools (ADIT, AutoDep) for most of 2014. After the transition to the new system, the legacy tools will be available for a limited period of time to complete any unfinished deposition sessions.
In the interest of a smooth transition, the number of X-ray structures that will be processed by the new system will increase gradually over the next few months. After this period, it will also become possible to deposit NMR and 3DEM structures with the new system.
Up-to-date information about the new system is available at http://wwpdb.org/system_info.html. Webinars to demonstrate the new system will be announced in the near future.
Coming soon: X-ray Validation Reports for Archived PDB Structures
The wwPDB partners are pleased to announce that validation reports for all X-ray crystal structures already in the PDB archive will be made publicly available in February.
These validation reports assess the quality of a structure and highlight specific concerns by considering the coordinates of the model, the experimental data and the fit between the two. Easily interpretable summary information that compares the quality of a model with that of other models in the archive will help users of PDB data to critically assess archived entries and to select the most appropriate structural models for their needs.
The reports implement recommendations of a large group of community experts on validation and have been developed in the context of a larger initiative, the new wwPDB Deposition and Annotation System, which was created to unify the annotation tools and practices used across all wwPDB deposition centres and for all common structure-determination methods.
Since August 2013 the new X-ray structure-validation reports have been provided to depositors as part of the structure-annotation process. More recently, a stand-alone wwPDB X-ray structure validation server was launched allowing crystallographers to generate reports on demand in order to check early, intermediate and near-final models and to help them to identify any potential problems that need addressing prior to structure analysis, publication and deposition.
Further information, including sample X-ray validation reports, is available. We welcome your feedback on the new validation reports. If you would like to send us your comments or questions, then please contact firstname.lastname@example.org
wwPDB Partners Prepare to Launch New Deposition System
The wwPDB partners are pleased to announce that the new deposition system will be released at the end of January 2014 for structures determined using X-ray crystallography. The new web-based deposition tool combines many features of the existing deposition systems with enhanced data visualization and contextual communication with wwPDB data curators.
The deposition interface is a key part of the new wwPDB Deposition and Annotation System, which was created to unify the deposition and annotation tools and practices across all wwPDB deposition centers. Several structures have already been deposited, annotated, and released using this new system, which has been in beta testing since the summer. The new system will support all experimental methods currently archived by the wwPDB, with deposition tools for NMR and 3DEM made available later in 2014.
Depositors will have the option to use the new system or one of the legacy deposition tools (ADIT, AutoDep) through the end of 2014. At that time, the legacy tools will stop accepting new entries, and will only be available for a limited period of time to complete in-progress deposition sessions.
Information about the new system is available, and updated regularly. Webinar demonstrations will be announced in the near future.
Announcement: Standardization of Amino Acid Nomenclature
The wwPDB is transitioning to use the nomenclature for pyrrolysine and selenocysteine as recommended by the joint nomenclature committee of IUPAC/IUBMB.
Starting in January, PYL (for pyrrolysine) and SEC (for selenocysteine) will be used where three-letter codes appear for amino acids in new releases and updated in entries currently in the archive (<50 entries).
In April, the letter O (for pyrrolysine) and U (for selenocysteine) will be used where one-letter codes appear for amino acids in new releases and updated in entries currently in the archive.
Questions and comments should be sent to email@example.com.
Time-stamped Copies of the PDB Archive
A snapshot of the PDB archive (ftp://ftp.wwpdb.org) as of January 2, 2014 has been added to ftp://snapshots.wwpdb.org/. Snapshots have been archived annually since January 2005 to provide readily identifiable data sets for research on the PDB archive.
The directory 20140102 includes the 96,692 experimentally-determined coordinate files and related experimental data that were available at that time. Coordinate data are available in PDB, mmCIF, and XML formats. The date and time stamp of each file indicates the last time the file was modified.
The script at ftp://snapshots.wwpdb.org/rsyncSnapshots.sh may be used to make a local copy of a snapshot or sections of the snapshot.
Download the wwPDB calendar for IYCr:2014
2014 has been declared the International Year of Crystallography (IYCr2014) by the United Nations Educational, Scientific and Cultural Organization (UNESCO) and the International Union of Crystallography (IUCr).
IYCr2014 commemorates the centennial of X-ray diffraction and celebrates the important role of crystallography in the modern world.
The wwPDB has created a 2014 calendar that illustrates how X-ray crystallography enables our understanding of biology at the atomic level.
In honor of 2014: The International Year of Crystallography, the wwPDB has created a 2014 calendar that illustrates how X-ray crystallography enables our understanding of biology at the atomic level. The calendar is available for download in PDF, PPT, and individual image formats.