As part of their series of Workshops on Open-Source Tools for Chemistry, the Chemical Information and Computer Applications Group of the Royal Society of Chemistry hosted two PDB50 celebrations on November 16 and 18, 2021. Videos of these presentations are available online.
After these workshops, attendees should be able to:
Throughout 2021, wwPDB has been celebrating the 50th anniversary of the PDB archive (wwpdb.org/pdb50).
The inaugural symposium hosted by the ASBMB and organized by the wwPDB Foundation was held virtually on May 4-5, 2021.
Videos of these presentations are now available from wwPDB.org. Watch the full symposium or access individual presentations.
The Biophysical Society hosted a virtual symposium on October 6, 2021, highlighting some of the high-impact applications of protein structural data, with a particular focus on the areas of structure prediction and membrane protein biophysics.
The recorded presentations from that day are available from the BPS Video Library
Session I. Enabling Understanding of Protein Structure, Function, and Design
Session II. Molecular Biophysics of Membrane Proteins
As part of their series of Workshops on Open-Source Tools for Chemistry, the Chemical Information and Computer Applications Group of the Royal Society of Chemistry will be hosting two free virtual events in honor of PDB50.
Please register to attend these Open-Source Tools for Chemistry Workshops.
John D. Westbrook Jr. (1957-2021), Research Professor at Rutgers University and Data & Software Architect Lead for the RCSB PDB, passed away on October 18, 2021.
He was incredibly beloved and respected by his colleagues at Rutgers and throughout the world, known for his dry wit and endless enthusiasm for thinking about all aspects of data and data management.
John had a long and highly successful career developing ontologies, tools, and infrastructure in data acquisition, validation, standardization, and mining in the structural biology and life science domains. His work established the PDBx/mmCIF data dictionary and format as the foundation of the modern Protein Data Bank (PDB) archive (wwPDB.org).
More than twenty-five years ago, while still a graduate student, John recognized the importance of a well-defined data model for ensuring delivery of high quality and reliable structural information to data users. He was the principal architect of the mmCIF data representation for biological macromolecular data. Based on a simple, context-free grammar (without column width constraints), data are presented in either key-value or tabular form. All relationships between common data items (e.g., atom and residue identifiers) are explicitly documented within the PDBx Exchange Dictionary (mmcif.wwpdb.org). Use of the PDBx/mmCIF format enables software applications to evaluate and validate referential integrity within any PDB entry. A key strength of the mmCIF technology is the extensibility afforded by its rich collection of software-accessible metadata.
The current PDBx/mmCIF dictionary contains more than 6,200 definitions relating to experiments involved in macromolecular structure determination and descriptions of the structures themselves. The first implementation of this schema was used for the Nucleic Acid Database, a data resource of nucleic acid-containing X-ray crystallographic structures. Today, this dictionary underpins all data management of the PDB. Since 2014, it has served as the Master Format for the PDB archive. It also forms the basis of the Chemical Component Dictionary (wwpdb.org/data/ccd), which is used to maintain and distribute small molecule chemical reference data in the PDB.
In 2011, the Worldwide Protein Data Bank (wwPDB) PDBx/mmCIF Working Group was established to enable direct use of PDBx/mmCIF format files within major macromolecular crystallography software tools and to provide recommendations on format extensions required for deposition of larger macromolecule structures to the PDB. This was a key step in the evolution of the PDB archive, which enabled studies of macromolecular machines, such as the ribosome, as single PDB structures (instead of split entries with atomic coordinates distributed among different entry files). In 2019, mandatory submission of PDBx/mmCIF format files for deposition was announced (Adams et al. Acta Crystallographica D75, 451-454).
To ensure the success of the PDBx/mmCIF dictionary and format, John worked with a wide range of community experts to extend the framework to encompass descriptions of macromolecular X-ray crystallographic experiments, 3D cryo-electron microscopy experiments, NMR spectroscopy experiments, protein and nucleic acid structural features, diffraction image data, and protein production and crystallization protocols. Most recently, these efforts have been focused on developing compatible data representations for X-ray free electron (XFEL) methods, and for integrative or hybrid methods (I/HM). I/HM structures, currently stored in the prototype PDB-Dev archive (pdb-dev.wwpdb.org), presented new challenges for data exchange among rapidly evolving and heterogeneous experimental repositories. Proper management of I/HM structures in PDB-Dev also required extension of the PDBx/mmCIF data dictionary to include coarse-grained or multiscale models, which will be essential for studying macromolecular structures in situ using cryo-electron tomography and other bioimaging methods.
John contributed broadly to community data standards enabling interoperation and data integration within the biology and structural biology domains. His efforts have included (i) describing the increasing molecular complexity of macromolecular structure data, (ii) representing new experimental methodologies, including I/HM techniques, and (iii) expanding the biological context required to facilitate broader integration with a spectrum of biomedical resources. John’s work has been central to connecting crystallographic and related structural data for biological macromolecules to key resources across scientific disciplines. His efforts have been described in more than 120 peer-reviewed publications, one of which has been cited more than 21,000 times according to the Web of Science (Berman et al. Nucleic Acids Research 28, 235-242). Eight of his most influential published papers have appeared in the International Tables of Crystallography.
John has also done yeoman service to the crystallographic community over many years and was recognized with the inaugural Biocuration Career Award from the International Society for Biocuration in 2016.
For the International Union of Crystallography, John served on the Commission for Maintenance of CIF Standard (COMCIFS), the Working Group on Data Diffraction Deposition (DDDWG), and the Committee on Data (CommDat). He also served as an Associate Editor for Acta Crystallographica Section F.
John was a long-standing member of the American Crystallographic Association, and served on the Data, Standards & Computing Committee. He also served on the Metadata Interest Group for the Research Data Alliance.
John is survived by his wife, Bonnie J. Wagner-Westbrook, Ed.D. and his devoted Mother-in-Law, Joan N. Wagner of Clinton Twp., NJ; many cousins including Chandler Turner (of Portsmouth, VA), Ann (Turner) Heyes (of Tasmania, Australia) and Louise (Turner) Brown (of Oakland CA).
Visitation will take place on Saturday, November 6, 2021 from 2-4pm with Memorial Service at 4pm. All at Scarponi-Bright Funeral Home, 26 Main Street, Lebanon, NJ. Interment will be private.
Memorials can be made to Capicats or an organization of choice in his honor.
Additional information is available at Scarponi-Bright.
John D. Westbrook Jr (1957–2021) Acta Cryst (2021) D77: 1475-1476 doi: 10.1107/S2059798321011402
The PDB was announced on October 20, 1971 in Crystallography: Protein Data Bank Nature New Biology 233: 223 (1971) doi: 10.1038/newbio233223b0.
Today, the PDB archive contains >180,000 structures of proteins, nucleic acids, and complex assemblies that helps students and researchers understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease. It is managed by the Worldwide PDB (wwPDB) organization that ensures that the PDB is freely and publicly available to the global community.
The wwPDB has been celebrating this golden anniversary with symposia and events throughout 2021.
Consider supporting 50 years of PDB's spirit of openness, cooperation, and education with a donation to the wwPDB Foundation. The wwPDB Foundation was established in 2010 to raise funds in support of the outreach activities of the wwPDB. The Foundation raised funds to help support PDB50 events, workshops, and educational publications.
The Foundation is chartered as a 501(c)(3) entity exclusively for scientific, literary, charitable, and educational purposes.
Congratulations to biocurators Dr. Sutapa Ghosh and Dr. Monica Sekharan on processing over 10,000 PDB depositions. They are the second and third biocurators to reach this milestone. Yumiko Kengaku reached this milestone in April 2021.
Dr. Ghosh received her PhD in structural biology from the University of Calcutta and joined PDB after working in industry in structure-based drug design. Dr. Sekharan received her PhD in Biological Chemistry from the University of Washington with expertise in NMR spectroscopy. During their 15 year career at the PDB, many depositors trusted their professional skills in accurate and comprehensive data analysis and representation. Their deep scientific knowledge, profound data curation expertise and commitment to excellence contributed to the high quality data archive for the benefit of the scientific community. We congratulate Drs. Ghosh and Sekharan with this exciting accomplishment and look forward to their future successes.
wwPDB continues to support research, education, and drug discovery worldwide. Open access to PDB data has helped researchers in structure-guided discovery and development of anti-coronavirus drugs, vaccines and neutralizing antibodies. When researchers analyze existing PDB structures, such as working on a similar structure, they may often need additional information impossible to retrieve from the PDB entry file alone. In particular, it is not possible to obtain a point of contact in cases where there is no associated primary publication for an entry.
Following a recommendation from the IUCr Commission on Biological Macromolecules and the IUCr Committee on Data, wwPDB will make public the PI name, email address, and ORCiD ID for initial PDB depositions or re-submissions made, starting September 24, 2021. This will enable contact with the authors of every released PDB structure as of that date. This release will also align the PDB with the standard practices of providing corresponding author information by scientific journals
The dated acceptance of these PDB Terms and Conditions described above will be captured within the OneDep system. The responsible depositor who creates the deposition should make entry PI(s) aware of the policy change to include PI name, email address, and ORCiD in public PDBx/mmCIF files.
The Biophysical Society will host a virtual symposium on October 6, 2021, highlighting some of the high-impact applications of protein structural data, with a particular focus on the areas of structure prediction and membrane protein biophysics.
Registration is free, however space is limited. Registration deadline is October 4.
Individual Chemical Component Dictionary (CCD) and Biologically Interest molecule Reference Dictionary (BIRD) definitions are now accessible in a new FTP tree in the PDB archive. In response to user requests, these individual CCD and BIRD entry files can be found at /pdb/refdata/chem_comp/ and /pdb/refdata/bird/, respectively with last character hash as sub-directory.
New inventory data files offer a quick overview of data in the archive. These files are in the extensible JSON format, and can be found under the new /pdb/holdings/ FTP tree.
The inventory lists provided include:
The inventory (index) files historically provided in /pdb/derived_data/ will continue to be updated for the time being; they will eventually be removed from the PDB archive. Users are encouraged to utilize these new inventory files.
EMBL will host a virtual symposium on October 20-22, 2021 celebrating 50 years of the PDB.
Registration deadline is September 29.
wwPDB manages the PDB Core Archives as a public good according to the FAIR Principles. In support of the FAIR objectives, wwPDB has replaced its historical data access license with a standard open source license from Creative Commons, the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.
The new CC0 license provides the same open access as the prior license. PDB data remain freely available to all PDB Users including commercial users.
The 2021 wwPDB Charter Agreement and Usage Policy have been updated to reflect the new license.
Users of PDB data are encouraged to attribute the original authors of the PDB structure data where possible.
Celebrate PDB50 at the Fall 2021 ACS Meeting with a session on Understanding Enzyme Function in 3D: Celebrating 50 Years of the Protein Data Bank.
All times shown are are listed in Eastern Daylight Time (EDT) on Wednesday August 25, 2021.
2:00 Introductory Remarks 2:05 The winding road from G-quadruplexes to telomerase, Juli Feigon (UCLA) 2:45 Enhanced exploration of small-molecule ligands bound to proteins and nucleic acids, Stephen K. Burley (Rutgers University and UCSD) 3:10 Mechanistic insights into the cleavage and polyadenylation machinery, Lori Passmore (University of Cambridge) 3:35 Vive la difference! The synergies and differences between the PDB and the CSD, Jason Cole (CCDC) 4:00 Break
4:30 Beyond static snapshots of protein structure: The role of dynamics in function, George Phillips Jr. (Rice University) 5:10 Evolution of the SARS-CoV-2 proteome in three dimensions (3D) during the COVID-19 pandemic, Sagar Khare (Rutgers) 5:35 Cracking the phosphatase code: Holoenzyme formation, regulatory protein binding and susbtrate dephosphorylation by the phosphoprotein phosphatase family, Rebecca Page (Brown) 6:00 Small molecules targeting COVID-19 in an evolving landscape of publishing and peer review, James Fraser (UCSF) 6:30 Break
7:00 Watching metalloenzymes at work, Amie Boal (Penn State) 7:25 Time travel to the past and future – evolution of energy landscapes for enzymes catalysis, Dorothee Kern (Brandeis) 7:50 Structure, mechanism, and inhibition of class IIb histone deacetylases, David Christianson (Harvard) 8:15 Truth sometimes triumphs: The history of structural enzymology, Gregory Petsko (Brandeis) 8:55 Closing Remarks
Visit ACS for registration information.
This symposium was organized by Carmen Nitsche (CCDC), and Steven C. Almo (Albert Einstein College of Medicine), and Stephen K. Burley (RCSB PDB).
From February 9, 2022, the wwPDB EMDB Core Archive will exclusively support version 3 of its data model and retire version 1.9.6 header files from the archive. The switch will involve several changes regarding file provision by the archive and this article outlines these changes.
Since the inception of OneDep in 2015, the EMDB Core Archive (EMBL-EBI: https://ftp.ebi.ac.uk/pub/databases/emdb/, PDBj: https://ftp.pdbj.org/pub/emdb/, wwPDB mirror site: https://ftp.wwpdb.org/pub/emdb/) has maintained two versions of its data model in parallel and also two versions of the header file for each entry. Currently, the official EMDB data model is version 1.9.6, while version 3, which facilitates a richer representation of the metadata about EMDB entries, was introduced in 2015.
Version 3 has now been finalized, and EMDB will therefore change its official data model version from v1.9.6 to v3. This will involve three changes to the EMDB Core Archive as of February 9, 2022:
Currently, for an entry EMD-xxxxx in the EMDB core archive located at /structures/EMD-xxxxx/header/, the following header files are provided:
where emd-xxxxx.xml at present is a copy of the file emd-xxxxx-v19.xml, adhering to v1.9.6 of the EMDB data model.
From 9th February 2022 onward, for any entry, the following header files will be provided:
where emd-xxxxx.xml will be a copy of emd-xxxxx-v30.xml, supported by v3 of the EMDB data model.
For any further information please email email@example.com.
wwPDB, in collaboration with the PDBx/mmCIF Working Group, has set plans to extend the length of ID codes for PDB and Chemical Component Dictionary (CCD) ID entries in the future. These extended formats are not supported by the legacy PDB file format.
As announced previously, wwPDB has extended PDB ID length to eight characters prefixed by ‘PDB’, e.g., pdb_00001abc.
Each PDB ID is issued a corresponding Digital Object Identifier (DOI), often required for manuscript submission to journals and described in publications by the structure authors.
To help depositors provide information to journals, OneDep now displays the PDB ID and DOI on the submission confirmation page.
The extended PDB IDs and corresponding PDB DOIs, along with existing four character PDB IDs, are now included in the PDBx/mmCIF formatted files. Initially, this will only be available for updated and newly-released PDB entries, with an archive-wide update at a later date.
The additional accessions will be provided in the _database_2 PDBx/mmCIF category.
For example, PDB entry 1ABC will have the extended PDB ID (pdb_00001abc) and the corresponding PDB DOI (10.2210/pdb1abc/pdb).
loop_ _database_2.database_id _database_2.database_code _database_2.pdbx_database_accession _database_2.pdbx_DOI PDB 1abc pdb_00001abc 10.2210/pdb1abc/pdb WWPDB D_1xxxxxxxxx ? ?
Once all available four-character PDB IDs have been consumed, newly-deposited PDB entries will only be issued extended PDB ID codes. These entries will only be distributed in PDBx/mmCIF format.
wwPDB asks journals, users, and software developers to review code and remove related limitations.
The 2021 ACA Meeting Transactions Symposium Function Follows Form: Celebrating the 50th Anniversary of the Protein Data Bank celebrates this golden anniversary.
Friday July 30 Speakers
Saturday July 31 Speakers
Each day will end with a Panel Discussion: Leaning In – PDB in the Next 50 Years.
An article describing updates and improvements to Mol* is highlighted on the cover of the 2021 Nucleic Acids Research Web Server Issue. As the primary 3D structure viewer used by PDBe and RCSB PDB, it enables 3D exploration of macromolecular coordinate and experimental data directly within the browser window. The project is an open collaboration started by PDBe, RCSB PDB, CEITEC, and welcomes new contributors.
Two outstanding students, Áron Samuel Kovács and Sukolsak Sakshuwong, contributed new functionality to Mol*. Thanks to their work, Mol* now has greatly improved 3D rendering capabilities and can also export molecular scenes as 3D object files for use in external rendering programs. These features can currently be previewed on molstar.org before they are made available at PDBe and RCSB PDB.
Áron Samuel Kovács just finished his Master thesis in computer graphics at Masaryk University in the group of Barbora Kozlíková. He greatly improved the 3D rendering capabilities of Mol*, including artifacts-free transparency, improved darkening of crevices for better depth perception and much cleaner outlines.
Sukolsak Sakshuwong just finished his PhD in Management Science and Engineering at Stanford University in the group of Ashish Goel. He added geometry exporters to Mol* which allows users to extract 3D molecular scenes created in Mol* for use in 3D printing and other 3D graphic design. These scenes can be exported as glTF, an industry standard file format, as well as STL and Wavefront (.obj) formats.
The Mol* toolkit is available open access on GitHub, allowing community contributions.
The Electron Microscopy Data Bank (EMDB), the public repository for electron cryo-microscopy maps and tomograms of macromolecular complexes and subcellular structures, is now an official partner in the Worldwide Protein Data Bank (wwPDB) collaboration under a formal agreement.
The wwPDB partners are organizations that act as deposition, data processing and distribution centers for the three core wwPDB archives – Biological Magnetic Resonance Data Bank (BMRB), EMDB, and the Protein Data Bank (PDB).
The founding members--Research Collaboratory for Structural Bioinformatics PDB (RCSB PDB, USA), PDBe (Europe), and PDBj (Japan)--established the wwPDB in 2003. BMRB (USA) joined in 2006.
This move formalizes a long-standing relationship between the EMDB and wwPDB. EMDB was established in 2002 at EMBL’s European Bioinformatics Institute (EMBL-EBI). Since then, wwPDB and EMDB have collaborated on a wide range of issues including data deposition, annotation, and validation.
The partnership marks an important milestone in the wwPDB’s mission to bring coherence to the public archiving, management and dissemination of structural biology data, and highlights its commitment to the FAIR Principles (Findability, Accessibility, Interoperability, Reusability), which are emblematic of responsible stewardship of public domain information.
Key benefits of the partnership for EMDB users include the streamlining and harmonisation of policies and practices with the other core wwPDB archives to facilitate deposition, as well as improvements to data validation, which will facilitate the reuse of EMDB data.
At the inaugural PDB50 meeting, ~275 posters were presented (Abstracts Day 1 | Day 2); 209 of these presentations were considered for poster prize awards.
Many thanks to the poster prize judges:
wwPDB is celebrating the 50th Anniversary of the PDB throughout 2021 with symposia, materials, and more.
wwPDB validation reports are now provided in PDBx/mmCIF format for all new depositions in OneDep. This change makes validation data more interoperable with the PDB archival format. Data are more logically and better organized in the PDBx/mmCIF reports, and therefore more “database-friendly” than the report in XML format. PDBx/mmCIF-format validation reports for newly released and modified entries will be distributed through the PDB and EMDB Core Archives.
The new PDBx/mmCIF reports are easier to interpret. They contain a high-level summary and offer easier access to residue-level information. Data are provided at multiple levels: entity, chain-specific, and even at the individual residues. For example, it is more straightforward to obtain the total number of clashes. The corresponding validation dictionary is available at mmcif.wwpdb.org/dictionaries/mmcif_pdbx_vrpt.dic/Index. Examples of PDBx/mmCIF validation reports for X-ray, 3DEM, and NMR are publicly available at GitHub.
PDBx/mmCIF validation reports will be provided for the full PDB and EMDB archives once archival validation recalculation is performed.
wwPDB strongly recommends all PDB users and software developers adopt this format for future applications.
In 2014, PDBx/mmCIF became the PDB’s archive format and the the legacy PDB file format was frozen. In addition to PDBx/mmCIF files for all entries, wwPDB produces PDB format-formatted files for entries that can be represented in this legacy file format (e.g., entries with over 99,999 atoms or with multi-character chain IDs are only available in PDBx/mmCIF).
As the size and complexity of PDB structures increases, additional limitations of the legacy PDB format are becoming apparent and need to be addressed (as announced previously).
Restrictions in the SHEET record fields in legacy the PDB file format do not allow for the generation of complex beta sheet topology. Complex beta sheet topologies include instances where beta strands are part of multiple beta sheets and other cases where the definition of the strands within a beta sheet cannot be presented in a linear description. For example, in PDB entry 5wln a large beta barrel structure is created from multiple copies of a single protein; within the beta sheet forming the barrel are instances of a single beta strand making contacts on one side with multiple other strands, even from different chains.
This limitation, however, is not an issue in the PDBx/mmCIF formatted file, where these complex beta sheet topology can be captured in _struct_sheet, _struct_sheet_order, _struct_sheet_range, and _struct_sheet_hbond.
Starting June 8th 2021, legacy PDB format files will no longer be generated for PDB entries where the SHEET topology cannot be generated. For these structures, wwPDB will continue to provide secondary structure information with helix and sheet information in the PDBx/mmCIF formatted file.
wwPDB regularly reviews the software used during OneDep biocuration. The _struct_site and _struct_site_gen categories in PDBx/mmCIF (SITE records in the legacy PDB file format) are generated by in-house software and based purely upon distance calculations, and therefore may not reflect biological functional sites.
Starting in June 2021, the in-house legacy software which produces _struct_site and _struct_site_gen records will be retired and wwPDB will no longer generate these categories for newly-deposited PDB entries. Existing entries will be unaffected.
Journal of Biological Chemistry (JBC) has published a collection of reviews in celebration of PDB50.
This issue, edited by Lila Gierasch (JBC) and Helen Berman (wwPDB Foundation, RCSB PDB), contains 17 reviews highlighting the impact of the PDB archive across biological chemistry.
JBC was one of the first journals to require PDB deposition of structural data reported in accepted articles. In addition, more structures in the PDB have been published in JBC than in any other journal.
If the past 15 months have taught us anything about science, it’s that it is vital for researchers to work together to make progress on major challenges. Scientists from around the world will come together virtually to celebrate the 50th anniversary of a key piece of the infrastructure for sharing scientific knowledge: the Protein Data Bank (PDB). The event will be hosted by the American Society for Biochemistry and Molecular Biology on May 4–5, 2021.
Additional events and resources will be announced throughout the year at wwpdb.org/pdb50.
The PDB is the global archive for biological structures. From its inception, the PDB has embraced a culture of open access, leading to its widespread use by the research community and public alike. Millions of users access the PDB data exploring fundamental biology, energy and biomedicine.
Structural biology archived in the PDB opens windows into biology. Through their structures, scientists not only can understand how biological molecules work but can design many of our modern medicines.
Structural biology has been seminal in understanding how SARS-CoV-2, the virus that causes COVID-19, and is the foundation of our understanding of protein folding. In fact, more structural biologists have been awarded Nobel Prizes than those in any other field.
In 1971, Helen Berman, a co-founder of the PDB and now a professor emerita at Rutgers University, and colleagues realized that the research community would benefit from sharing structural biology data. The PDB archive that they started has grown into a global database managed by the Worldwide Protein Data Bank consortium (wwPDB) of partner sites in Asia, Europe and America.
“The PDB plays a seminal role in structure-based drug design, a mainstay of many of our current therapeutics… (and) has given rise to the entire field of structural bioinformatics,” Berman said.
Most scientific journals require deposition of structural biology data in the PDB prior to publication. The PDB data are readily accessible to scientists, educators and nonscientists alike.
Leading structural biologists at the meeting from Caltech, Stanford University, Tsinghua University, Harvard Medical School and many other institutions will celebrate the history of the PDB archive. They will also present their current research on topics ranging from SARS-CoV-2 replication, cancer therapies based on antibodies conjugated to small molecules, and immunity and antiviral drugs.
Thousands of scientists have contributed and access the PDB archive regularly. The Journal of Biological Chemistry recently released a special issue on this theme, scientific advances enabled by the PDB.
The importance of sharing structural biology data for systems biology, protein design and drug discovery will continue to open our world into the intricacies biology.
Learn more about the upcoming May meeting and review the agenda at ASBMB. Registration ends May 1.
wwPDB, in collaboration with the PDBx/mmCIF Working Group, has set plans to extend the length of ID codes for PDB and Chemical Component Dictionary (CCD) ID entries in the future. Entries containing these extended IDs will not be supported by the legacy PDB file format.
CCD entries are currently identified by unique three-character alphanumeric codes. At current growth rates, we anticipate running out of available new codes in the next three to four years. At this point, the wwPDB will issue four-character alphanumeric codes for CCD IDs in the OneDep system. Due to constraints of the legacy PDB file format, entries containing these new, four character ID codes will only be distributed in PDBx/mmCIF format. The wwPDB will begin implementation of extended CCD ID codes in 2022.
In addition, wwPDB also plans to extend PDB ID length to eight characters prefixed by ‘PDB’, e.g., pdb_00001abc. Each PDB ID has a corresponding Digital Object Identifier (DOI), often required for manuscript submission to journals and described in publications by the structure authors. Both extended PDB IDs and corresponding PDB DOIs, along with existing four character PDB IDs, will be included in the PDBx/mmCIF formatted files for all new entries by Fall 2021.
For example, PDB entry 1ABC will also have the extended PDB ID (pdb_00001abc) and the corresponding PDB DOI (10.2210/pdb1abc/pdb) listed in the _database_2 PDBx/mmCIF category.
loop_ _database_2.database_id _database_2.database_code _database_2.pdbx_database_accession _database_2.pdbx_DOI PDB 1abc pdb_00001abc 10.2210/pdb1abc/pdb WWPDB D_1xxxxxxxxx ? ?
Once four-character PDB IDs are all consumed, newly-deposited PDB entries will only be issued extended PDB ID codes, and entries will only be distributed in PDBx/mmCIF format.
wwPDB is asking PDB users and related software developers to review code and begin to remove such limitations for the future.
Throughout 2021, the wwPDB will be celebrating the 50th anniversary of the PDB archive (wwpdb.org/pdb50).
The inaugural symposium will be held virtually on May 4-5, 2021.
The online sessions will take place between 11 a.m. – 4:30 p.m. ET each day. The event will be recorded and made available to registered participants after the meeting.
Students and postdoctoral fellows are especially encouraged to attend and will be eligible for poster awards.
Register by May 1 at https://www.asbmb.org/meetings-events/pdb50.
firstname.lastname@example.org is an open discussion forum for questions and discussions with the PDB user community about protein structure, analysis, and related topics. Messages sent to email@example.com will be sent to all subscribers. This bulletin board, which supports HTML formatting and images, replaces the previous forum at firstname.lastname@example.org. Messages, including messages migrated from the previous bulletin board are archived.
Existing subscribers do not need to resubscribe. New users should visit the email@example.com info page to subscribe.
Questions about the migration should be sent to firstname.lastname@example.org.
Questions about PDB structures should be sent to email@example.com.
Congratulations to wwPDB’s Ms. Yumiko Kengaku on processing 10,000 depositions. Yumiko began her career as a biocurator in 2000, as a member the newly-formed PDBj team at her alma mater Osaka University. She is the 1st wwPDB biocurator to process more than 10,000 structures. Many Asian structural biologists know and trust Yumiko. For more than two decades, she has worked closely with depositors to expertly guiding them through the structure deposition process, ensuring timely release of high-quality data. A gift celebrating her long service to PDBj, the wwPDB, and the global scientific community was presented to Yumiko in April 2021. We look forward to celebrating the accomplishments of the next biocurator to reach the 10,000 deposition milestone.
Extensions to the PDBx/mmCIF dictionary for reflection data with anisotropic diffraction limits, for unmerged reflection data, and for quality metrics of anomalous diffraction data are now supported in OneDep.
In October 2020, a subgroup of the wwPDB PDBx/mmCIF Working Group was convened to develop a richer description of experimental data and associated data quality metrics. Members of this Data Collection and Processing Subgroup are all actively engaged in development and support of diffraction data processing software. The Subgroup met virtually for several months discussing, reviewing, and finalizing a new set dictionary content extension that were incorporated into the PDBx/mmCIF dictionary on February 16, 2021. A reference implementation of the new content extensions has been developed by Global Phasing Ltd.
These extensions facilitate the deposition and archiving of a broader range of diffraction data, as well as new quality metrics pertaining to these data. These extensions cover three main areas:
The new mmCIF data extensions describing anisotropic diffraction now enable archiving of the results of Global Phasing’s STARANISO program. Developers of other software can make use of them or extend the present definitions to suit their applications. Example files created by autoPROC, BUSTER (version 20210224) and Gemmi that are compliant with the new dictionary extensions are provided in a GitHub repository.
These example files, and similarly compliant files produced by other data processing and/or refinement programs, are suitable for direct uploading to the wwPDB OneDep system. Automatic recognition of that compliance, implemented by means of explicit dictionary versioning using the new pdbx_audit_conform record, will avoid unnecessary pre-processing at the time of deposition. This improved OneDep support will ensure a lossless round trip between data processing/refinement in the lab and deposition at the PDB.
wwPDB strongly encourages structural biologists to always use the latest versions of structure determination software packages to produce data files for PDB deposition. wwPDB also encourages crystallographers wishing to deposit new structures together with their associated diffraction data to use the software which guarantees consistency between data and final model. This consistency is difficult to achieve when separate diffraction data files and model coordinate files are pieced together a posteriori by ad hoc means.
wwPDB also encourages depositors to make their raw diffraction images available from one of the public repositories to allow direct access to the original diffraction image data.
To improve the clarity of assembly definitions in curation, wwPDB now makes curated PDB assemblies available for depositors to view in OneDep using the Mol* viewer.
One of the important processes in curation of PDB entries is the definition of assemblies for each structure. This helps users of PDB data to understand the structure in the context of its complex formation in the specific experimental conditions.
To ensure that assemblies are curated correctly, they are reviewed by annotators at the time of curation before being reported back to the depositors after the curation process.
The deposition system in OneDep has now been enhanced so that after curation, the annotated assembly is displayed in the Mol* 3D viewer for depositors to review. This viewer is available in a new Review section in the deposition interface, which is present after curation of the entry. The Mol* viewer can display PDB structure data within the browser with minimal memory requirements, therefore making it quick and easy to visually display assembly information.
These changes will help improve the validation and reporting of curated assemblies during the deposition process.
In 2014, PDBx/mmCIF became the PDB’s archive format and the the legacy PDB file format was frozen. In addition to PDBx/mmCIF files for all entries, wwPDB produces PDB format-formatted files for entries that can be represented in this legacy file format (e.g., entries with over 99,999 atoms or with multi-character chain IDs are only available in PDBx/mmCIF)
As the size and complexity of PDB structures increases, additional limitations of the legacy PDB format are becoming apparent and need to be addressed.
A new article in Structure describes new features, including branched representations and 2D SNFG images for carbohydrates, identification of ligands of interest, 3D views of electron density fit, and 2D images of small molecule geometry.
These enhancements and processes for validation of 3D small-molecular structures reflect recommendations from the wwPDB/CCDC/D3R Ligand Validation Workshop and the adoption of software through community collaborations.
This manuscript also highlights enhancements made since the initial implementation of Validation Reports as described in Validation of the Structures in the Protein Data Bank (2017) Structure 25: 1916-1927 doi: 10.1016/j.str.2017.10.009.
Enhanced Validation of Small-Molecule Ligands and Carbohydrates in the Protein Data Bank Zukang Feng, John D. Westbrook, Raul Sala, Oliver S. Smart, Gérard Bricogne, Masaaki Matsubara, Issaku Yamada, Shinichiro Tsuchiya, Kiyoko F. Aoki-Kinoshita, Jeffrey C. Hoch, Genji Kurisu, Sameer Velankar, Stephen K. Burley, and Jasmine Y. Young (2021) Structure doi: 10.1016/j.str.2021.02.004
Abstract submission and reduced registration rates end March 22. Register at https://www.asbmb.org/meetings-events/pdb50.
With this week's update, 1,018 SARS-CoV-2-related structures are now freely available from the Protein Data Bank.
The first SARS-CoV-2 structure, a high-resolution crystal structure of the coronavirus main protease (PDB 6lu7), was released early in the pandemic on February 5, 2020.
Since then, structural biologists have visualized most of the SARS-CoV-2 proteome, including the spike protein binding to its ACE2 receptor and neutralizing antibodies, and the main protease, the papain-like proteinase, and other promising drug discovery targets. All of the structures and related data are available for exploration from wwPDB partner websites: RCSB PDB, PDBe, PDBj, and BMRB.
Rapid public release of SARS-CoV-2 structure data has greatly increased our understanding of Covid-19, allowed direct visualization of emerging variants of the virus, and facilitated structure-guided drug discovery and reuse to combat infection. Open access to PDB structures has already enabled design of effective vaccines against SARS-CoV-2.
The response of the research community to the pandemic has highlighted the importance of open access to scientific data in real time. The wwPDB strives to ensure that 3D biological structure data remain freely accessible for all, while maintaining as comprehensive and accurate an archive as possible.
The impact of these 1018 structures and many more coronavirus protein structures to come stands as a testament to the importance of open access to structural biology research data.
Throughout 2021, the wwPDB will be celebrating the 50th anniversary of the PDB archive.
The inaugural symposium will be held May 4-5, 2021 in an event hosted by the American Society for Biochemistry and Molecular Biology and organized by the wwPDB Foundation.
This celebration of the 50th anniversary of the founding of the Protein Data Bank as the first open access digital data resource in biology will include presentations from speakers from around the world who have made tremendous advances in structural biology and bioinformatics.
Attendees are encouraged to participate in the virtual poster session and exhibition hall. Students and postdoctoral fellows will be eligible for poster awards.
Register and submit abstracts by March 15th, 2021 for reduced rates.
Speakers will include:
The online sessions will take place between 11 a.m. – 4:30 p.m. EST each day. The event will be recorded and made available to registered participants after the meeting.
Sponsorship opportunities are available; please contact the wwPDB Foundation for more information.
The wwPDB archive has now been updated to include validation reports for every released set of EM model coordinates in the PDB and every released EMDB map entry. Validation reports provide quantitative and visual assessments of structure quality and enable archive-wide comparisons (https://www.wwpdb.org/validation/validation-reports).
wwPDB EM validation reports were first made available to OneDep depositors in 2019 (http://www.wwpdb.org/news/news?year=2019#5db841ceea7d0653b99c8839). The current reports are based on recommendations obtained from EM Validation Task Force (VTF) meetings in 2010 (Structure 20: 205-214, wwpdb.org/task/em) and 2020 (white paper in preparation), as well as EM Validation Challenge events (https://www.ncbi.nlm.nih.gov/pubmed/32002441, https://www.biorxiv.org/content/10.1101/2020.06.12.147033v1). Examples of recent improvements include images for deposited masks, improved map-model overlay images, visualization of a (approximate) raw map from two half-maps, and rotationally averaged power spectrum plots. The underlying methodology is continually improved, based on community requirements, requests and feedback.
The PDB Core Archive holds validation reports that assess each PDB model along with its associated experimental map/tomogram from EMDB. EM map+model reports can be downloaded at the following wwPDB mirrors:
The EMDB Core Archive holds validation reports that assess each EMDB map/tomogram entry. EM map-only reports can be downloaded at the following URLs:
Additional information about validation reports is available for EM map+model, EM map only, and EM tomograms.
If you have any questions or queries about wwPDB validation, please contact us at firstname.lastname@example.org.
A snapshot of the PDB Core archive (ftp://ftp.wwpdb.org) as of January 5th, 2021 has been added to ftp://snapshots.wwpdb.org and ftp://snapshots.pdbj.org. Snapshots have been archived annually since 2005 to provide readily identifiable data sets for research on the PDB archive.
The directory 20210105 includes the structure and experimental data for the 173,005 PDB entries available at that time. Atomic coordinate and related metadata are available in PDBx/mmCIF, PDB, and XML file formats. The date and time stamp of each file indicates the last time the file was modified. The snapshot of PDB Core archive is 822 GB.
A snapshot of the EMDB Core archive (ftp://ftp.ebi.ac.uk/pub/databases/emdb/) as of January 4, 2021 can be found in ftp://ftp.ebi.ac.uk/pub/databases/emdb_vault/20210101/ and ftp://snapshots.pdbj.org/20210101/. The snapshot of EMDB Core archive contains map files and their metadata within XML files for both released and obsoleted entries (13731 and 142, respectively) and is 2.9 TB in size.