If the past 15 months have taught us anything about science, it’s that it is vital for researchers to work together to make progress on major challenges. Scientists from around the world will come together virtually to celebrate the 50th anniversary of a key piece of the infrastructure for sharing scientific knowledge: the Protein Data Bank (PDB). The event will be hosted by the American Society for Biochemistry and Molecular Biology on May 4–5, 2021.
Additional events and resources will be announced throughout the year at wwpdb.org/pdb50.
The PDB is the global archive for biological structures. From its inception, the PDB has embraced a culture of open access, leading to its widespread use by the research community and public alike. Millions of users access the PDB data exploring fundamental biology, energy and biomedicine.
Structural biology archived in the PDB opens windows into biology. Through their structures, scientists not only can understand how biological molecules work but can design many of our modern medicines.
Structural biology has been seminal in understanding how SARS-CoV-2, the virus that causes COVID-19, and is the foundation of our understanding of protein folding. In fact, more structural biologists have been awarded Nobel Prizes than those in any other field.
In 1971, Helen Berman, a co-founder of the PDB and now a professor emerita at Rutgers University, and colleagues realized that the research community would benefit from sharing structural biology data. The PDB archive that they started has grown into a global database managed by the Worldwide Protein Data Bank consortium (wwPDB) of partner sites in Asia, Europe and America.
“The PDB plays a seminal role in structure-based drug design, a mainstay of many of our current therapeutics… (and) has given rise to the entire field of structural bioinformatics,” Berman said.
Most scientific journals require deposition of structural biology data in the PDB prior to publication. The PDB data are readily accessible to scientists, educators and nonscientists alike.
Leading structural biologists at the meeting from Caltech, Stanford University, Tsinghua University, Harvard Medical School and many other institutions will celebrate the history of the PDB archive. They will also present their current research on topics ranging from SARS-CoV-2 replication, cancer therapies based on antibodies conjugated to small molecules, and immunity and antiviral drugs.
Thousands of scientists have contributed and access the PDB archive regularly. The Journal of Biological Chemistry recently released a special issue on this theme, scientific advances enabled by the PDB.
The importance of sharing structural biology data for systems biology, protein design and drug discovery will continue to open our world into the intricacies biology.
Learn more about the upcoming May meeting and review the agenda at ASBMB. Registration ends May 1.
wwPDB, in collaboration with the PDBx/mmCIF Working Group, has set plans to extend the length of ID codes for PDB and Chemical Component Dictionary (CCD) ID entries in the future. Entries containing these extended IDs will not be supported by the legacy PDB file format.
CCD entries are currently identified by unique three-character alphanumeric codes. At current growth rates, we anticipate running out of available new codes in the next three to four years. At this point, the wwPDB will issue four-character alphanumeric codes for CCD IDs in the OneDep system. Due to constraints of the legacy PDB file format, entries containing these new, four character ID codes will only be distributed in PDBx/mmCIF format. The wwPDB will begin implementation of extended CCD ID codes in 2022.
In addition, wwPDB also plans extended PDB ID length to eight characters prefixed by ‘PDB’, e.g., pdb_00001abc. Each PDB ID has a corresponding Digital Object Identifier (DOI), often required for manuscript submission to journals and described in publications by the structure authors. Both extended PDB IDs and corresponding PDB DOIs, along with existing four character PDB IDs, will be included in the PDBx/mmCIF formatted files for all new entries by Fall 2021.
For example, PDB entry 1ABC will also have the extended PDB ID (pdb_00001abc) and the corresponding PDB DOI (10.2210/pdb1abc/pdb) listed in the _database_2 PDBx/mmCIF category.
loop_ _database_2.database_id _database_2.database_code _database_2.pdbx_database_accession _database_2.pdbx_DOI PDB 1abc pdb_00001abc 10.2210/pdb1abc/pdb WWPDB D_1xxxxxxxxx ? ?
Once four-character PDB IDs are all consumed, newly-deposited PDB entries will only be issued extended PDB ID codes, and entries will only be distributed in PDBx/mmCIF format.
wwPDB is asking PDB users and related software developers to review code and begin to remove such limitations for the future.
Throughout 2021, the wwPDB will be celebrating the 50th anniversary of the PDB archive (wwpdb.org/pdb50).
The inaugural symposium will be held virtually on May 4-5, 2021.
The online sessions will take place between 11 a.m. – 4:30 p.m. ET each day. The event will be recorded and made available to registered participants after the meeting.
Students and postdoctoral fellows are especially encouraged to attend and will be eligible for poster awards.
Register by May 1 at https://www.asbmb.org/meetings-events/pdb50.
firstname.lastname@example.org is an open discussion forum for questions and discussions with the PDB user community about protein structure, analysis, and related topics. Messages sent to email@example.com will be sent to all subscribers. This bulletin board, which supports HTML formatting and images, replaces the previous forum at firstname.lastname@example.org. Messages, including messages migrated from the previous bulletin board are archived.
Existing subscribers do not need to resubscribe. New users should visit the email@example.com info page to subscribe.
Questions about the migration should be sent to firstname.lastname@example.org.
Questions about PDB structures should be sent to email@example.com.
Congratulations to wwPDB’s Ms. Yumiko Kengaku on processing 10,000 depositions. Yumiko began her career as a biocurator in 2000, as a member the newly-formed PDBj team at her alma mater Osaka University. She is the 1st wwPDB biocurator to process more than 10,000 structures. Many Asian structural biologists know and trust Yumiko. For more than two decades, she has worked closely with depositors to expertly guiding them through the structure deposition process, ensuring timely release of high-quality data. A gift celebrating her long service to PDBj, the wwPDB, and the global scientific community was presented to Yumiko in April 2021. We look forward to celebrating the accomplishments of the next biocurator to reach the 10,000 deposition milestone.
Extensions to the PDBx/mmCIF dictionary for reflection data with anisotropic diffraction limits, for unmerged reflection data, and for quality metrics of anomalous diffraction data are now supported in OneDep.
In October 2020, a subgroup of the wwPDB PDBx/mmCIF Working Group was convened to develop a richer description of experimental data and associated data quality metrics. Members of this Data Collection and Processing Subgroup are all actively engaged in development and support of diffraction data processing software. The Subgroup met virtually for several months discussing, reviewing, and finalizing a new set dictionary content extension that were incorporated into the PDBx/mmCIF dictionary on February 16, 2021. A reference implementation of the new content extensions has been developed by Global Phasing Ltd.
These extensions facilitate the deposition and archiving of a broader range of diffraction data, as well as new quality metrics pertaining to these data. These extensions cover three main areas:
The new mmCIF data extensions describing anisotropic diffraction now enable archiving of the results of Global Phasing’s STARANISO program. Developers of other software can make use of them or extend the present definitions to suit their applications. Example files created by autoPROC, BUSTER (version 20210224) and Gemmi that are compliant with the new dictionary extensions are provided in a GitHub repository.
These example files, and similarly compliant files produced by other data processing and/or refinement programs, are suitable for direct uploading to the wwPDB OneDep system. Automatic recognition of that compliance, implemented by means of explicit dictionary versioning using the new pdbx_audit_conform record, will avoid unnecessary pre-processing at the time of deposition. This improved OneDep support will ensure a lossless round trip between data processing/refinement in the lab and deposition at the PDB.
wwPDB strongly encourages structural biologists to always use the latest versions of structure determination software packages to produce data files for PDB deposition. wwPDB also encourages crystallographers wishing to deposit new structures together with their associated diffraction data to use the software which guarantees consistency between data and final model. This consistency is difficult to achieve when separate diffraction data files and model coordinate files are pieced together a posteriori by ad hoc means.
wwPDB also encourages depositors to make their raw diffraction images available from one of the public repositories to allow direct access to the original diffraction image data.
To improve the clarity of assembly definitions in curation, wwPDB now makes curated PDB assemblies available for depositors to view in OneDep using the Mol* viewer.
One of the important processes in curation of PDB entries is the definition of assemblies for each structure. This helps users of PDB data to understand the structure in the context of its complex formation in the specific experimental conditions.
To ensure that assemblies are curated correctly, they are reviewed by annotators at the time of curation before being reported back to the depositors after the curation process.
The deposition system in OneDep has now been enhanced so that after curation, the annotated assembly is displayed in the Mol* 3D viewer for depositors to review. This viewer is available in a new Review section in the deposition interface, which is present after curation of the entry. The Mol* viewer can display PDB structure data within the browser with minimal memory requirements, therefore making it quick and easy to visually display assembly information.
These changes will help improve the validation and reporting of curated assemblies during the deposition process.
In 2014, PDBx/mmCIF became the PDB’s archive format and the the legacy PDB file format was frozen. In addition to PDBx/mmCIF files for all entries, wwPDB produces PDB format-formatted files for entries that can be represented in this legacy file format (e.g., entries with over 99,999 atoms or with multi-character chain IDs are only available in PDBx/mmCIF)
As the size and complexity of PDB structures increases, additional limitations of the legacy PDB format are becoming apparent and need to be addressed.
Restrictions in the SHEET record fields in legacy the PDB file format do not allow for the generation of complex beta sheet topology. Complex beta sheet topologies include instances where beta strands are part of multiple beta sheets and other cases where the definition of the strands within a beta sheet cannot be presented in a linear description. For example, in PDB entry 5wln a large beta barrel structure is created from multiple copies of a single protein; within the beta sheet forming the barrel are instances of a single beta strand making contacts on one side with multiple other strands, even from different chains.
This limitation, however, is not an issue in the PDBx/mmCIF formatted file, where these complex beta sheet topology can be captured in _struct_sheet, _struct_sheet_order, _struct_sheet_range, and _struct_sheet_hbond.
Starting June 8th 2021, legacy PDB format files will no longer be generated for PDB entries where the SHEET topology cannot be generated. For these structures, wwPDB will continue to provide secondary structure information with helix and sheet information in the PDBx/mmCIF formatted file.
wwPDB regularly reviews the software used during OneDep biocuration. The _struct_site and _struct_site_gen categories in PDBx/mmCIF (SITE records in the legacy PDB file format) are generated by in-house software and based purely upon distance calculations, and therefore may not reflect biological functional sites.
Starting in June 2021, the in-house legacy software which produces _struct_site and _struct_site_gen records will be retired and wwPDB will no longer generate these categories for newly-deposited PDB entries. Existing entries will be unaffected.
A new article in Structure describes new features, including branched representations and 2D SNFG images for carbohydrates, identification of ligands of interest, 3D views of electron density fit, and 2D images of small molecule geometry.
These enhancements and processes for validation of 3D small-molecular structures reflect recommendations from the wwPDB/CCDC/D3R Ligand Validation Workshop and the adoption of software through community collaborations.
This manuscript also highlights enhancements made since the initial implementation of Validation Reports as described in Validation of the Structures in the Protein Data Bank (2017) Structure 25: 1916-1927 doi: 10.1016/j.str.2017.10.009.
Enhanced Validation of Small-Molecule Ligands and Carbohydrates in the Protein Data Bank Zukang Feng, John D. Westbrook, Raul Sala, Oliver S. Smart, Gérard Bricogne, Masaaki Matsubara, Issaku Yamada, Shinichiro Tsuchiya, Kiyoko F. Aoki-Kinoshita, Jeffrey C. Hoch, Genji Kurisu, Sameer Velankar, Stephen K. Burley, and Jasmine Y. Young (2021) Structure doi: 10.1016/j.str.2021.02.004
Abstract submission and reduced registration rates end March 22. Register at https://www.asbmb.org/meetings-events/pdb50.
With this week's update, 1,018 SARS-CoV-2-related structures are now freely available from the Protein Data Bank.
The first SARS-CoV-2 structure, a high-resolution crystal structure of the coronavirus main protease (PDB 6lu7), was released early in the pandemic on February 5, 2020.
Since then, structural biologists have visualized most of the SARS-CoV-2 proteome, including the spike protein binding to its ACE2 receptor and neutralizing antibodies, and the main protease, the papain-like proteinase, and other promising drug discovery targets. All of the structures and related data are available for exploration from wwPDB partner websites: RCSB PDB, PDBe, PDBj, and BMRB.
Rapid public release of SARS-CoV-2 structure data has greatly increased our understanding of Covid-19, allowed direct visualization of emerging variants of the virus, and facilitated structure-guided drug discovery and reuse to combat infection. Open access to PDB structures has already enabled design of effective vaccines against SARS-CoV-2.
The response of the research community to the pandemic has highlighted the importance of open access to scientific data in real time. The wwPDB strives to ensure that 3D biological structure data remain freely accessible for all, while maintaining as comprehensive and accurate an archive as possible.
The impact of these 1018 structures and many more coronavirus protein structures to come stands as a testament to the importance of open access to structural biology research data.
Throughout 2021, the wwPDB will be celebrating the 50th anniversary of the PDB archive.
The inaugural symposium will be held May 4-5, 2021 in an event hosted by the American Society for Biochemistry and Molecular Biology and organized by the wwPDB Foundation.
This celebration of the 50th anniversary of the founding of the Protein Data Bank as the first open access digital data resource in biology will include presentations from speakers from around the world who have made tremendous advances in structural biology and bioinformatics.
Attendees are encouraged to participate in the virtual poster session and exhibition hall. Students and postdoctoral fellows will be eligible for poster awards.
Register and submit abstracts by March 15th, 2021 for reduced rates.
Speakers will include:
The online sessions will take place between 11 a.m. – 4:30 p.m. EST each day. The event will be recorded and made available to registered participants after the meeting.
Sponsorship opportunities are available; please contact the wwPDB Foundation for more information.
The wwPDB archive has now been updated to include validation reports for every released set of EM model coordinates in the PDB and every released EMDB map entry. Validation reports provide quantitative and visual assessments of structure quality and enable archive-wide comparisons (https://www.wwpdb.org/validation/validation-reports).
wwPDB EM validation reports were first made available to OneDep depositors in 2019 (http://www.wwpdb.org/news/news?year=2019#5db841ceea7d0653b99c8839). The current reports are based on recommendations obtained from EM Validation Task Force (VTF) meetings in 2010 (Structure 20: 205-214, wwpdb.org/task/em) and 2020 (white paper in preparation), as well as EM Validation Challenge events (https://www.ncbi.nlm.nih.gov/pubmed/32002441, https://www.biorxiv.org/content/10.1101/2020.06.12.147033v1). Examples of recent improvements include images for deposited masks, improved map-model overlay images, visualization of a (approximate) raw map from two half-maps, and rotationally averaged power spectrum plots. The underlying methodology is continually improved, based on community requirements, requests and feedback.
The PDB Core Archive holds validation reports that assess each PDB model along with its associated experimental map/tomogram from EMDB. EM map+model reports can be downloaded at the following wwPDB mirrors:
The EMDB Core Archive holds validation reports that assess each EMDB map/tomogram entry. EM map-only reports can be downloaded at the following URLs:
Additional information about validation reports is available for EM map+model, EM map only, and EM tomograms.
If you have any questions or queries about wwPDB validation, please contact us at firstname.lastname@example.org.
A snapshot of the PDB Core archive (ftp://ftp.wwpdb.org) as of January 5th, 2021 has been added to ftp://snapshots.wwpdb.org and ftp://snapshots.pdbj.org. Snapshots have been archived annually since 2005 to provide readily identifiable data sets for research on the PDB archive.
The directory 20210105 includes the structure and experimental data for the 173,005 PDB entries available at that time. Atomic coordinate and related metadata are available in PDBx/mmCIF, PDB, and XML file formats. The date and time stamp of each file indicates the last time the file was modified. The snapshot of PDB Core archive is 822 GB.
A snapshot of the EMDB Core archive (ftp://ftp.ebi.ac.uk/pub/databases/emdb/) as of January 4, 2021 can be found in ftp://ftp.ebi.ac.uk/pub/databases/emdb_vault/20210101/ and ftp://snapshots.pdbj.org/20210101/. The snapshot of EMDB Core archive contains map files and their metadata within XML files for both released and obsoleted entries (13731 and 142, respectively) and is 2.9 TB in size.