In July 2020, the wwPDB will roll out updated PDB structures and reference data files with standardized representation of carbohydrate molecules, improving the Findability and Interoperability of PDB data. Detailed information about this work is available from the wwPDB website, including PDBx/mmCIF dictionary extensions and over 500 example files. We encourage developers of software packages that produce, access, or visualize PDB data to review this information and adapt their software.
Through collaboration with the glycoscience community, software tools were developed to standardize atom nomenclature of nearly 800 monosaccharides in the Chemical Component Dictionary (CCD) and applied branched polymeric representation to oligo- and polysaccharides within the PDB archive, enabling easy translation to other representations commonly used by glycobiologists. To guarantee unambiguous chemical description of oligo-/polysaccharides in each of the nearly 12,000 affected PDB entries, we have included an explicit description of covalent linkage information between their monomeric units. To ensure continued Findability of common oligosaccharides (e.g., sucrose, Lewis X factor), we have expanded the Biologically Interesting molecule Reference Dictionary (BIRD) which will contain the covalent linkage information and common synonyms for such molecules.
wwPDB is also taking this opportunity to improve the organization of chemical synonyms in the CCD by introducing a new _pdbx_chem_comp_synonyms data category. This will enable more comprehensive capture of alternative names for small molecules in the PDB. To minimize disruption to users, there will be an initial transition period, where the legacy data item, _chem_comp.pdbx_synonyms, will be retained.
We are pleased to announce that from February 18, 2020 authors (PIs) of released PDB structures can update the model coordinates while retaining the same PDB accession code, thereby preserving the link with the original publication. In this second and final phase of the project we have extended the versioning functionality to structures deposited prior to the roll out of OneDep--the common wwPDB system for deposition, validation, and biocuration.
For entries deposited via OneDep, depositors should log into the corresponding session at deposit.wwpdb.org and submit the request via the OneDep communication panel. For entries deposited via legacy systems, requests should be initiated by sending an email to firstname.lastname@example.org and including the PDB code in the subject and body of the email. Once submitted, the revised model will be processed by wwPDB biocurators and the updated version released immediately upon depositor’s approval. Versioning of PDB entries will be limited to changes in the coordinate files, with no changes permitted to the deposited experimental data. To limit the impact on the wwPDB biocuration resources, PDB versioning is currently restricted to one replacement per PDB entry per year, and three entries per Principal Investigator per year. We will review this restriction in 2021.
The most recent version of the entry will be made available in the PDB archive FTP (ftp.wwpdb.org). All major versions of a PDB structure will be retained in the versioned FTP archive (ftp-versioned.wwpdb.org)--more information can be found on the wwPDB website. The structure of the versioned FTP archive has been built allowing for future extension of the PDB code format. PDB entry 1abc would therefore be found in the folder pdb_00001abc.
Changes made to entries during versioning are considered to be either major or minor. Updates to atomic coordinates, polymer sequence, or chemical description trigger a major version increment, while changes to any other categories are classified as minor. Changes introduced are recorded in the PDBx/mmCIF audit categories.
If you have any further queries regarding the process of PDB versioning, please contact the wwPDB at email@example.com.
PDB data provide a starting point for structure-guided drug discovery
A high-resolution crystal structure of COVID-19 (2019-nCoV) coronavirus 3CL hydrolase (Mpro) has been determined by Zihe Rao and Haitao Yang's research team at ShanghaiTech University. Rapid public release of this structure of the main protease of the virus (PDB 6lu7) will enable research on this newly-recognized human pathogen.
Recent emergence of the COVID-19 coronavirus has resulted in a WHO-declared public health emergency of international concern. Research efforts around the world are working towards establishing a greater understanding of this particular virus and developing treatments and vaccines to prevent further spread.
While PDB entry 6lu7 is currently the only public-domain 3D structure from this specific coronavirus, the PDB contains structures of the corresponding enzyme from other coronaviruses. The 2003 outbreak of the closely-related Severe Acute Respiratory Syndrome-related coronavirus (SARS) led to the first 3D structures, and today there are more than 200 PDB structures of SARS proteins. Structural information from these related proteins could be vital in furthering our understanding of coronaviruses and in discovery and development of new treatments and vaccines to contain the current outbreak.
The coronavirus 3CL hydrolase (Mpro) enzyme, also known as the main protease, is essential for proteolytic maturation of the virus. It is thought to be a promising target for discovery of small-molecule drugs that would inhibit cleavage of the viral polyprotein and prevent spread of the infection.
Comparison of the protein sequence of the COVID-19 coronavirus 3CL hydrolase (Mpro) against the PDB archive identified 95 PDB proteins with at least 90% sequence identity. Furthermore, these related protein structures contain approximately 30 distinct small molecule inhibitors, which could guide discovery of new drugs. Of particular significance for drug discovery is the very high amino acid sequence identity (96%) between the COVID-19 coronavirus 3CL hydrolase (Mpro) and the SARS virus main protease (PDB 1q2w). Summary data about these closely-related PDB structures are available (CSV) to help researchers more easily find this information. In addition, the PDB houses 3D structure data for more than 20 unique SARS proteins represented in more than 200 PDB structures, including a second viral protease, the RNA polymerase, the viral spike protein, a viral RNA, and other proteins (CSV).
Public release of the COVID-19 coronavirus 3CL hydrolase (Mpro), at a time when this information can prove most vital and valuable, highlights the importance of open and timely availability of scientific data. The wwPDB strives to ensure that 3D biological structure data remain freely accessible for all, while maintaining as comprehensive and accurate an archive as possible. We hope that this new structure, and those from related viruses, will help researchers and clinicians address the COVID-19 coronavirus global public health emergency.
The wwPDB partners are pleased to announce that as of March 2020 the OneDep system will begin accepting upload of NMR experimental data as a single file, either in NMR-STAR or NEF format. This will start the transition from the current practice where distinct types of NMR data such as assigned chemical shifts, restraints, and peak lists are uploaded separately.
NMR-STAR is the official wwPDB format for storing NMR data, supported by an extensive dictionary [GitHub; Ulrich, E. L. et al. (2019) NMR-STAR: comprehensive ontology for representing, archiving and exchanging data from nuclear magnetic resonance spectroscopic experiments Journal of Biomolecular NMR, 73: 5–9. doi: 10.1007/s10858-018-0220-3], while NEF (NMR exchange format; Gutmanas et al. (2015) NMR Exchange Format: a unified and open standard for representation of NMR restraint data Nature Structural & Molecular Biology 22: 433–434 doi: 10.1038/nsmb.3041) is a light-weight format and dictionary, supported by the leading software in NMR structure determination. The use of these two interconvertible standard formats as single data files will simplify the process of deposition, as well as the storage and distribution of this data.
For newly deposited entries accompanied by such a unified data file, the NMR data will be distributed in the PDB FTP area as single files in the NMR-STAR format. A best effort conversion to the NEF format will also be provided. These unified NMR data files will be added to a new FTP directory, “nmr_data” in parallel to the existing directories, nmr_restraints and nmr_chemical shifts. In addition, to support existing users these unified files that contain both restraints and chemical shift data will be copied to the existing directories “nmr_restraints” and “nmr_chemical_shifts”.
A standardized naming convention for NMR unified data will also be developed to simplify access of the relevant NMR data. File naming will start with PDB accession code, followed by nmr_data with format type extension, for example ‘2lcb_nmr_data.nef’ or ‘2lcb_nmr_data.str’.
We plan to begin accepting and distributing NMR data as unified files from March 2020.
A comprehensive policy and procedures document for the EMDB archive has been drawn up by the EMDB team in order to ensure consistent and coherent rules for its data. The document is now available to view on the EMDB website.
Since its foundation in 2002, the Electron Microscopy Data Bank (EMDB; https://emdb-empiar.org/) archives publicly available three-dimensional (3D) electron cryo-microscopy (cryo-EM) maps and tomograms of biomacromolecules, their complexes and cellular structures. Following the release of the first eight EMDB entries in 2002, the EMDB archive grew steadily and currently stands at almost 10000 released maps. From 2016, EMDB entries are deposited and processed through the wwPDB OneDep system while the biocuration workload is shared geographically by the EMDB, PDBe, RCSB PDB and PDBj teams.
The policy outlines the requirements for data deposition, accepted formats, entry modifications and release. For example, the policy recommends for single-particle depositions to include a primary map (as shown in the accompanying publication), a raw map (unmasked, unfiltered, unsharpened) and unmasked half-maps, as well as any auxiliary files such as Fourier Shell Correlation (FSC) data. In the document EMDB also advises that the official wwPDB validation report generated after biocuration, which now includes EM map/tomogram validation and, if applicable, map-model validation, is provided to journal editors and referees as part of the manuscript submission and review process.
In response to the cryo-EM community’s increasing demand to make all data publicly available, EMDB strongly encourages the deposition of all atomic models to the Protein Data Bank (PDB), all 3D EM reconstructions to EMDB and all raw data (including tilt series for tomograms) to the Electron Microscopy Public Image Archive (EMPIAR ). Related entries in these archives reference one another, making the deposited data easily discoverable and accessible to the community.
The EMDB team and its umbrella organisation wwPDB, welcome feedback from EMDB users and depositors on the policies and procedures through firstname.lastname@example.org.
A snapshot of the PDB Core archive (ftp://ftp.wwpdb.org) as of January 1, 2020 has been added to ftp://snapshots.wwpdb.org and ftp://snapshots.pdbj.org. Snapshots have been archived annually since 2005 to provide readily identifiable data sets for research on the PDB archive.
The directory 20200101 includes the 159,140 experimentally-determined structure and experimental data available at that time. Atomic coordinate and related metadata are available in PDBx/mmCIF, PDB, and XML file formats. The date and time stamp of each file indicates the last time the file was modified. The snapshot of PDB Core archive is 575 GB.
A snapshot of the EMDB Core archive (ftp://ftp.ebi.ac.uk/pub/databases/emdb/) as of January 1, 2020 can be found in ftp://ftp.ebi.ac.uk/pub/databases/emdb_vault/20200101/ and ftp://snapshots.pdbj.org/20200101/. The snapshot of EMDB Core archive contains map files and their metadata within XML files for both released and obsoleted entries (10370 and 130, respectively) and is 1.7 TB in size.