wwPDB 2020 News

Contents

01/14/2020 Distribution of NMR data in a unified format at the PDB archive
01/10/2020 EMDB policy and procedures document now available
01/08/2020 Time-stamped Copies of PDB and EMDB Archives

01/14/2020

Distribution of NMR data in a unified format at the PDB archive

The wwPDB partners are pleased to announce that as of March 2020 the OneDep system will begin accepting upload of NMR experimental data as a single file, either in NMR-STAR or NEF format. This will start the transition from the current practice where distinct types of NMR data such as assigned chemical shifts, restraints, and peak lists are uploaded separately.

NMR-STAR is the official wwPDB format for storing NMR data, supported by an extensive dictionary [GitHub; Ulrich, E. L. et al. (2019) NMR-STAR: comprehensive ontology for representing, archiving and exchanging data from nuclear magnetic resonance spectroscopic experiments Journal of Biomolecular NMR, 73: 5–9. doi: 10.1007/s10858-018-0220-3], while NEF (NMR exchange format; Gutmanas et al. (2015) NMR Exchange Format: a unified and open standard for representation of NMR restraint data Nature Structural & Molecular Biology 22: 433–434 doi: 10.1038/nsmb.3041) is a light-weight format and dictionary, supported by the leading software in NMR structure determination. The use of these two interconvertible standard formats as single data files will simplify the process of deposition, as well as the storage and distribution of this data.

For newly deposited entries accompanied by such a unified data file, the NMR data will be distributed in the PDB FTP area as single files in the NMR-STAR format. A best effort conversion to the NEF format will also be provided. These unified NMR data files will be added to a new FTP directory, “nmr_data” in parallel to the existing directories, nmr_restraints and nmr_chemical shifts. In addition, to support existing users these unified files that contain both restraints and chemical shift data will be copied to the existing directories “nmr_restraints” and “nmr_chemical_shifts”.

A standardized naming convention for NMR unified data will also be developed to simplify access of the relevant NMR data. File naming will start with PDB accession code, followed by nmr_data with format type extension, for example ‘2lcb_nmr_data.nef’ or ‘2lcb_nmr_data.str’.

We plan to begin accepting and distributing NMR data as unified files from March 2020.

01/10/2020

EMDB policy and procedures document now available

A comprehensive policy and procedures document for the EMDB archive has been drawn up by the EMDB team in order to ensure consistent and coherent rules for its data. The document is now available to view on the EMDB website.

Since its foundation in 2002, the Electron Microscopy Data Bank (EMDB; https://emdb-empiar.org/) archives publicly available three-dimensional (3D) electron cryo-microscopy (cryo-EM) maps and tomograms of biomacromolecules, their complexes and cellular structures. Following the release of the first eight EMDB entries in 2002, the EMDB archive grew steadily and currently stands at almost 10000 released maps. From 2016, EMDB entries are deposited and processed through the wwPDB OneDep system while the biocuration workload is shared geographically by the EMDB, PDBe, RCSB PDB and PDBj teams.

The policy outlines the requirements for data deposition, accepted formats, entry modifications and release. For example, the policy recommends for single-particle depositions to include a primary map (as shown in the accompanying publication), a raw map (unmasked, unfiltered, unsharpened) and unmasked half-maps, as well as any auxiliary files such as Fourier Shell Correlation (FSC) data. In the document EMDB also advises that the official wwPDB validation report generated after biocuration, which now includes EM map/tomogram validation and, if applicable, map-model validation, is provided to journal editors and referees as part of the manuscript submission and review process.

In response to the cryo-EM community’s increasing demand to make all data publicly available, EMDB strongly encourages the deposition of all atomic models to the Protein Data Bank (PDB), all 3D EM reconstructions to EMDB and all raw data (including tilt series for tomograms) to the Electron Microscopy Public Image Archive (EMPIAR ). Related entries in these archives reference one another, making the deposited data easily discoverable and accessible to the community.

The EMDB team and its umbrella organisation wwPDB, welcome feedback from EMDB users and depositors on the policies and procedures through emdbhelp@ebi.ac.uk.

01/08/2020

Time-stamped Copies of PDB and EMDB Archives

A snapshot of the PDB Core archive (ftp://ftp.wwpdb.org) as of January 1, 2020 has been added to ftp://snapshots.wwpdb.org and ftp://snapshots.pdbj.org. Snapshots have been archived annually since 2005 to provide readily identifiable data sets for research on the PDB archive.

The directory 20200101 includes the 159,140 experimentally-determined structure and experimental data available at that time. Atomic coordinate and related metadata are available in PDBx/mmCIF, PDB, and XML file formats. The date and time stamp of each file indicates the last time the file was modified. The snapshot of PDB Core archive is 575 GB.

A snapshot of the EMDB Core archive (ftp://ftp.ebi.ac.uk/pub/databases/emdb/) as of January 1, 2020 can be found in ftp://ftp.ebi.ac.uk/pub/databases/emdb_vault/20200101/ and ftp://snapshots.pdbj.org/20200101/. The snapshot of EMDB Core archive contains map files and their metadata within XML files for both released and obsoleted entries (10370 and 130, respectively) and is 1.7 TB in size.