wwPDB 2017 News

Contents

08/19/2017 wwPDB Events at IUCr (August 21-28)
08/01/2017 Better Management of PDB Archive with File Versioning and Revision History
07/11/2017 Enriched Model Files Conforming to OneDep Data Standards Now Available in the PDB FTP Archive
06/06/2017 5 Easy Steps to PDB Deposition
05/17/2017 Revise Your Structure Without Changing the PDB Accession Code and Related Changes to the FTP Archive
05/02/2017 Enriched PDB Structure Entry Files Conforming to OneDep Data Standards Are Now Available for Testing
04/26/2017 Freezing the PDB Format
03/22/2017 Archival PDBx/mmCIF Version V4 to V5 Update in the Protein Data Bank
03/15/2017 Updated Validation Reports for Archived PDB Structures Now Available
03/14/2017 Data Management: A global coalition to sustain core data
02/09/2017 The paper describing wwPDB OneDep system is now available
01/09/2017 Time-stamped Copies of the PDB Archive Available

08/19/2017

wwPDB Events at IUCr (August 21-28)

Meet wwPDB members from around the globe at the 24th General Assembly and Congress of the International Union of Crystallography (IUCr, August 21-28, 2017) in Hyderabad.

RCSB PDB News ImageCome to exhibition stand 75 to receive a special fan celebrating the wwPDB OneDep system.

Other wwPDB events include:

Tuesday August 22
- Oral Presentation at 10:30am in Hall 4: Small-molecule ligand/drug representation and validation in the Protein Data Bank presented by Genji Kurisu (PDBj)
- Poster 914: OneDep: wwPDB System for Deposition, Biocuration, Validation of Macromolecular Structures presented by Aleksandras Gutmanas (PDBe)

Thursday August 24
- Microsymposia-044 in Hall MR 2.02: Structural databases as teaching tools - Part A macromolecules) chaired by Joel Sussman (Weizmann Institute) and Christine Zardecki (RCSB PDB)

  • 10:30 Enlightening macromolecular structure-function relationship with Proteopedia Jaime Prilusky (Weizmann Institute)
  • 11:00 Structural view of biology: Exploring new perspectives for deeper learning Shuchismita Dutta (RCSB PDB)
  • 11:30 Disease to therapeutics via 3D structures: stories from viral world Urmila Kulkarni-Kale (University of Pune)
  • 12:00 PDBe: Bringing structure to biology and beyond Sameer Velankar (PDBe)
  • 12:30 SASBDB and DARA as biological solution scattering teaching tools by Alexey Kikhney (EMBL-Hamburg)
  • 12:45 Play with 3D structure data of biomolecules by Hirofumi Suzuki (PDBj)
- Poster 824: LiteMol: Web-based 3D visualization of macromolecular structure data presented by Matthew Conroy (PDBe)
- Poster 1366: PDB-101: Educational Portal for Molecular Explorations Through Biology and Medicine presented by Christine Zardecki (RCSB PDB)

Friday August 25
- Poster 1315: Annotation of organic CoFactor molecules in PDB presented by Abhik Mukhopadhyay (PDBe)

Saturday August 26
- Poster 1396: RCSB PDB: Structural biology views for basic and applied research presented by John Westbrook (RCSB PDB)
- Poster 1430: BioSync: An Online Resource for X-ray Facilities Worldwide presented by Stephen K. Burley (RCSB PDB)

Sunday August 27
- Oral Presentation at 12:30pm in Hall MR 2.03-2.04: wwPDB OneDep Validation Services presented by John Westbrook (RCSB PDB)
- Oral Presentation at 16:30pm in Hall MR 2.02: PDBx/mmCIF: The Foundation for the wwPDB OneDep System presented by John Westbrook (RCSB PDB)

The wwPDB is also sponsoring a Poster Prize for students in the “structural biology” category.

08/01/2017

Better Management of PDB Archive with File Versioning and Revision History

As announced on May 17, 2017, wwPDB will introduce a file versioning system to retain depositor-initiated updates of previously released coordinate entries. A new FTP repository will host versioned files. Versions will be separated into major and minor updates. Updates to atomic coordinates, polymer sequence or chemical description in a PDB coordinate file will trigger a major version increment. Other changes will be classified as minor. All latest major versions of each PDB structure will be retained in the new FTP archive.

wwPDB will deliver versioned files in two phases:

  • Phase 1 (October 2017), we will release the new versioned FTP archive at ftp://ftp-version.wwpdb.org for structural model files in PDBx/mmCIF and PDBML formats.
  • Phase 2 will be released in 2018 and will support depositor-initiated updates of coordinates in PDBx/mmCIF and PDBML formats.

File names in the versioned FTP archive will conform to a new naming scheme, which allows users to easily see the major and minor version number:

<PDB_ID>_<content_type>_v<major_version>-<minor_version>.<file_format_type>.<file_compression_type>

The familiar 4 character PDB accession code will be extended to 8 characters and will include the prefix “pdb”. Thus PDB accession code for entry 1abc would become pdb_00001abc.

For example, the first initial release of PDB entry 1abc would have the following form under the new file-naming scheme: pdb_00001abc_xyz_v1-0.cif.gz

where xyz stands for coordinate content; cif indicates the file format; and gz indicates a zipped UNIX archive file.

The first minor revision of PDB entry 1abc would then have the following name:

pdb_00001abc_xyz_v1-1.cif.gz

If PDB entry 1abc then had a major update, it would have the following name : pdb_00001abc_xyz_v2-0.cif.gz (N.B.: The minor update number will be reset to zero every time a new major update is made.)

The versioned data files for a particular entry will be stored in single directory following a 2 character hash from the penultimate two characters of the PDB code:

../pub/pdb_versioned/data/entries/<two-letter-hash>/<pdb_accession_code>/<entry_data_File_names>

For example, major version 1 with minor version 2 file for entry 1ABC would have the following path:

../pub/pdb_versioned/data/entries/ab/pdb_00001abc/pdb_00001abc_xyz_v1-2.cif.gz

Different views of the repository will be provided for the most common use cases as a convenience for repository users. For 2017 phase 1, views by content type and format similar to the current repository will be introduced. All latest major versions are included.

../pub/pdb_versioned/views/<content_type>/<file_format_type>/<two-letter-hash>/<pdb_accession_code>/<entry_data_File_names>

For example, the coordinate files in mmCIF format for entry 1ABC will be made available at

../pub/pdb_versioned/views/coordinates/mmcif/ab/pdb_00001abc/pdb_00001abc_xyz_v1-2.cif.gz

../pub/pdb_versioned/views/coordinates/mmcif/ab/pdb_00001abc/pdb_00001abc_xyz_v2-0.cif.gz

Data files in the current archive location ftp://ftp.wwpdb.org/pub/pdb/data/structures/ will continue to use the familiar naming style and will contain only the latest version for every entry.

07/11/2017

Enriched Model Files Conforming to OneDep Data Standards Now Available in the PDB FTP Archive

The model files in the PDB FTP archive have been updated to V5.0 of the PDBx/mmCIF dictionary. Both mmCIF and XML formats have been updated. These files were provided previously for the community to review and test. There is no change to PDB format files as PDB format is a legacy format. These, therefore, do not contain all of the remediated information.

The changes to V5.0 include:

  • Improved audit categories to capture details of changes to files down to the category level for entry revisions.
  • Better organized data content and much more extensive metadata in model files for electron microscopy derived models.
  • Corrected source organism and sequence references for each sequence fragment in chimeric proteins.
  • Standardized data in several categories, including software name, detector name and detector type.

The complete list of changes can be found at the wwPDB website.

Questions regarding V5.0 data should be sent to deposit-help@mail.wwpdb.org.

05/17/2017

Revise Your Structure Without Changing the PDB Accession Code and Related Changes to the FTP Archive

The wwPDB is planning to introduce in 2017 a new procedure for the management by the Depositor of Record (where the Depositor of Record is defined as the Principal Investigator for the entry) of substantial revisions to previously released PDB archival entries.

At present, revised atomic coordinates for an existing released PDB entry are assigned a new accession code, and the prior entry is obsoleted. This long-standing wwPDB policy had the unintended consequence of breaking connections with publications and usage of the prior set of atomic coordinates, resulting in a non-trivial barrier to submission of atomic coordinate revisions by our Depositors of Record.

The wwPDB is introducing a file versioning system that allows Depositors of Record to update their own previously released entries. Please note, in the first phase, file versioning will be applied to the atomic coordinates refined versus unchanged experimental data.

Version numbers of each PDB archive entry will be designated using a #-# identifier. The first digit specifies the major version, and the second designates the minor version. The Structure of Record (i.e., the initial set of released atomic coordinates) is designated as Version 1-0. Thereafter, the major version digit is incremented with each substantial revision of a given entry (e.g., Version 2-0, when the atomic coordinates are replaced for the first time by the Depositor of Record). “Major version changes” are defined as updates to the atomic coordinates, polymer sequence(s), and/or chemical identify of a ligand. All other changes are defined as “minor changes”. When a major change is made, the minor version number is reset to 0 (e.g., 1-0 to 1-1 to 2-0). For the avoidance of doubt, the wwPDB will retain all major versions with the latest minor versions of an entry within the PDB archive.

Current wwPDB policies governing the deposition of independently refined structures based on the data generated by a research group or laboratory separate from that of the Depositor of Record remain unchanged. Versioning of atomic coordinates will be strictly limited to substitutions made by the Depositor of Record.

Upon introduction of the file versioning system, the wwPDB will revise each PDB accession code by extending its length and prepending “PDB” (e.g., "1abc" will become "pdb_00001abc"). This process will enable text mining detection of PDB entries in the published literature and allow for more informative and transparent delivery of revised data files. For example, the atomic coordinates for the second major version of PDB entry 1abc would have the following form under the new file-naming schema:

pdb_00001abc_xyz_v2-0.cif.gz

The wwPDB is mindful of the importance of continuity in providing services and supporting User activities. For as long as practicable, the wwPDB will continue assigning PDB codes that can be truncated losslessly to the current four-character style. In the same spirit, initial implementation of entry file versioning will appear in a new, parallel branch of the PDB archive FTP tree. More details on the new FTP tree organization and accessibility of version information will be forthcoming. Data files in the current archive location ftp://ftp.wwpdb.org/pub/pdb/data/structures/ will continue to use the familiar naming style and will contain the latest version in the corresponding versioned archive.

05/02/2017

Enriched PDB Structure Entry Files Conforming to OneDep Data Standards Are Now Available for Testing

On July 12, 2017, the wwPDB partners plan to update the PDB FTP archive with PDB structure entry files conforming to V5.0 of the PDBx/mmCIF dictionary, which already supports the global wwPDB system for Deposition, Biocuration, and Validation of PDB data - OneDep.

In preparation for this update, to allow the community ample time to test the planned update and to provide feedback, the wwPDB is now delivering PDBx/mmCIF and XML structure entry files for all entries in the PDB archive conforming to the new data standards via a new FTP repository (ftp://ftp-beta.wwpdb.org/). This collection of test files will be updated in concert with regular weekly updates of the PDB archive.

Complete lists of changes can be found at the wwPDB website (https://www.wwpdb.org/documentation/remediation).

The wwPDB strongly encourages the community to review and test the updated files.

Users should report V5.0 data issues to deposit-help@mail.wwpdb.org

We will attempt to address the reported issues incrementally as User feedback is received in advance of the rollout on July 12, 2017.

Other derived data and experimental data files of ftp-beta tree will be delivered incrementally to the ftp-beta tree between May 3 and July 12, 2017.

The test FTP area (ftp://ftp.wwpdb.org/pub/pdb/test_data/EM/) containing previously updated 3DEM model files (previously made available in December 2016) is to be retired effective May 3 2017.

04/26/2017

Freezing the PDB Format

Since 2014 the wwPDB master distribution format has been PDBx/mmCIF and the PDB file format has been frozen with no further developments. As the PDBx/mmCIF format continues to evolve, PDB format files will become outdated. Further details can be found at http://www.wwpdb.org/documentation/file-formats-and-the-pdb

03/22/2017

Archival PDBx/mmCIF Version V4 to V5 Update in the Protein Data Bank

The wwPDB is preparing the update of PDBx/mmCIF model files for all entries in the PDB archive to V5 version of the PDBx/mmCIF dictionary. When completed, all PDB model files will have better organized content and will conform to the revised data model used within the wwPDB OneDep System. A list of changes will be available at the wwPDB website (https://www.wwpdb.org/documentation/remediation). Since January 2016, the OneDep system (https://www.wwpdb.org/deposition/system-information) has supported Deposition, Biocuration, and Validation of structures determined by experimental methods currently accepted by the PDB.

The updated model files for all experimental methods will be made available in a new PDB FTP server (ftp://ftp-beta.wwpdb.org/pub/pdb/data/structures/), and the corresponding PDBx/mmCIF dictionary will be released in May 2017. The test FTP area (ftp://ftp.wwpdb.org/pub/pdb/test_data/EM/) containing previously updated 3DEM model files (previously made available in December 2016) will be simultaneously retired.

The current PDB FTP archive will be updated with new files corresponding to the V5 PDBx/mmCIF dictionary in July 2017. Users are strongly encouraged to review and test the updated data files.

03/15/2017

Updated Validation Reports for Archived PDB Structures Now Available

The wwPDB partners are pleased to announce that updated validation reports for all X-ray, NMR, and 3DEM structures deposited in the PDB archive are now available on March 15, 2017.

The updates include new percentile statistics reflecting the state of the PDB archive on December 31th 2016 and updated versions of the Mogul software (2017) and CSD archive (as538be).

The updated reports are accessible from the following FTP sites:

  • ftp://ftp.wwpdb.org/pub/pdb/validation_reports/ (wwPDB)
  • ftp://ftp.rcsb.org/pub/pdb/validation_reports/ (RCSB PDB)
  • ftp://ftp.ebi.ac.uk/pub/databases/pdb/validation_reports/ (PDBe)
  • ftp://ftp.pdbj.org/pub/pdb/validation_reports/ (PDBj)

A copy of the previous version is archived at RCSB PDB and PDBj.

These updated wwPDB validation reports provide an assessment of structure quality using widely accepted standards and criteria, recommended by community experts serving in the Validation Task Force. The wwPDB partners strongly encourage journal editors and referees to request them from authors as part of the manuscript submission and review process. The reports are date-stamped and display the wwPDB logo, and contain the same information, regardless of which wwPDB site processed the entry. Provision of wwPDB validation reports is already required by Nature, eLife, The Journal of Biological Chemistry, the International Union of Crystallography (IUCr) journals, FEBS journals, Journal of Immunology and Angew Chem Int Ed Engl as part of their manuscript-submission process.

Validation reports are also provided to depositors through OneDep - the wwPDB portal for validation, deposition and biocuration of structure data. The wwPDB partners encourage the use of the stand-alone validation server and the webservice API at any time prior to data deposition. Depositors are required to review and accept the reports as part of the data submission process. Validation reports will continue to be developed and improved as we receive recommendations from the expert Validation Task Forces (VTF) for X-ray, NMR, EM, and on ligand validation, and as we collect feedback from depositors and users.

Further information and sample validation reports are available.

Your feedback, comments, and questions are welcome at validation@mail.wwpdb.org.

03/14/2017

Data Management: A global coalition to sustain core data

On November 18-19, 2016, the Human Frontier Science Program Organization (HFSPO) hosted a meeting of senior managers of key data resources (including members of the Worldwide Protein Data Bank) and leaders of several major funding organizations to discuss the challenges associated with sustaining biological and biomedical (i.e., life sciences) data resources and associated infrastructure.

A strong consensus emerged from the group that core data resources for the life sciences should be supported through a coordinated international effort(s) that better ensure long-term sustainability and that appropriately align funding with scientific impact. Ideally, funding for such data resources should allow for access at no charge, as is presently the usual (and preferred) mechanism.

The Global Life Sciences Data Resources (GLSDR) Working Group has published a letter in Nature and preprint in bioRxiv on Data Management: A global coalition to sustain core data:

  • W. Anderson, R. Apweiler, A. Bateman, G.A. Bauer, H. Berman, J.A. Blake, N. Blomberg, S.K. Burley, G. Cochrane, V. Di Francesco, T. Donohue, C. Durinx, A. Game, E.D. Green, T. Gojobori, P. Goodhand, A. Hamosh, H. Hermjakob, M. Kanehisa, R. Kiley, J. McEntyre, R. McKibbin, S. Miyano, B. Pauly, N. Perrimon, M.A. Ragan, G. Richards, Y-Y. Teo, M. Westerfield, E. Westhof, P.F. Lasko (2017) Data management: A global coalition to sustain core data Nature 543: 179 doi: 10.1038/543179a
  • W. Anderson, R. Apweiler, A. Bateman, G.A. Bauer, H. Berman, J.A. Blake, N. Blomberg, S.K. Burley, G. Cochrane, V. Di Francesco, T. Donohue, C. Durinx, A. Game, E.D. Green, T. Gojobori, P. Goodhand, A. Hamosh, H. Hermjakob, M. Kanehisa, R. Kiley, J. McEntyre, R. McKibbin, S. Miyano, B. Pauly, N. Perrimon, M.A. Ragan, G. Richards, Y-Y. Teo, M. Westerfield, E. Westhof, P.F. Lasko (2017) Towards coordinated international support of core data resources for the life sciences bioRxiv doi: 10.1101/110825

02/09/2017

The paper describing wwPDB OneDep system is now available

The paper describing wwPDB OneDep system is now available. The wwPDB has deployed a unified system for deposition, biocuration, and validation of macromolecular structures globally across all wwPDB, EMDB, and BMRB deposition sites to meet the evolving requirements of the scientific community to archive structural data over the coming decades.

The OneDep system provides a user-friendly deposition interface and improved structure validation with the benefit of recommendations from expert task forces representing the respective methodological communities. The processing efficiency in biocuration is improved as OneDep supports a more automated workflow.

As Milka Kostic, the Senior Editor at Structure and Cell Chemical Biology described, OneDep is a step in the right direction and offers a single point of entry into the atomic coordinate deposition process, as well as improving processes of structure validation and data biocuration.

01/09/2017

Time-stamped Copies of the PDB Archive Available

A snapshot of the PDB archive (ftp://ftp.wwpdb.org) as of January 1, 2017 has been added to ftp://snapshots.wwpdb.org/. Snapshots have been archived annually since January 2005 to provide readily identifiable data sets for research on the PDB archive.

The directory 20170101 includes the 125,463 experimentally-determined coordinate files and related experimental data available at that time. Coordinate data are available in PDBx/mmCIF, PDB, and XML file formats. The date and time stamp of each file indicates the last time the file was modified. The snapshot is 757 GB.