PDB NextGen Archive

Since February 01 2023, the wwPDB enriches PDB entries with additional annotation and distributes the latest versions of each entry via next generation archive (NextGen) accessible at https://files-nextgen.wwpdb.org and its mirrors in the USA, UK and Japan.

PDB NextGen Repository

The PDB NextGen Repositories are updated every month on the 1st Wednesday at 00:00 UTC.

wwPDB: https://files-nextgen.wwpdb.org, rsync://rsync-nextgen.wwpdb.org
RCSB PDB (USA): https://files-nextgen.rcsb.org, rsync://rsync-nextgen.rcsb.org
PDBe (UK): https://ftp.ebi.ac.uk/pub/databases/pdb_nextgen/
PDBj (Japan): https://files-nextgen.pdbj.org, rsync://rsync-nextgen.pdbj.org

What is Enriched in the PDB Entry

This enriched PDB archive provides annotation from external database resources in the metadata that is in addition to what is in the structure model files in the PDB main archive, https://files.wwpdb.org. For example, the sequence annotation from external resources such as UniProt, SCOP2 and Pfam are provided at atom, residue, and chain levels: _pdbx_sifts_unp_segments and _pdbx_sifts_xref_db_segments for each segment, _pdbx_sifts_xref_db at residue level, and _atom_site at the atom level.

More external annotation will be added as this archive grows.

Directory structure of the PDB NextGen archive

This archive uses the extended PDB accession codes to 8 characters prefixed with “pdb” in file naming, e.g., "pdb_00001abc"

Similar to the versioned PDB archive, in the NextGen tree all files for a particular entry are stored in single directory (e.g., "pdb_00001abc").

These directories are grouped under a 2-character hash from the two penultimate characters of the PDB code (last character minus 2 and last character minus 1). For example, the hash would be "ab" for PDB entry "pdb_00001abc"):

../pdb_nextgen/data/entries/divided/<two-letter-hash>/<pdb_accession_code>/<entry_data_File_names>

Thus, all files for entry pdb_00001abc would be stored in the following directory:

../pdb_nextgen/data/entries/divided/ab/pdb_00001abc/

File names in the NextGen archive

File names in the NextGen archive conform to a similar naming scheme as versioned archive with a tag “enrich” to identify the source of PDB archives:

<PDB_ID>_<content_type>-<archive resource>.<file_format_type>.<file_compression_type>

For example, the latest version of PDB entry 1abc would have the following form under the new file-naming scheme:

pdb_00001abc_xyz-enrich.cif.gz

pdb_00001abc_xyz-no-atom-enrich.xml.gz