PDB Beta Archive
Introduction
wwPDB anticipates four character PDB accession code (PDB ID) will be consumed by 2028. With the continuous growth of PDB archive, wwPDB has revised PDB accession code by extending its length and prepending "PDB" (e.g., "1abc" will become "pdb_00001abc"). This new ID format will enable text mining detection of PDB entries in the published literature and allow for more informative and transparent delivery of revised data files.
PDB Beta Archive is provided to help community adopt extended PDB ID and PDBx/mmCIF format during the transition phase. All files at this archive are re-organized with extended PDB ID (including file naming and directories) at entry level, mirroring the same data organization of the PDB Versioned Archive.
All data files for a particular entry are stored in a single directory, labeled based on a two-character hash generated from the penultimate two characters of the PDB code, i.e., https://files-beta.org/pub/wwpdb/pdb/data/entries/<two-letter-hash>/<pdb_accession_code>/<entry_data_File_names>.
The two-letter hash will be based on the second and third characters from the last character. For example, PDB entry PDB_1abc5678 will be under /67/. This will maintain consistency with the current PDB archive: PDB entry 1abc is under /ab.
File naming is standardized such that the file type is used for the extension. For example, file naming is changed from r116dsf.ent.gz to pdb_0000116d-sf.cif.gz for the structure factor file and from pdb318d.ent.gz to pdb_0000318d.pdb.gz for the legacy PDB formatted coordinate file.
When four character PDB IDs are about to be consumed, this PDB Beta Archive will replace the current PDB Archive and entries with extended PDB IDs issued are not compatible with PDB format. wwPDB encourages scientific journals, PDB community and users to transition to PDBx/mmCIF format and adopt new PDB ID format as earlier as possible.
For more information, see FAQ.
File Download
The PDB Beta archive is updated every Wednesday at 00:00 UTC.
wwPDB: https://files-beta.wwpdb.org, rsync://rsync-beta.wwpdb.org
RCSB PDB (US): https://files-beta.rcsb.org, rsync://rsync-beta.rcsb.org (see the download protocol below)
PDBe (UK): https://ftp.ebi.ac.uk/pub/databases/wwpdb/
PDBj (Japan): ftp://ftp-beta.pdbj.org, https://files-beta.pdbj.org, rsync://rsync-beta.pdbj.org
New Sequence and InChI
Every Saturday by 3:00 UTC, for every new entry the wwPDB website provides:
Data Structure and Content
Primary data (atomic coordinates and experimental data) are stored at entry level using a hash directory.
| Data types |
File formats |
Location |
Shortlink |
| Atomic coordinates |
PDBx/mmCIF, XML, and PDB |
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/structures/
|
https://files-beta.wwpdb.org/download/[extended PDB ID].[extension].gz
E.g. (replace pdb_00001abc with real PDB ID):
- https://files-beta.wwpdb.org/download/pdb_00001abc.cif.gz
- https://files-beta.wwpdb.org/download/pdb_00001abc.cif
- https://files-beta.wwpdb.org/download/pdb_00001abc.xml
|
| X-ray data: Structure Factors |
PDBx/mmCIF |
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/structures/
|
https://files-beta.wwpdb.org/download/[extended PDB ID]-sf.cif.gz
E.g. (replace pdb_00001abc with real PDB ID):
- https://files-beta.wwpdb.org/wodnload/pdb_00001abc-sf.cif.gz
|
| NMR data: Restraints and chemical shifts |
NEF (/nmr_data), NMR-STAR, and native refinement program formats |
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/structures/
|
https://files-beta.wwpdb.org/download/[extended PDB ID].[NMR data format].gz
E.g. (replace pdb_00001abc with real PDB ID):
- https://files-beta.wwpdb.org/download/pdb_00001abc.mr.gz
- https://files-beta.wwpdb.org/download/pdb_00001abc_cs.str.gz
- https://files-beta.wwpdb.org/download/pdb_00001abc_mr.str.gz
|
Small molecules references:
CCD
BIRD
Other derived data:
CCD holdings
SMILES, InChI, InChIKey
Variants
|
PDBx/mmCIF
JSON
SDF, smi, inch
|
https://files-beta.wwpdb.org/pub/wwpdb/refdata/
CCD: https://files-beta.wwpdb.org/pub/wwpdb/refdata/chem_comp/[last-character-hash]/[CCD ID]/
BIRD: https://files-beta.wwpdb.org/pub/wwpdb/refdata/bird
e.g. https://files-beta.wwpdb.org/pub/wwpdb/refdata/bird/prd/[last-character-hash]
*Note that paths are case-sensitive -- use capital letters for hash and IDs.
Other derived data: https://files-beta.wwpdb.org/pub/wwpdb/refdata/derived_data/
CCD holdings: A list of released chemical reference entries, their content types (e.g., Chemical Component, BIRD), and the most recent modification date of the reference file.
https://files-beta.wwpdb.org/pub/wwpdb/refdata/derived_data/refdata_id_list.json.gz
|
CCD: https://files-beta.wwpdb.org/ligands/download/[CCD ID].cif
BIRD: https://files-beta.wwpdb.org/birds/download/[BIRD ID].cif
E.g.:
- https://files-beta.wwpdb.org/ligands/download/ATP.cif
- https://files-beta.wwpdb.org/birds/download/PRD_000006.cif
|
| Assemblies |
PDBx/mmCIF |
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/assemblies/
|
https://files-beta.wwpdb.org/download/[extended PDB ID]-assembly[number].cif.gz
E.g. (replace pdb_00001abc with real PDB ID):
- https://files-beta.wwpdb.org/download/pdb_00001abc-assembly1.cif.gz
|
| Archive holdings |
JSON |
https://files-beta.wwpdb.org/pub/wwpdb/pdb/holdings/ |
|
List of archive holdings
These inventory data files at /pub/wwpdb/pdb/holdings/ offer a quick overview of data in the archive.
| current_file_holdings.json.gz |
a list of released PDB entries and the file types present for each in the PDB Core Archive (e.g. coordinate data, experimental data, validation report). |
| released_structures_last_modified_dates.json.gz |
a list of released PDB entries with the most recent modification date of the PDBx/mmCIF file. |
| released_experimental_data_last_modified_dates.json.gz |
a list of released experimental data files with the most recent modification date. |
| obsolete_structures_last_modified_dates.json.gz |
a list of obsoleted PDB entries with the most recent modification date of the PDBx/mmCIF file. |
| obsolete_experimental_data_last_modified_dates.json.gz |
a list of obsoleted experimental data files with the most recent modification date. |
| all_removed_entries.json.gz |
a list of obsoleted PDB entries including information for entry authors, entry title, release date, obsolete date, and superseding PDB ID, if any. |
| unreleased_entries.json.gz |
a list of on-hold PDB entries, their entry status, deposition date, and pre-release sequence information, where available. |
Download Protocols
Every Wednesday from 00:00 UTC, all new and modified data entries will be updated at each of the wwPDB repository sites. The PDB archive is quite large, requiring over 1TB of storage, and continues to grow with each weekly update.
All files mentioned above are available via 3 different protocols: ftp, https and rsync. For individual file downloads we recommend https. The ftp protocol will be gradually phased out. For bulk file downloads we recommend rsync, see more instructions about rsync below.
Batch download script and instruction:
A script for batch download of all PDB entries of a file type or format can be found here.
Instruction:
This script is for downloading all released PDB entries (of a single file type/format) from the PDB Beta Archive.
It uses asynchronous aiohttp library to download multiple files asynchronously when performing bulk downloads.
It requires python 3.8 or higher and aiofiles, aiohttp packages. The aiofiles, aiohttp packages can be installed
with the following commands:
pip install aiofiles
pip install aiohttp
The script requires two input arguments to run. The following example command line downloads all mmCIF files and stores
the downloaded files under the directory, `/home/my_user_id/download`:
python BetaArchiveBatchDownloader.py --file_type mmcif --output_dir /home/my_user_id/download
(Run the following command lines to see all supported download file types:
python BetaArchiveBatchDownloader.py
or
python BetaArchiveBatchDownloader.py -h
or
python BetaArchiveBatchDownload.py --help
It shows:
--file_type FILE_TYPE
The supported file types for downloading are listed in left column.
The corresponding file naming conventions are listed in right column.
mmcif : pdb_xxxxxxxx.cif.gz
pdb : pdb_xxxxxxxx.pdb.gz
assemblies : pdb_xxxxxxxx-assembly#.cif.gz
XML : pdb_xxxxxxxx.xml.gz
XML-extatom : pdb_xxxxxxxx-extatom.xml.gz
XML-noatom : pdb_xxxxxxxx-noatom.xml.gz
structure_factors : pdb_xxxxxxxx-sf.cif.gz
nmr_data_str : pdb_xxxxxxxx_nmr-data.str.gz
nmr_data_nef : pdb_xxxxxxxx_nmr-data.nef.gz
nmr_chemical_shifts : pdb_xxxxxxxx_cs.str.gz
nmr_restraints : pdb_xxxxxxxx.mr.gz
nmr_restraints_v2 : pdb_xxxxxxxx_mr.str.gz
validation_cif : pdb_xxxxxxxx_validation.cif.gz
validation_xml : pdb_xxxxxxxx_validation.xml.gz
validation_pdf : pdb_xxxxxxxx_validation.pdf.gz
full_validation_pdf : pdb_xxxxxxxx_full_validation.pdf.gz
)
How the downloaded files are stored:
Since the current Archive has more than 246000+ entries, it is not desirable to have quarter million files under a single
directory.
The script first creates a top sub directory using file type name as sub directory name (/home/my_user_id/download/mmcif),
then creates the hash directories based on pdb ids. The downloaded files are stored in hash directories based on pdb ids.
For the above example command, the downloaded files are stored as following:
/home/my_user_id/download/mmcif/00/pdb_0000100d.cif.gz
/home/my_user_id/download/mmcif/00/pdb_0000200d.cif.gz
/home/my_user_id/download/mmcif/00/pdb_0000200l.cif.gz
/home/my_user_id/download/mmcif/00/pdb_0000300d.cif.gz
/home/my_user_id/download/mmcif/00/pdb_0000400d.cif.gz
/home/my_user_id/download/mmcif/01/pdb_0000101d.cif.gz
/home/my_user_id/download/mmcif/01/pdb_0000101m.cif.gz
/home/my_user_id/download/mmcif/01/pdb_0000201d.cif.gz
/home/my_user_id/download/mmcif/01/pdb_0000201l.cif.gz
/home/my_user_id/download/mmcif/01/pdb_0000301d.cif.gz
/home/my_user_id/download/mmcif/01/pdb_0000401d.cif.gz
Download protocols and instructions:
RCSB PDB:
Using http protocol:
Download coordinate files in PDBx/mmCIF Format:
https://files-beta.wwpdb.org/download/
For example, https://files-beta.wwpdb.org/download/pdb_00001abc.cif.gz
Download coordinate files in PDBML format:
https://files-beta.wwpdb.org/download/
For example, https://files-beta.wwpdb.org/download/pdb_00001abc.xml.gz
Download the experimental data files:
https://files-beta.wwpdb.org/download/
For example, https://files-beta.wwpdb.org/download/pdb_00001abc-sf.cif.gz (for structure factors)
Download the assembly files:
https://files-beta.wwpdb.org/download/
For example, https://files-beta.wwpdb.org/download/pdb_00001abc-assembly1.cif.gz (for assembly 1 in cif format)
Download the validation report files:
https://files-beta.wwpdb.org/validation/download/
For example, https://files-beta.wwpdb.org/validation/download/pdb_00001abc_validation.pdf.gz
Download CCD files:
https://files-beta.wwpdb.org/ligands/download/
For example, https://files-beta.wwpdb.org/ligands/download/ATP.cif
Download BIRD files:
https://files-beta.wwpdb.org/birds/download/
For example, https://files-beta.wwpdb.org/birds/download/PRD_000006.cif
Download EMDB data files:
https://files.rcsb.org/pub/emdb/structures
Using rsync protocol:
rsync --port=33444 rsync-beta.wwpdb.org::
wwpdb Top level of wwPDB ( /pub/wwpdb )
pdb Top level of PDB tree ( /pub/wwpdb/pdb )
pdb_data Data directory within PDB archive ( /pub/wwpdb/pdb/data )
pdb_refdata Small molecule data directory within wwPDB archive ( /pub/wwpdb/refdata )
pdb_ihm Top level of the PDB-IHM tree ( /pdb/wwpdb/pdb_ihm )
Download coordinate files in PDBx/mmCIF Format:
Maintain the complete archive directory hierarchy, but only copy mmCIF files:
rsync -rlpt -v --delete --port=33444 --include '*/' --include '*/*/structures/*.cif.gz' --exclude '*' rsync-beta.wwpdb.org::pdb_data/entries/ ./mmCIF/
Maintain the complete archive directory hierarchy, remove empty directories, and only copy mmCIF files:
rsync -rlpt -v --delete --port=33444 --prune-empty-dirs --include '*/' --include '*/*/structures/*.cif.gz' --exclude '*' rsync-beta.wwpdb.org::pdb_data/entries/ ./mmCIF/
Download coordinate files in PDBML Format (xml):
Maintain the complete archive directory hierarchy, but only copy coordinate XML files:
rsync -rlpt -v --delete --port=33444 --include '*/' --exclude '*/*/structures/*-extatom.xml.gz' --exclude '*/*/structures/*-noatom.xml.gz' --include '*/*/structures/*.xml.gz' --exclude '*' rsync-beta.wwpdb.org::pdb_data/entries/ ./XML/
Maintain the complete archive directory hierarchy, remove empty directories, and only copy coordinate XML files:
rsync -rlpt -v --delete --port=33444 --prune-empty-dirs --include '*/' --exclude '*/*/structures/*-extatom.xml.gz' --exclude '*/*/structures/*-noatom.xml.gz' --include '*/*/structures/*.xml.gz' --exclude '*' rsync-beta.wwpdb.org::pdb_data/entries/ ./XML/
Download chemical component (CCD) files:
rsync -rlpt -v -z --delete --port=33444 rsync-beta.wwpdb.org::pdb_refdata/chem_comp/ ./CCD/
Download the validation report files:
rsync -rlpt -v --delete --port=33444 --prune-empty-dirs --include '*/' --include='*/validation_reports/***' --exclude='*' rsync-beta.wwpdb.org::pdb_data/entries/ ./validation/
Need further help with the US site: Please contact
info@rcsb.org if you have any problems with file download.
PDBe:
Using http protocol:
Download coordinate files in PDB Exchange Format (mmCIF):
https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz
Download coordinate files in PDBML format:
https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz
Download coordinate files in PDB format:
https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].pdb.gz
Access the full PDB ftp tree:
https://ftp.ebi.ac.uk/pub/databases/wwpdb/
Download EMDB data files:
https://ftp.ebi.ac.uk/pub/databases/emdb/structures
Download the validation report files:
https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/validation_reports
Using rsync protocol:
rsync rsync://rsync.ebi.ac.uk:: pub ftp.ebi.ac.uk /pub area
Download coordinate files in PDB Exchange Format (mmCIF):
rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz \
./mmCIF
Download coordinate files in PDBML Format (xml):
rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz \
./XML
Download coordinate files in PDB Format:
rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].pdb.gz \
./pdb
Download EMDB map metadata header files (xml):
rsync -rlpt -v -z --delete --include "emd-*.xml" \
"rsync.ebi.ac.uk::pub/databases/emdb/structures/EMD-*/header/" ./header
Download directories/files for EMDB entry EMD-1003:
rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/emdb/structures/EMD-1003/ ./EMD-1003
Download the validation report files:
rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/validation_reports/ \
./validation_reports
Using ftp protocol:
ftp ftp.ebi.ac.uk
will connect to an anonymous ftp server containing the remediated wwPDB repository. Use the user 'anonymous' when prompted. Alternatively, use lftp as below
lftp http://ftp.ebi.ac.uk
The archive files are available in pub/databases/wwpdb
cd pub/databases/wwpdb
Need further help with the PDBe site: Please contact PDBe (http://www.ebi.ac.uk/pdbe/about/contact or e-mail
pdbehelp@ebi.ac.uk) if you have any problems connecting to Index of /.
PDBj:
Using http protocol:
Download coordinate files in PDB Exchange Format (mmCIF):
https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz
Download coordinate files in PDBML format (all):
https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz
Download coordinate files in PDBML format (no-atom site information):
https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID]-noatom.xml.gz
Download coordinate files in PDBML format (atom site information only):
https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID]-exatom.xml.gz
Download coordinate files in PDB format:
https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].pdb.gz
Download EMDB data files:
https://files.pdbj.org/pub/emdb/structures
Download the validation report files:
https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/validation_reports/
Using rsync protocol:
rsync rsync-beta.pdbj.org::
wwpdb Top level of wwPDB ( /pub/wwpdb )
pdb Top level of PDB tree ( /pub/wwpdb/pdb )
pdb_data Data directory within PDB archive ( /pub/wwpdb/pdb/data )
pdb_refdata Small molecule data directory within wwPDB archive ( /pub/wwpdb/refdata )
pdb_ihm Top level of the PDB-IHM tree ( /pub/wwpdb/pdb_ihm )
Download coordinate files in PDB Exchange Format (mmCIF):
rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::pdb_data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz ./mmCIF
Download coordinate files in PDBML Format (xml):
rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::pdb_data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz ./XML
Download chemical component (CCD) files:
rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::pdb_refdata/chem_comp/ ./chem_comp
Download EMDB map metadata header files (xml):
rsync -rlpt -v -z --delete --include "emd-*.xml" \
"rsync.pdbj.org::emdb/structures/EMD-*/header/" ./header
Download directories/files for EMDB entry EMD-5001:
rsync -rlpt -v -z --delete \
rsync.pdbj.org::emdb/structures/EMD-5001/ ./EMD-5001
Download the validation report files:
rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::pdb_data/entries/[2-letter hash]/[extended PDB ID]/validation_reports/ ./validation_reports
Using ftp protocol:
ftp ftp-beta.pdbj.org
will connect to an anonymous ftp server at PDBj containing the remediated wwPDB repository.
Need further help with the PDBj site: Please contact PDBj
https://pdbj.org/contact if you have any problems with file download.
Archive Snapshots
The annual archive snapshots provide the data in the archive at the start of each year or at selected milestone moments. These data may be used to provide a stable set of entries for analysis and allow users to see changes introduced due to remediation efforts by wwPDB.
Access to these snapshots is available through HTTP, rsync, FTP, and AWS sync protocols.
HTTP Protocol
RCSB PDB (US/AWS): AWS S3 Explorer
PDBj (Japan): PDB Snapshot Archive
RSYNC Protocol
PDBj (Japan): rsync -avz snapshots.pdbj.org:: .
FTP Protocol
PDBj (Japan): ftp://snapshots.pdbj.org
AWS SYNC Protocol
RCSB PDB (US/AWS): s3://pdbsnapshots/
AWS SYNC Instruction:
-
Install AWS CLI tool
-
-