PDB Beta Archive

Introduction

wwPDB anticipates four character PDB accession code (PDB ID) will be consumed by 2028. With the continuous growth of PDB archive, wwPDB has revised PDB accession code by extending its length and prepending "PDB" (e.g., "1abc" will become "pdb_00001abc"). This new ID format will enable text mining detection of PDB entries in the published literature and allow for more informative and transparent delivery of revised data files.

PDB Beta Archive is provided to help community adopt extended PDB ID and PDBx/mmCIF format during the transition phase. All files at this archive are re-organized with extended PDB ID (including file naming and directories) at entry level, mirroring the same data organization of the PDB Versioned Archive.

All data files for a particular entry are stored in a single directory, labeled based on a two-character hash generated from the penultimate two characters of the PDB code, i.e., https://files-beta.org/pub/wwpdb/pdb/data/entries/<two-letter-hash>/<pdb_accession_code>/<entry_data_File_names>.

The two-letter hash will be based on the second and third characters from the last character. For example, PDB entry PDB_1abc5678 will be under /67/. This will maintain consistency with the current PDB archive: PDB entry 1abc is under /ab.

File naming is standardized such that the file type is used for the extension. For example, file naming is changed from r116dsf.ent.gz to pdb_0000116d-sf.cif.gz for the structure factor file and from pdb318d.ent.gz to pdb_0000318d.pdb.gz for the legacy PDB formatted coordinate file.

When four character PDB IDs are about to be consumed, this PDB Beta Archive will replace the current PDB Archive and entries with extended PDB IDs issued are not compatible with PDB format. wwPDB encourages scientific journals, PDB community and users to transition to PDBx/mmCIF format and adopt new PDB ID format as earlier as possible.

For more information, see FAQ.

File Download

The PDB Beta archive is updated every Wednesday at 00:00 UTC.

wwPDB: https://files-beta.wwpdb.org, rsync://rsync-beta.wwpdb.org
RCSB PDB (US): https://files-beta.rcsb.org, rsync://rsync-beta.rcsb.org (see the download protocol below)
PDBe (UK): https://ftp.ebi.ac.uk/pub/databases/wwpdb/
PDBj (Japan): ftp://ftp-beta.pdbj.org, https://files-beta.pdbj.org, rsync://rsync-beta.pdbj.org

New Sequence and InChI

Every Saturday by 3:00 UTC, for every new entry the wwPDB website provides:

Data Structure and Content

Primary data (atomic coordinates and experimental data) are stored at entry level using a hash directory.

Data types File formats Location Shortlink
Atomic coordinates PDBx/mmCIF, XML, and PDB

https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/

https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/structures/

https://files-beta.wwpdb.org/download/[extended PDB ID].[extension].gz

E.g. (replace pdb_00001abc with real PDB ID):

  • https://files-beta.wwpdb.org/download/pdb_00001abc.cif.gz
  • https://files-beta.wwpdb.org/download/pdb_00001abc.cif
  • https://files-beta.wwpdb.org/download/pdb_00001abc.xml
X-ray data: Structure Factors PDBx/mmCIF

https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/

https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/structures/

https://files-beta.wwpdb.org/download/[extended PDB ID]-sf.cif.gz

E.g. (replace pdb_00001abc with real PDB ID):

  • https://files-beta.wwpdb.org/wodnload/pdb_00001abc-sf.cif.gz
NMR data: Restraints and chemical shifts NEF (/nmr_data), NMR-STAR, and native refinement program formats

https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/

https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/structures/

https://files-beta.wwpdb.org/download/[extended PDB ID].[NMR data format].gz

E.g. (replace pdb_00001abc with real PDB ID):

  • https://files-beta.wwpdb.org/download/pdb_00001abc.mr.gz
  • https://files-beta.wwpdb.org/download/pdb_00001abc_cs.str.gz
  • https://files-beta.wwpdb.org/download/pdb_00001abc_mr.str.gz
Small molecules references: CCD
BIRD

Other derived data:
CCD holdings

SMILES, InChI, InChIKey

Variants
PDBx/mmCIF
JSON
SDF, smi, inch
https://files-beta.wwpdb.org/pub/wwpdb/refdata/

CCD: https://files-beta.wwpdb.org/pub/wwpdb/refdata/chem_comp/[last-character-hash]/[CCD ID]/

BIRD: https://files-beta.wwpdb.org/pub/wwpdb/refdata/bird

   e.g. https://files-beta.wwpdb.org/pub/wwpdb/refdata/bird/prd/[last-character-hash]

*Note that paths are case-sensitive -- use capital letters for hash and IDs.

Other derived data: https://files-beta.wwpdb.org/pub/wwpdb/refdata/derived_data/

CCD holdings: A list of released chemical reference entries, their content types (e.g., Chemical Component, BIRD), and the most recent modification date of the reference file. https://files-beta.wwpdb.org/pub/wwpdb/refdata/derived_data/refdata_id_list.json.gz

CCD: https://files-beta.wwpdb.org/ligands/download/[CCD ID].cif

BIRD: https://files-beta.wwpdb.org/birds/download/[BIRD ID].cif

E.g.:

  • https://files-beta.wwpdb.org/ligands/download/ATP.cif
  • https://files-beta.wwpdb.org/birds/download/PRD_000006.cif
Assemblies PDBx/mmCIF

https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/

https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/assemblies/

https://files-beta.wwpdb.org/download/[extended PDB ID]-assembly[number].cif.gz

E.g. (replace pdb_00001abc with real PDB ID):

  • https://files-beta.wwpdb.org/download/pdb_00001abc-assembly1.cif.gz
Archive holdings JSON https://files-beta.wwpdb.org/pub/wwpdb/pdb/holdings/

List of archive holdings

These inventory data files at /pub/wwpdb/pdb/holdings/ offer a quick overview of data in the archive.

current_file_holdings.json.gz a list of released PDB entries and the file types present for each in the PDB Core Archive (e.g. coordinate data, experimental data, validation report).
released_structures_last_modified_dates.json.gz a list of released PDB entries with the most recent modification date of the PDBx/mmCIF file.
released_experimental_data_last_modified_dates.json.gz a list of released experimental data files with the most recent modification date.
obsolete_structures_last_modified_dates.json.gz a list of obsoleted PDB entries with the most recent modification date of the PDBx/mmCIF file.
obsolete_experimental_data_last_modified_dates.json.gz a list of obsoleted experimental data files with the most recent modification date.
all_removed_entries.json.gz a list of obsoleted PDB entries including information for entry authors, entry title, release date, obsolete date, and superseding PDB ID, if any.
unreleased_entries.json.gz a list of on-hold PDB entries, their entry status, deposition date, and pre-release sequence information, where available.

Download Protocols

Every Wednesday from 00:00 UTC, all new and modified data entries will be updated at each of the wwPDB repository sites. The PDB archive is quite large, requiring over 1TB of storage, and continues to grow with each weekly update.

All files mentioned above are available via 3 different protocols: ftp, https and rsync. For individual file downloads we recommend https. The ftp protocol will be gradually phased out. For bulk file downloads we recommend rsync, see more instructions about rsync below.

Batch download script and instruction:

A script for batch download of all PDB entries of a file type or format can be found here.

Instruction:
This script is for downloading all released PDB entries (of a single file type/format) from the PDB Beta Archive.

It uses asynchronous aiohttp library to download multiple files asynchronously when performing bulk downloads.

It requires python 3.8 or higher and aiofiles, aiohttp packages. The aiofiles, aiohttp packages can be installed
    with the following commands:

        pip install aiofiles
        pip install aiohttp

The script requires two input arguments to run. The following example command line downloads all mmCIF files and stores
    the downloaded files under the directory, `/home/my_user_id/download`:

        python BetaArchiveBatchDownloader.py --file_type mmcif --output_dir /home/my_user_id/download

(Run the following command lines to see all supported download file types:

    python BetaArchiveBatchDownloader.py
or
    python BetaArchiveBatchDownloader.py -h
or
    python BetaArchiveBatchDownload.py --help

It shows:

    --file_type FILE_TYPE

                        The supported file types for downloading are listed in left column.
                        The corresponding file naming conventions are listed in right column.

                        mmcif               :   pdb_xxxxxxxx.cif.gz
                        pdb                 :   pdb_xxxxxxxx.pdb.gz
                        assemblies          :   pdb_xxxxxxxx-assembly#.cif.gz
                        XML                 :   pdb_xxxxxxxx.xml.gz
                        XML-extatom         :   pdb_xxxxxxxx-extatom.xml.gz
                        XML-noatom          :   pdb_xxxxxxxx-noatom.xml.gz
                        structure_factors   :   pdb_xxxxxxxx-sf.cif.gz
                        nmr_data_str        :   pdb_xxxxxxxx_nmr-data.str.gz
                        nmr_data_nef        :   pdb_xxxxxxxx_nmr-data.nef.gz
                        nmr_chemical_shifts :   pdb_xxxxxxxx_cs.str.gz
                        nmr_restraints      :   pdb_xxxxxxxx.mr.gz
                        nmr_restraints_v2   :   pdb_xxxxxxxx_mr.str.gz
                        validation_cif      :   pdb_xxxxxxxx_validation.cif.gz
                        validation_xml      :   pdb_xxxxxxxx_validation.xml.gz
                        validation_pdf      :   pdb_xxxxxxxx_validation.pdf.gz
                        full_validation_pdf :   pdb_xxxxxxxx_full_validation.pdf.gz

)

How the downloaded files are stored:

    Since the current Archive has more than 246000+ entries, it is not desirable to have quarter million files under a single
    directory.

    The script first creates a top sub directory using file type name as sub directory name (/home/my_user_id/download/mmcif),
    then creates the hash directories based on pdb ids. The downloaded files are stored in hash directories based on pdb ids.

    For the above example command, the downloaded files are stored as following:

        /home/my_user_id/download/mmcif/00/pdb_0000100d.cif.gz
        /home/my_user_id/download/mmcif/00/pdb_0000200d.cif.gz
        /home/my_user_id/download/mmcif/00/pdb_0000200l.cif.gz
        /home/my_user_id/download/mmcif/00/pdb_0000300d.cif.gz
        /home/my_user_id/download/mmcif/00/pdb_0000400d.cif.gz

        /home/my_user_id/download/mmcif/01/pdb_0000101d.cif.gz
        /home/my_user_id/download/mmcif/01/pdb_0000101m.cif.gz
        /home/my_user_id/download/mmcif/01/pdb_0000201d.cif.gz
        /home/my_user_id/download/mmcif/01/pdb_0000201l.cif.gz
        /home/my_user_id/download/mmcif/01/pdb_0000301d.cif.gz
        /home/my_user_id/download/mmcif/01/pdb_0000401d.cif.gz

Download protocols and instructions:

RCSB PDB:

Using http protocol:

Download coordinate files in PDBx/mmCIF Format:

https://files-beta.wwpdb.org/download/

          For example, https://files-beta.wwpdb.org/download/pdb_00001abc.cif.gz

Download coordinate files in PDBML format:

https://files-beta.wwpdb.org/download/

          For example, https://files-beta.wwpdb.org/download/pdb_00001abc.xml.gz

Download the experimental data files:

https://files-beta.wwpdb.org/download/

        For example, https://files-beta.wwpdb.org/download/pdb_00001abc-sf.cif.gz (for structure factors)

Download the assembly files:

https://files-beta.wwpdb.org/download/

        For example, https://files-beta.wwpdb.org/download/pdb_00001abc-assembly1.cif.gz (for assembly 1 in cif format)

Download the validation report files:

https://files-beta.wwpdb.org/validation/download/

        For example, https://files-beta.wwpdb.org/validation/download/pdb_00001abc_validation.pdf.gz

Download CCD files:

https://files-beta.wwpdb.org/ligands/download/

        For example, https://files-beta.wwpdb.org/ligands/download/ATP.cif

Download BIRD files:

https://files-beta.wwpdb.org/birds/download/

        For example, https://files-beta.wwpdb.org/birds/download/PRD_000006.cif

Download EMDB data files:

https://files.rcsb.org/pub/emdb/structures


Using rsync protocol:

rsync --port=33444 rsync-beta.wwpdb.org::
wwpdb           Top level of wwPDB ( /pub/wwpdb )
pdb             Top level of PDB tree ( /pub/wwpdb/pdb )
pdb_data        Data directory within PDB archive ( /pub/wwpdb/pdb/data )
pdb_refdata     Small molecule data directory within wwPDB archive ( /pub/wwpdb/refdata )
pdb_ihm         Top level of the PDB-IHM tree ( /pdb/wwpdb/pdb_ihm )

Download coordinate files in PDBx/mmCIF Format:

Maintain the complete archive directory hierarchy, but only copy mmCIF files:
rsync -rlpt -v --delete --port=33444 --include '*/' --include '*/*/structures/*.cif.gz' --exclude '*' rsync-beta.wwpdb.org::pdb_data/entries/ ./mmCIF/

Maintain the complete archive directory hierarchy, remove empty directories, and only copy mmCIF files:
rsync -rlpt -v --delete --port=33444 --prune-empty-dirs --include '*/' --include '*/*/structures/*.cif.gz' --exclude '*' rsync-beta.wwpdb.org::pdb_data/entries/ ./mmCIF/

Download coordinate files in PDBML Format (xml):

Maintain the complete archive directory hierarchy, but only copy coordinate XML files:
rsync -rlpt -v --delete --port=33444 --include '*/' --exclude '*/*/structures/*-extatom.xml.gz' --exclude '*/*/structures/*-noatom.xml.gz' --include '*/*/structures/*.xml.gz' --exclude '*' rsync-beta.wwpdb.org::pdb_data/entries/ ./XML/

Maintain the complete archive directory hierarchy, remove empty directories, and only copy coordinate XML files:
rsync -rlpt -v --delete --port=33444 --prune-empty-dirs --include '*/' --exclude '*/*/structures/*-extatom.xml.gz' --exclude '*/*/structures/*-noatom.xml.gz' --include '*/*/structures/*.xml.gz' --exclude '*' rsync-beta.wwpdb.org::pdb_data/entries/ ./XML/

Download chemical component (CCD) files:

rsync -rlpt -v -z --delete --port=33444 rsync-beta.wwpdb.org::pdb_refdata/chem_comp/ ./CCD/

Download the validation report files:

rsync -rlpt -v --delete --port=33444 --prune-empty-dirs --include '*/' --include='*/validation_reports/***' --exclude='*' rsync-beta.wwpdb.org::pdb_data/entries/ ./validation/

Need further help with the US site: Please contact info@rcsb.org if you have any problems with file download.



PDBe:

Using http protocol:

Download coordinate files in PDB Exchange Format (mmCIF):

https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz

Download coordinate files in PDBML format:

https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz

Download coordinate files in PDB format:

https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].pdb.gz

Access the full PDB ftp tree:

https://ftp.ebi.ac.uk/pub/databases/wwpdb/

Download EMDB data files:

https://ftp.ebi.ac.uk/pub/databases/emdb/structures

Download the validation report files:

https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/validation_reports

Using rsync protocol:

rsync rsync://rsync.ebi.ac.uk::    pub                ftp.ebi.ac.uk /pub area

Download coordinate files in PDB Exchange Format (mmCIF):

rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz \
./mmCIF

Download coordinate files in PDBML Format (xml):

rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz \
./XML

Download coordinate files in PDB Format:

rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].pdb.gz \
./pdb

Download EMDB map metadata header files (xml):

rsync -rlpt -v -z --delete --include "emd-*.xml" \
"rsync.ebi.ac.uk::pub/databases/emdb/structures/EMD-*/header/" ./header

Download directories/files for EMDB entry EMD-1003:

rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/emdb/structures/EMD-1003/ ./EMD-1003

Download the validation report files:

rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/validation_reports/ \
./validation_reports

Using ftp protocol:

ftp ftp.ebi.ac.uk

will connect to an anonymous ftp server containing the remediated wwPDB repository. Use the user 'anonymous' when prompted. Alternatively, use lftp as below

lftp http://ftp.ebi.ac.uk

The archive files are available in pub/databases/wwpdb

cd pub/databases/wwpdb



Need further help with the PDBe site: Please contact PDBe (http://www.ebi.ac.uk/pdbe/about/contact or e-mail pdbehelp@ebi.ac.uk) if you have any problems connecting to Index of /.



PDBj:

Using http protocol:

Download coordinate files in PDB Exchange Format (mmCIF):

https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz

Download coordinate files in PDBML format (all):

https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz

Download coordinate files in PDBML format (no-atom site information):

https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID]-noatom.xml.gz

Download coordinate files in PDBML format (atom site information only):

https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID]-exatom.xml.gz

Download coordinate files in PDB format:

https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].pdb.gz

Download EMDB data files:

https://files.pdbj.org/pub/emdb/structures

Download the validation report files:

https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/validation_reports/

Using rsync protocol:

rsync rsync-beta.pdbj.org::
wwpdb           Top level of wwPDB ( /pub/wwpdb )
pdb             Top level of PDB tree ( /pub/wwpdb/pdb )
pdb_data        Data directory within PDB archive ( /pub/wwpdb/pdb/data )
pdb_refdata     Small molecule data directory within wwPDB archive ( /pub/wwpdb/refdata )
pdb_ihm         Top level of the PDB-IHM tree ( /pub/wwpdb/pdb_ihm )

Download coordinate files in PDB Exchange Format (mmCIF):

rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::pdb_data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz ./mmCIF

Download coordinate files in PDBML Format (xml):

rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::pdb_data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz ./XML

Download chemical component (CCD) files:

rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::pdb_refdata/chem_comp/ ./chem_comp

Download EMDB map metadata header files (xml):

rsync -rlpt -v -z --delete --include "emd-*.xml" \
"rsync.pdbj.org::emdb/structures/EMD-*/header/" ./header

Download directories/files for EMDB entry EMD-5001:

rsync -rlpt -v -z --delete \
rsync.pdbj.org::emdb/structures/EMD-5001/ ./EMD-5001

Download the validation report files:

rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::pdb_data/entries/[2-letter hash]/[extended PDB ID]/validation_reports/ ./validation_reports

Using ftp protocol:

ftp ftp-beta.pdbj.org

will connect to an anonymous ftp server at PDBj containing the remediated wwPDB repository.

Need further help with the PDBj site: Please contact PDBj https://pdbj.org/contact if you have any problems with file download.

Archive Snapshots

The annual archive snapshots provide the data in the archive at the start of each year or at selected milestone moments. These data may be used to provide a stable set of entries for analysis and allow users to see changes introduced due to remediation efforts by wwPDB.

Access to these snapshots is available through HTTP, rsync, FTP, and AWS sync protocols.

HTTP Protocol

RCSB PDB (US/AWS): AWS S3 Explorer

PDBj (Japan): PDB Snapshot Archive

RSYNC Protocol

PDBj (Japan): rsync -avz snapshots.pdbj.org:: .

FTP Protocol

PDBj (Japan): ftp://snapshots.pdbj.org

AWS SYNC Protocol

RCSB PDB (US/AWS): s3://pdbsnapshots/

AWS SYNC Instruction:

  1. Install AWS CLI tool
    • List all PDB Snapshot objects

    • aws s3 ls s3://pdbsnapshots/ --no-sign-request
    • When you see a result like:

    • 20250101/ PRE 20250101/
      • PRE: This indicates that the listed item is a "prefix" or "folder" in the S3 bucket, not a file object. This is how S3 organizes files logically into folders.
        • 20250101/: This is the prefix (or "folder") in the S3 bucket. It's not an actual folder in the traditional sense, but rather a common prefix used to group objects.
    • Sync PDB Snapshots

    • All PDB Snapshot objects

    • aws s3 sync s3://pdbsnapshots/ ./local-directory/ --no-sign-request
    • Specific PDB snapshot object (e.g., /20250101)

    • aws s3 sync s3://pdbsnapshots/20250101/ ./local-directory/20250101 --no-sign-request