![]() |
![]() |
|||||||||||||||||||||||||||||||
|
Chemical Component Dictionary The Chemical Component Dictionary (formerly the HET Group Dictionary) is as an external reference file describing all residue and small molecule components found in PDB entries. This dictionary contains detailed chemical descriptions for standard and modified amino acids/nucleotides, small molecule ligands, and solvent molecules. Each chemical definition includes descriptions of chemical properties such as stereochemical assignments, aromatic bond assignments, idealized coordinates, chemical descriptors (SMILES & InChI), and systematic chemical names. The chemical component dictionary is organized by the 3-character alphanumeric code that PDB assigns to each chemical component. New chemical component definitions appear in the dictionary as the entries in which they are observed are released in the PDB archive; consequently, the dictionary is updated with each weekly PDB release. Users can search and browse the Chemical Component Dictionary using resources such as PDBeChem and Ligand Expo. Recently, redundant definitions were removed, small modifying functional groups were absorbed into complete components, and definitions with ambiguous chemical descriptions were removed. Beyond ensuring that atom names begin with their type symbol, no attempt was made to extend systematic nomenclature to non-polymer chemical components. Where possible, single atoms and small groups have been replaced by complex single compound entries. For example, the ethyl group (ETH) has been obsoleted and new definitions have been created that combine ethyl groups with neighboring residues. Any obsoleted components remain in the dictionary marked with status OBS. The entire Chemical Component Dictionary and the companion dictionary of amino acid protonation variants can be downloaded from the wwPDB ftp site: Chemical Component Dictionary: mmCIF (plain text) |
mmCIF (gz) Please note that these files are large, and may take awhile to download. The dictionary of protonation variants provides additional nomenclature information for the protonation states of standard amino acids in N-terminal, C-terminal, and free forms, and includes common side chain protonation states. The identifiers used in this extension dictionary longer identifier codes to distinguish the various protonation forms of the standard amino acids. For instance, an identifier code ARG_LFOH_DHH12 is used to identify the arginine variant with a neutral peptide unit and side chain protonated at NH1. The extended identifier codes are not compatible with the 3-character format restrictions for the residue identifier in the PDB format, so these codes do not currently appear in PDB files. In PDB entries, protonated residues are identified by the 3-character code of their parent amino acid; however, the atom nomenclature for protonated forms will be taken from the variant dictionary definitions. Prior to development of the Chemical Component Dictionary, PDB chemical information was solely in the form of connection tables. This older representation, called the PDB HET dictionary, is still made available on the wwPDB ftp site (download). PDB HET format dictionary entries for individual components are available at ftp://ftp.wwpdb.org/pub/pdb/data/monomers/. Descriptions of chemical components in mmCIF and PDB formats are provided below.
PDBeChem1 offers a wide range of possibilities for searching and exploring the dictionary:
Users can also search by references in macromolecules, molecule classification, and atom energy type. A generic browsing interface lets users follow links that are available from every record in order to navigate through the relationships of the dictionary. For example, a relationship link can be followed to view the atoms of a ligand and then for a particular atom, its bonds and energy types and so on. For more information, please see
Ligand Expo, formerly the Ligand Depot2, can be used to navigate the Chemical Component Dictionary. It integrates databases, services, tools and methods related to small molecules, and allows users to:
Ligand Expo provides information in Chemical Component Dictionary and individual chemical components within PDB entries for download in a variety of formats and packaging at http://ligand-expo.rcsb.org/ld-download.html. Chemical Components in mmCIF Format The mmCIF format combines collections of related data items (tokens) into categories. A category is essentially a table in which each token represents a row in the table. The question mark (?) is used to mark an item value as missing. A period (.) may be used to identify that there is no appropriate value for the item or that a value has been intentionally omitted. Vectors and tables of data may be encoded in mmCIF using a loop_ directive. To build a table, the data item names corresponding to the table columns are preceded by the loop_ directive, and followed by the corresponding rows of data. A detailed description of the mmCIF syntax and logic structure is available. In the Chemical Component Dictionary, each chemical component is defined by sets of tokens in the five categories:
In a PDB entry, the mmCIF category chem_comp is used to describe the chemical components in the file. The chemical name is described in chem_comp.name, chemical formula in chem_comp.formula, and molecular weight in chem_comp.formula_weight. For example, the mmCIF file for PDB entry 1t5d contains the ligand 4-Chloro-benzoic Acid (ID code: 174): Further information describing this residue (174) is then provided in the Chemical Component Dictionary (See the Example).
Chemical Components in PDB Format The heterogen section of a PDB coordinate file describes ligands in the entry. The chemical name of the ligand is given in the HETNAM record and the chemical formula is given in the FORMUL record. Any synonyms for the chemical name are given in the HETSYN records. For example, the PDB format file for PDB entry 1t5d contains the ligand 4-Chloro-benzoic Acid (ID code: 174): Further information describing this residue (174) is then provided in the Chemical Component Dictionary (See the Example). Please refer to the PDB File Format Guide for further description.
1 D. Dimitropoulos, J. Ionides, K. Henrick (2006) UNIT 14.3: Using MSDchem to search the PDB ligand dictionary
In Current Protocols in Bioinformatics (A.D. Baxevanis, R.D.M. Page, G.A. Petsko, L.D. Stein, and G.D. Stormo, eds.)
pp 14.3.1-14.3.3 John Wiley & Sons, Hoboken, NJ.
2 Z. Feng, L. Chen, H. Maddula, O. Akcan, R. Oughtred, H.M. Berman, J. Westbrook. (2004) Ligand Depot: a data warehouse for ligands bound to macromolecules. Bioinformatics 20(13):2153-2155. © 2010 wwPDB |