PDBest

Main Features

PDBest Help Center

Overview
Input Files
- Online Query
- Local Files
Configuration
- Filtering Options
- Output
Processing

What is PDBest ?

PDBest (PDB Enhanced Structures Toolkit) is a user- friendly, freely available platform for acquiring, manipulating and normalizing protein structures in a high-throughput and seamless fashion. The platform has an intuitive graphical interface developed to allow researchers and students with no programming background to download and manipulate theirs files without using the command line. The platform can also save protocols, enabling users to easily share PDB searching and filtering data, improving reproducibility of the analyses carried out subsequently.

The software platform was developed in C++ language on the QT framework, providing high performance for all major operating systems: Windows, Linux, and Mac OS X.

Input Files

On PDBest, users can provide input files from (1) a local repository, (2) download them from the RCSB Protein Data Bank mirror, via an online query, using their searching parameters or (3) a combination of both.

OnlineQuery

Using the "Online Query" option users can search for biomolecules using all parameters available at the RCSB Protein Data Bank, combining the searching criteria with the logical operators "and"/"or" in any combination, allowing very specific and sophisticated queries to be performed.

Furthermore, it is possible to remove files by different sequence similarity thresholds on "Remove Similar Sequence at" list box.

Users can compose their queries via the box menu and add it with the "Insert" button and add or modify its parameters. A query option can be removed using the icon "x" on the window or the the query can be reset with the "clear" button. The "Submit" button will submit the query to the RCSB database and the list of matching PDB files will be shown to be analyses before the processing step. It is possible to change or refine the query before processing, making adjustments at any time. The PDB identifiers can be seen via the "Show" button and a list of them can be saved.

The "Add" button can be used to manually include PDB identifiers.

There is no limit of number of molecues to be acquired and processed by PDBest.

Local Files

The input PDB files can provided from a local repository though three option buttons:

Files: PDB files are selected manually from a folder.

Directory: Include files selecting a whole directory, which will look for files with PDB (.pdb) or mmCIF (.cif) extentions. It is important notice that this option does not include files on sub-directories.

List of files: Providing a file with a list of files to be included (the full paths must be provided).

PDB files must have unique names, even if in different directories, otherwise only one instance will be considered. Files shown on the list box are sent to processing section.

Configurations

The "Configurations" section includes two main options, "Filtering Standards" and "Output" options. After loading the files users can choose amongst many filtering options to be applied as well as decide where to store the filtered files and name conventions.

Filtering Standards

PDBest can manipulate PDB files by applying a series of filters or processing parameters which allows users to select relevant information to theirs analyses. A new file will be created with the selected records, and the original file will be maintained.

The PDB file format definition stablishes a set of standards to be followed during structure deposition and its sections are extensively described at the wwwPDB. The PDBest filterig criteria is devided in the folloing sections. The user can choose to keep or discard the records selecting the check box accordingly:

General: The general section allows file format conversions, to add (or to remove) hydrogens at a given pH, split files by chain amongst other common tasks.

Title: The title section processes records used to describe the experimental considtions and the biological macromolecules present in the entry. It includes the records: HEADER, OBSLTE, TITLE, CAVEAT, COMPND, SOURCE, KEYWDS, EXPDTA, AUTHOR, REVDAT, SPRSDE, JRNL, and REMARK.

Primary Structure : The primary structure section of a PDB file contains the sequence of residues in each chain of the macromolecule. The records used to define the sequence of residues are DBREF, DBREF1, DBREF2, SEQADV, SEQRES and MODRES.

Heterogen : The heterogen section of a PDB file contains the complete description of non-standard residues in the entry. The records associated to non-standard residues are HET, HETNAM, HETSYN and FORMUL.

Secondary Structure : The secondary structure section of a PDB file describes helices, sheets, and turns found in the protein or polypeptide. The records that describe the secondary structures are HELIX, TURN and SHEET. The user can filter residues based on its presence (or absence) on these secondary structures .

Connectivity Annotation : The connectivity annotation section specifies the existence and location of disulfide bonds and other linkages. The records defining connectivity annotation are SSBOND, LINK and CISPEP. It is necessary to select the check box to keep this record.

Miscellaneous Features : The miscellaneous features section describes features in the molecule such as environments surrounding a non-standard residue or an active site.

Crystallographic and Coordinate Transformation : The crystallographic section describes the geometry of the crystallographic experiment and the coordinate system transformations. The records that contains these informations are CRYST1, ORIGXn, SCALEn, MTRIXn and TVECT.

Coordinate : The coordinate section contains the collection of atomic coordinates as well as the MODEL and ENDMDL records. The GUI has three tabs, one for each atomi component (atoms, residues and others). On the atom component tab users can choose filtering options by atom name (e.g., filtering backbone atoms - CA, C, N, O), and select renumbering options.

On the residue component tab users can choose to select or remove specific residues based on their 3-letter code. It is also possible to filter atoms with multiple occipancies (greater/lower occupancy or keep atoms without occupancy). It is also possible to renumber residues.

On the "other informations" tab it is possible to filter models, for instance, from structures determined by NMR, to remove solvent molecules and filter TER, ANISOU and HETATM records.

Connectivity : The connectivity section provides information on chemical connectivity.

Bookkeeping : The bookkeeping section provides final information about the file itself. It is necessary to check the MASTER and END boxes to keep these records.

Output

The Output configurations allow users to select the directory where the PDB files will be download from PDB online repository if online queries are performed and where processed files will be stored after the application of filters. The user may include an expression and choose the output format. For example: using the expression ".filtered" every processed file will contain this term as in 4HHB.filtered.pdb.

Process

After submiting queries to RCSB Protein Data Bank and/or loading local files and choosing filtering criteria, the files can be seen on a grid on the Process section. Up to a hundred file indentifiers can be seen at once. On the grid the origin of the file (online/local) and its status (to be downloaded/ready) is shown. The files with "To be downloaded" status will be downloaded by pressing the "Start Download" button.

Files can be selected and removed from the grid before processing.

Users can verify inconsistencies on files format with the Verify Inconsistencies button. Files with errors are marked on the grid. The possible issues on the structures verified by PDBest are missing atoms or residues, atoms with multiple occupancy and non-standard residues .

The Open Detailed file button will open a window with a complete report regarding the inconsistencies found. The report can lso be saved into a file.

The "Process" button applies the selected filters, generating the new files.