mk_prepare_receptor.py#

A command-line script for receptor preparation from a PDB/CIF file, setting up various properties (flexible/reactive residues, box specifications, etc.), and writing useful input files (PDBQT, GPF, box configuration file for Vina) for docking.

Basic usage#

mk_prepare_receptor.py -i examples/system.pdb --write_pdbqt prepared.pdbqt

This is equivalent to:

mk_prepare_receptor.py -i examples/system.pdb -o prepared -p

Read more about the syntax in Write flags and Options.

About#

The input structure is matched against templates to guarantee chemical correctness and identify problems with the input structures. This allows the user to identify and fix problems, resulting in a molecular model that is correct with respect to heavy atoms, protonation state, connectivity, bond orders, and formal charges.

The matching algorithm uses the connectivity and elements, but not bond orders or atom names. Hydrogens are optional. This makes it compatible with input files from various sources.

Templates are matched on a per residue basis. Each residue is represented as an instance of a PolymerResidue object, which contains:

an RDKit molecule that represents the actual state

a padded RDKit molecule containing a few atoms from the adjacent residues

parameters such as partial charges

The positions are set by the input, and the connectivity and formal charges are defined by the templates. Heavy atoms must match exactly. If heavy atoms are missing or in excess, the templates will fail to match.

Missing hydrogens are added by RDKit, but are not subjected to minimization with a force field. Thus, their bond lengths are not super accurate.

Different states of the same residue are stored as different templates, for example different protonation states of HIS, N-term, LYN/LYS, etc. Residue name is primary key unless user overrides.

Currently not supported: capped residues from charmm-gui.

Templates#

The templates contain SMILES strings that are used to create the RDKit molecules that constitute every residue in the processed model. In this way, the chemistry of the processed model is fully defined by the templates, and the only thing that is preserved from the input are the atom positions and the connectivity between residues.

The SMILES strings contain all atoms that exist in the final model, and none that do not exist. This also applies to hydrogens, meaning that the SMILES are expected to have real hydrogens. Note that real hydrogens are different from explicit hydrogens. Real hydrogens will be represented as an actual atom in an RDKit molecule, while explicit hydrogens are a just property of heavy atoms. In the SMILES, real hydrogens are defined with square brackets “[H]” and explicit hydrogens without, e.g. “[nH]” to set the number of explicit hydrogens on an aromatic nitrogen to one.

Residues that are part of a polymer, which is often all of them, will have bonds to adjacent residues. The heavy atoms involved in the bonds will miss a real hydrogen and have an implicit (or explicit) one instead. As an example, consider modeling an alkyl chain as a polymer, in which the monomer is a single carbon atom. Our template SMILES would be “[H]C[H]”. The RDKit molecule will have three atoms and the carbon will have two implicit hydrogens. The implicit hydrogens correspond to bonds to adjacent residues in the processed polymer.

Write flags#

The option flags starting with --write in mk_prepare_receptor can be used both with an argument to specify the outpuf filename:

--write_pdbqt myenzyme.pdbqt --write_json myenzyme.json

and without the filename argument as long as a default basename is provided:

--output_basename myenzyme --write_pdbqt --write_json

It is also possible to combine the two types of usage:

--output_basename myenzyme --write_pdbqt --write_json --write_vina_box box_for_myenzyme.txt

in which case the specified filenames have priority over the default basename.

Residue selection and assignment language#

Meeko uses the chain ID and residue number to identify a residue. The arguments involving selection of residues:

-d, --delete_residues <residues>

-f, --flexres <residues>

-r, --reactive_flexres <residues>

use the compact selection language that specify residues efficiently. The chain ID and the residue number(s) are separated by a colon (:) delimiter. Each residue number is combined with the most recent chain ID that precedes it, resulting in an expanded list of chain-residue pairs.

For an input like A:5,7,BB:12C, this selection language represents: residues (number) 5 and 7 in Chain A and residue (number) 12C in Chain BB.

The arguments involving assignment of residues to properties:

-n, --set_template <template>

-b, --blunt_ends <positions>

--wanted_altloc <location>

-s, --reactive_name_specific <residue:atom>

use the residue selection lanaguge described above, followed by an equal sign (=) as the delimiter and the assigned value, which could be the name of a residue template, the atom index for the blunt end, the wanted altloc ID, or the atom name of the reactive atom. Each residue selection is comibned with the most recent assignment that precedes it, resulting in a further expanded list of residue-assignment pairs.

For an input like "A:5,7=CYX,A:19A,B:17=HID, this assignment language represents: residues (number) 5 in Chain A are set to (template name) CYX and residue (number) 19 A in Chain A, and residue (number) 17 in Chain B are set to (template name) HID.

Options#

Input/Output Options#

--read_pdb <PDB_FILENAME>#: Read a PDB file using the PDB parser in RDKit.

-i, --read_with_prody <MACROMOL_FILENAME>#: Read a PDB or mmCIF file using ProDy (if installed). ProDy can be installed from PyPI or conda-forge.

-o, --output_basename <basename>#: Specify a default basename for output files created by –write options when no filename is specified.

-p, --write_pdbqt <PDBQT_FILENAME> [*]#: Output PDBQT files with _rigid or _flex suffixes for flexible residues. Defaults to –output_basename if no filename is provided.

-j, --write_json <JSON_FILENAME> [*]#: Save the receptor’s parameterized configuration to JSON format. Defaults to –output_basename if unspecified.

-g, --write_gpf <GPF_FILENAME> [*]#: Output an AutoGrid input file (GPF). Defaults to –output_basename if not specified.

-v, --write_vina_box <VINA_BOX_FILENAME> [*]#: Generate a configuration file for Vina with grid box dimensions. Defaults to –output_basename if not specified.

--write_pdb <PDB_FILENAME> [*]#: Save the prepared receptor in PDB format. Must specify the filename.

Receptor Perception Options#

-n, --set_template <template>#: Assign templates to residues, e.g., A:5,7=CYX,B:17=HID.

-d, --delete_residues <residues>#: Specify residues to delete, e.g., A:350,B:15,16,17.

-b, --blunt_ends <positions>#: Blunt end definitions, e.g., A:123,200=2,A:1=0.

--add_templates <JSON_FILENAME> [*]#: Load additional templates from one or more JSON files.

-a, --allow_bad_res#: (Flag) Ignore residues with missing atoms instead of raising an error.

--default_altloc <location>#: Define a default alternate location for residues, overridden by –wanted_altloc.

--wanted_altloc <location>#: Specify alternate locations for particular residues, e.g., :5=B,B:17=A.

--mk_config <JSON_FILENAME>#: Specify a JSON configuration file for receptor preparation.

Grid Box Options#

--box_size <X Y Z>#: Set the size of the grid box in Angstroms (x, y, z).

--box_center <X Y Z>#: Define the center of the grid box in Angstroms (x, y, z).

--box_center_off_reactive_res#: (Flag) Shift the grid box center 5 Å along the CA-CB bond from CB. Applicable only when there is one reactive flexible residue.

--box_enveloping <FILENAME>#: Adjust the grid box to enclose atoms in the specified file. Supported formats: .sdf, .mol, .mol2, .pdb, .pdbqt.

--padding <value>#: Set padding around atoms specified in –box_enveloping (in Å).

Flexible and/or Reactive Options#

-f, --flexres <residues>#: Define flexible residues by chain ID and residue number, e.g., -f “:42,B:23”.

-r, --reactive_flexres <residues>#: Define reactive flexible residues by chain ID and residue number, e.g., -r “:42,B:23”. Maximum of 8 reactive residues.

--reactive_name <residue:atom>#: Specify the reactive atom name for a residue type, e.g., –reactive_name “TRP:NE1”. Can be repeated for multiple assignments.

-s, --reactive_name_specific <residue:atom>#: Specify the reactive atom for an individual residue by residue ID, e.g., -s “A:42=NE2”. The residue becomes reactive.

--r_eq_12 <value>#: Set the equilibrium distance (r_eq) for reactive atoms in 1-2 interactions. Default is 1.8 Å.

--eps_12 <value>#: Set the epsilon value for reactive atoms in 1-2 interactions. Default is 2.5.

--r_eq_13_scaling <factor>#: Scale r_eq for 1-3 interactions across reactive atoms. Default is 0.5.

--r_eq_14_scaling <factor>#: Scale r_eq for 1-4 interactions across reactive atoms. Default is 0.5.

mk_prepare_receptor.py

Contents