Sunday, March 27, 2016

Generating protonation states and conformations

We're working on pKa prediction using semiempirical methods and need to generate coordinates for all possible protonation states of a large number of amines and perform conformer search.  Here are my current thoughts so far.

We have the amine names so we can use cactus to generate SMILES strings and then Babel to generate coordinates and do the conformer search.  However, Babel only (1) generates neutral protonation states, (2) does not consider C-NH2 bonds in primary amines rotateable, (3) does not consider conformers relates to inversion of N.  Babel also has some additional problems that I'll describe at the end of the post.

Some easy fixes
Most of the problems can be addressed fairly easily. It was easy to write a program that reads a mol2 file (where atom typing is mostly done) and protonates all appropriate N atoms.

Similarly, it also seems easy to write a program that generates all possible protonation states by removing select protons from the fully protonated coordinate file.  Furthermore, by removing all protons on primary and secondary amines one at a time one will help solve the problems with C-NH2 rotation in primary amines and the inversion of secondary amines.

The main problem is the inversion of tertiary amines
Actually, it is not heard to generate a crappy guess at the inverted structure simply by placing the H atom on the opposite side of the tertiary N.  When do this manually in Avogadro by moving the H atom and optimising the structure it works fine.  But generating such coordinates in an xyz file and then passing them to Babel doesn't work because the atom typing is screwed up.  I tried generating mol2 files instead but for some reason Babel perform a conformer search starting from a mol2 file.

Another option could be to generate a z-matrix (gzmat) and then write code that does the inversion by changing the sign of select dihedral angles.

Yet another option could be to do it directly from the SMILES string, e.g. C[N@H+](F)Cl vs C[N@@H+](F)Cl.  The challenge here for me would be to only do this for select N atoms. In general writing a SMILES parser seems like a big job.

I am open to suggestions here

Additional problems
Babel does not generate the correct stereoisomer for molecules with chiral centers involving several rings.  It does print out an error message, but it remains to be seen how many of the amines we are interesting in are affected by this.

Babel also doesn't consider C-OH bonds rotateable so we have to somehow generate the start conformers separately.  This also applies to COOH groups.

Babel generates -COOH groups, so we also have to generate the COO- form.

Other programs
Perhaps we should look at CDK, RDkit, or PubChemPy instead.



This work is licensed under a Creative Commons Attribution 4.0

No comments: