## Tuesday, May 3, 2016

### Enzyme design: My latest paper explained without the jargon

Our latest paper has just appeared in the open access journal PeerJ.  It's ultimately related to making better enzymes so first some background.

Background
Enzymes are proteins that make certain chemical reactions go faster and nearly every complex molecule in your body is made by, or broken down by, enzymes.  But people have also started using enzymes in commercial products, for example in washing powder to break down oily stains at lower temperatures. This saves money on heating the water and, being proteins, the enzymes are biodegradable. So there is a lot of interest in designing new enzymes that build or break down new molecules efficiently. For example, there is a rather large company (Novozymes) near Copenhagen that does nothing but design, produce, and sell enzymes on an industrial scale.

Designing new enzymes currently involves a lot of trial-and-error, so you have to pay a lot of smart scientists a lot of money for a long time to design new enzymes - a cost that is ultimately passed on to you and I as consumers. My long-term goal is to reduce the amount of trial-and-error by writing a computer program that can predict what changes you have to make to improve the enzyme before you ever make it in the lab.

I've had some modest success with a prototype program some years ago (you can find the papers here and here) for one enzyme.  But one of the many things we don't know is whether the method we base our approach on will work at all for other types of enzymes. The paper that just got published is a first small step in figuring this out.

The New Study
We've collected data for five other enzymes from other published papers that we trust reasonably well and tested two methods that are fast enough to design enzymes - one is the same method we used a few years ago and the other is a newer one that wasn't available to us before now.  The conclusion of our study is that the methods seem to work well enough for all but one system, and this system is different for the two methods.  This suggests that we can't just base future work on one method. We have to have both ready in case one of them fail.  We need to repeat the study for many other types of enzymes - I would say at least 15-20 more - and we need to improve the quality of the data so that we trust it completely, rather than "reasonably well".  In the paper we have extended an open invitation to other scientists to contribute to this effort.

## Saturday, April 16, 2016

### Computing pKa values for molecules with several ionizable groups

We're working on pKa prediction using semiempirical methods and need to compute pKa values  for molecules with several ionizable groups. Here are my current thoughts so far.

Background: one ionizable group
If there is only one tritrateable site
$$\mathrm{BH \rightleftharpoons B + H^+} \ \ \ K=\mathrm{\frac{[B][H^+]}{[BH]}}$$
then the fraction of $\mathrm{BH}$ molecules $f_{\mathrm{BH}}$ is $$\begin{split} f_{\mathrm{BH}} & =\mathrm{\frac{[BH]}{[B]+[BH]} } \\ & = \mathrm{\frac{[B]}{[B]}\frac{[BH]/[B]}{1+[BH]/[B]} } \\ & = \mathrm{\frac{[H^+]/K}{1+[H^+]/K} } \\ & = \mathrm{\frac{10^{p\textit{K}-pH}}{1+10^{p\textit{K}-pH}} } \end{split}$$ where $$\mathrm{pH = -log[H^+] \implies [H^+] = 10^{-pH}}$$
and similarly for $K$.

From this we can see that the pK value is the pH value for which $f_{\mathrm{BH}}$ = 1/2. So you compute the pK value from the standard free energy difference
$$\text{p}K =\left( \Delta G^\circ(\mathrm{B})+\Delta G^\circ(\mathrm{H^+})- \Delta G^\circ(\mathrm{BH})\right)/RT\ln(10)$$
and you're done.

Two ionizable groups
For a molecule with two titrateable groups ($\mathrm{HB_\alpha B_\beta H}$) and the following equilibrium constants $$\mathrm{HBBH \rightleftharpoons BBH + H^+} \ \ K_{\alpha1}$$ $$\mathrm{HBBH \rightleftharpoons HBB + H^+} \ \ K_{\beta1}$$ $$\mathrm{HBB \rightleftharpoons BB + H^+} \ \ K_{\alpha0}$$ $$\mathrm{BBH \rightleftharpoons BB + H^+} \ \ K_{\beta0}$$ The probability of, for example, $\mathrm{BBH}$ is $$f_{\mathrm{BBH}} =\mathrm{\frac{[BBH]}{[BB]+[BBH]+[HBB]+[HBBH]}= \frac{[BBH]}{\textit{P}}}$$ $f_{\mathrm{BBH}}$ can be rewritten in terms of pK values $$f_{\mathrm{BBH}} = \mathrm{\frac{[BBH]/[BB]}{\textit{P}/[BB]} = \frac{10^{p\textit{K}_{\beta0}-pH}}{\textit{P}/[BB]}}$$ where $$\mathrm{ \textit{P}/[BB] = 1+10^{p\textit{K}_{\alpha0}-pH}+10^{p\textit{K}_{\beta0}-pH}+ 10^{p\textit{K}_{\alpha0}+p\textit{K}_{\beta1}-2pH}}$$ Similarly, $$f_{\mathrm{HBB}} = \mathrm{\frac{10^{p\textit{K}_{\alpha0}-pH}}{\textit{P}/[BB]}}$$ and $$f_{\mathrm{HBBH}} = \mathrm{\frac{10^{p\textit{K}_{\alpha0}+p\textit{K}_{\beta1}-2pH}}{\textit{P}/[BB]}}$$  The apparent pK value of the $\alpha$ group ($\mathrm{p}K_{\alpha}$) is the pH at which its protonation probability $$f_\alpha =f_{\mathrm{HBB}} + f_{\mathrm{HBBH}}$$ is 1/2 and similarly for the $\beta$ group.  So compute the microscopic pK values (Eq 4-7), then $f_\alpha$ and $f_\beta$ as a function of pH, and then $\mathrm{p}K_{\alpha}$ and $\mathrm{p}K_{\beta}$

If one of the groups (say $\alpha$) titrates at a significantly lower pH than the other ($\mathrm{p}K_{\alpha1} << \mathrm{p}K_{\beta1}$) then $\mathrm{p}K_{\alpha}=\mathrm{p}K_{\alpha1}$ and $\mathrm{p}K_{\beta}=\mathrm{p}K_{\beta0}$ and it is not necessary to compute the free energy of $\mathrm{HBB}$, but it can be hard to determine this in advance.  Similarly, if there is no significant interaction between the sites then $\mathrm{p}K_{\alpha}=\mathrm{p}K_{\alpha1}=\mathrm{p}K_{\alpha0}$ and $\mathrm{p}K_{\beta}=\mathrm{p}K_{\beta1}=\mathrm{p}K_{\beta0}$ and one can skip one of the protonation states.

For $N$ ionizable groups one has to determine $2^N$ microscopic pKa values, which quickly gets out of hand if one has to do a conformational search for each protonation state and the molecule is large.

Related post
Generating protonation states and conformations

## Wednesday, April 13, 2016

### Reviewing for PeerJ: it's the little (and the not so little) things

I just did my first review for PeerJ and it was a real pleasure because there are a lot of "little things" that make your reviewing life easier:

1. Figures/tables are in the text and, get this, the captions are immediately above/below the corresponding figure/table.  Some other journals also do this, but not enough.

2. I annotate the mss in a pdf reader and usually this is a frustrating experience since the publisher generated pdf has all sorts of "quirks" that make highlighting and copying text hit and miss.  The previous pdf I reviewed turned every page with a figure into an image!  Annotating/copying in the PeerJ pdf worked flawlessly.

3. The pdf contained a 3 front pages with the due date, a summary of the review criteria, a link to the page with the supplementary material, and a link to the page where I should submit my review.  No hunting around for the email with the link! I teared up a little bit when I saw that.

Other "little things" include stuff like not having to rank the perceived importance or impact of the work on some bogus 1-10 scale, a strict policy on making the raw data available, and a button to click to make my review non-anonymous.

## Thursday, April 7, 2016

### Why I use Twitter: one scientist's perspective

Tomorrow I am giving a short talk on why and how I use Twitter to public relations people at the University.  I have 20 min + 10  min questions, but am aiming for a 10 min talk + 20 min questions.

## Tuesday, April 5, 2016

### ACS Omega is too expensive

Disclaimer: I applied to be a co-editor of this journal and was not selected.

ACS Omega has just announced its APCs: \$1500 (\$2000) under the ACS Authors Choice (CC-BY or CC-BY-NC-ND) license for members and an additional \$500 for non-members (since full membership costs$162, this is the real additional cost for one yearly publication).

The ACS Authors Choice license is not open access: under the ACS Authors Choice license you assign copyright to the ACS.  While
For non-commercial research and education purposes only, users may access, download, copy, display and redistribute articles as well as adapt, translate, text and data mine content contained in articles, ...
you still can't, for example, use a figure from such an article in a book chapter without the ACS permission.  Also, the ACS can, for example, sell your article or your figures.

So the cost for to publish OA is ACS Omega is $2000. That's more expensive than other impact neutral OA journals: PLoS One (\$1495), Scientific Reports (\$1495), F1000Research (\$1000), Rio Journal (\$850), PeerJ (\$100/author, Bio-only), and Royal Society Open Science (\$0, for now).

Since all journals are impact neutral and provide quick review (AFAIK) price is the main considerations and I see no reason to pay more for to publish in ACS Omega.

## Thursday, March 31, 2016

### Reviews for Towards a barrier height benchmark set for biologically relevant systems

2016.04.06: our rebuttal can be found here

The reviews our latest PeerJ submission (editor assigned February 25) came back last evening with the alway welcome verdict "minor revision".  The preprint has already received over 400 views and 85 downloads.

## Sunday, March 27, 2016

### Generating protonation states and conformations

We're working on pKa prediction using semiempirical methods and need to generate coordinates for all possible protonation states of a large number of amines and perform conformer search.  Here are my current thoughts so far.

We have the amine names so we can use cactus to generate SMILES strings and then Babel to generate coordinates and do the conformer search.  However, Babel only (1) generates neutral protonation states, (2) does not consider C-NH2 bonds in primary amines rotateable, (3) does not consider conformers relates to inversion of N.  Babel also has some additional problems that I'll describe at the end of the post.

Some easy fixes
Most of the problems can be addressed fairly easily. It was easy to write a program that reads a mol2 file (where atom typing is mostly done) and protonates all appropriate N atoms.

Similarly, it also seems easy to write a program that generates all possible protonation states by removing select protons from the fully protonated coordinate file.  Furthermore, by removing all protons on primary and secondary amines one at a time one will help solve the problems with C-NH2 rotation in primary amines and the inversion of secondary amines.

The main problem is the inversion of tertiary amines
Actually, it is not heard to generate a crappy guess at the inverted structure simply by placing the H atom on the opposite side of the tertiary N.  When do this manually in Avogadro by moving the H atom and optimising the structure it works fine.  But generating such coordinates in an xyz file and then passing them to Babel doesn't work because the atom typing is screwed up.  I tried generating mol2 files instead but for some reason Babel perform a conformer search starting from a mol2 file.

Another option could be to generate a z-matrix (gzmat) and then write code that does the inversion by changing the sign of select dihedral angles.

Yet another option could be to do it directly from the SMILES string, e.g. C[N@H+](F)Cl vs C[N@@H+](F)Cl.  The challenge here for me would be to only do this for select N atoms. In general writing a SMILES parser seems like a big job.

I am open to suggestions here