Saturday, November 24, 2012

Entropy and degeneracy: the equation no one tells you about but everyone uses

A useful equation: $S=R \ln(g)$
Open any P-Chem textbook and you'll find this expression for the entropy (and often a reference to the fact it is inscribed on Boltzmann's tombstone):$$S=k\ln(W)$$ $W$ is the multiplicity of the system, i.e. the number of (microscopic) arrangements producing the same (macroscopic) state, and is given by$$W=\frac{N!}{N_1!N_2!N_3!...N_g!}$$Here $N$ is the number of molecules and $N_i$ is the number of molecules with a particular microscopic arrangement $i$ of which there are $g$ different kinds.

Confused?  Believe me you are not the only one, and most scientists never use this form of the equation anyway.  Instead they usually assume that all these microscopic arrangements have the same energy or are degenerate (same thing).  This means that each macroscopic arrangement is equally likely and $N_1=N_2=...=N/g$.  This simplifies the expression for the multiplicity, $$W=g^N$$and entropy$$S=Nk\ln(g)$$significantly and for a mole of molecules we have  $$S=R\ln(g)$$This formula relates the entropy to the degeneracy $g$, the number of microscopic arrangements with the same energy

A simple example
Let's say two molecules A and B bind to a receptor R through a single hydrogen bond (indicated by "||||" in the figure) with the same strength.

If you mix equal amounts of A, B, and R you will get more R-A than R-B at equilibrium even though the hydrogen bond strength is the same in the two complexes.  This is because molecule A can bind in four different ways while B can only bind one way, i.e. the R-A complex has a degeneracy of four ($g=4$) and the R-B complex has a degeneracy of one ($g=1$). Put another way, the R-A complex is more likely because it has a higher entropy ($S=R\ln(4)$) than the R-B complex ($S=R\ln(1)$).

Ifs, ands, or buts
Of course this is a simplified picture where we only focus on conformational entropy and ignore contributions from translation, rotation and vibration, not only in the complexes but also for free A and B.

Also it is quite unlikely that the hydrogen bond strength for two molecules will be identical or that molecule A will be perfectly symmetrical so that the four binding modes are perfectly degenerate. In general $S=R\ln(g)$ will give you an estimate of the maximum possible value of the conformational entropy.

See for example this interesting blog post on a paper were the authors rationalize the measured difference in binding entropy in terms of conformation.  As I point out in the comments section, the conformational entropy difference ($S=R \ln(2)$) is smaller than the measured entropy difference, so there must be other - more important - contributions to the entropy change.

Derivation
If $N_1=N_2=...=N/g$ then $$W=\frac{N!}{(N/g)!^g}$$For large $N$ we can use Stirling's approximation,$x!\approx (x/e)^x$ $$W=\frac{(N/e)^N}{(N/ge)^{(N/g)g}}\\W=\left(\frac{N/e}{(N/e)(1/g)}\right)^N\\W=g^N$$

Other posts on statistical mechanics

Thursday, November 15, 2012

Dear FEBS Journal: I only review for arXiv-friendly journals

From: Jan Halborg Jensen
Sent: Thursday, November 15, 2012 3:50 PM
To: xxx
Subject: Re: FEBS Journal Manuscript xxx
Dear xxx

Thank you for your invitation to review for FEBS Journal.  I only review for journals that allow pre-print deposition on servers such as arXiv.  According to your instructions to authors this does not appear to be the case, so I must decline.  If FEBS Journal does allow for depositions of preprints please let me know.

Best regards, Jan

Wednesday, November 14, 2012

How to deal with rejection in science in two easy steps

Step 1: make a list your recent rejections

Proposals
Update: Nov 20: Danish Council for Strategic Research (NABIIT, second-round):
The program committee found that this was an interesting, relevant and support-worthy application associated with an important and fundamental issue.
Unfortunately, however, when prioritizing support-worthy applications yours was not not given sufficiently high priority to receive support.
Nov 12: European Research Area – Industrial Biotechnology framework (second-round):
We would like to inform you that your proposal EIB.12.031 DZYME was evaluated positively by the ERA-IB Expert Panel in its meeting on the 16th October. Unfortunately, not all positively evaluated proposals can be granted due to a limited budget of some of the national and/or regional funding organisations.

Oct 10: The Danish Council for Independent Research | Technology and Production Sciences (FTP; translated from Danish):
The council finds your application worthy of support.  However, the councils funds are insufficient to fund all qualified applications.  The council has in this round funded 29 applications out of 280.
Oct 3: The Danish Council for Independent Research | Natural Sciences (FNU; translated from Danish):
Your application was found very worthy of support.  This means that your professional qualifications, your CV, and your project was of such quality and character that it would have been funded had there been sufficient funds.
Papers (see this post)
Nov 1: Physical Chemistry Chemical Physics
All manuscripts submitted to Physical Chemistry Chemical Physics are initially evaluated by the Editors to ensure they meet the essential criteria for publication in the journal. I’m sorry to say that on this occasion your paper will not be considered further because it is not of sufficient novelty and impact to appeal to our readership.
Sep 29: Journal of Chemical Information and Modeling
"In my judgment, your submission is inappropriate for JCIM; it would be rejected upon full review."

Step 2: well ... uhm ...
D'oh!

New paper: In silico screening of 393 mutants facilitates enzyme engineering of amidase activity in CalB

Martin's latest paper was submitted to arXiv September 20th but I only get around to blogging about it now.  Here's the abstract
Our previously presented method for high throughput computational screening of mutant activity (Hediger et al., arXiv:1203.2950) is benchmarked against experimentally measured amidase activity for 22 mutants of Candida antarctica lipase B (CalB). Using an appropriate cutoff criterion for the computed barriers, the qualitative activity of 15 out of 22 mutants is correctly predicted. The method identifies four of the six most active mutants with ≥3-fold wild type activity and seven out of the eight least active mutants with ≤0.5-fold wild type activity. The method is further used to screen all sterically possible (386) double-, triple- and quadruple-mutants constructed from the most active single mutants. Based on the benchmark test at least 20 new promising mutants are identified.
Here's the story behind the paper:
I was part of a 3-year EU collaborative project that ended this Spring. One of the sub-projects, headed by Allan Svendsen at Novozymes, was to generate mutants of the lipase CalB that increased its amidase activity, i.e. make it hydrolyze (O=)C-N(H) bonds instead of (O=)C-O bonds.  So, every 6-8 months or so Allan would say "we can make a new batch of 5-10 mutants; what should they be?"

Well, for the first two years we didn't really have suggestions since we were developing a method to screen a large number of mutants in a short period of time.  So instead ideas for single-mutants were generated using the usual method of educated guessing based, essentially, on visual inspection of the structure.  During the last year we were able to computationally test the mutants before they were made to offer real predictions.  Towards the end of the project our method was finally sufficiently automated to computationally test all possible double-, triple-, and quadruple mutants that could be made from the single-mutants and we found some very promising ones, but the grant ran out before they could be tested experimentally.

Comparing to experiment we found that had we had the method at the start of the project we would have found most of the mutants with increased amidase activity and ruled out most of the mutants with amidase activity lower than the wild-type.

However, the mutant with highest amidase activity is only 11 times more active than wild-type, and we predict this mutant to have a significantly higher barrer than wild-type.  Also, we don't predict right ranking of activity.  This makes it difficult to publish in academic journals that focus on impact as we will see next.

Submitting the manuscript
We first sent the paper to Journal of Chemical Information and Modeling who said "In my judgment, your submission is inappropriate for JCIM; it would be rejected upon full review."

Then we considered ChemBioChem but when asked if they consider manuscripts submitted to arXiv they said "ChemBioChem does not consider manuscripts that have been published and available, including on electronic resources such as arXiv. Our statement on this can be found in our Notice to Authors."

Then we tried Physical Chemistry Chemical Physics who said "All manuscripts submitted to Physical Chemistry Chemical Physics are initially evaluated by the Editors to ensure they meet the essential criteria for publication in the journal. I’m sorry to say that on this occasion your paper will not be considered further because it is not of sufficient novelty and impact to appeal to our readership."

So, now the paper is under review at Journal of Molecular Catalysis B: Enzymatic.  I had initially gently vetoed that journal since it is published by Elsevier, which I boycott.  But when I signed the boycott I was aware that I may have to break that boycott since my name rarely is the only one on the paper.  Anyway, in return the paper gets submitted to PLoS ONE (which was my first choice) if this journal rejects.

Should this study be published?
We don't identify a very active mutant.  There is little direct correlation between computed barriers and observed activity.  The most promising mutants identified computationally are not tested experimentally.

Yes, but this method appears to be the only method capable of computing the effect of mutations on barriers for hundreds of mutants in a practically relevant amount of time.  If you are faced with the problem of "which mutants do I start making a month from now" this method is a viable alternative to guessing, which otherwise is the only other option other than random mutagenesis that I can see.