Wednesday, November 30, 2011

Reinventing Discovery: Practical steps toward open science

What can you do if you're a scientist?

I recently finished reading Michael Nielsen's book Reinventing Discovery.  This is not another review of the book (there are many: Google it).  I'll just say it is a well written book on a very important topic, so go buy it now.

Towards the end of the book there is a section entitled "Practical steps toward open science" and one subsection is about what you can do if you're a scientist.  There are some good suggestions, some of which I'd like to expand on here, and I'd like to add some new suggestions as well.  The suggestions are roughly ordered in increasing effort (though not necessarily impact!), together with my own examples (where applicable).

If you have read the book it should be obvious why all these things are a good idea, so I'll focus on the "what" and "how", not the "why".  (Any questions, just leave a comment).  In all cases don't worry about the "audience": interested people will find your things using Google.

Recommend Michael's book to your colleagues
The book also makes a great graduation presents for PhDs.

Put your Powerpoint presentations on
You have many Powerpoint presentations and/or posters lying around.  Make a free account on Slideshare (it takes 5 minutes) and upload them. 

In addition to helping open science, Slideshare gives you viewing statistics, which are great for grant reports where you have to list talks. Furthermore, you have an online backup of your talks.
Example.  If you want to go one step further in sharing talks and papers see this example and this example.

Share papers and other things that interests you on Google+ and/or Twitter
If you have a Google account (or Gmail address) you have Google+, you just need to set it up.  Setting up a Twitter account takes 5 minutes.  When you come across an interesting paper or web site, share it on these sites.  I use Twitter for this since most journal pages have a "twitter button", but not yet a Google+ button.

You'll also often see a Facebook button, but items posted on Facebook are usually not discoverable by Google, so not very "open".  You might also want to check out Mendeley, CiteULike, Diggit, etc. but I don't have any direct experience with these sites yet.

In addition to helping open science, the list of your Google+ posts/tweets provides a convenient reminder of things you've found interesting.

Start a blog
Setting up a blog (for example using Google blogger) takes 10-20 minutes. The main trouble people have with blogs is finding something to write about.  They often blame this on lack of time, but I think it has more to do with preconceived notions on what and how often one should post.

My advice is to think of a blog as a giant bulletin board where you post things currently of interest to you without worrying if they are of interest to anyone else or if it relates to a particular theme.  The blog you're reading now has several examples: conference talks and posters, interesting papers, new papers from the group, summary of in-class discussions, things you've figures out and don't want to forget (here, here, and here), teaching material, links related to talkselectronic postcards to the group, pictures from conferences, etc., etc., etc.  (Note that this is a group blog: I didn't write all of the posts).

In addition to helping open science, this "bulletin board" can be indexed with keywords, so that I can easily find the things I wrote and share them with other people.
Other examples (including a movie).

Publish papers in PLoS ONE
(disclaimer: I have no direct experience with this yet)
PLoS ONE is an open access journal that accepts papers in any area of science.  If the work described in your paper is not supported by funding it's free, otherwise it is about €/$1000.  Among the advantages of PLoS ONE are: papers are not accepted based on perceived importance or impact, only on whether the work is carried out properly.  Nevertheless, the impact factor is quite high (4.4 for 2010). 

In addition to helping open science, (to put it very crudely) you can publish "low impact" papers (or even negative results!) in a relatively "high impact" journal.  PLoS ONE is indexed in Web of Science and PubMed so even if your colleagues are not familiar with this journal, they'll find your article.

Publish your preprint on arXiv or Nature Precedings
(disclaimer: I have no direct experience with this yet)
Michael's book has a nice explanation of the arXiv preprint server.  If your paper is in the area of physics, math or quantitative biology it's probably acceptable in arXiv.  Otherwise it is surely acceptable in Nature Precedings.  But be careful: some journals (notably ACS journals!) will currently not publish papers that have been deposited at these sites.  So check first.

Make your published data/code freely downloadable under an creative commons/open source license
Anything that went into the making the paper: spreadsheets, spectra, scripts, software, etc.

Open Notebook Science: Make your unpublished data/code freely downloadable under an open source license

Thursday, November 17, 2011

"Missing" MOPAC parameters

A long, long time ago in a land far, far away I implemented MNDO, AM1, and PM3 in GAMESS.  This was done by taking chunks of code from MOPAC that contained contained the parameters, integral code, and Fock matrix builder.  In doing so, I never noticed that the parameter file contained more parameters than where published in the papers describing the method.  This conundrum came back to haunt us recently as we're trying to implement PM6.

Of course, it's not a conundrum at all: the "missing parameters" are "simply" functions of the other parameters, so they are not missing but finding these functions is not so "simple" either, and I couldn't have done it without some very helpful emails from Jimmy Stewart.

To find expressions for two of the parameters, called $DD$ and $QQ$ in GAMESS, you have to dig out the MNDO paper from 1976 where they are given in equations (11) and (12)

$DD = \frac{5}{\sqrt 3 }\frac{{\left( {4 \cdot ZS \cdot ZP} \right)^{5/2} }}{{\left( {ZS + ZP} \right)^6 }}$   (1)

$QQ = \sqrt {\frac{3}{2}} \cdot \frac{1}{ZP} $   (2)

The $AM$ parameter turns out to be related to $\rho_0$ in the MNDO paper, and is just the $GSS$ parameter in Hartree units (1 Hartree = 27.21 eV).

$AM=\frac{GSS}{27.21}$   (3)

Similarly, AD and AQ are then related to $\rho_1$ and $\rho_2$ which "are calculated by numerical methods$^{20}$", where reference 20 is a paper on NDDO integrals.  Here, a few sentences on page 95 provided sufficient clues:

$AD=\frac{1}{2 \rho_1}$   (4)

where $\rho_1$ is the solution to
$\frac{1}{2} \left( 4\rho_1^2 \right)^{-1/2}-\frac{1}{2} \left( 4DD^2+4\rho_1^2 \right)^{-1/2}=\frac{HSP}{27.21}$   (5)

which is equation (56) where $R = 0$ and $D_1^A=D_1^B$.  Back then I assume they wrote some Fortran code to solve the equation iteratively, but now this can be done in a few seconds using a web browser.

Similarly, for $AQ$:

$AQ=\frac{1}{2 \rho_2}$   (6)

$\frac{1}{4} \left( 8QQ^2+4\rho_2^2 \right)^{-1/2}-\frac{1}{2} \left( 4QQ^2+4\rho_2^2 \right)^{-1/2}+\frac{1}{4} \left( 4\rho_2^2 \right)^{-1/2}= \frac{HPP}{27.21}$   (7)

where the latter equation is derived from equation (62) in the NDDO integral paper (more on $HPP$ below).

This leaves us with the last parameter, $EISOL$, which corresponds to $E_{el}^A$ in the MNDO paper: which "are calculated from restricted single-determinantal wave functions using the same approximations and parameters as in molecular NDDO calculations."  Not much to go on if you ask me, inspection of the parameters themselves held some clues.  For example for hydrogen $EISOL=USS$

So, after a bit of fiddling around I was able to reproduce $EISOL$ for beryllium $\left( [He]2s^2 \right)$ (Remember that semiempirical methods ignore the core electrons):

$\begin{aligned}EISOL&=2h_{11}+J_{11}+4J_{12}-K_{12}+2h_{22}+J_{22}\\&=2h_{22}+J_{22}\\&=2USS+GSS\end{aligned}$   (8)
Carbon $\left( [He]2s^22p_x^12p_y^1 \right)$ was a bit trickier to verify, mainly because I still didn't know what $HPP$ was:


Luckily, Jimmy Stewart knew this from the top of his head: $HPP = (GPP-GP2)/2$

Update: I just found some useful pages here and here

Update: For some elements $GPP < GP2$.  In that case use $HPP = min(0.1,HPP)$  (Thanks again, Jimmy!)

Update: Turns out that you can get these parameters printed out in MOPAC using the HCORE keyword:

$DD = DD2$

$QQ = \sqrt{2}DD3$

$AM = \frac{1}{2 PO1}$

$AD = \frac{1}{2 PO2}$

$AQ = \frac{1}{2 PO3}$

(This post was written using MathJax, which is very easy to install on Blogger)

Friday, November 4, 2011

Pen- and screencasting

Some links for todays "educational" seminar on pen- and screencasting

Tools for pencasting
The Smartpen
The Replaynote app for the iPad Limited to 10 min recordings
The Wacom Bamboo Tablet/Screenflow (Mac) or Camtasia (Windows)

Free screencasting software, limited to 15 min recordings
The Khan Academy collection of thousands of pencasts
9. klasses matematikpensum som pencasts

Uses for pencasting
Review of lectures
Supplementary lectures
Homework problems/solutions

Other uses for screencasting
Powerpoint lectures
How to use programs such as Maple
Research presentations
Summary of research papers
Animation in Powerpoint