Wednesday, June 27, 2012

Reviews of Casper's second PLoS ONE submission

The reviews for Casper's second PLoS ONE submission, submitted around May 25th are in.  Let it not be said that all #openaccess papers are carelessly reviewed!

I have all but skimmed it but the only major concern of Reviewer #1 is of course very easily dealt with (good point by the way!).

Similarly, the major criticism of Reviewer #2 is also easily dealt with.  We were of course aware of the FMOUtil program, but I am not sure what a comparison would entail?  We should of course point out any differences in the input files produced by FragIt and FMOUtil for polypeptides and emphasize the advantage of FragIt (general applicability) by doing a few more examples like DNA and perhaps a large ligand bound to a protein?


PONE-D-12-14697
FragIt: A Tool to Prepare Input Files for Fragment Based Quantum Chemical Calculations
PLoS ONE

Dear Mr Steinmann,

Thank you for submitting your manuscript to PLoS ONE. After careful consideration, we feel that it has merit, but is not suitable for publication as it currently stands. Therefore, my decision is "Major Revision."

We invite you to submit a revised version of the manuscript that addresses all the critical points raised by both reviewers.

We encourage you to submit your revision within sixty days of the date of this decision.

When your files are ready, please submit your revision by logging on to http://pone.edmgr.com/ and following the Submissions Needing Revision link. Do not submit a revised manuscript as a new submission.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

Please also include a rebuttal letter that responds to each point brought up by the academic editor and reviewer(s). This letter should be uploaded as a Response to Reviewers file.

In addition, please provide a marked-up copy of the changes made from the previous article file as a Manuscript with Tracked Changes file. This can be done using 'track changes' in programs such as MS Word and/or highlighting any changes in the new document.

If you choose not to submit a revision, please notify us.

Yours sincerely,

 xxx (name removed upon request)
Academic Editor
PLoS ONE

Reviewers' comments:

Reviewer #1: Review of FragIt: A Tool to Prepare Input Files for Fragment Based Quantum Chemical Calculations

Steinmann et al. present a description of FragIt, a command-line program to prepare GAMESS input files for Fragment Molecular Orbital Calculations. Fragments are generated by bond cleavage at bonds that match particular SMARTS patterns, which simplifies the procedure in particular for polymers such as proteins and oligosaccharides. As the authors point out, without a tool of this sort, it would be next to impossible to prepare input files for any but the smallest proteins.

The manuscript is overall clear and well-written, but could have benefited from a final careful edit before submission as there are many trivial issues. I tested out the software, and it worked without problems (thank you for including a test file).

Major:
(1) This paper does not describe a particular version of the software. Please make a version 1.0 release available from the Github download site or elsewhere. Given that most computational chemists may not be familiar with Github, please provide a simple download link for the software to be downloaded.

Minor:
(1) Spellings: Copenahgen, dilengtly, indepth, chargetransfer, "rise to further" not "rise further", "in Figure" not "on Figure", "nescessary", "input file for" not "input file to", "because we" not "because of we", "aswell", "supplimentary"
(2) Page 1, line 52: Please describe what is behind the "etc". (I would like to know)
(3) Is this tool only specific to proteins and oligosaccharides? Can you make this clearer? The MMFF94 forcefield only supports a limited subset of atom types, and already in the introduction at page 1, line 52, you start talking about proteins. If this is the case, perhaps you should also mention this in the title.
(4) This tool appears to be specific to GAMESS. While I would encourage you to develop your software in such a way that it could be used for other comp chem packages, if that is not your goal, perhaps you should name GAMESS specifically in the title, i.e. A Tool to Prepare GAMESS Input Files. Otherwise you may mislead or disappoint the reader, and in this way you also highlight it to GAMESS users.
(5) Colloquialisms: "doable", "mean feat", "we've", "don't". While I have no particular problem with the use of these, such expressions are not generally found in the literature and may be confusing to non-native speakers (who form the majority of the readership).
(6) The figure numbers correspond to the legends, but not the actual figures in the PDF provided. Naturally, this caused me some confusion.
(7) I found it unusual that the Results section preceded the description of the software and dataset. In fact, I skipped reading the Results until the end, and I would recommend you consider rearranging the manuscript accordingly.
(8) Is there software available already which can already perform the same action? This wasn't very clear to me. For example, you reference Open Babel on Page 2 but Open Babel has no particular abilities to prepare FMO files.
(9) As someone who is not familiar with FMO calculations, I would have been interested to read a general description of why they useful, how fragments should be decided, what is the associated decrease in accuracy, the increase in speed, and so forth. Perhaps the authors could consider a paragraph on this as this feeds into why the software is important.
(10) EFMO is mentioned on Page 2, line 33, but not again in the paper. Is this supported or not (and how does it differ)?
(11) Abbreviations such as FMO, EFMO, SMARTS should be listed with capitals, e.g. "Fragment Molecular Orbital" not "fragment molecular orbital"
(12) Page 5, line 39: Remove line referencing Pybel if not used.
(13) Page 5, Line 53: Here it says that "explicitly defined valid pairs of atoms" can be used instead of substructure patterns, but elsewhere in the manuscript there is no example of this. Could the authors provide an example of usage?
(14) Page 6, line 10: do you mean "combines two or more adjacent fragments?" (subsequent means neighbouring in time)
(15) Page 6, line 33: What does FMO/FD stand for?
(16) Page 6, line 37: This sentence is confusing. How about "One defines a central fragment as above and a distance. Fragments which have atoms with this distance..."
(17) Page 8, line 26: "--output-active-atoms" not "--output-active-distance" according to the version of the software I downloaded.
(18) Please add a section in the manuscript describing the contents of the Sumpplementary Section.
(19) References. Reference formatting is not consistent. SOmetimes journal titles abbreviated, sometimes not. Should be Avogadro in Ref 9 not Open Babel. All titles are in lowercase even though this may be incorrect (e.g. ref 8).
(20) Figures. Which are the active, inactive and frozen regions in Figure 6?
(21) Consider including a LICENSE.txt with the source code. This makes the license more obvious.


Reviewer #2: The manuscript describes FragIt, a novel and flexible method and corresponding
software package to prepare input data for quantum chemical calculations using
fragmentation methods (the FMO method in particular).  The presented results
are rather narrow in focus (FMO method of the GAMESS package for polypeptides
and polysaccharides), but generally applicable and easily extensible in theory.

The described software is open source and builds on other cheminformatics
software (Open Babel and PDB2PQR).  It is written in python which makes it
easily usable and modifyable by most computational chemists.  Due to the use of
SMARTS patterns which can be specified at run-time, no software engineering is
needed to apply FragIt to different systems.

Our main criticism is the lack of comparison with the available FMOUtil package
(see general comment 4 below).  This point must be addressed before
publication in our opinion.

Less important but still higly recommended for consideration are our general
comments 1, 2, 3 and 5 about the abstract/introduction - we think some
rewriting would add to the clarity of the manuscript.

As such we recommend to accept this manuscript for publication provided the
above points (general comments 1-5) have been addressed and the other points
considered (unless they indicate clear mistakes).

General comments:

1. The beginning of the introduction mentions fragmentation methods in general
but lacks a clear statement about which method they support in particular.
From the rest of the manuscript, it appears that FragIt is (for now) specific
to Kitaura's and Fedorov's FMO method as implemented in GAMESS. As such, it
would appear appropriate to quickly summarize the method and its main features
to the PLOS One readership.

2. The second part of the introduction discusses the "tedious tasks" one has to
undertake to setup program packages for fragmentation methods and mentions some
prior software which "can perform some of these tasks".  However, these tasks
are not presented at this point (they appear to be implicitly presented by
explaining the work-flow and fragmentation algorithm later on).  While a
thorough discussion of them should be left for later sections, we think it
would add to the manuscript to present them in the introduction.

3. The discussion of prior software packages with similar scope is lacking.  The
authors cite Avogadro, Open Babel and Facio.  Avogadro is a molecular modelling
application with support for GAMESS input deck generation, but (to our
knowledge) not including fragmentation.  Open Babel is a general cheminformatics
toolkit and has (to our knowledge) no special features targetted at
fragmentation methods besides the SMARTS handling the authors are using in
FragIt.  Facio appears to be a general-purpose molecular-modelling
application includings GAMESS input file / visualization support.  The
screenshot at http://www1.bbiq.jp/zzzfelis/FMO2.jpg implies it includes support
for the FMO method, so the authors should discuss which tasks (see above point
2) Facio is unable (or only poorly/difficully able) to perform compared to
FragIt.

Further, PEACH (http://www.cbi.or.jp/~nakata/peach/4.8/peachw48.html is a
related link, the main link appears to be gone) appears to be able to fragment
DNA and dump GAMESS output (besides ABINIT output), but its availability and
usage is unclear to us and has not been evaluated.  Nevertheless, it should
probably be mentioned along the others as well.

4. In addition to the above general comment 3, the authors completely ommitted
FMOUtil (http://staff.aist.go.jp/d.g.fedorov/fmo/fmoutil.html), written by the
authors of the FMO method itself.  It appears to be the most obvious choice for
comparison: it is similarly licensed as FragIt (GPL version 2, according to the
source file and program output) and of similar usage (command-line tool dumping
a GAMESS input file), albeit of more limited scope (restricted to
proteins/polypeptides).  An additional feature of FMOUil appear to be grouping
of glycine residues to the previous fragment, the authors could address why this
is not done in FragIt and/or add this if easily implemented via SMARTS.

As most of the results are about polypeptides, we would even advise to include
FMOUtil fragmentation results for comparison in the results sections, rerunning
the fragmentation with FMOUtil should not take long.

5. In the light of general comment 4, the unique advantages of FragIt are not
very prominently mentioned.  The authors supply patterns for polypeptides and
polysaccharides.  Due to the residue-centric PDB format, fragmenting proteins by
residue appears to be a simple problem, while polysaccharides are probably a
very less common target for quantum chemistry applications in need of
fragmentation methods.  The unique advantages compared to e.g. FMOUtil appear to
be (i) the possibility to fragment peptides at different bonds along the
backbone than what is indicated in the PDB file and (ii) to easily fragment
arbitrary other types of polymers without the need to change the source code or
otherwise difficult setups.  Point (ii) is only indirectly addressed on page
7/line 55 in a comment on testing new pattern.

As such it would make a stronger case for FragIt if further patterns for other
types of polymers (the most notable example would be nucleotides) were
presented or at least their easy development stressed more.  The last section
(Availability and Future Directions) mentions DNA (page 9/line 5), but at the
very end of the manuscript, and only one more example (solid state systems) is
mentioned, which underplays the apparent versatility of the SMARTS approach in
FragIt in our opinion.

Rewriting parts of the abstract to this end should be considered as well.

6. The grouping of the sections is unusual: the results are presented before
the Design and Implementation.  From reading the manuscript, one gets the
feeling it has originally been written the other way around, e.g. the PDB codes
are only mentioned at the very end when the dataset is described, not on the
first mention of the respective proteins and the figures 2-4 are mostly generic
figures whose referencing in the Results section looks slightly misplaced.

7. The results are strictly limited to the fragmentation and their respective
merits are discussed without backing from QM calculations.  While full-scale QM
calculations would likely be out-of-scope for the manuscript, a more thorough
discussion (possibly with references to quantitative comparisons) about which
fragmentation results are desirable in general would useful in order to better
assess the various FragIt options.  The authors refer to references 21 and 22
with respect to the fragementation of peptide bonds, but fragmentation sizes
etc. are not discussed that much.

8. The terms of the availability (especially the software license) are not
clearly stated in the Availability and Future Directions section.  According to
the code on github, FragIt is licensed under the GPL, version 2 (or later); we
believe it should at least be mentioned that it is distributed under an open
source license.  We would also suggest to include a (web-)citation for the
FragIt code for indexing and linking purposes instead of the single inline URL
on page 8/line 53.

9. Although the FMO-specific parts of the generated GAMESS input is the crucial
part of FragIt, it seems to write a complete GAMESS output including hardcoded
default settings for the ab initio method and basis sets.  Those
non-fragmentation specific parts of the input are probably supposed to be edited
by the user to their needs.  FMOUtil (see general comment 4) apparently includes
support for various methods and basis sets implementing the FMO method, so
discussing the nature of the default SYSTEM/SCF/CONTROL/BASIS groups in the
Writing the Input Files section should be considered.

10. The section Availability and Future Directions could discuss possible
interaction or inclusion of FragIt in other (open-source) software packages, the
most likely being Avogadro (reference 9), which already includes a sophisticated
GAMESS input deck generator and a plugin infrastructure including python
support.  Avogadro being a graphical application might eben make it possible to
interactively manipulate fragment boundaries.

11. The paper is sometimes written in a somewhat informal style (see e.g.
page 2/line 15/16, "is [...] no minor task when you have hundreds of fragments")
and we encountered several language issues or errors that we have not explicitly
mentioned.

Specific comments:

Page 1, Line 39: we suggest adding a citation to the recent Chem. Rev. review
(DOI: 10.1021/cr200093j) of fragmentations methods to the end of the sentence
"[...] such as fragmentation methods".

Page 1, Line 48: the pointer to the supporting information regarding the
complexitiy of input files for fragmentation methods compared to conventional
methods seems unnecessary here.  First off, there are numerous fragmentation
methods and some might not require much more complex setup than conventional
ones.  Further, the supporting information does not appear to contain any
conventional input files for comparison anyway.

Page 2, Line 37: PDB2PQR is not only available online, it can be freely
downloaded and installed locally for (offline) use with FragIt.

Page 2, Line 38: the reference to the RECAP algorithm without further
explanation looks misplaced in the introduction and could be moved to the Design
and Implementation section.

Page 3, Lines 14-22: the usefulness of the molecular cluster example is unclear.
If the only reason for its inclusion is to show FragIt will fragment seperate
molecules to one fragment each, this could perhaps be folded into the Design and
Implementation section.  The specific example (16 water molecules and one
tyrosine) leads to 16 very small and one big fragment.  It is unclear how this
is a "good" result considering the size difference of the fragments; if this
results is indeed better than grouping maybe 2-3 water molecules into one
fragment, this should be discussed.  We suggest to omit or move this subsection.

Page 4, Lines 10-19: the description of figure 5 is hard to follow.  Several
fragments in figure 5 have the same color, and the text does not discuss them,
just summarized the fragment size.  It would be clearer if the text would
mention the color and/or fragment number according to the panels in figure 5.
Another possibility would be to enhance the caption of figure 5 to include that
explanation.

Page 4, Lines 21/23: it is unclear whether the lack of fragmentation along
disulfide bonds "unless a specific pattern is supplied" is intentional (and thus
desired), or simply a missing feature.  Further, the discussed results include
no example with disulfide bonds, while the test set does (see page 7/line 14),
though also without discussion.

Page 6, Line 30: we assume fragement I is the central fragment mentioned before,
it could be mentioned explicitly to make this clearer.

Page 6, Line 44: (web-)citations for PyMOL and Jmol should be provided
(reference 29 cites Jmol, but only later in the manuscript on page 8/line 46).

Page 6, Line 44: "visually inspect" is vague, we assume the output scripts
include markup so that PyMOL/Jmol will color each fragment differently (as
shown in the figures), this could be mentioned more specifically (perhaps
referencing one of the figures), including possible other features.

Page 7, Line 26: (web-)citation for NumPy could be provided considering Python
got cited earlier in the manuscript as well.

Idem: The strict dependency on Python 2.6 is not explained; if this was indeed
the case, it would seriously narrow the field of use.  Another possibility is
that the "(or greater)" after Numpy applies to Python as well, in this case this
should be made clearer.  If Python 2.6 is just the version used by the authors,
we suggest adding language like "has been tested/validated with Python 2.6".
The same applies for OpenBabel, the supported version should be mentioned here;
as reference 10 specifies version 2.3.0, this could be mentioned here as well.

Comments on References:

1. Several journals are not abbreviated correctly, e.g. reference 1 (J Comput
Chem), reference 2 (J Chem Theory Comput), reference 11 (J Cheminform) or
reference 27 (Chem Cent J).  We did not check all references for this.

2. Reference 5: the bibliographical data for the book "CRC" is incomplete, it
should be something like "CRC Press, Boca Raton, FL".

3. Reference 9: wrong title, it should probably read "Avogadro v.1.0.3.".

4. Reference 16: the ID for the arXiv preprint is missing a dot, it should be
"1202.4935".

5. Reference 23: The URL is not marked up like the other URLs in e.g.
references 24-26.

Comments on Figures:

1. The figures are referenced out-of-order in the text in the sequence
1-2-5-3-4-7-6-8.  Further, the captions and uploaded images do not match;
uploaded figure 2 is labelled "algorithm" (figure 7 caption) and every later
figure is offset by one between the uploaded figure and the figure caption due
to this.

2. Figure 1: similar to the results it depicts (see comments to page 3/lines
14-22 above), the usefulness of figure 1 is dubious.  It displays the
water/tyrosine cluster in two panels, with regular atom-coloring in the first
panel and fragment coloring in the second.  However, for the second panel, there
are fewer different colors than there are fragemnts, so several fragments have
the same color.  As the fragments are not otherwise labelled, the actual result
is contradicted, i.e. that every independent molecule is a seperate fragment.
We suggest removing the figure or at least adding labels to the repeatedly
colored fragments and/or clarify the coloring/fragmentation in the caption.

3. Figure 6: the caption should explain the coloring with respect to the
mentioned regions, possibly also denoting the fragment indices so that
the figure can be understood for black-and-white printouts.

4. Figure 7: the caption looks superflouos and unnecessary.

Comments on Tables:

1. Table 2: the comment "d.o." in rows 2-6 is unclear to us - pardon our
ignorance if this is an otherwise common abbreviation.  If it signifies "see
above", "v.s." (vide supra) could be used instead, or just the first comment
"capped with methylene" be repeated.

Comments on Software:

1. No release has been made of FragIt so far, only the git trunk is available.
We advise on releasing a version and tarball in conjunction with the paper for
future reference.  This must not be a 1.0 release.

2. Only version 2.6 of python is supported (the documentation mentions ongoing
work to support further python versions).  As FragIt is not a big project (below
2k lines of code, 25% of which belongs to the test suite), we suggest removing
the python-2.6 dependency for a public release (see software comment 1), if
possible.

3. The config file format conflates generic configuration values ("writer", the
default patterns, output options) with specific results for a particular PDB
file ("explicitfragmentpairs" and "explicitprotectatoms").  This might be due to
option parsing, but is non-intuitive.  The same goes for requiring a PDB file
for dumping the configuration values into a file.

4. It is not possible to enable/disable protection patterns via command-line
options (at least according to --help output) the only way to do this seems to
be by changing the config file.  Compared to some of the other command-line
options, this one seems rather important to us so we suggest adding it.

5. Including a (probably trivial) setup.py script for installation and
deployment as is customary for python software is advised.