Saturday, January 21, 2017

Prediction of the Regioselectivity of Electrophilic Aromatic Substitution Reactions of Heteroaromatic Systems Using Semi-Empirical Quantum Chemical Methods

Art Winter tweeted this paper by Morten Jørgensen and co-workers last year and I decided to see if semi-empirical methods could help here.  The paper uses Chemdraws chemical shift predictor to predict where a bromine atom will be added to a heteroaromatic molecules using electrophilic aromatic substitution reactions. They tested this on 132 different compounds and achieved an 80% success rate, which is very good.

Googling a bit let me to this paper by Wang and Streitwieser where they show a correlation between the rate of electrophilic aromatic substitution reactions and the lowest proton affinity of the protonated species.  This suggests that the protonated carbon with the lowest proton affinity (or pKa if solvent is included) should be the reacting carbon.  So I tested this using semiempirical QM methods for these 132 compounds.  When I say "I" I should say that +Jimmy Charnley Kromann ran many of the calculations and Monika Kruszyk provided most of the structures as Chemdraw files, which I could convert to SMILES strings using OpenBabel. These are preliminary results and may contain errors. 

The reactions for the 132 compounds are not all run in the same solvent, so I first tested gas phase, chloroform (i.e. dielectric 4.8) and DMF (dimethylformamide, dielectric 37) using PM3 and COSMO in MOPAC. I chose PM3/COSMO because that gave the best results in a previous pKa study. The most representative choice of solvent seems to be chloroform, where PM3/COSMO predicted the correct bromination site in 95% of the cases, i.e. it fails for 7 cases. Gas phase and DMF fails for 14 and 8 cases, so it's important to include solvent, but the value of the dielectric constant is not all that important.  Using chloroform as a solvent, I then tested AM1,  PM6, PM6-DH+, PM7 and DFTB3/SMD (using GAMESS for the last one), which resulted in 12, 12, 12, 9, and 13 wrong predictions. One of the compounds includes an Si atom, which the DFTB3 parameter set I used couldn't handle so the 13 wrong predictions is out of 131 compounds.  Anyway, PM3/COSMO/chloroform works best.

In some cases the lowest pKa value is quite close to some of the other pKa values, so I took an approach similar to that of Jørgensen and co-workers: if the correct bromination site is included in the set of atoms with pKa values within 0.74 pH units (corresponding to 1 kcal/mol at room temperature) then I counted it as correct.  For PM3/COSMO/chloroform this occurred 10 times. In 9 cases the set included 2 atoms and in 1 case, 3 atoms.  In one of the 9 cases (15) there are only two possible bromination sites, so this case is not a successful prediction and PM3/COSMO/chloroform actually gets 8 wrong. However, in all other cases there are more possibilities than those predicted. Furthermore, in all but 2 of thes 10 cases the atom with the lowest pKa is the "correct" atom.

Bromination, or more generally, halogenation is often a first step towards adding an aryl group, usually using a Suzuki reaction.  Often there is more than one halogen of the same type so there is also interest in predicting where the aryl group will go.  I tried the PM3/COSMO/chloroform approach on the six molecules in this paper by Houk and co-workers. Computing pKa's of the halogenated carbon atoms let to correct predictions in 4 of the 6 cases, while computing proton affinities of the carbon atoms in the non-halogenated parent compounds let to correct predictions in 2 of the 6 cases. The former approach seems promising but needs to be tested on a much larger set of molecules.

Next step is to write this up and get the set-up and analysis code in such a shape that we can distribute it. I've also started thinking about how to make the approach more generally available and usable for non-experts. A grant proposal is also in the works, so if we're successful that should definitely be possible to achieve.


This work is licensed under a Creative Commons Attribution 3.0 Unported License.

No comments: