Correlation of liquid viscosity with molecular structure for organic compounds using different variable selection methods
Bono Lučić, Ivan Bašic, Damir Nadramija,  Ante Miličević, Nenad Trinajstić, Takahiro Suzuki, Ruslan Petrukhin, Mati Karelson and Alan R. Katritzky

Abstract

Improved models for predicting viscosities at 20 °C were generated using three different methods for descriptor selection. Data set of 361 diverse organic molecules and their experimental viscosities were used for developing the models. Molecular properties are encoded by 822 initial descriptors computed by the CODESSA program. CODESSA, GFA and CROMRsel methods are capable of selecting good and facile viscosity models having only five descriptors. These methods are automated procedures for generation of simple multiregression (MR) models. All three methods produce excellent linear models, but the models obtained by the CROMRsel method are somewhat better. In addition, using the CROMRsel suite of programs a very good nonlinear MR model having five descriptors (two linear and three cross-product descriptors, R2 = 0.908, S = 0.175) was obtained. Nonlinear models generated in this study show that the classical MR based methods can be efficiently used to obtain simple and very good nonlinear MR models. The best five-descriptor models selected in this study usually contain one geometrical (gravitational index) and one topological descriptor (Randić index of order 0), and three electrostatic descriptors which reflect the bonding properties of molecules, i.e. their capabilities to create (mainly) hydrogen bonds. Because of that, hydrogen-donors and hydrogen-acceptors surface areas, charges, total molecular surface areas, and maximum net atomic charges and state energies for oxygen atoms appear to be key factors for modeling the viscosity of organic molecules.