The “New algorithms and methods” session is focused on the development of new modeling paradigms, data processing and validation techniques as well as new ways to describe the molecular structure.
The foundations of the modern Quantitative Structure-Activity Relationship (QSAR) approach were laid by Hansch (1918-2011) in the early 1960s. Since then, a plethora of now considered classical methods, such as Free-Wilson analysis, CoMFA, CoMSIA, 4D- and 5D-QSAR have been developed, with many new concepts introduced each year. Even more staggering is the introduction of novel molecular descriptors currently approaching ~10 000 – a trend often considered a “blessing or a curse”. These advances are often fueled by developments in related fields such as statistics, computational chemistry, systems biology, etc.
One such new modeling paradigms developed at the NCTR is the so-called three-dimensional spectral data - activity relationship (or 3D-SDAR) approach. 3D-SDAR is a grid-based approach, which is based on 3d-molecular fingerprints constructed from the NMR chemical shifts of pairs of atoms (determining the X and Y coordinates of the individual fingerprint elements) and their corresponding through-space atom-to-atom distances (determining the Z-coordinate). These fingerprints are further tessellated by regular grids, thus generating voxel occupancy matrices with thousands of columns, later processed by bagging-like Partial Least Squares (PLS) or k-Nearest Neighbors (KNN) algorithms, which split repeatedly the modeling set into fully randomized training and hold-out subsets. On each randomization cycle a “blind” prediction set is evaluated. The structural interpretation of these models is based on 3D-SDAR maps and projections on the normal coordinate space of the most frequently occurring voxels.