Arkaprava Banerjee is a Researcher (funded by the Life Sciences Research Board, DRDO, Govt. of India) working at the Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata. Mr. Banerjee has twenty-two research articles published in reputed journals and two book chapters with overall citations of 416 and an h-index of 11 (Scopus). His ORCID identifier is 0000-0001-8468-0784, His expertise lies in the similarity-based cheminformatic approaches like Read-Across and Read-Across Structure-Activity Relationship (RASAR) – a novel method that combines the concept of QSAR and Read-Across. Mr. Banerjee is also a Java programmer, who has developed various cheminformatic tools based on QSAR, Read-Across, and RASAR, and the tools are freely available from the DTC Laboratory Supplementary Website. He received the Prof. Anupam Sengupta Bronze medal from Jadavpur University for securing the highest marks in Pharmaceutical Chemistry in the MPharm Examination. He has also received a special diploma awarded by the Institute of Biomedical Chemistry, Moscow, Russia, and the ASCCT Travel Award from the American Society for Cellular and Computation Toxicology. Together with Prof. Kunal Roy, he has been one of the first researchers to develop quantitative models using similarity and error-based descriptors (quantitative/classification Read-Across Structure-Activity Relationship: q-RASAR/c-RASAR models) with applications in drug design, materials science, and property modeling. Recently, he coauthored a book on “q-RASAR,” which was published by Springer.
Machine learning-assisted c-RASAR modeling of a curated set of orally active nephrotoxic drugs: Similarity-based predictions from close source neighbors
Arkaprava Banerjee, Kunal Roy*
Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India
Cheminformatics and Machine Learning (ML) have seen exponential progress in the last decade, in the field of chemical risk assessment, due to their efficiency, accuracy, and reliability. The constant evolution of New Approach Methodologies (NAM) has inspired researchers around the globe to deviate from conventional approaches and adopt or develop new, “unconventional” methods. The classification Read-Across Structure-Activity Relationship (c-RASAR) is an unconventional approach that utilizes similarity and error-based information from the nearest neighboring compounds into a Machine Learning modeling framework, resulting in enhanced predictivity. Although this technique has so far been applied to 0-2D molecular descriptors, we have applied this approach in the present study on molecular fingerprints along with the conventional 0-2D descriptors for ML-based model development from a recently reported highly curated set of nephrotoxicity potential of orally active drugs. We initially developed ML models using nine different linear and non-linear algorithms separately on molecular descriptors and MACCS fingerprints, thus generating 18 different ML QSAR models. Using the chemical spaces defined by the modeling 0-2D descriptors and fingerprints, the similarity and error-based RASAR descriptors were computed, and the most discriminating RASAR descriptors were used to develop another set of 18 different ML c-RASAR models. All 36 models were cross-validated 20 times with a 5-fold cross-validation strategy, and their predictivity was checked on the test set data. A multi-criteria decision-making strategy – the Sum of Ranking Differences (SRD) approach - was adopted to identify the best-performing model based on robustness and external validation parameters. This statistical analysis suggested that the c-RASAR models had an overall good performance, while the best-performing model was also a c-RASAR model. This model was used to screen a true external set data prepared from the known nephrotoxic compounds of DrugBankDB. These results also showed that our model efficiently identifies nephrotoxic compounds. The t-SNE analyses on the descriptors, fingerprints, and the RASAR descriptor spaces inferred that the RASAR descriptors efficiently encode the chemical information, as evident from the tight and distinct clustering of the data points. Additionally, the molecular descriptors and the corresponding RASAR descriptors were used to identify potential activity cliffs using the ARKA framework.
Keywords: QSAR; c-RASAR; ARKA; Nephrotoxicity; Activity cliffs; Sum of ranking differences