Model Details
PTP: Predictive Target Profile
This model predicts the interaction of a molecule represented by its chemical structure towards 31 targets selected for their utility in broad early hazard assessment.
- Project
- Pharmbio
- Project Owner
- hamza.saeed@igp.uu.se
- Version
-
0.1.0
- Uploaded
-
Dec. 21, 2023, 2:21 p.m.
Model Card
Model Card for Predictive Target Profile
Model Details
Overview
This model predicts the interaction of a molecule represented by its chemical structure towards 31 targets (ACHE, ADORA2A, ADRB1, ADRB2, AR, AVPR1A, CCKAR, CHRM1, CHRM2, CHRM3, CNR1, CNR2, DRD1, DRD2, EDNRA, HTR1A, HTR2A, KCNH2, LCK, MAOA, NR3C1, OPRD1, OPRK1, OPRM1, PDE3A, PTGS1, PTGS2, SCN5A, SLC6A2, SLC6A3, SLC6A4) selected for their utility in broad early hazard assessment. The model uses Conformal Prediction for delivering prediction intervals for each prediction, and accepts chemical structures as input in SMILES or MOL format.
Version
Owners
- Jonathan Alvarsson
References
- Lampa S, Alvarsson J, Arvidsson Mc Shane S, Berg A, Ahlberg E, Spjuth O Predicting off-target binding profiles with confidence using Conformal Prediction Frontiers in Pharmacology. 9, 1256. (2018). DOI: 10.3389/fphar.2018.01256
Model Architecture
Chemicals were represented by the signature molecular descriptor and support vector machines were used as the underlying machine learning method. By using conformal prediction, the results from predictions come in the form of confidence p-values for each class. Data preprocessing and model development was implemented as a SciPipe workflow to enable reproducible models. For more details including hyper-parameter tuning, see the scientific manuscript https://dx.doi.org/10.3389/fphar.2018.01256.
Input Format
- A chemical structure in SMILES or MOL format
- A confidence level for predictions
Output Format
A prediction interval for each of the 31 targets associated with chemical liabilities.Considerations
Use Cases
- For novel compounds in drug discovery, the model can serve as early alerts of potential off-target interactions that would warrant additional experiments to rule out potential chemical liabilities.
Accuracy
Efficiency metrics (M Criterion, Observed Fuzziness and Class-Averaged Observed Fuzziness) for Dataset1, Dataset2, and Dataset3. (A) Dataset2 without extending with assumed non-actives. Circles show individual results from the three replicate runs that were run, while the lines show the median value from the individual replicate results. Targets are here sorted by number of active compounds. (B) Dataset2 after extending with assumed non-actives. Circles show individual results from the three replicate runs that were run, while the lines show the median value from the individual replicate results. Targets are here sorted by number of active compounds. (C) Dataset3, the 10 largest target datasets, which were not extended with assumed non-actives. Targets are here sorted by total number of compounds.
Predicted vs. observed labels, for all targets, for the prediction data, at confidence level 0.8 (A) and 0.9 (B). “A” denotes active compounds, and “N” denotes non-active compounds. The x-axis show observed labels (as found in ExCAPE-DB), while the y-axis show the set of predicted labels. The areas of the circles are proportional to the number of SAR data points for each observed label/predicted label combination.