Model Details
cpLogD - water–octanol distribution coefficient (logD) for chemical compounds.
This models predicts water–octanol distribution coefficient (logD) for chemical compounds: It is a proxy for the lipophilicity which is a major determinant of drug properties and overall suitability of drug candidates.
- Project
- Pharmbio
- Project Owner
- hamza.saeed@igp.uu.se
- Version
-
0.1.0
- Uploaded
-
Dec. 21, 2023, 2:16 p.m.
Model Card
Model Card for cpLogD
Model Details
Overview
- This models predicts water–octanol distribution coefficient (logD) for chemical compounds: It is a proxy for the lipophilicity which is a major determinant of drug properties and overall suitability of drug candidates.
- The model is based on data for 1.6 million compounds from the ChEMBL database with available predicted values for ACD/logD. For modeling, a support-vector machine with a linear kernel using conformal prediction methodology was used, outputting prediction intervals at a specified confidence level. Features for chemical structures were Signature Molecule Descriptor calculated with Chemistry Development Kit.
Version
Owners
- Ola Spjuth
References
- Lapins M, Arvidsson S, Lampa S, Berg A, Schaal W, Alvarsson J, Spjuth O. A confidence predictor for logD using conformal regression and a support-vector machine. Journal of Cheminformatics. 10, 18 (2018). DOI: 10.1186/s13321-018-0271-1
Model Architecture
- The model is based on data for 1.6 million compounds from the ChEMBL database with available predicted values for ACD/logD. For modeling, a support-vector machine with a linear kernel using conformal prediction methodology was used, outputting prediction intervals at a specified confidence level.
- For more details, see the scientific paper A confidence predictor for logD using conformal regression and a support-vector machine - Journal of Cheminformatics
Input Format
- A chemical structure in SMILES or MOL format.
- A confidence level for predictions.
- [optional] Size of image for highlighting predictions in chemical structure.
Output Format
- A prediction interval for cpLogD at the selected confidence level.
- Image for highlighting the relevant atoms in chemical structure that contributed the most to the prediction.
Considerations
Use Cases
- For novel chemical structures in drug discovery projects, this model can give a prediction of its lipophilicity
- Examples of molecule gradients for the prediction of cpLogD. Shown are gradients for four compounds indicated by arrows in Fig. 3. Upper row: atenolol (logD=−1.82) and sotalol (logD=−1.52). Lower row: tolnaftate (logD=5.4) and amiodarone (logD=6.1)
Limitations
- The model is trained in values calculated suing the ACD_LogP algorithm, not measured values.
Accuracy
The model shows a predictive ability of Q2=0.973 and with the best performing nonconformity measure having median prediction interval of ± 0.39 log units at 80% confidence and ± 0.60 log units at 90% confidence. Plotted are acd_logd values (x-axis) versus the predicted logD values (y-axis) for 100,000 test set compounds. The root mean square error of prediction (RMSEP) is 0.41 log units.