Model Details

cpLogD - water–octanol distribution coefficient (logD) for chemical compounds.

This models predicts water–octanol distribution coefficient (logD) for chemical compounds: It is a proxy for the lipophilicity which is a major determinant of drug properties and overall suitability of drug candidates.

Project
Pharmbio
Project Owner
hamza.saeed@igp.uu.se

Version

0.1.0

Uploaded

Dec. 21, 2023, 1:16 p.m.

Model Card

Model Card for cpLogD

API and Docs


API

The API for the model can be accessed via this link.

There are also user interfaces PredGUI and PredGUIMM where the API is used.

Model Details


Overview

  • This models predicts water–octanol distribution coefficient (logD) for chemical compounds: It is a proxy for the lipophilicity which is a major determinant of drug properties and overall suitability of drug candidates.

  • The model is based on data for 1.6 million compounds from the ChEMBL database with available predicted values for ACD/logD. For modeling, a support-vector machine with a linear kernel using conformal prediction methodology was used, outputting prediction intervals at a specified confidence level. Features for chemical structures were Signature Molecule Descriptor calculated with Chemistry Development Kit.


Version

1.0

Owners

  • Ola Spjuth

References

  • Lapins M, Arvidsson S, Lampa S, Berg A, Schaal W, Alvarsson J, Spjuth O. A confidence predictor for logD using conformal regression and a support-vector machine. Journal of Cheminformatics. 10, 18 (2018). DOI: 10.1186/s13321-018-0271-1

Model Architecture

  • The model is based on data for 1.6 million compounds from the ChEMBL database with available predicted values for ACD/logD. For modeling, a support-vector machine with a linear kernel using conformal prediction methodology was used, outputting prediction intervals at a specified confidence level.

  • For more details, see the scientific paper A confidence predictor for logD using conformal regression and a support-vector machine - Journal of Cheminformatics


Input Format

  • A chemical structure in SMILES or MOL format.
  • A confidence level for predictions.
  • [optional] Size of image for highlighting predictions in chemical structure.

Output Format

  • A prediction interval for cpLogD at the selected confidence level.
  • Image for highlighting the relevant atoms in chemical structure that contributed the most to the prediction.

Considerations

Use Cases

  • For novel chemical structures in drug discovery projects, this model can give a prediction of its lipophilicity

  • Graph


  • Examples of molecule gradients for the prediction of cpLogD. Shown are gradients for four compounds indicated by arrows in Fig. 3. Upper row: atenolol (logD=−1.82) and sotalol (logD=−1.52). Lower row: tolnaftate (logD=5.4) and amiodarone (logD=6.1)

Limitations

  • The model is trained in values calculated suing the ACD_LogP algorithm, not measured values.

Accuracy



The model shows a predictive ability of Q2=0.973 and with the best performing nonconformity measure having median prediction interval of ± 0.39 log units at 80% confidence and ± 0.60 log units at 90% confidence. Plotted are acd_logd values (x-axis) versus the predicted logD values (y-axis) for 100,000 test set compounds. The root mean square error of prediction (RMSEP) is 0.41 log units.


Graph