Paper reading (四十):Deep Learning for Drug-Induced Liver Injury

论文题目:Deep Learning for Drug-Induced Liver Injury

scholar 引用:116

页数:9

发表时间:2015.10

发表刊物:Chemical Information and Modeling

作者:Youjun Xu, Ziwei Dai,..., and Luhua Lai

摘要:

Drug-induced liver injury (DILI) has been the single most frequent cause of safety-related drug marketing withdrawals for the past 50 years. Recently, deep learning (DL) has been successfully applied in many fields due to its exceptional and automatic learning ability. In this study, DILI prediction models were developed using DL architectures, and the best model trained on 475 drugs(样本量不会太少吗) predicted an external validation set of 198 drugs with an accuracy of 86.9%, sensitivity of 82.5%, specificity of 92.9%, and area under the curve of 0.955, which is better than the performance of previously described DILI prediction models. Furthermore, with deep analysis, we also identified important molecular features that are related to DILI. Such DL models could improve the prediction of DILI risk in humans. The DL DILI prediction models are freely avaliable at http://www.repharma.cn/DILIserver/DILI_home.php (好像不能访问了呢。。。).

结论:

  • using large data sets and the UGRNN molecular encoding approach with the least information loss
  • The deep learning methods may see widespread use in chemical and drug informatics studies covering subjects beyond DILI prediction.
  • 除了上述两句,结论跟摘要的in this study后面部分内容一毛一样啊。。。

正文组织架构:

1. Introduction

2. Materials and Methods

2.1 Data sets

2.2 Molecular encoding and DL architecture

2.3 DL architecture settings and models

2.4 DILI-related molecular feature motifs

3. Results and Discussion

3.1 Construction of DL models

3.2 DL-NCTR DILI Model

3.3 DL-Liew DILI Model

3.4 DL-Combined DILI Model

3.5 Influence of the Size of the Training Data Set

3.6 Influence of Different split on training and prediction data

3.7 Comparision to normal NN and DNN models

3.8 Further Discussion of the DL DILI Models

3.9 Analysis of DILI Feature Motifs

4. Conclusion

正文部分内容摘录:

1. Biological Problem: What biological problems have been solved in this paper?

  • Drug-induced liver injury (DILI) prediction

2. Main discoveries: What is the main discoveries in this paper?

  • DL DILI models is better than the performance of previously described DILI prediction models.
  • With deep analysis, we also identified importantt molecular features that are related to DILI.
  • DL models could improve the prediction of DILI risk in humans.

3. ML(Machine Learning) Methods: What are the ML methods applied in this paper?

  • Recently, Lusci et al. developed the novel undirected graph recursive neural networks (UGRNN) method used for molecular structure encoding and used this encoding approach to effectively predict the water solubility of compounds based on DL architectures.
  • Dataset: Four publicly available data sets composed of annotated DILI-positive or DILI-negative properties of drugs or compounds were used in this work. 
  • Paper reading (四十):Deep Learning for Drug-Induced Liver Injury

4. ML Advantages: Why are these ML methods better than the traditional methods in these biological problems?

  • traditional methods: 
  1. involved radially distributed molecular descriptors and linear discriminant analysis for classification
  2. K-nearest neighbor method and mixed molecular descriptors
  3. Bayesian model with extended connectivity fingerprints and other interpretable descriptors
  4. mixed machine learning algorithms and PaDEL molecular descriptors
  5. a Decision Forest algorithm and Mold2 chemical descriptors
  6. theoretically calculated parameters and measured in vitro biological data 
  • A key advantage of deep learning is that features can be learned automatically using a general-purpose procedure.
  • It has been confirmed that deep-learning architectures have the power to handle big data with little manual intervention.
  • One advantage of this UGRNN is that it relies only minimally on the identification of suitable molecular descriptors because suitable representations are learned automatically from the data.

5. Biological Significance: What is the biological significance of these ML methods’ results?

  • One deficiency of DL models is that they are black-box models without apparent physical meaning.
  • Assuming normal distribution of the two-class samples, a two-sample T-test was used to identify the descriptors that significantly differed between DILI-positive and -negative drugs.
  • a method of principal component analysis (PCA) was used to determine whether the samples could be distinguished. If the samples were well distinguished by certain combinations of descriptors, the important descriptors were excavated with an analysis of clustering and coefficient weights.

6. Prospect: What are the potential applications of these machine learning methods in biological science?

  • In the current study, DILI properties were divided into only two categories: DILI-positive and DILI-negative.
  • However, in many public and canonical data sets,multiple levels of DILI are often used.
  • Multiple-level prediction of DILI might be more accurate, but the lack of data makes it difficult to develop such prediction models at the current time.

7. Mine Question(Optional)

  • the best model trained on 475 drugs
  • maybe the dataset size is too small for deep learning?