BACKGROUND: Malaria or Paludism is a tropical disease caused by parasites of the Plasmodium genre and transmitted to humans through the bite of infected mosquitos of the Anopheles genre. This pathology is considered one of the first causes of death in tropical countries and, despite several existing therapies, they have a high toxicity. Computational methods based on Quantitative Structure- Activity Relationship studies have been widely used in drug design work flows.
OBJECTIVE: The main goal of the current research is to develop computational models for the identification of antimalarial hit compounds.
MATERIALS AND METHODS: For this, a data set suitable for the modeling of the antimalarial activity of chemical compounds was compiled from the literature and subjected to a thorough curation process. In addition, the performance of a diverse set of ensemble-based classification methodologies was evaluated and one of these ensembles was selected as the most suitable for the identification of antimalarial hits based on its virtual screening performance. Data curation was conducted to minimize noise. Among the explored ensemble-based methods, the one combining Genetic Algorithms for the selection of the base classifiers and Majority Vote for their aggregation showed the best performance.
RESULTS: Our results also show that ensemble modeling is an effective strategy for the QSAR modeling of highly heterogeneous datasets in the discovery of potential antimalarial compounds.
CONCLUSION: It was determined that the best performing ensembles were those that use Genetic Algorithms as a method of selection of base models and Majority Vote as the aggregation method.
- Models, Chemical
- Quantitative Structure-Activity Relationship