Machine learning-driven drug discovery

Drug discovery, as it has been done over the last few decades, is a process that is brute-force, failure-prone, and expensive. We aim to change that by building & utilizing assays that trade-off high throughput for high clinical relevance, while simultaneously leveraging machine learning to effectively guide the experiments. The situation is particularly catastrophic in oncology, which is the therapeutic area with the lowest clinical trial success rate with the likelihood of approval from Phase I at 5.1% (Mullard, 2016) where the suboptimal preclinical strategies has been identified as a key problem (Hutchinson and Kirk, 2011). This is despite the fact that highly clinically relevant assays actually exist (Friedman et al., 2015) with up to 87% clinical accuracy (Majumder et al., 2015) but these are restricted to use as diagnostic assays instead of primary drug discovery tools. This is because established wisdom dictates that drug discovery requires screening millions of chemicals in simplistic high throughput assays. We argue that the increased efficiency of machine learning driven experimentation (shown by many researchers, including the PI) can make up for the lower throughput and enable clinically relevant drug discovery in the preclinical stage. Computational biology research often remains restricted to hypothetical algorithmic work with results on toy datasets, however we are actually serious about building a fully functioning integrated computational / experimental drug discovery pipeline. That is exactly the reason we are housed in a biomedical powerhouse campus, the UT Southwestern Medical Center, where we can find the experimental resources necessary to execute this vision.