Drug discovery: an industry in crisis
Drug discovery is in a crisis: the cost of a new drug is estimated to be in excess of $2B after having doubled every nine years since the 1950s (Scannell et al., 2012). Furthermore, oncology is the therapeutic area with the lowest likelihood of approval from phase I trials at 5.1%, which compares to an average of 9.6% for all indications, or 19.1% for infectious disease (Mullard, 2016). Among many reasons for the particularly high failure rate in oncology, experts point out the prevalence of suboptimal preclinical strategies (Hutchinson and Kirk, 2011). Preclinical cancer drug discovery predominantly relies on screening monolayer, monoclonal cancer cell lines on glass/plastic surfaces, typically with a single monochromatic readout (often viability / cell count) not much different from the NCI-60 which started in mid-1980s (Chabner, 2016). The main improvement from the NCI-60 to date has been the incorporation of sequencing data and more cell lines: for example, the most recent large-scale study utilizes 1,000 cell lines (Iorio et al., 2016). This is despite the fact that we now know that some fundamental components of the human disease are missing from these assays: the tumor microenvironment (e.g. tumor extracellular matrix) or the crosstalk between the stromal, immune, and malignant cells (Gu and Mooney, 2016; Tabassum and Polyak, 2015). There exist drug activity assays that capture these key properties, termed functional assays, yet due to their higher complexity they have lower throughput which restricts these assays to clinical diagnostic use, since traditional drug discovery relies on brute-force testing of millions of chemicals (Friedman et al., 2015; Majumder et al., 2015). We want to utilize these functional diagnostic assays for drug discovery, by using machine learning increased hit efficiency.
Machine learning to the rescue
Our work (Cobanoglu et al., 2013; Wise and Cobanoglu, 2016) and the work of many other groups (Kangas et al., 2014; Kearnes et al., 2016; Lavecchia, 2015; Murphy, 2011; Naik et al., 2016; Reker and Schneider, 2015; Riniker et al., 2014; Warmuth et al., 2003) have shown that machine learning can significantly increase the efficiency of drug discovery by intelligently guiding experimentation as opposed to the brute-force testing approach of high-throughput screening (HTS). The specific focus of our lab is to use machine learning to guide functional assay based cancer drug discovery. The idea is to leverage machine learning to maximize the utility of the limited throughput of a functional assay. Since the hits would originate from a clinically relevant functional assay, by definition, they will have a high likelihood of translating to the clinic. Previous functional assays report 87% clinical accuracy (Majumder et al., 2015) in predicting the impact of a drug on the clinical outcome which means that hits identified from these assays would be expected to have a similar translation rate, and that is a transformative improvement over the current 5.1% clinical success rate in oncology.
Algorithmic ideation: a proof of concept
Preliminary results from work done by the PI’s previous co-founder, Aaron Wise, Ph.D. and the PI, Murat Can Cobanoglu, Ph.D. show that representing the set of drug-target and target-disease (disease being cancer types) relationships as a sparse tensor, and then using low-rank tensor completion to predict new interactions can make identify correct interactions 104-fold more efficiently than brute-force random screening (Wise and Cobanoglu, 2016). Detailed description of the proof of concept and the results are available as a preprint on bioRxiv at: http://biorxiv.org/content/early/2016/06/08/057901 and the entire source code is open source and available here: https://bitbucket.org/aiinc/drugx.
We pursue an integrated experimental / computational strategy: developing clinically relevant functional assays, and the machine learning algorithms / software that will guide these assays. For the clinically relevant assay, our goals are to characterize and reconstruct the human in vivo extracellular matrix (ECM); co-culture the stromal, immune and malignant cells; and incorporate perfusion. For the machine learning, we already have the public prototype with multiple improvements planned on it to enable the incorporation of higher dimensional omics data such as proteomics, metabolomics, epigenomics. We closely follow work in the active learning and collaborative filtering fields of machine learning research.
Iorio, F., Knijnenburg, T.A., Vis, D.J., Bignell, G.R., Menden, M.P., Schubert, M., Aben, N., Gonçalves, E., Barthorpe, S., Lightfoot, H., et al. (2016). A Landscape of Pharmacogenomic Interactions in Cancer. Cell 166, 740–754.
Majumder, B., Baraneedharan, U., Thiyagarajan, S., Radhakrishnan, P., Narasimhan, H., Dhandapani, M., Brijwani, N., Pinto, D.D., Prasath, A., Shanthappa, B.U., et al. (2015). Predicting clinical response to anticancer drugs using an ex vivo platform that captures tumour heterogeneity. Nat. Commun. 6, 6169.
Warmuth, M.K., Liao, J., Ratsch, G., Mathieson, M., Putta, S., and Lemmen, C. (2003). Active Learning with Support Vector Machines in the Drug Discovery Process. J. Chem. Inf. Comput. Sci. 43, 667–673.