Detailed Vision

Drug discovery: an industry in crisis

Drug discovery is in a crisis: the cost of a new drug is estimated to be in excess of $2B after having doubled every nine years since the 1950s (Scannell et al., 2012). Furthermore, oncology is the therapeutic area with the lowest likelihood of approval from phase I trials at 5.1%, which compares to an average of 9.6% for all indications, or 19.1% for infectious disease (Mullard, 2016). Among many reasons for the particularly high failure rate in oncology, experts point out the prevalence of suboptimal preclinical strategies (Hutchinson and Kirk, 2011). Preclinical cancer drug discovery predominantly relies on screening monolayer, monoclonal cancer cell lines on glass/plastic surfaces, typically with a single monochromatic readout (often viability / cell count) not much different from the NCI-60 which started in mid-1980s (Chabner, 2016). The main improvement from the NCI-60 to date has been the incorporation of sequencing data and more cell lines: for example, the most recent large-scale study utilizes 1,000 cell lines (Iorio et al., 2016). This is despite the fact that we now know that some fundamental components of the human disease are missing from these assays: the tumor microenvironment (e.g. tumor extracellular matrix) or the crosstalk between the stromal, immune, and malignant cells (Gu and Mooney, 2016; Tabassum and Polyak, 2015). There exist drug activity assays that capture these key properties, termed functional assays, yet due to their higher complexity they have lower throughput which restricts these assays to clinical diagnostic use, since traditional drug discovery relies on brute-force testing of millions of chemicals (Friedman et al., 2015; Majumder et al., 2015). We want to utilize these functional diagnostic assays for drug discovery, by using machine learning increased hit efficiency.

Machine learning to the rescue

Our work (Cobanoglu et al., 2013; Wise and Cobanoglu, 2016) and the work of many other groups (Kangas et al., 2014; Kearnes et al., 2016; Lavecchia, 2015; Murphy, 2011; Naik et al., 2016; Reker and Schneider, 2015; Riniker et al., 2014; Warmuth et al., 2003) have shown that machine learning can significantly increase the efficiency of drug discovery by intelligently guiding experimentation as opposed to the brute-force testing approach of high-throughput screening (HTS). The specific focus of our lab is to use machine learning to guide functional assay based cancer drug discovery. The idea is to leverage machine learning to maximize the utility of the limited throughput of a functional assay. Since the hits would originate from a clinically relevant functional assay, by definition, they will have a high likelihood of translating to the clinic. Previous functional assays report 87% clinical accuracy (Majumder et al., 2015) in predicting the impact of a drug on the clinical outcome which means that hits identified from these assays would be expected to have a similar translation rate, and that is a transformative improvement over the current 5.1% clinical success rate in oncology.

Algorithmic ideation: a proof of concept

Preliminary results from work done by the PI’s previous co-founder, Aaron Wise, Ph.D. and the PI, Murat Can Cobanoglu, Ph.D. show that representing the set of drug-target and target-disease (disease being cancer types) relationships as a sparse tensor, and then using low-rank tensor completion to predict new interactions can make identify correct interactions 104-fold more efficiently than brute-force random screening (Wise and Cobanoglu, 2016). Detailed description of the proof of concept and the results are available as a preprint on bioRxiv at: and the entire source code is open source and available here:

Future Directions

We pursue an integrated experimental / computational strategy: developing clinically relevant functional assays, and the machine learning algorithms / software that will guide these assays. For the clinically relevant assay, our goals are to characterize and reconstruct the human in vivo extracellular matrix (ECM); co-culture the stromal, immune and malignant cells; and incorporate perfusion. For the machine learning, we already have the public prototype with multiple improvements planned on it to enable the incorporation of higher dimensional omics data such as proteomics, metabolomics, epigenomics. We closely follow work in the active learning and collaborative filtering fields of machine learning research.


Chabner, B.A. (2016). NCI-60 Cell Line Screening: A Radical Departure in its Time. J. Natl. Cancer Inst. 108.

Cobanoglu, M.C., Liu, C., Hu, F., Oltvai, Z.N., and Bahar, I. (2013). Predicting drug-target interactions using probabilistic matrix factorization. J. Chem. Inf. Model. 53, 3399–3409.

Friedman, A.A., Letai, A., Fisher, D.E., and Flaherty, K.T. (2015). Precision medicine for cancer with next-generation functional diagnostics. Nat. Rev. Cancer 15, 747–756.

Gu, L., and Mooney, D.J. (2016). Biomaterials and emerging anticancer therapeutics: engineering the microenvironment. Nat. Rev. Cancer 16, 56–66.

Hutchinson, L., and Kirk, R. (2011). High drug attrition rates—where are we going wrong? Nat. Rev. Clin. Oncol. 8, 189–190.

Iorio, F., Knijnenburg, T.A., Vis, D.J., Bignell, G.R., Menden, M.P., Schubert, M., Aben, N., Gonçalves, E., Barthorpe, S., Lightfoot, H., et al. (2016). A Landscape of Pharmacogenomic Interactions in Cancer. Cell 166, 740–754.

Kangas, J.D., Naik, A.W., and Murphy, R.F. (2014). Efficient discovery of responses of proteins to compounds using active learning. BMC Bioinformatics 15, 143.

Kearnes, S., McCloskey, K., Berndl, M., Pande, V., and Riley, P. (2016). Molecular graph convolutions: moving beyond fingerprints. J. Comput. Aided Mol. Des.

Lavecchia, A. (2015). Machine-learning approaches in drug discovery: methods and applications. Drug Discov. Today 20, 318–331.

Majumder, B., Baraneedharan, U., Thiyagarajan, S., Radhakrishnan, P., Narasimhan, H., Dhandapani, M., Brijwani, N., Pinto, D.D., Prasath, A., Shanthappa, B.U., et al. (2015). Predicting clinical response to anticancer drugs using an ex vivo platform that captures tumour heterogeneity. Nat. Commun. 6, 6169.

Mullard, A. (2016). Parsing clinical success rates. Nat. Rev. Drug Discov. 15, 447.

Murphy, R.F. (2011). An active role for machine learning in drug development. Nat. Chem. Biol. 7, 327–330.

Naik, A.W., Kangas, J.D., Sullivan, D.P., and Murphy, R.F. (2016). Active machine learning-driven experimentation to determine compound effects on protein patterns. Elife 5.

Reker, D., and Schneider, G. (2015). Active-learning strategies in computer-assisted drug discovery. Drug Discov. Today 20, 458–465.

Riniker, S., Wang, Y., Jenkins, J.L., and Landrum, G.A. (2014). Using information from historical high-throughput screens to predict active compounds. J. Chem. Inf. Model. 54, 1880–1891.

Scannell, J.W., Blanckley, A., Boldon, H., and Warrington, B. (2012). Diagnosing the decline in pharmaceutical R&D efficiency. Nat. Rev. Drug Discov. 11, 191–200.

Tabassum, D.P., and Polyak, K. (2015). Tumorigenesis: it takes a village. Nat. Rev. Cancer 15, 473–483.

Warmuth, M.K., Liao, J., Ratsch, G., Mathieson, M., Putta, S., and Lemmen, C. (2003). Active Learning with Support Vector Machines in the Drug Discovery Process. J. Chem. Inf. Comput. Sci. 43, 667–673.

Wise, A., and Cobanoglu, M.C. (2016). Predicting Targeted Cancer Therapeutics.