Researchers develop computer application to 'read' medical literature, find significant data relationships
DALLAS - Jan. 22, 2004 - Until recently, researchers and their assistants spent countless hours poring over seemingly endless volumes of journals and scientific literature for information pertinent to their studies in fields such as cancer, AIDS, pediatrics and cardiology.
But thanks to new software developed by bioinformatics researchers at UT Southwestern Medical Center at Dallas, scientists can now easily identify obscure commonalities in research data and directly relate them to their studies, saving money and speeding the process of discovery.
The computer application is unique because it "emulates the scientific thought process" in researching data, said Dr. Harold "Skip" Garner, professor of biochemistry and internal medicine, who with former graduate student Dr. Jonathan Wren developed the system.
"This work is about teaching computers to 'read' the literature and make relevant associations so they can be summarized and scored for their potential relevance," said Dr. Wren, now a researcher in the department of botany and microbiology at the University of Oklahoma. "For humans to answer the same questions objectively and comprehensively could entail reading tens of thousands of papers."
The proprietary software, the basis for a new company called etexx Biopharmaceuticals, has already helped predict new uses for existing drugs to combat cardiac disease. The UT Southwestern researchers' work with this computer application appears in the current issue of the journal Bioinformatics and is available online.
The software, called IRIDESCENT, constructs a network of related objects starting with their co-occurrence within MEDLINE abstracts. MEDLINE is the National Library of Medicine's prime bibliographic database covering medicine, nursing, dentistry, health care and the preclinical sciences.
To identify and evaluate what biomedical objects (such as genes, phenotypes, chemicals and diseases) have in common, most researchers read volumes of published scientific literature and papers using the MEDLINE database. IRIDESCENT aids researchers by allowing object sets within this network to be queried to identify shared, statistical relationships by comparing how frequently the sets appear relative to random probability.
Indexed MEDLINE articles are growing exponentially, reflecting an explosion of information driven by technological improvements in generating data. The database currently contains more than 4,600 journals and a total of 12.7 million records written during the past 35 years, with another 500,000 abstracts added annually, the researchers said.
IRIDESCENT can identify general themes, Dr. Garner said, along with statistically exceptional groupings within the list (such as drugs affecting the activity of a group of genes). Researchers also can infer how cohesive an experimental grouping is based upon relationships documented in the literature and identify missing members in a set by their relationship to the group as a whole.
"Many new high throughput technologies, such as microarrays for gene expression analysis, generate so much data that it is often hard to interpret," Dr. Garner said. "IRIDESCENT can do a much better job because it emulates the scientific thought process. Having assimilated all of MEDLINE, IRIDESCENT can compile diverse facts to present a list of 'hypotheses' to the user for finding hidden knowledge in the data."
Dr. Wren noted that scientists are still a "long way off from turning over the interpretation phase of research to computers. But, they are not only useful in organizing this growing body of scientific knowledge, they also are rapidly becoming indispensable."
Initiated in September 2003, etexx Biopharmaceuticals creates new uses for existing Food and Drug Administration-approved compounds for various classes of cardiac disease for which no therapeutics are available, Dr. Garner said. These candidate drug-disease relationships are studied to select the best-suited FDA-approved compounds for more extensive studies. Then laboratory tests for a given class of heart disease are conducted to confirm the predicted therapeutic potential of the compounds.
The traditional approach to drug discovery combined with the rigors of clinical trials typically takes up to 15 years and almost $1 billion to bring a new drug to market, Dr. Garner said. High-throughput screening, toxicology testing and manufacturing qualifications are a few of the costly and time-consuming development components significantly reduced by etexx Biopharmaceuticals because safety profiles and manufacturing processes already have been FDA-approved.
UT Southwestern researchers have considered other potential applications for IRIDESCENT, including the ability to forecast new uses for existing drugs. Through several lab trials, they were able to demonstrate its success by finding the drug chlorpromazine would reduce the progression of cardiac hypertrophy, Dr. Garner said.
Cardiac hypertrophy, or enlargement of the heart, is the organ's natural response to stress. Chlorpromazine, also known as Thorazine, is used to treat psychotic disorders and symptoms such as hallucinations, delusions and hostility. It also is used to prevent and treat nausea and vomiting, behavior problems in children, and relieve severe hiccups.
The research was supported by the National Science Foundation, the state of Texas, National Institutes of Health, the Hudson Foundation, the American Heart Association and the Biological Chemical Countermeasures Program of The University of Texas.
Media Contact: Scott Maier
To automatically receive news releases from UT Southwestern via e-mail, subscribe at http://www.utsouthwestern.edu/utsw/cda/dept37326/files/37813.html