GIGGLE: Scanning Genome Databases Faster to Accelerate Identification of Disease
Personalized medicine tailors medical treatments based on a patient’s genomic profile, avoiding the one-size-fits-all approach. In order to diagnose an illness and identify the best treatment options, researchers must find genetic culprits by comparing the patient’s genome to information in public and local databases. This is easier said than done. Researchers are faced with a tangled maze of information as they grapple with many different genomic databases that each have unique interfaces and organizational structures. Scientists at University of Utah Health developed a new search engine to overcome these obstacles that may accelerate the application of personalized medicine.
“Genomics is a total soup of file formats,” said Ryan Layer, Ph.D., senior postdoctoral fellow in Aaron Quinlan’s lab at the Department of Human Genetics at U of U Health and the USTAR Center for Genetic Discovery. He likens the current genomic computational environment to the early days of the internet. “Before Google, you had to know exactly what web page to go to find the information you needed,” he said.
Layer thought, why not create a Google-like search that makes the genomic information available in these databases more accessible. With this search engine, researchers could perform rapid and effective searches of these databases to advance medical diagnoses and treatments.
Enter GIGGLE, a new search engine that uses web interface to enter experimental results to search, identify, and rank regions of the genome that could be associated with disease.
According to Layer, GIGGLE may accelerate how researchers treat patients.
GIGGLE identifies the overlaps between the genome feature of interest, like a patient’s genetic mutations, and the available databases by indexing data before searching it. This approach minimizes the number of times the search has to read data from the hard drive, accelerating the search. The results are ranked using statistics (the negative log of the Fisher’s Exact Test as well as binary log of the odds ratio) and visualized with a heat map to show the strongest and weakest relationships in the search.
And it is fast. To test this new tool, the researchers searched up to 1,000,000 genome intervals, and GIGGLE outperformed common genomic search tools, proving to be 2,336-times faster than TABIX and 25-times faster than BEDTOOLS.
“A fast search is how we attack big data in any field, but just searching quickly is useless,” Layer said. “GIGGLE is a foundational technology that enables researchers to identify unexpected relationships by finding new threads or connections among genomic data.”
Layer and his team hope to build on GIGGLE’s foundation by expanding on the dimensional searches to help refine the results so researchers can home in on genetic hotspots, and more appropriate treatments, faster.
GIGGLE is available at https://github.com/ryanlayer/giggle.