Aids continues to be a deadly barrier to human progress, claiming millions of lives every year. In 2015, around 36.7 million people were infected with HIV, resulting in 1.1 million deaths. The epidemic has claimed around 39 million lives globally since its discovery. The pandemic has particularly hit those in the Third World countries the hardest. It is present throughout the world and is actively spreading and continues to impact society and economy. On the occasion of World Aids Day, there is a need to take stock of the research on the disease and its combat.
One of the problems that scientific researchers face is that the information coming from all directions is highly voluminous and diverse. A search for HIV-related articles and publications on the world’s most trusted reservoir of medical knowledge PubMed yields thousands and thousands of journals articles, reports, gene mapping, opinion pieces, etc. And this is not the end. The list has more than 40 studies added to it every day, leave aside other such databases. Genomic screens and scans regularly throw up results for all of 20,000 human genes. Additionally, with the development of high speed and high accuracy devices there is a veritable data tsunami confronting Aids research. For instance, the Solexa machine yields up to 100 billion bases (sequence information) in just one single run. The flood doesn’t stop here; data pours in from HIV resistance testing, genome-wide studies and epidemiological tracking.
This is where analytics comes into the picture. New Analytical tools need to be developed to contend such data, particularly focusing on ways and means for analyzing genome-wide screens and scans for human genes that affect HIV replication.
Data repositories are many but culling out relevant data in real time may be a challenge. The National Center for Biotechnology Information (NCBI) centralizes data on scientific literature, gene structure, DNA sequences, etc. Similarly, there are other institutions all collecting and curating the huge data such as the Los Alamos HIV Database, Stanford University HIV Drug Resistance Database (on HIV mutations and resistance to antiviral agents), but analyzing all of it is still a problem.
“It is not humanly possible to remember or even to read the huge corpus of information. Naturally, medical researchers are unaware of the full extent of literature on HIV, even as the data flood keeps on getting worse due to the explosion of data. Consequently, it is now imperative to device analytical tools that would process and visualize the data in an easy-to-understand form. Quick and efficient summarization of the studies and reports is an urgent need. In fact, this is more important than the mere accumulation of data,” said Shashank Dixit, CEO, Deskera, a cloud technology firm that has developed its own Big Data tool.
The problem is getting resolved now, with modern analytical tools being developed to distil the data. The Gene Overlapper collects genome-wide screens and scans of human genes, enabling analysis of the data sets. The analysis, paired with another resource (HIV Replication Cycle site), provides context to the genome-wide data. However, for the tools to work efficiently, large data sets need to be rich in context and user-friendly. Data needs to be paired with summaries, which are generally web based.
Three dimensional visualization of HIV proteins and the various positions of the HIV integration sites aid in quick and reliable assimilation of information. The web resources are linked to hundreds of other such sites for quick referencing. The day may not be far when efficient tools to tackle the HIV problem would be available everywhere.