Life Science Leader Magazine

The vision of Life Science Leader is to be an essential business tool for life science executives. Our content is designed to not only inform readers of best practices, but motivate them to implement those best practices in their own businesses.

Page 57 of 69

Information Technology and acting as the hub of the consortium, and AstraZeneca strongly Akhtari explains, this is important since it reduces the risk bought into this opportunity,Ó says Al Dossetter, founder and manof false positives and is the point where a human touch can aging director of MedChemica. ÒAstraZeneca has been joined by be essential to provide quality control. ÒThere are many data Roche/Genentech, and the database contains around 1.2 million repositories worldwide containing a lot of heterogeneous data. data points so far. However, the more data there is to mine, the This data has to be standardized and indexed to be searchable, better the results will be.Ó and results from queries need to be returned in real time via The consortium is open to other large biopharma companies, an intuitive interface to enable scientists to continue their and discussions are ongoing. As a consortium, all partners have research,Ó Akhtari adds. a say and can suggest where additional data could improve the As Boehm explains, this isnÕt always as easy as it looks: dataset overall, even agreeing to share costs where further testÒThere have been some interesting papers on how to analyze ing would be advantageous or match the addition of equivalent Big Data, but when you look closely, you realize it takes a amounts of data. There will be no Òreach-throughÓ claims or huge amount of curation and isnÕt necessarily scalable. WhatÕs tiebacks for any molecules generated as possible on a thousand records wonÕt a result of the collaboration. ÒMore comnecessarily work on millions. WhatÕs panies will create bigger databases and, needed is a way to build compatible and therefore, better rules. This should be synwell-annotated databases and analyze ergistic rather than additive,Ó says Boehm. the databases using processes that can There will also be opportunities for colbe scaled up.Ó laborations with academia. The benefits of Textual information makes up the bulk these will be two-way Ñ for both the acaof the information generated by the bioSaeid Akhtari, cofounder, president, and CEO, NextBio demic researchers and the science behind pharma industry, and one of the excitthe database. ÒWe plan to have an online ing possibilities for data mining would tool available by the end of 2013. This could give academia and be to be able to link this with the other available information small companies access to the technology on a pay-as-you-go basis. and analyze it. However, as Gardner explains, this has its own This would support research and provide us with another revenue issues. ÒAnalyzing text is challenging because so many meanstream,Ó says Dossetter. ings of words are changed by their context. You canÕt assume As with all precompetitive collaborations, security is an importhat two people using the same word will necessarily mean tant issue. However, Boehm is reassuring, saying, ÒThe beauty the same thing. It will be necessary to resolve issues at a very of the collaboration is that the data is extracted and analyzed in detailed level.Ó such a way that we share the rules but not the structures of the It is also important to know the data well, as this will influmolecules. Many companies are recognizing the advantages of preence how it is searched and analyzed and the quality of the competitive collaborations, and I expect to see more in the future. outputs. Understanding the data also has an impact on the I look forward to seeing what comes out of this collaboration. It questions asked of the data. ÒFor example, do you know the could be a big step in drug development.Ó context in which the data was discovered? Have patients been NextBio has created a database with billions of data points from diagnosed using a specific methodology or were they self-diaga range of different types of information, such as genomic, pronosed? Were they given the same treatment protocol or even teomic and metabolomic data, molecular profiles, and clinical trial the same dose? Were the endpoints the same?Ó asks Gardner. results from public and private databases, as well as clinical data Another key challenge is data security. This is important both from individual patients. The company analyzes the data using its for patients and drug developers. ÒData security and patient proprietary algorithms. privacy is critical. We remove identifiers to protect privacy and ÒOne of the drivers for the advances in Big Data in healthcare store data in a private cloud to ensure it is secure and to proresearch is the improved efficiency in producing molecular provide confidence for our clients,Ó says Akhtari. files, as sequencing costs are falling,Ó says Saeid Akhtari, cofoundTHE FUTURE OF BIG DATA AND DATA MINING: er, president, and CEO of NextBio. ÒEach patient whose data is THE ROUTE TO THE MOTHER LODE added to the system makes it smarter.Ó If these challenges can be resolved and large sets of data (e.g. drug CUTTING THROUGH THE ROCK FACE: information, FDA-approval documentation, patents) can be comTHE BIG DATA CHALLENGES bined successfully, then the future of Big Data and data mining Data mining and Big Data bring with them many challenges. could be very exciting. ÒThe future of data mining, we believe, is One of the biggest challenges in data mining is the consistency in making data available to the community and connecting stakeof the data, which can come from many sources. However, as holders,Ó says Akhtari. ÒWe remove identifiers to protect privacy and store data in a private cloud to ensure it is secure.Ó 56 LifeScienceLeader.com November 2013

Life Science Leader Magazine

NOV 2013

Contents of this Issue

Navigation

Page 57 of 69

Articles in this issue

Links on this page

Archives of this issue