AnVil develops and implements data analysis and visualization strategies for very large, high-dimensional data sets. This brief tutorial will help you better understand what drives AnVil's business.
What do we mean by "very large" and "high-dimensional?"
These refer to data sets with hundreds of thousands, or even millions of variables or records. Instead of a few data records in which each record has a dozen attributes, we can see valuable relationships within records that contain thousands of columns, or even more. Think of a typical spreadsheet you might use in your business. Now, instead of a dozen columns, imagine one with 50,000 columns. This is an example of a very large data set. AnVil's unique and proprietary analysis and visualization tools and methods enable us to work with these data sets without prior dimensional reduction, keeping all the data that your researchers have spent time, money and effort to obtain.
Where do these large data sets come from?
Life sciences and drug discovery technology give researchers the ability to create truly massive amounts of data. Instead of synthesizing and testing a dozen new drug candidates, researchers run combinatorial libraries of hundreds of thousands of compounds through automated assays. This generates very large data sets that can present overwhelming analysis challenges to most data miners and informatics scientists.
Consider another example where researchers test DNA chips with thousands of genes against tissue samples from hundreds of patients. The results of these technologies are the very large data sets that benefit the most from AnVil's unique analysis and visualization processes and methods.
What are the goals of analyzing this data? What applications might it have?
Most organizations are not prepared to deal with this data avalanche. They store all the data, hoping to analyze it all someday, or they take just the small amount of data they can handle and discard the rest. They are losing the information that resides in these masses of data! Every data point tells something about the characteristics of a new drug candidate or the action of a gene. The goal is to keep all the data and use it to extract the maximum amount of information the patterns, trends, and subtle relationships of natural processes. The benefit is to learn more, earlier in the research cycle, giving a better basis for decision-making, and allocating effort toward the most promising lines of research. This is where AnVil's expertise and experience can help. Using the power of high-dimensional analysis and visualization, AnVil enables clients to see and interpret multiple data sets in integrated, information-rich views. This approach turns data into decision-quality information.
How does AnVil develop a data exploration and analysis strategy?
AnVil's methods combine traditional statistical analysis, innovative techniques derived from research, and expert scientific knowledge based on experience. We recognize that there's no single, right way to explore data and get results - each objective may present several solution options. AnVil identifies and provides the best team, methodology, and tools that will forge the optimal path to discovery.
Developing a data exploration strategy involves viewing and clustering the data, deriving proposed sets of classifiers, and testing the scoring or efficiency of those classifiers. In other words, we seek the smallest practical set of variables that accounts for the greatest amount of variability or spread in the data. We do this without simplifying or reducing the dimensions of the problem. Only after the full dataset is explored is the data dimensionally reduced for finer discrimination.
Created by PixelMEDIA, Inc.