StATS: Acuity microarray analysis software.

I got a request to evaluate some software by Axon Instruments for the analysis of microarray data. The software, Acuity version 3.1, costs $4,000 per person and has to compete with other commercial software such as

StatSci.org has a nice list of companies and institutions that produce statistical analysis software for microarrays as part of their overview of microarray data analysis.

Perhaps the most interesting product of the bunch is Bioconductor, an open source library for the R programming language. Both Bioconductor and R are distributed under the GNU General Public License, which allows you to modify the source code as long as you agree that if you distribute your modifications, you distribute them with the source code and allow others to make further modifications. This requirement is called "copyleft" to distinguish it from "copyright."

I wrote a brief introduction to the R language in PDF format. There are lots of other tutorials for R and Bioconductor that are better written than mine. In particular, the "Overview of R and Bioconductor" handout  in PDF format is a nice place to start.

R (and Bioconductor) is an object oriented language, which makes it both difficult to learn and very powerful. Microarray data is stored in a class called exprSet. This class includes information about expression levels, standard errors of expression levels, phenotypic data, annotation, description, and notes.

Phenotypic data represents information about the individual or cell line that produced this particular microarray, such as the gender of the patient, the tissue being sampled, and so forth. You can get the pheontypic data from a exprSet class by using the pData() function.

The hardest thing about microarrays is that there are just so many genes that you can evaluate the expression levels for. Bioconductor allows you to annotate these genes. For example, if you ran an experiment with the Affymettrix Hu6800 chip, your data set will have the Affymettrix proprietary names for the genes. Bioconductor has a library that allows you to find out the symbol, genename, locusid, and other information using the Affymettrix name. You can also get the PMIDs (PubMed Identification) for papers that cite the particular gene of interest.

I will write up more details about Bioconductor when I get the time and energy.

In contrast to Bioconductor, Acuity is menu driven, so you just point and click to get a data analysis. It sounds easier, but quite honestly, you have to work just as hard with either program because there is no such thing as an easy analysis for microarray data. The acuity menu has 9 menus:

1. Import Data

2. Organize Microarrays into Folders

3. View Data

4. View Statistics

5. Normalize Microarrays

6. Create a Dataset for Analysis

7. Manipulate a Dataset

8. Cluster a Dataset

9. Advanced Tips

Notice that most of the menus deal with data management. This is not surprising, since the size of a typical microarray experiment makes data management quite difficult.

As far as I could tell, Acuity could not analyze data from gene chips (Affymetrix chips). Version 4.0 of the software, I'm told, will have support for gene chip data.

The normalization features of Acuity look quite good and is very easy to use. The problem, of course, is that there are a lot of choices for normalization, and the typical user won't know what choices are best. That's not something that you can really blame Acuity for, any more than you can blame Ford Motor Company because some of its cars are driven by 85 year olds who are half blind. When I find the time, I will write a short web page about normalization.

The disappointing aspect of this software is that all of its methods fit into a category of analysis called unsupervised learning or class discovery. I did not find any methods in the category of supervised learning (class prediction).

If you want to identify genes that have differential expression across two groups of microarrays, Acuity offers you either a t-test or the Mann-Whitney test. The typical microarray has thousands of genes, so some sort of adjustment is needed for the p-values for these tests. Acuity offers the Bonferroni and Benjamini-Hochberg corrections for multiple comparisons.

There are several levels of software integration which add to the functionality of this program.

The Help files are quite good. There were a few statements that were poorly or ambiguously worded, but for something as complex as microarray analysis, some imperfections are to be expected. I did notice some stray entries in the keyword index (<invalid> and <no data>). Also the entry under "Calculate Multiple Groups" was blank.

This page was written by Steve Simon while working at Children's Mercy Hospital. Although I do not hold the copyright for this material, I am reproducing it here as a service, as it is no longer available on the Children's Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Data mining or Category: Statistical computing.