FAQs

GNomEx

Analysis

Data

Reporting Problems

GNomEx

How do I link directly to GNomEx datasets?

Look up the Analysis, Experiment, or DataTrack number and modify the examples:

Analysis

Microarray

What is P-value adjustment?

P-value adjustment is an essential part of next-gen sequencing or microarray analysis. Raw or unadjusted p values are calculated for each individual gene in a dataset using a test statistic (for example, a T test). Since there are 30–40,000 different genes in a typical gene expression experiment, you are doing 30–40,000 independent statistical tests on the data. Even at a reasonable p value of 0.05 or 0.01, you should expect many, many false positive results. P-value adjustment using a method such as the Benjamini and Hochberg method or Q-value method will calculate a false discovery rate for the whole experiment from the unadjusted p values of individual genes.

Which expression microarray gene selection method should I use?

Read this comparison by Cody Olsen for a comparison of expression microarray gene selection methods.

How do I run the TiMAT2 CorrelationMap application on chIP-chip promoter array data?

##Here's an example: #Make all of the possible intervals from a T2 run serialized window file java -jar -Xmx1000M ~/Apps/IntervalMaker -s -50 -i 1 -o 2 -g 250 -z 60 -f \  /Users/nix/HCI/PIs/Cairns/ZebraFish/Results/H3K4K27me3Combine/Win/all_Win  #Make a text file containing chromosome, start, and stop for each interval,  #  this represents all of the interrogated promoters on the zebrafish array java -jar -Xmx1000M ~/Apps/IntervalReportPrinter -c -f \  /Users/nix/HCI/PIs/Cairns/ZebraFish/Results/H3K4K27me3Combine/Win/all_Win1Indx86672   #Find the best window within each promoter (if desired use the -m option to  #  find the lowest scoring window for identifying reduced regions from a diff analysis) java -jar -Xmx1000M ~/Apps/BestWindowScoreExtractor -w \  /Users/nix/HCI/PIs/Cairns/ZebraFish/Results/H3K4K27me3Combine/Win/all_Win \  -r /Users/nix/HCI/PIs/Cairns/ZebraFish/Results/H3K4K27me3Combine/Win/all_Win1Indx86672.xls \  -z 60 -i 1 > /Users/nix/HCI/PIs/Cairns/ZebraFish/Results/H3K4K27me3Combine/Win/bestWin.xls  #Parse the output file printing the row number as the first column, the  #   first four columns, and skipping the first four lines java -jar ~/Apps/PrintSelectColumns -i 0,1,2,3 -n 4 -r -f \  /Users/nix/HCI/PIs/Cairns/ZebraFish/Results/H3K4K27me3Combine/Win/bestWin.xls   #Run the CorrelationMap application on the parsed file java -jar ~/Apps/CorrelationMaps -w 1000000 -g zv7 -f \  /Users/nix/HCI/PIs/Cairns/ZebraFish/Results/H3K4K27me3Combine/Win/bestWin.PSC.xls

How do I convert TiMAT2 xxx.bar files to text files?

There is a converter called Bar2Gr in the T2 package. Download it from SourceForge or use the installed version on hci-bio.

nixlaptop:~ nix$ ssh u0028003@hci-bio u0028003@hci-bio's password:  Last login: Wed Aug  8 09:36:36 2007 from 155.100.234.87 [u0028003@hci-bio ~]$ java -jar /home/BioApps/T2/Apps/Bar2Gr   ************************************************************************************** **                                 Bar2Gr: Nov 2006                                 ** ************************************************************************************** Converts xxx.bar to text xxx.gr files.  -f The full path directory/file name for your xxx.bar file(s).  Example: java -Xmx1500M -jar pathTo/T2/Apps/Bar2Gr -f /affy/BarFiles/   **************************************************************************************

How do I convert between different gene names?

There are several websites that can get you 90% of the way there. Check out MatchMiner or the DAVID Gene ID Conversion Tool. For those with no match, try manually entering the name into the UCSC Genome Browser Gateway or Ensembl search bar.

Data

Sequencing

How do I convert refFlat or refSeq UCSC gene table files into brs (binary ref seq) files for uploading to a DAS/2 server like GNomEx?

  1. Follow the initial steps for retrieving and uncompressing the igb_exe.jar file as described for converting fasta to bnib files.
  2. Launch the BrsParser (java -Xmx2G -cp genometry.jar:sam--igbext.bnd.jar com.affymetrix.genometryImpl.parsers.BrsParser) to get the usage information. Note: You must launch this from the IGBCode directory.
  3. Launch the BrsParser on a refFlat file (e.g. java -Xmx2G -cp genometry.jar:sam--igbext.bnd.jar com.affymetrix.genometryImpl.parsers.BrsParser /my/anno/hg19EnsGenes.ucsc /my/anno/hg19EnsGenes.brs).

What is the md5_checksums.txt file?

The Fastq folder for each sequencing experiment contains, in addition to the compressed Fastq files themselves, a file named "md5_checksums.txt". This file contains the MD5 checksum for each Fastq file from your experiment on our server. The MD5 checksum is a commonly used method to check data integrity. Once you have downloaded your sequence data files from GNomEx, you can calculate the MD5 checksums on your local copy of the files and compare them with the checksums in md5_checksum.txt. The numbers should match exactly. If they do not match, the file was corrupted during transfer and must be downloaded again.

To calculate the MD5 checksum, use the "md5sum" command on Linux or the "md5" command in the terminal on Mac OS X:

$ md5sum 9835X1_130311_SN141_0663_BC1P97ACXX_5.txt.gz  946235368cf6b055f96b957c6640d8ea  9835X1_130311_SN141_0663_BC1P97ACXX_5.txt.gz

How do I convert fasta files into bnib files for uploading to a DAS/2 server like GNomEx?

  1. Make a directory called IGBCode (mkdir IGBCode).
  2. Change into that directory (cd IGBCode).
  3. Download the IGB.zip file (NOT Webstart) from the latest IGB release into the IGBCode directory.
  4. Unzip IGB.zip.
  5. Move the igb_exe.jar file from the uncompressed igb folder to the IGBCode folder (mv igb/igb_exe.jar .)
  6. Unzip the igb_exe.jar file.
  7. Launch the NibbleParser (java -Xmx2G -cp genometry.jar:sam--igbext.bnd.jar com.affymetrix.genometryImpl.parsers.NibbleResiduesParser) to get the usage information. Note: You must launch this from the IGBCode directory.
  8. Launch the Nibble Parser on a directory of uncompressed xxx.fasta files (e.g. for x in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 X Y M; do echo $x; java -Xmx2G -cp genometry.jar:sam--igbext.bnd.jar com.affymetrix.genometryImpl.parsers.NibbleResiduesParser chr$x /my/fasta/files/chr$x".fasta" /my/bnib/files/chr$x".bnib" H_sapiens_Feb_2009; done
  9. Save the chromosome length information for each converted file (this will be needed by GNomEx for creating a new genome build).

Microarray

What are all these Agilent data files?

  • The data files produced during an Agilent microarray experiment (whether gene expression, CGH, or ChIP-chip) include raw data in the form of TIFF images, low-resolution JPEG images for visual examination, quality control files, and numerical data in the form of a text file. All the files for a particular array will have names that begin with the array's experiment number. For example, in request number 5054R, the names of all the data files for the first array will begin with 5054E1.
  • Here is a list of files produced from a typical gene expression experiment:
experiment_251486815301_S01_GE2-v5_95_Aug07_1_1.jpg
JPEG image of array
experiment_251486815301_S01_GE2-v5_95_Aug07_1_1.pdf
quality control report
experiment_251486815301_S01_GE2-v5_95_Aug07_1_1.txt
Text file with the numeric data for each spot on the array
request.fep

XML document that contains the parameters used for this run of the Agilent Feature Extraction software—the software that translates the image of the array into numerical data

request_200708151242.rtf

RTF (Word) document with the Project Run Summary for the array, a brief report describing when the array was scanned, the array's format, and the number of saturated spots

request_251486815301_S01_H.tif
TIFF from high-intensity scan
request_251486815301_S01_L.tif
TIFF from low-intensity scan
H002334_LastBatchReport.rtf
same as the Project Run Summary
QCReport_Graphs

A folder that contains files used in the Quality Control Report

Where do I get the design file for an Agilent array?

The layout of each Agilent microarray is described in a "design file," which is needed for loading Agilent-format microarray data into analysis software such as MeV or Agilent CGH Analytics.

Download

Obtain the Agilent array design files from Agilent. You will need a microarray bar code to download the design file. The bar code number of each array is a 12-digit number (typically beginning with "251...") that is embedded in the names of your Agilent microarray data files. You can also find the barcode number in the header of the .txt format Agilent data file. The barcode should be on row 3, directly beneath a cell with the word "FeatureExtractor_Barcode." If the data file is opened in Excel, the barcode is usually in cell T3.

Formats

The design files are available in several different formats. The format you need depends on which analysis software you use.

  • Agilent CGH Analytics 
  • GEML (.xml) format
  • MeV 
  • Tab-delimited text (.txt) format
  • Agilent scanner configuration
  • DNA: Back of slide
  • Barcode: Left side
  • Scan: Landscape

What analysis software is available?

See Software

How do I report a problem or issue with a USeq application?

Please fill out a bug report. Do not send an email—they get buried and lost. This bug report should contain at minimum three items:

  • A short description of the problem
  • A complete copy of the command line used and all terminal output
  • Links to all of the files used to execute the command line. Place them in a web accessible directory, on moab/ alta/ ember5, or (if you have an account) in GNomEx under a new Analysis

Without these three essential items, we will not be able to reproduce your bug and your bug report will in all likelihood be deleted.

How do I upload my results to GEO or ArrayExpress?

How do I get the sequence for a particular microarray probe?

Agilent Probes

Agilent probe sequences are frequently reported in the .txt files that contain the raw data for an Agilent microarray experiment. These are tab-delimited text files, easily opened in Microsoft Excel. Check there first.

Agilent probe sequences are also available via Agilent's eArray website. You will need a login to use this site, which can be obtained from Brian Dalley or the Bioinformatics Core.

Step by Step Instructions

Once you've logged into eArray, follow these steps to find the sequence of a probe:

  1. Click on the "Probe" tab.
  2. From the Search type menu, select "Gene Symbol," "Accessions," or "ProbeID"—whichever is appropriate for your search.
  3. In the "Search Term" field, enter a probe ID (e.g. A_44_P409518), an accession number (e.g. NM_057188), or a gene symbol (e.g. Gmpr).
  4. Select a species from the Species drop-down menu.
  5. Select a Folder for the search: either "Agilent Catalog" for Agilent-designed probes (including commercial arrays) or "University of Utah" for custom-designed probes.
  6. Click the "Search" button.
  7. Finally, click on the "View" link for your probe of interest at the bottom of the page.

Affymetrix Probes

Affymetrix array probe sequences can be found by searching for the platform (array) identifier at the Affymetrix website.

Where do I report problems, request new features, or post questions?

GNomEx

USeq

  • How to report a problem with USeq
    1. Please fill out a bug report. Do not send an email—they get buried and lost. This bug report should contain at minimum three items:
      • A short description of the problem
      • A complete copy of the command line used and all terminal output
      • Links to all of the files used to execute the command line
    2. Place them in a web accessible directory on moab/ alta/ ember5, or (if you have an account) in GNomEx under a new Analysis.
  • Bug Reports
  • Feature Requests
  • General Help Requests
  • USeq User Group

IGB

Pysano

Requests for Analysis Assistance

General Analysis Inquiries

Contact Us

Genomics Director
Brian K. Dalley, PhD
brian.dalley@hci.utah.edu
801-585-7192

Bioinformatic Analysis Director
David Nix, PhD
david.nix@hci.utah.edu
801-587-4611

Governance

HCI Senior Director Oversight
Bradley Cairns, PhD

Faculty Advisory Committee Chair
Bradley Cairns, PhD

Faculty Advisory Committee Members
Richard Clark, PhD
Jason Gertz, PhD
Christopher Gregg, PhD
Philip Moos, PhD
Sean Tavtigian, PhD
Katherine Varley, PhD
Joseph Yost, PhD