Step 1: For the two channels of an image, generate a file containing spot intensities and addresses and display a scatterplot.
Step 2: Compare spot intensities between the red and green channels and generate a list of over and under expressed genes.
Step 3: Visually examine pairs of spots reported as differentially expressed.
Step 4: For sophisticated statistical analysis, prepare a file containing intensities of all images to be opened in statistical packages like JMP. JMP provides a user with a three way connection between a replica of the image, scatter plot/histogram, and peak intensities. Having observed the scatter plot, one can trace its features to actual locations in the image. Conversely, one can subtract defective artifactual areas on an image from the scatter plot and remove their distorting influences. Also, generation of multiple scatter plots for time course studies are possible within this package.
Step 5: Here we analyse a series of experiments including replicates.
Step 6: A true color composite image of the two channels is displayed.
Pre-requisites and installation:
1) ~256 MB RAM
2) F-SCAN (http://abs.cit.nih.gov/fscan
) is written in MATLAB.
Install MATLAB 5.2 or subsequent versions ( http://www.mathworks.com) before using F-SCAN.
Set MATLAB memory to ~256 MB.
Place the F-SCAN directory on your hard disk.
APPEND F-SCAN to the MATLAB path as follows.
MAC users should click on the icon with a picture of two folders. This activates the MATLAB Path browser. On the left side of the path browser, find the P-SCAN folder and highlight it. Click on the APPEND button in the center of the path browser. Finally, click on the Save button in the bottom right of the path browser and then close.
PC users should click on the icon with a picture of two folders. This activates the MATLAB Path browser. Click on the Browse button to locate the P-SCAN folder. Next, click in the space labeled "Current Directory" to insure the P-SCAN path is highlighted. Then select the Path menu item "Add to Path..." Most importantly, select "Add to back". Then click OK. Finally, from the File menu of the Path Browser choose "Save Path" and exit the Path Browser.
UNIX users should cd to
their home directory and create a directory there called
matlab
In the matlab directory, create a file startup.m containing
the following
path(path,'insert pscan path here')
where 'insert pscan path here' should be replaced by
the location of pscan on the users computer; for example, '/usr/people/pscanuser/P-SCAN_1.2'
MATLAB 6 users should go to the "File" menu and
select "Set Path...". Next select "Add Folder...". Highlight
the F-SCAN_1.3 folder by a single click, and next select "OK". Next
select "Move to Bottom". Finally, select "Save" and then "Close".
In the command window type "fscan".
Step 1. To generate a file containing spot intensities and addresses and display a scatterplot:
Click the OK button.
Choose the first track "Find spot intensities in an image".
Using the dialog box choose the files containing the two
channels and the genelist
"open".
The image should now be visible.
Step
2. To compare spot intensities between the two channels and generate a
list of over and underexpressed genes:
Click the OK button.
Choose second track "Compare channels interactively".
Choose the second track "Compare channels interactively".
Choose the gene list.
F-SCAN draws a scatter plot and presents it on a log-log scale. The rectangular box depicts the background cutoff. Spots above the upper (lower) line are over (under) expressed 2-fold. One can click on spots in the image and identify their position on the scatterplot or vice versa. For each spot chosen, a small subsegment of the image around the spot is displayed. A message box shows the genename and ID. One can access standard databases and identify the clone via the web using the 'WWW Clone' button. On the menbar one finds 'F-SCAN tools'. To examine spots 3, 5, 10, 100, 1000 fold over/under click 'F-Scan tools -> plot -> fold ' in 'F-SCAN tools'
One can manually alter the background cutoff as well as the upper/lower lines. Click 'F-Scan tools -> plot -> Redo background cutoff' . The old lines disappear and crosshairs appear. Click at the desired location and the new line(s) will pass through that point. The altered numerical values will appear on the message boxes.
Step 3. To visually examine pairs of spots reported as differentially expressed between channels:
Having compared the two channels, one obtains a list of spots differentially expressed by a significant amount. One would like to visually verify the differential expression by referring back to relevant areas in the original images. Here we provide a tool wherein the reader can supply the address of a given spot and obtain slices of an image centered at that spot in a given number of images. This enables a visual verification of the authenticity of the calculated difference, or a dismissal of the same as having resulted either from bleeds of neighbouring spots or some other artifact.
Users must supply lists of addresses of the spots they would like to visualise. This can be done in two ways: a) enter the address one at a time using the keyboard. b) enter an entire file of addresses. This can be done by saving a file of genes expressed over/under from the scatter plot in track 2 and reading this file in.
Click the OK button.
Choose third track "Compare images at address".
Choose third track "Compare images at address"
(If these were produced with older versions of F-SCAN, the user may be prompted for the names of the .pGL files, as well. F-SCAN may also prompt the user to select the .mat files which correspond with the .pGL files.)
To enter a list by file name click "input from file" and choose the file in the dialog box.
Step 4. For sophisticated statistical analysis, to prepare a file containing intensities of all images to be opened in statistical packages like JMP:
JMP provides a user with a three way connection between a replica of the image, scatter plot/histogram and peak intensities. Having observed the scatter plot, one can trace its features to actual locations in the image. Conversely, one can subtract defective artifactual areas on an image from the scatter plot and remove their distorting influences. Also, multiple scatter plots tracing time courses are available in this package.
One needs a single file containing peak intensities of all images under study for this analysis. This track creates such a file with the following features:
F. Clicking "Proceed" creates one combined output file (to be opened in JMP).
Go To Top
Step 5: Analyse a series (including replicates)
Here one can perform one of four tasks.
A) Analyse replicates :
WHAT IS DONE HERE
This is an attempt to winnow over expressed and under expressed genes that are consistent between replicate experiments. One compares the expected (E) to observed (S) value for over expressed (S=+1) and under expressed (S=-1) genes consistent between replicates. Cutoffs which define over and under expression are also recorded in the output.
HOW TO ACCOMPLISH IT
i) From the F-SCAN menu, select Track 5 "Analyze a series (including replicates)"
ii) From the File Selection menu choose "Select interactively" if it is the first time for selecting the particular set of files. Otherwise, a previous series analysis would have generated a .prj file which can be read by choosing "Input from file".
iii a) If "Input from file" is chosen, proceed to step iv.
iii b) If "Select interactively is chosen, proceed as follows. Select the gene list. Specify the lattice parameters. Browse to the folder containing the data files and specify an ouput file name. Select the files to combine holding down the control key (apple key on the MAC) to select multiple files.
The Series Selection window should appear listing the files chosen. This window allows the assignment of replicates and series points. To assign series points, select an integer from the pull down menus in the "Point in series" column with replicates for a particular series getting the same integer. For example, if the first and last experiments listed are for the zero hour in a time series, the first and last "Point in series" menus should be set to one. If the next time point is at three hours the "Point in series" menus for those files should be set to two. A total of six series points are allowed.
The columns labeled Channel 1 and Channel 2 allow the user to specify forward and reverse experiments. Replicates should have the same labels within each series point. If the replicate is a reverse experiment, then the labels should be reversed. For subsequent series points, a new label should be applied. For example, if the following files are selected: exp11.pGL, exp12.pGL, exp13.pGL, exp14.pGL, exp15.pGL, and exp16.pGL and the time points are zero hour (exp11.pGL and exp16.pGL), three hours (exp12.pGL and exp14.pGL), and six hours (exp13.pGL and exp15.pGL; and in the dyes have been reversed for exp14.pGL and exp16.pGL; then the Series Selection would be
Experiment Channel 1 Channel 2 Point in series
exp11 c1 t1 1
exp12 c2 t2 2
exp13 c3 t3 3
exp14 t2 c2 2
exp15 t3 c3 3
exp16 c1 t1 1
Finally, the user should choose a Peak Intensity Type of either LR** or LR*, where LR* is the log of the ratio of the normalized peak heights of the two channels, and LR** is the log of the ratio of the curvature corrected peak heights of the two channels.
The Select Ratio Order sub panel defines which channel will be used in the over under expression designation. Ch1/Ch2 will yield positive numbers for the log ratio when the intensities in channel one are higher than in channel two. Ch2/Ch1 will yeild negative numbers for the log ratio when the intensities of channel one are higher than in channel two.
Once selections have been made choose the "PROCEED" button.
iv) The PLOT SELECTION window appears next. In the upper panel "Screening Criteria", change the default cutoff values if desired.
In the lower left panel, change the integer labels for each series point to short labels relevant for the experiment. For example, in a course experiment the labels might be 0 hr, 3 hr, 6 hr. The labels will be used in the line plots. Also, change the log ratio cutoffs if desired.
Finally, to analyze the replicates, click the "Analyze Replicates" button. A tab delimited file will be generated containing observed and expected values for each series point.
In this file,
E(S=+1) is the expected value of over expressed genes.
E(S=-1) is the expected value of under expressed genes.
#(S=+1) is the observed number of over expressed genes.
#(S=-1) is the observed number of under expressed genes.
+1 cutoff is the log ratio value above which a gene is defined as over expressed.
-1 cutoff is the log ratio value below which a gene is defined as under expressed.
b) Plot all line plot expression patterns for the series
WHAT IS DONE HERE
A pattern for each gene is generated as follows.
To each gene, we assign at each series point a symbol representing over expression (+), under expression (-), neutral (0), flagged(f), or absent in expression (x). All series point symbols of a gene are connected together in a string. Each string thus defines the expression pattern of the gene.
Genes similar in pattern are grouped together. All possible groups are shown in the output. The line plots are shown only for those patterns which contain no flagged genes or genes that are absent in expression.HOW TO ACCOMPLISH IT
i) From the F-SCAN menu, select Track 5 "Analyze a series (including replicates)"
ii) From the File Selection menu choose "Select interactively" if it is the first time for selecting the particular set of files. Otherwise, a previous series analysis would have generated a .prj file which can be read by choosing "Input from file".
iii a) If "Input from file" is chosen, proceed to step iv.
iii b) If "Select interactively is chosen, proceed as follows. Select the gene list. Specify the lattice parameters. Browse to the folder containing the data files and specify an ouput file name. Select the files to combine holding down the control key (apple key on the MAC) to select multiple files.
The Series Selection window should appear listing the files chosen. This window allows the assignment of replicates and series points. To assign series points, select an integer from the pull down menus in the "Point in series" column with replicates for a particular series getting the same integer. For example, if the first and last experiments listed are for the zero hour in a time series, the first and last "Point in series" menus should be set to one. If the next time point is at three hours the "Point in series" menus for those files should be set to two. A total of six series points are allowed.
The columns labeled Channel 1 and Channel 2 allow the user to specify forward and reverse experiments. Replicates should have the same labels within each series point. If the replicate is a reverse experiment, then the labels should be reversed. For subsequent series points, a new label should be applied. For example, if the following files are selected: exp11.pGL, exp12.pGL, exp13.pGL, exp14.pGL, exp15.pGL, and exp16.pGL and the time points are zero hour (exp11.pGL and exp16.pGL), three hours (exp12.pGL and exp14.pGL), and six hours (exp13.pGL and exp15.pGL; and in the dyes have been reversed for exp14.pGL and exp16.pGL; then the Series Selection would be
Experiment Channel 1 Channel 2 Point in series
exp11 c1 t1 1
exp12 c2 t2 2
exp13 c3 t3 3
exp14 t2 c2 2
exp15 t3 c3 3
exp16 c1 t1 1
Finally, the user should choose a Peak Intensity Type of either LR** or LR*, where LR* is the log of the ratio of the normalized peak heights of the two channels, and LR** is the log of the ratio of the curvature corrected peak heights of the two channels.
The Select Ratio Order sub panel defines which channel will be used in the over under expression designation. Ch1/Ch2 will yield positive numbers for the log ratio when the intensities in channel one are higher than in channel two. Ch2/Ch1 will yeild negative numbers for the log ratio when the intensities of channel one are higher than in channel two.
Once selections have been made choose the "PROCEED" button.
iv) The PLOT SELECTION window appears next. In the upper panel "Screening Criteria", change the default cutoff values if desired.
In the lower left panel, change the integer labels for each series point to short labels relevant for the experiment. For example, in a course experiment the labels might be 0 hr, 3 hr, 6 hr. The labels will be used in the line plots. Also, change the log ratio cutoffs if desired.
Finally, to plot all patterns, click the "PLOT ALL PATTERNS" button. Notice that patterns containing flagged spots or missing data are not plotted.
c) Plot selected expression patterns for the series collectively
WHAT IS DONE HERE
This section is to enable a user to select and display all genes whose behavior along a series of experimental points conforms to a pattern of interest. Several patterns may be selected, and the collections of genes conforming to these patterns are presented in a single plot. The succeeding section enables the user to present individual patterns separately.
The user defines the patterns to be displayed choosing from "any", "over", "under", "either", and "neither" at each series point. "Any" selects all genes at that point; "over" is for the over expressed genes; "under" is for the under expressed genes; "either" is for the regulated genes either over or under expressed; and "neither" is for gene neither over nor under expressed. The colors of the lines are also dictated by the user. Lines for genes satisfying these patterns are displayed in a single plot.HOW TO ACCOMPLISH IT
i) From the F-SCAN menu, select Track 5 "Analyze a series (including replicates)"
ii) From the File Selection menu choose "Select interactively" if it is the first time for selecting the particular set of files. Otherwise, a previous series analysis would have generated a .prj file which can be read by choosing "Input from file".
iii a) If "Input from file" is chosen, proceed to step iv.
iii b) If "Select interactively is chosen, proceed as follows. Select the gene list. Specify the lattice parameters. Browse to the folder containing the data files and specify an ouput file name. Select the files to combine holding down the control key (apple key on the MAC) to select multiple files.
The Series Selection window should appear listing the files chosen. This window allows the assignment of replicates and series points. To assign series points, select an integer from the pull down menus in the "Point in series" column with replicates for a particular series getting the same integer. For example, if the first and last experiments listed are for the zero hour in a time series, the first and last "Point in series" menus should be set to one. If the next time point is at three hours the "Point in series" menus for those files should be set to two. A total of six series points are allowed.
The columns labeled Channel 1 and Channel 2 allow the user to specify forward and reverse experiments. Replicates should have the same labels within each series point. If the replicate is a reverse experiment, then the labels should be reversed. For subsequent series points, a new label should be applied. For example, if the following files are selected: exp11.pGL, exp12.pGL, exp13.pGL, exp14.pGL, exp15.pGL, and exp16.pGL and the time points are zero hour (exp11.pGL and exp16.pGL), three hours (exp12.pGL and exp14.pGL), and six hours (exp13.pGL and exp15.pGL; and in the dyes have been reversed for exp14.pGL and exp16.pGL; then the Series Selection would be
Experiment Channel 1 Channel 2 Point in series
exp11 c1 t1 1
exp12 c2 t2 2
exp13 c3 t3 3
exp14 t2 c2 2
exp15 t3 c3 3
exp16 c1 t1 1
Finally, the user should choose a Peak Intensity Type of either LR** or LR*, where LR* is the log of the ratio of the normalized peak heights of the two channels, and LR** is the log of the ratio of the curvature corrected peak heights of the two channels.
The Select Ratio Order sub panel defines which channel will be used in the over under expression designation. Ch1/Ch2 will yield positive numbers for the log ratio when the intensities in channel one are higher than in channel two. Ch2/Ch1 will yeild negative numbers for the log ratio when the intensities of channel one are higher than in channel two.
Once selections have been made choose the "PROCEED" button.
iv) The PLOT SELECTION window appears next. In the upper panel "Screening Criteria", change the default cutoff values if desired.
In the lower left panel "Label Points in Series", change the integer labels for each series point to short labels relevant for the experiment. For example, in a course experiment the labels might be 0 hr, 3 hr, 6 hr. The labels will be used in the line plots. Also, change the log ratio cutoffs if desired.
In the lower right panel "Select Expression Patterns", a total of five patterns A) - E) may be designated. If fewer are desired, simply change the color selection to "none". Patterns are designated by selecting the gene expression ratio qualifier "any", "over", "under", "either", or "neither" at each series point.
Finally, to generate the plot of the designated patterns, click on the "Plot Selected" button.
d) Plot selected expression patterns for the series separately
WHAT IS DONE HERE
This section is to enable a user to select and display all genes whose behavior along a series of experimental points conforms to a pattern of interest. It differs from the section above in that the patterns comprising the collections of genes are separately displayed (instead of as one collection as in the preceding section).
The user defines the patterns to be displayed choosing from "any", "over", "under", "either", and "neither" at each series point. "Any" selects all genes at that oint; "over" is for the over expressed genes; "under" is for the under expressed genes; "either" is for the regulated genes either over or under expressed; and "neither" is for gene neither over nor under expressed. The colors of the lines are also dictated by the user. Lines for genes satisfying these patterns are displayed using one plot for each pattern.HOW TO ACCOMPLISH IT
Step 6. A true color composite image of the two channels is displayed.:
Click Track 6 of the Main Menu - 'Show image'. A dialog box requires the user to choose files containing the two channels and a composite image is displayed.
Go To Top
Reference
"Development of a Prostate cDNA Microarray and Statistical Gene Expression Analysis Package"; A. J. Carlisle, V. V. Prabhu, A. Elkahloun, J. Hudson, J. M. Trent, W. M. Linehan, E. D. Williams, M. R. Emmert-Buck, L. A. Liotta, P. J. Munson, and D. B. Krizman; Molecular Carcinogenesis 28:12-22 (2000)
P. J. Munson munson@helix.nih.gov , V. V. Prabhu prabhu@helix.nih.gov , and L. Young lynny@helix.nih.gov
Analytical Biostatistics Section; Mathematical and Statistical Computing Laboratory; Center for Information Technology; Building 12A, Room 2039; National Institutes of Health; Bethesda, MD 20892
This manual was translated into HTML by Eric Faden.
Last modified 8 August 2001