F-SCAN Manual

Index



F-SCAN Functions:

Step 1: For the two channels of an image, generate a file containing spot intensities and addresses and display a scatterplot.

Step 2: Compare spot intensities between the red and green channels and generate a list of over and under expressed genes.

Step 3: Visually examine pairs of spots reported as differentially expressed.

Step 4: For sophisticated statistical analysis, prepare a file containing intensities of all images to be opened in statistical packages like JMP. JMP provides a user with a three way connection between a replica of the image, scatter plot/histogram, and peak intensities. Having observed the scatter plot, one can trace its features to actual locations in the image. Conversely, one can subtract defective artifactual areas on an image from the scatter plot and remove their distorting influences. Also, generation of multiple scatter plots for time course studies are possible within this package.

Step 5: Here we analyse a series of experiments including replicates.

Step 6: A true color composite image of the two channels is displayed.

Go To Top
 
 


Pre-requisites and installation:

1) ~256 MB RAM

2) F-SCAN (http://abs.cit.nih.gov/fscan ) is written in MATLAB.
 

Install MATLAB 5.2 or subsequent versions ( http://www.mathworks.com) before using F-SCAN.

Set MATLAB memory to ~256 MB.

Place the F-SCAN directory on your hard disk.

APPEND F-SCAN to the MATLAB path as follows.

     MAC  users should click on the icon with a picture of two folders.  This activates the MATLAB Path browser.  On the left side of the path browser, find the P-SCAN folder and highlight it.  Click on the APPEND button in the center of the path browser.  Finally, click on the Save button in the bottom right of the path browser and then close.

    PC  users should click on the icon with a picture of two folders.  This activates the MATLAB Path browser.  Click on the Browse button to locate the P-SCAN folder.  Next, click in the space labeled "Current Directory" to insure the P-SCAN path is highlighted.  Then select the Path menu item "Add to Path..."    Most importantly, select "Add to back".   Then click OK.  Finally, from the File menu of the Path Browser choose "Save Path"  and exit the Path Browser.

     UNIX  users should cd to their home directory and create a directory there called
matlab
In the matlab directory, create a file startup.m containing the following
path(path,'insert pscan path here')
where 'insert pscan path here' should be replaced by the location of pscan on the users computer;  for example, '/usr/people/pscanuser/P-SCAN_1.2'

  MATLAB 6 users should go to the "File" menu and select "Set Path...".  Next select "Add Folder...".  Highlight the F-SCAN_1.3 folder by a single click, and next select "OK".  Next select "Move to Bottom".  Finally, select "Save" and then "Close".
 

In the command window type "fscan".

Go To Top
 
 


How to operate F-SCAN:

Step 1. To generate a file containing spot intensities and addresses and display a scatterplot:

  1. Type "fscan" in the MATLAB command window.

  2. Click the OK button.

    Choose the first track "Find spot intensities in an image".

    Using the dialog box choose the files containing the two channels and the genelist
    "open".

    The image should now be visible.

  3. Enter 3 letters in the file naming box. The output file will contain these 3 letters and end with the suffix .pGL.
  4. The image should be segmented into constituent fields.  First specify if the fields of the image are further divided into subfields. If subfields do exist proceed to the next item. Else, click the 'no' button. Specify the lattice parameters.  Then click on the top left corner and bottom right corner of the array leaving a margin whose width is approximately one half of the spacing between the fields. You are presented a putative segmentation of the image.  If it is not acceptable 'Redo' the segmentation.  A proper segmentation is one in which the lines separating the fields lie in between them.  If this does not  occur, 'Redo' the segementation clicking slightly above or below your original choices.  When an acceptable segmentation occurs, click 'Next'.
  5. If the image has subfields it must be segmented into constituent grids. To accomplish this,first click the 'yes' button. Specify the lattice parameters.  Then click on the top left spot of the entire image, then on the bottom right spot of the first subfield, and finally on the bottom right spot of the entire image. You can redo this step if necessary. If satisfied press "Next".
  6. The program presents one segment at a time for grid placement. Click the top left and bottom right spots and a grid will be laid on one segment. One can alter this grid by using buttons on the window or redoing it entirely. Click "next" to continue through the remaining segments of the image.
  7. After grids have been placed on all segments, F-SCAN evaluates peak heights of all spots and prints out a file with addresses and intensities of all spots. A scatterplot is displayed between the two channels along with a composite image. One can click on spots in the image and identify their position on the scatterplot or vice versa. For each spot chosen, a small subsegment of the image around the spot is displayed. A message box shows the genename and ID. One can access standard databases and identify the clone via the web using the 'WWW Clone' button. On the menubar find 'F-SCAN tools'. One can alter the fold difference lines on the scatterplot, display a list of genes above a certain fold or save this list in a file.
Go To Top
 

Step 2. To compare spot intensities between the two channels and generate a list of over and underexpressed genes:
 

  1.  
    1. If entering F-SCAN anew, type "fscan" in the MATLAB command window. Otherwise, proceed to A-2. One must first generate a .mat file for the image using Step 1 before using Step 2.

    2.  

      Click the OK button.

      Choose second track "Compare channels interactively".

    3. If already in F-SCAN, return to the "MAIN MENU".

    4.  

      Choose the second track "Compare channels interactively".

  2. Choose the .mat file that Step 1 generates for the image.

  3.  

    Choose the gene list.

    F-SCAN draws a scatter plot and presents it on a log-log scale. The rectangular box depicts the background cutoff. Spots above the upper (lower) line are over (under) expressed 2-fold. One can click on spots in the image and identify their position on the scatterplot or vice versa. For each spot chosen, a small subsegment of the image around the spot is displayed. A message box shows the genename and ID. One can access standard databases and identify the clone via the web using the 'WWW Clone' button. On the menbar one finds 'F-SCAN tools'. To examine spots 3, 5, 10, 100, 1000 fold over/under click 'F-Scan tools -> plot -> fold ' in 'F-SCAN tools'

    One can manually alter the background cutoff as well as the upper/lower lines. Click 'F-Scan tools -> plot -> Redo background cutoff' . The old lines disappear and crosshairs appear. Click at the desired location and the new line(s) will pass through that point. The altered numerical values will appear on the message boxes.

  4. To see a list of gene names and ratios together with addresses click 'F-Scan tools ->List Genes - WWW Show Genes over/under'. A web page with html links to databases opens up. .

  5.  
  6. To save files containing over/underexpressed genes click 'F-Scan tools ->List genes-> Save genes over/under'>.
Go To Top
 

Step 3. To visually examine pairs of spots reported as differentially expressed between channels:

Having compared the two channels, one obtains a list of spots differentially expressed by a significant amount. One would like to visually verify the differential expression by referring back to relevant areas in the original images. Here we provide a tool wherein the reader can supply the address of a given spot and obtain slices of an image centered at that spot in a given number of images. This enables a visual verification of the authenticity of the calculated difference, or a dismissal of the same as having resulted either from bleeds of neighbouring spots or some other artifact.

Users must supply lists of addresses of the spots they would like to visualise. This can be done in two ways: a) enter the address one at a time using the keyboard. b) enter an entire file of addresses. This can be done by saving a file of genes expressed over/under from the scatter plot in track 2 and reading this file in.

  1. If one is comparing only a few spots (4-5) proceed to B. If one is perusing a long list of spots, have a file ready as explained in the introduction of this step before proceeding to B.
  2.  
    1. If entering F-SCAN anew, type "fscan" in the MATLAB command window.

    2.  

      Click the OK button.

      Choose third track "Compare images at address".

    3. If already in F-SCAN, return to "MAIN MENU".

    4.  

      Choose third track "Compare images at address"

  3. Change the directory in the dialog box into one which contains files being compared. All files that are to be compared to each other should reside in one directory. Provide an output file name for a file that will receive data in this track. (This file is created only if the button "add to list" is clicked in F.)
  4. Choose all the .mat files of images being compared. These are created for each image by Step 1.

  5.  

    (If these were produced with older versions of F-SCAN, the user may be prompted for the names of the .pGL files, as well. F-SCAN may also prompt the user to select the .mat files which correspond with the .pGL files.)

  6. To compare only a few spots, click "Input from keyboard" and enter the addresses by hand.

  7.  

    To enter a list by file name click "input from file" and choose the file in the dialog box.

  8. Image slices of corresponding spots will be presented to you per address. To save information pertaining to a pair, click "Add to list"; otherwise, click "Skip".
Go To Top
 

Step 4. For sophisticated statistical analysis, to prepare a file containing intensities of all images to be opened in statistical packages like JMP:

JMP provides a user with a three way connection between a replica of the image, scatter plot/histogram and peak intensities. Having observed the scatter plot, one can trace its features to actual locations in the image. Conversely, one can subtract defective artifactual areas on an image from the scatter plot and remove their distorting influences. Also, multiple scatter plots tracing time courses are available in this package.

One needs a single file containing peak intensities of all images under study for this analysis. This track creates such a file with the following features:

  1. two columns per image - peak heights (arithmetical), the logarithm of peak heights, and the number of pixels used for the spot
  2. four columns per pair of images (A and B) compared: A/B,B/A,log(A)-log(B),log(B)-log(A)
  3. gene list as supplied by the company.
We add layout information that enables drawing a replica of the image and also information on spot kind.
  1. Choose Track 4 "Combine several files into one".
  2. For images where the lattice reliably locates the spot center choose P1 ; otherwise choose P2 as peak heights. P2 locates the peak center around the lattice point by optimising the intensity.

  3.  
  4. Choose the gene list that corresponds to the images chosen.  In the subsequent box, specify the lattice parameters.
  5. In the next dialog box change folders to that which contains the files to be combined. All of these files should reside in one directory. Provide a name for the combined output file in the lower portion of the dialog box.   In the subsequent list box choose files of all images that are to be collectively studied.
  6. The next window is a menu of all calculated values available for a given file and also a list of selected files. Choose the values you wish to incorporate in the final table.  If you wish to include a given value for all the files selected it is enough to click the first row. Else for a given value choose all the files ou wish to include in the final output. The meaning of each column heading such as 'P , Lc21' etc is ascertained by clicking on the button which carries it.  For example to find the meaning of P click the button containing it an you will see a display bar containing the expression "Unnormalised Peak heights P1 and P2 in channels 1 and 2". The available values are
F.   Clicking "Proceed" creates one combined output file (to be opened in JMP).

Go To Top
 

Step 5:  Analyse a series (including replicates)

Here one can perform one of four tasks.

A) Analyse replicates :

      WHAT IS DONE HERE

      This is an attempt to winnow over expressed and under expressed genes that are consistent between replicate experiments. One compares the expected (E) to observed (S) value for over expressed (S=+1) and under expressed (S=-1) genes consistent between replicates. Cutoffs which define over and under expression are also recorded in the output.

     HOW TO ACCOMPLISH IT

i) From the F-SCAN menu, select Track 5 "Analyze a series (including replicates)"

ii) From the File Selection menu choose "Select interactively" if it is the first time for selecting the  particular set of files.  Otherwise, a previous series analysis would have generated a .prj file which can be read by choosing "Input from file".

iii a)  If "Input from file" is chosen, proceed to step iv.

iii b)  If "Select interactively is chosen, proceed as follows.  Select the gene list.  Specify the lattice parameters.  Browse to the folder containing the data files and specify an ouput file name.  Select the files to combine holding down the control key (apple key on the MAC) to select multiple files.

The Series Selection window should appear listing the files chosen.  This window allows the assignment of replicates and series points.  To assign series points, select an integer from the pull down menus in the "Point in series" column with replicates for a particular series getting the same integer.  For example,  if the first and last experiments listed are for the zero hour in a time series, the first and last "Point in series" menus should be set to one.  If the next time point is at three hours the "Point in series" menus for those files should be set to two.  A total of six series points are allowed.

The columns labeled Channel 1 and Channel 2 allow the user to specify forward and reverse experiments.  Replicates should have the same labels within each series point.  If the replicate is a reverse experiment, then the labels should be reversed.  For subsequent series points, a new label should be applied.  For example, if the following files are selected:  exp11.pGL, exp12.pGL, exp13.pGL, exp14.pGL, exp15.pGL, and exp16.pGL and the time points are zero hour (exp11.pGL and exp16.pGL), three hours (exp12.pGL and exp14.pGL), and six hours (exp13.pGL and exp15.pGL;  and in the dyes have been reversed for exp14.pGL and exp16.pGL; then the Series Selection would be

Experiment     Channel 1     Channel 2     Point in series
exp11          c1            t1            1
exp12          c2            t2            2
exp13          c3            t3            3
exp14          t2            c2            2
exp15          t3            c3            3
exp16          c1            t1            1
 

Finally, the user should choose a Peak Intensity Type of either LR** or LR*, where LR* is the log of the ratio of the normalized peak heights of the two channels, and LR** is the log of the ratio of the curvature corrected peak heights of the two channels.

The Select Ratio Order sub panel defines which channel will be used in the over under expression designation.  Ch1/Ch2  will yield positive numbers for the log ratio when the intensities in channel one are higher than in channel two.  Ch2/Ch1 will yeild negative numbers for the log ratio when the intensities of channel one are higher than in channel two.

Once selections have been made choose the "PROCEED" button.

iv) The PLOT SELECTION window appears next.  In the upper panel "Screening Criteria", change the default cutoff values if desired.

In the lower left panel, change the integer labels for each series point to short labels relevant for the experiment.  For example, in a course experiment the labels might be 0 hr, 3 hr, 6 hr.  The labels will be used in the line plots.  Also, change the log ratio cutoffs if desired.

Finally, to analyze the replicates, click the "Analyze Replicates" button.  A tab delimited file will be generated containing observed and expected values for each series point.
In this file,
E(S=+1) is the expected value of over  expressed genes.
E(S=-1) is the expected value of under expressed genes.
#(S=+1) is the observed number of over expressed genes.
#(S=-1) is the observed number of under expressed genes.
+1 cutoff is the log ratio value above which a gene is defined as over expressed.
-1 cutoff is the log ratio value below which a gene is defined as under expressed.
 

b) Plot all line plot expression patterns for the series

    WHAT IS DONE HERE

    A pattern for each gene is generated as follows.
To each gene, we assign at each series point a symbol representing over expression (+), under expression (-), neutral (0), flagged(f), or absent in expression (x).  All series point symbols of a gene are connected together in a string.  Each string thus defines the expression pattern of the gene.
Genes similar in pattern are grouped together.  All possible groups are shown in the output.  The line plots are shown only for those patterns which contain no flagged genes or genes that are absent in expression.

      HOW TO ACCOMPLISH IT
 

i) From the F-SCAN menu, select Track 5 "Analyze a series (including replicates)"

ii) From the File Selection menu choose "Select interactively" if it is the first time for selecting the  particular set of files.  Otherwise, a previous series analysis would have generated a .prj file which can be read by choosing "Input from file".

iii a)  If "Input from file" is chosen, proceed to step iv.

iii b)  If "Select interactively is chosen, proceed as follows.  Select the gene list.  Specify the lattice parameters.  Browse to the folder containing the data files and specify an ouput file name.  Select the files to combine holding down the control key (apple key on the MAC) to select multiple files.

The Series Selection window should appear listing the files chosen.  This window allows the assignment of replicates and series points.  To assign series points, select an integer from the pull down menus in the "Point in series" column with replicates for a particular series getting the same integer.  For example,  if the first and last experiments listed are for the zero hour in a time series, the first and last "Point in series" menus should be set to one.  If the next time point is at three hours the "Point in series" menus for those files should be set to two.  A total of six series points are allowed.

The columns labeled Channel 1 and Channel 2 allow the user to specify forward and reverse experiments.  Replicates should have the same labels within each series point.  If the replicate is a reverse experiment, then the labels should be reversed.  For subsequent series points, a new label should be applied.  For example, if the following files are selected:  exp11.pGL, exp12.pGL, exp13.pGL, exp14.pGL, exp15.pGL, and exp16.pGL and the time points are zero hour (exp11.pGL and exp16.pGL), three hours (exp12.pGL and exp14.pGL), and six hours (exp13.pGL and exp15.pGL;  and in the dyes have been reversed for exp14.pGL and exp16.pGL; then the Series Selection would be

Experiment     Channel 1     Channel 2     Point in series
exp11          c1            t1            1
exp12          c2            t2            2
exp13          c3            t3            3
exp14          t2            c2            2
exp15          t3            c3            3
exp16          c1            t1            1
 

Finally, the user should choose a Peak Intensity Type of either LR** or LR*, where LR* is the log of the ratio of the normalized peak heights of the two channels, and LR** is the log of the ratio of the curvature corrected peak heights of the two channels.

The Select Ratio Order sub panel defines which channel will be used in the over under expression designation.  Ch1/Ch2  will yield positive numbers for the log ratio when the intensities in channel one are higher than in channel two.  Ch2/Ch1 will yeild negative numbers for the log ratio when the intensities of channel one are higher than in channel two.

Once selections have been made choose the "PROCEED" button.

iv) The PLOT SELECTION window appears next.  In the upper panel "Screening Criteria", change the default cutoff values if desired.

In the lower left panel, change the integer labels for each series point to short labels relevant for the experiment.  For example, in a course experiment the labels might be 0 hr, 3 hr, 6 hr.  The labels will be used in the line plots.  Also, change the log ratio cutoffs if desired.

Finally, to plot all patterns, click the "PLOT ALL PATTERNS" button.  Notice that patterns containing flagged spots or missing data are not plotted.

c) Plot selected expression patterns for the series collectively

      WHAT IS DONE HERE

This section is to enable a user to select and display all genes whose behavior along a series of experimental points conforms to a pattern of interest.  Several patterns may be selected, and the collections of genes conforming to these patterns are presented in a single plot.  The succeeding section enables the user to present individual patterns separately.
     The user defines the patterns to be displayed choosing from "any", "over", "under", "either", and "neither" at each series point.  "Any" selects all genes at that  point;  "over" is for the over expressed genes; "under" is for the under expressed genes; "either" is for the regulated genes either over or under expressed; and "neither" is for gene neither over nor under expressed.  The colors of the lines are also dictated by the user.  Lines for genes satisfying these patterns are displayed in a single plot.

     HOW TO ACCOMPLISH IT
 

i) From the F-SCAN menu, select Track 5 "Analyze a series (including replicates)"

ii) From the File Selection menu choose "Select interactively" if it is the first time for selecting the  particular set of files.  Otherwise, a previous series analysis would have generated a .prj file which can be read by choosing "Input from file".

iii a)  If "Input from file" is chosen, proceed to step iv.

iii b)  If "Select interactively is chosen, proceed as follows.  Select the gene list.  Specify the lattice parameters.  Browse to the folder containing the data files and specify an ouput file name.  Select the files to combine holding down the control key (apple key on the MAC) to select multiple files.

The Series Selection window should appear listing the files chosen.  This window allows the assignment of replicates and series points.  To assign series points, select an integer from the pull down menus in the "Point in series" column with replicates for a particular series getting the same integer.  For example,  if the first and last experiments listed are for the zero hour in a time series, the first and last "Point in series" menus should be set to one.  If the next time point is at three hours the "Point in series" menus for those files should be set to two.  A total of six series points are allowed.

The columns labeled Channel 1 and Channel 2 allow the user to specify forward and reverse experiments.  Replicates should have the same labels within each series point.  If the replicate is a reverse experiment, then the labels should be reversed.  For subsequent series points, a new label should be applied.  For example, if the following files are selected:  exp11.pGL, exp12.pGL, exp13.pGL, exp14.pGL, exp15.pGL, and exp16.pGL and the time points are zero hour (exp11.pGL and exp16.pGL), three hours (exp12.pGL and exp14.pGL), and six hours (exp13.pGL and exp15.pGL;  and in the dyes have been reversed for exp14.pGL and exp16.pGL; then the Series Selection would be

Experiment     Channel 1     Channel 2     Point in series
exp11          c1            t1            1
exp12          c2            t2            2
exp13          c3            t3            3
exp14          t2            c2            2
exp15          t3            c3            3
exp16          c1            t1            1
 

Finally, the user should choose a Peak Intensity Type of either LR** or LR*, where LR* is the log of the ratio of the normalized peak heights of the two channels, and LR** is the log of the ratio of the curvature corrected peak heights of the two channels.

The Select Ratio Order sub panel defines which channel will be used in the over under expression designation.  Ch1/Ch2  will yield positive numbers for the log ratio when the intensities in channel one are higher than in channel two.  Ch2/Ch1 will yeild negative numbers for the log ratio when the intensities of channel one are higher than in channel two.

Once selections have been made choose the "PROCEED" button.

iv) The PLOT SELECTION window appears next.  In the upper panel "Screening Criteria", change the default cutoff values if desired.

In the lower left panel "Label Points in Series", change the integer labels for each series point to short labels relevant for the experiment.  For example, in a course experiment the labels might be 0 hr, 3 hr, 6 hr.  The labels will be used in the line plots.  Also, change the log ratio cutoffs if desired.

In the lower right panel "Select Expression Patterns",  a total of five patterns A) - E) may be designated.  If fewer are desired, simply change the color selection to "none".   Patterns are designated by selecting the gene expression ratio qualifier "any", "over", "under", "either", or "neither" at each series point.

Finally, to generate the plot of the designated patterns, click on the "Plot Selected" button.

d) Plot selected expression patterns for the series separately

     WHAT IS DONE HERE

This section is to enable a user to select and display all genes whose behavior along a series of experimental points conforms to a pattern of interest.  It differs from the section above in that the patterns comprising the collections of genes are separately displayed (instead of as one collection as in the preceding section).
      The user defines the patterns to be displayed choosing from "any", "over", "under", "either", and "neither" at each series point.  "Any" selects all genes at that oint;  "over" is for the over expressed genes; "under" is for the under expressed genes; "either" is for the regulated genes either over or under expressed; and "neither" is for gene neither over nor under expressed.  The colors of the lines are also dictated by the user.  Lines for genes satisfying these patterns are displayed using one plot for each pattern.

    HOW TO ACCOMPLISH IT

    Follow the instructions for part c, except for the final step:  click on the "Plot Selected Separately" button.
Go To Top

Step 6. A true color composite image of the two channels is displayed.:

Click Track 6 of the Main Menu - 'Show image'. A dialog box requires the user to choose files containing the two channels and a composite image is displayed.

Go To Top
 
  Reference

"Development of a Prostate cDNA Microarray and Statistical Gene Expression Analysis Package"; A. J. Carlisle, V. V. Prabhu, A. Elkahloun, J. Hudson, J. M. Trent, W. M. Linehan, E. D. Williams, M. R. Emmert-Buck, L. A. Liotta, P. J. Munson, and D. B. Krizman; Molecular Carcinogenesis 28:12-22 (2000)

Contact information

P. J. Munson munson@helix.nih.gov , V. V. Prabhu prabhu@helix.nih.gov , and L. Young lynny@helix.nih.gov

Analytical Biostatistics Section; Mathematical and Statistical Computing Laboratory; Center for Information Technology; Building 12A, Room 2039; National Institutes of Health; Bethesda, MD 20892

Acknowledgement

This manual was translated into HTML by Eric Faden.

Go To Top

Last modified  8 August 2001