Submitting BLAST queries

GeneStudio's BLAST interface allows you to submit sequences to the NCBI BLAST server using a dialog-based interface.  GeneStudio's BLAST results viewer allows you to retrieve, view and store BLAST search results, and download hit sequences directly into GeneStudio components.

Sequences can be submitted to the BLAST server from all three GeneStudio components.  To submit a sequence:

  • Click on the sequence in the component window or highlight a region of a sequence.
  • Select the Internet menu.
  • Select the BLAST search menu item (you may also access this function from a right-click on a sequence or sequence name).  This will display the Online BLAST search dialog box.
  • Select the desired program options.  Note: Press More to view advanced options.
  • Press the Submit button.
  • A message box will be displayed indicated the estimated time to completion.

Note: To learn how to view the search results, see the Retrieving BLAST results tutorial.

Downloaded results are stored in XML format on your computer in the folder [APPLICATION_DATA]\GeneStudio\blast_results. [APPLICATION_DATA] is a system-defined folder.

BLAST options reference table

Basic options:

  • BLAST program
    Select the BLAST program you wish to use.  The listed options will depend on the query sequence type.
  • Choose database
    Choose the database to query.

Results retrieval

  • Maximum hits
    Limit the number of hits to download.  Choose a lower number for slower Internet connections.

Set subsequence:

  • Start
    Start point in the query sequence.
  • End
    End point in the query sequence.

Limit by Entrez query:

  • List box
    BLAST searches can be limited to the results of an Entrez query against the database chosen.  This can be used to limit searches to subsets of the BLAST databases.  Any terms can be entered that would normally be allowed in an Entrez search session.  To limit to a specific organism you can either select using the pull down menu, form a list of the most common organism in the databases.  Or enter the name of the organism in the Limit by Entrez queryfield with the [Organism] qualifier. For example: Mus musculus [Organism].

Choose filters

  • Low Complexity
    Mask off segments of the query sequence that have low compositional complexity, as determined by the SEG program of Wootton & Federhen (Computers and Chemistry, 1993) or, for BLASTN, by the DUST program of Tatusov and Lipman (in preparation).  Filtering can eliminate statistically significant but biologically uninteresting reports from the BLAST output (e.g., hits against common acidic-, basic- or proline-rich regions), leaving the more biologically interesting regions of the query sequence available for specific matching against database sequences.
  • Human repeats
    This option masks Human repeats (LINE's and SINE's) and is especially useful for human sequences that may contain these repeats.  Filtering for repeats can increase the speed of a search especially with very long sequences (>100 kb) and against databases which contain large number of repeats (htgs).
  • Mask for lookup table only
    This option masks only for purposes of constructing the lookup table used by BLAST. BLAST searches consist of two phases, finding hits based upon a lookup table and then extending them.  The option to "Mask for lookup table only" masks only for the lookup table so that no hits are found based upon low-complexity sequence. The BLAST extensions are performed without masking and so they can be extended through low-complexity sequence.
  • Mask lower case
    With this option selected you can denote areas you would like filtered with lower case.  This allows you to customize what is filtered from the sequence during the comparison to the BLAST databases.

BLAST program options

  • Gap opening penalty
    Cost to open gap (default = 5 for nucleotides 11 proteins).
  • Gap extension penalty
    Cost to extend gap (default = 2 nucleotides, 1 proteins).
  • Match reward
    Reward for nucleotide match [Integer]. Default = 1.
  • Mismatch penalty
    Penalty for nucleotide mismatch (default = -3).
  • Expect value
    The statistical significance threshold for reporting matches against database sequences; the default value is 10, meaning that 10 matches are expected to be found merely by chance, according to the stochastic model of Karlin and Altschul (1990).  If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Lower EXPECT thresholds are more stringent, leading to fewer chance matches being reported.  Increasing the threshold shows less stringent matches. Fractional values are acceptable.
  • Wordsize
    wordsize [Integer] (default = 11 nucleotides, 3 proteins).
  • Dropoff (X) for BLAST extension
    Dropoff (X) for BLAST extensions in bits (default if zero). Default = 20 for BLASTN 7 for other programs.
  • X Dropoff value for gapped alignments
    X dropoff value for gapped alignment (in bits). Default = 15 for al programs except for BLASTN for which it does not apply.
  • Final X dropoff value for gapped alignment
    Final X dropoff value for gapped alignment (in bits).  50 for BLASTN 25 for other programs.