Changeset 1327:0244c808a973

Show
Ignore:
Timestamp:
05/27/08 15:25:54 (7 months ago)
Author:
Greg Von Kuster <greg@bx.psu.edu>
branch:
default
convert_revision:
svn:9bcadc22-80f8-0310-8a53-c8f022958886/galaxy/trunk@2688
Message:

Add support ( sans sniffer ) for FASTQ data type.

Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • lib/galaxy/datatypes/registry.py

    r1298 r1327  
    4545                'customtrack' : interval.CustomTrack(), 
    4646                'fasta'       : sequence.Fasta(), 
     47                'fastq'       : sequence.Fastq(), 
    4748                'gff'         : interval.Gff(), 
    4849                'gff3'        : interval.Gff3(),   
     
    6667                'customtrack' : 'text/plain', 
    6768                'fasta'       : 'text/plain', 
     69                'fastq'       : 'text/plain', 
    6870                'gff'         : 'text/plain', 
    6971                'gff3'        : 'text/plain', 
  • lib/galaxy/datatypes/sequence.py

    r1096 r1327  
    8989            return False 
    9090 
     91class Fastq( Sequence ): 
     92    """Class representing a FASTQ sequence""" 
     93    # FASTQ format stores sequences and Phred qualities in a single file. It is concise and compact.  
     94    # FASTQ is first widely used in the Sanger Institute and therefore we usually take the Sanger  
     95    # specification and the standard FASTQ format, or simply FASTQ format. Although Solexa/Illumina  
     96    # read file looks pretty much like FASTQ, they are different in that the qualities are scaled  
     97    # differently. In the quality string, if you can see a character with its ASCII code higher than  
     98    # 90, probably your file is in the Solexa/Illumina format. 
     99    # 
     100    # For details, see http://maq.sourceforge.net/fastq.shtml 
     101    file_ext = "fastq" 
     102 
     103    def set_peek( self, dataset ): 
     104        Sequence.set_peek( self, dataset ) 
     105        sequences = 0 
     106        for line in file( dataset.file_name ): 
     107            if line and line.startswith( "@" ): 
     108                sequences += 1 
     109        dataset.blurb = '%d sequences' % sequences 
     110 
    91111try: 
    92112    import pkg_resources; pkg_resources.require( "bx-python" ) 
  • tools/data_source/upload.xml

    r796 r1327  
    2121**Auto-detect** 
    2222 
    23 The system will attempt to detect AXT, FASTA, Gff, HTML, LAV, Maf, Tabular, Wiggle, BED and Interval (BED with headers) formats. If your file is not detected properly as one of the known formats, it most likely means that it has some format problems (e.g., different number of columns on different rows). You can still coerce the system to set your data to the format you think it should be (please send us a note if you see a case when a valid format is not detected).  You can also upload valid files that are compressed (gzipped), which will automatically be decompressed upon upload.  
     23The system will attempt to detect AXT, BED, FASTA, Gff, Gff3, Interval (BED with headers), LAV, Maf, Tabular and Wiggle formats. If your file is not detected properly as one of the known formats, it most likely means that it has some format problems (e.g., different number of columns on different rows). You can still coerce the system to set your data to the format you think it should be (please send us a note if you see a case when a valid format is not detected).  You can also upload valid files that are compressed (gzipped), which will automatically be decompressed upon upload.  
    2424 
    2525----- 
     
    3434 
    3535blastz pairwise alignment format.  Each alignment block in an axt file contains three lines: a summary line and 2 sequence lines.  Blocks are separated from one another by blank lines.  The summary line contains chromosomal position and size information about the alignment. It consists of 9 required fields. 
    36  
    37 ----- 
    38  
    39 **Binseq.zip** 
    40  
    41 A zipped archive consisting of binary sequence files in either 'ab1' or 'scf' format.  All files in this archive must have the same file extension which is one of '.ab1' or '.scf'.  You must manually select this 'File Format' when uploading the file. 
    4236 
    4337----- 
     
    7266----- 
    7367 
     68**Binseq.zip** 
     69 
     70A zipped archive consisting of binary sequence files in either 'ab1' or 'scf' format.  All files in this archive must have the same file extension which is one of '.ab1' or '.scf'.  You must manually select this 'File Format' when uploading the file. 
     71 
     72----- 
     73 
    7474**FASTA** 
    7575 
     
    8585----- 
    8686 
     87**FASTQ** 
     88 
     89FASTQ format stores sequences and Phred qualities in a single file. FASTQ is first widely used in the Sanger Institute and therefore we usually take the Sanger specification and the standard FASTQ format, or simply FASTQ format.  You must manually select this 'File Format' when uploading the file:: 
     90 
     91        @EAS54_6_R1_2_1_413_324 
     92        CCCTTCTTGTCTTCAGCGTTTCTCC 
     93        + 
     94        ;;3;;;;;;;;;;;;7;;;;;;;88 
     95        @EAS54_6_R1_2_1_540_792 
     96        TTGGCAGGCCAAGGCCGATGGATCA 
     97        + 
     98        ;;;;;;;;;;;7;;;;;-;;;3;83 
     99        @EAS54_6_R1_2_1_443_348 
     100        GTTGCTTCTGGCGTGGGTGGGGGGG 
     101        +EAS54_6_R1_2_1_443_348 
     102        ;;;;;;;;;;;9;7;;.7;393333 
     103 
     104----- 
     105 
    87106**Gff** 
    88107 
     
    129148----- 
    130149 
     150**Qual** 
     151 
     152The qual sequence format is a FASTA-like format which stores numerical quality values for each nucleotide or amino acid. It is used by CAP3 and Phrap.  You must manually select this 'File Format' when uploading the file:: 
     153 
     154        >HSMETOO 134bp 
     155        10 20 30 40 50 50 50 50 50 20 25 25 30 30 20 15 20 35 50 50 50 50 50 50  
     156        50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50  
     157        50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50  
     158        50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50  
     159        50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50  
     160        50 50 50 20 30 20 10 10 
     161 
     162----- 
     163 
    131164**Scf** 
    132165 
     
    141174----- 
    142175 
     176**Taxonomy** 
     177 
     178Tabular data containing at least 24 columns.  You must manually select this 'File Format' when uploading the file. 
     179 
     180----- 
     181 
     182**Txt** 
     183 
     184Any text file. 
     185 
     186----- 
     187 
    143188**Txtseq.zip** 
    144189 
     
    151196The wiggle format is line-oriented.  Wiggle data is preceeded by a track definition line, which adds a number of options for controlling the default display of this track. 
    152197 
    153 ----- 
    154  
    155 **Other text type** 
    156  
    157 Any text file 
    158198 
    159199  </help> 
  • universe_wsgi.ini.sample

    r1292 r1327  
    175175data = galaxy.datatypes.data:Data,application/octet-stream 
    176176fasta = galaxy.datatypes.sequence:Fasta 
    177 gbrowsetrack = galaxy.datatypes.interval:GBrowseTrack 
     177fastq = galaxy.datatypes.sequence:Fastq 
    178178gff = galaxy.datatypes.interval:Gff 
    179179gff3 = galaxy.datatypes.interval:Gff3