Changeset 1327:0244c808a973
- Timestamp:
- 05/27/08 15:25:54
(7 months ago)
- Author:
- Greg Von Kuster <greg@bx.psu.edu>
- branch:
- default
- convert_revision:
- svn:9bcadc22-80f8-0310-8a53-c8f022958886/galaxy/trunk@2688
- Message:
Add support ( sans sniffer ) for FASTQ data type.
-
Files:
-
Legend:
- Unmodified
- Added
- Removed
- Modified
- Copied
- Moved
| r1298 |
r1327 |
|
| 45 | 45 | 'customtrack' : interval.CustomTrack(), |
|---|
| 46 | 46 | 'fasta' : sequence.Fasta(), |
|---|
| | 47 | 'fastq' : sequence.Fastq(), |
|---|
| 47 | 48 | 'gff' : interval.Gff(), |
|---|
| 48 | 49 | 'gff3' : interval.Gff3(), |
|---|
| … | … | |
| 66 | 67 | 'customtrack' : 'text/plain', |
|---|
| 67 | 68 | 'fasta' : 'text/plain', |
|---|
| | 69 | 'fastq' : 'text/plain', |
|---|
| 68 | 70 | 'gff' : 'text/plain', |
|---|
| 69 | 71 | 'gff3' : 'text/plain', |
|---|
| r1096 |
r1327 |
|
| 89 | 89 | return False |
|---|
| 90 | 90 | |
|---|
| | 91 | class Fastq( Sequence ): |
|---|
| | 92 | """Class representing a FASTQ sequence""" |
|---|
| | 93 | # FASTQ format stores sequences and Phred qualities in a single file. It is concise and compact. |
|---|
| | 94 | # FASTQ is first widely used in the Sanger Institute and therefore we usually take the Sanger |
|---|
| | 95 | # specification and the standard FASTQ format, or simply FASTQ format. Although Solexa/Illumina |
|---|
| | 96 | # read file looks pretty much like FASTQ, they are different in that the qualities are scaled |
|---|
| | 97 | # differently. In the quality string, if you can see a character with its ASCII code higher than |
|---|
| | 98 | # 90, probably your file is in the Solexa/Illumina format. |
|---|
| | 99 | # |
|---|
| | 100 | # For details, see http://maq.sourceforge.net/fastq.shtml |
|---|
| | 101 | file_ext = "fastq" |
|---|
| | 102 | |
|---|
| | 103 | def set_peek( self, dataset ): |
|---|
| | 104 | Sequence.set_peek( self, dataset ) |
|---|
| | 105 | sequences = 0 |
|---|
| | 106 | for line in file( dataset.file_name ): |
|---|
| | 107 | if line and line.startswith( "@" ): |
|---|
| | 108 | sequences += 1 |
|---|
| | 109 | dataset.blurb = '%d sequences' % sequences |
|---|
| | 110 | |
|---|
| 91 | 111 | try: |
|---|
| 92 | 112 | import pkg_resources; pkg_resources.require( "bx-python" ) |
|---|
| r796 |
r1327 |
|
| 21 | 21 | **Auto-detect** |
|---|
| 22 | 22 | |
|---|
| 23 | | The system will attempt to detect AXT, FASTA, Gff, HTML, LAV, Maf, Tabular, Wiggle, BED and Interval (BED with headers) formats. If your file is not detected properly as one of the known formats, it most likely means that it has some format problems (e.g., different number of columns on different rows). You can still coerce the system to set your data to the format you think it should be (please send us a note if you see a case when a valid format is not detected). You can also upload valid files that are compressed (gzipped), which will automatically be decompressed upon upload. |
|---|
| | 23 | The system will attempt to detect AXT, BED, FASTA, Gff, Gff3, Interval (BED with headers), LAV, Maf, Tabular and Wiggle formats. If your file is not detected properly as one of the known formats, it most likely means that it has some format problems (e.g., different number of columns on different rows). You can still coerce the system to set your data to the format you think it should be (please send us a note if you see a case when a valid format is not detected). You can also upload valid files that are compressed (gzipped), which will automatically be decompressed upon upload. |
|---|
| 24 | 24 | |
|---|
| 25 | 25 | ----- |
|---|
| … | … | |
| 34 | 34 | |
|---|
| 35 | 35 | blastz pairwise alignment format. Each alignment block in an axt file contains three lines: a summary line and 2 sequence lines. Blocks are separated from one another by blank lines. The summary line contains chromosomal position and size information about the alignment. It consists of 9 required fields. |
|---|
| 36 | | |
|---|
| 37 | | ----- |
|---|
| 38 | | |
|---|
| 39 | | **Binseq.zip** |
|---|
| 40 | | |
|---|
| 41 | | A zipped archive consisting of binary sequence files in either 'ab1' or 'scf' format. All files in this archive must have the same file extension which is one of '.ab1' or '.scf'. You must manually select this 'File Format' when uploading the file. |
|---|
| 42 | 36 | |
|---|
| 43 | 37 | ----- |
|---|
| … | … | |
| 72 | 66 | ----- |
|---|
| 73 | 67 | |
|---|
| | 68 | **Binseq.zip** |
|---|
| | 69 | |
|---|
| | 70 | A zipped archive consisting of binary sequence files in either 'ab1' or 'scf' format. All files in this archive must have the same file extension which is one of '.ab1' or '.scf'. You must manually select this 'File Format' when uploading the file. |
|---|
| | 71 | |
|---|
| | 72 | ----- |
|---|
| | 73 | |
|---|
| 74 | 74 | **FASTA** |
|---|
| 75 | 75 | |
|---|
| … | … | |
| 85 | 85 | ----- |
|---|
| 86 | 86 | |
|---|
| | 87 | **FASTQ** |
|---|
| | 88 | |
|---|
| | 89 | FASTQ format stores sequences and Phred qualities in a single file. FASTQ is first widely used in the Sanger Institute and therefore we usually take the Sanger specification and the standard FASTQ format, or simply FASTQ format. You must manually select this 'File Format' when uploading the file:: |
|---|
| | 90 | |
|---|
| | 91 | @EAS54_6_R1_2_1_413_324 |
|---|
| | 92 | CCCTTCTTGTCTTCAGCGTTTCTCC |
|---|
| | 93 | + |
|---|
| | 94 | ;;3;;;;;;;;;;;;7;;;;;;;88 |
|---|
| | 95 | @EAS54_6_R1_2_1_540_792 |
|---|
| | 96 | TTGGCAGGCCAAGGCCGATGGATCA |
|---|
| | 97 | + |
|---|
| | 98 | ;;;;;;;;;;;7;;;;;-;;;3;83 |
|---|
| | 99 | @EAS54_6_R1_2_1_443_348 |
|---|
| | 100 | GTTGCTTCTGGCGTGGGTGGGGGGG |
|---|
| | 101 | +EAS54_6_R1_2_1_443_348 |
|---|
| | 102 | ;;;;;;;;;;;9;7;;.7;393333 |
|---|
| | 103 | |
|---|
| | 104 | ----- |
|---|
| | 105 | |
|---|
| 87 | 106 | **Gff** |
|---|
| 88 | 107 | |
|---|
| … | … | |
| 129 | 148 | ----- |
|---|
| 130 | 149 | |
|---|
| | 150 | **Qual** |
|---|
| | 151 | |
|---|
| | 152 | The qual sequence format is a FASTA-like format which stores numerical quality values for each nucleotide or amino acid. It is used by CAP3 and Phrap. You must manually select this 'File Format' when uploading the file:: |
|---|
| | 153 | |
|---|
| | 154 | >HSMETOO 134bp |
|---|
| | 155 | 10 20 30 40 50 50 50 50 50 20 25 25 30 30 20 15 20 35 50 50 50 50 50 50 |
|---|
| | 156 | 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 |
|---|
| | 157 | 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 |
|---|
| | 158 | 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 |
|---|
| | 159 | 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 |
|---|
| | 160 | 50 50 50 20 30 20 10 10 |
|---|
| | 161 | |
|---|
| | 162 | ----- |
|---|
| | 163 | |
|---|
| 131 | 164 | **Scf** |
|---|
| 132 | 165 | |
|---|
| … | … | |
| 141 | 174 | ----- |
|---|
| 142 | 175 | |
|---|
| | 176 | **Taxonomy** |
|---|
| | 177 | |
|---|
| | 178 | Tabular data containing at least 24 columns. You must manually select this 'File Format' when uploading the file. |
|---|
| | 179 | |
|---|
| | 180 | ----- |
|---|
| | 181 | |
|---|
| | 182 | **Txt** |
|---|
| | 183 | |
|---|
| | 184 | Any text file. |
|---|
| | 185 | |
|---|
| | 186 | ----- |
|---|
| | 187 | |
|---|
| 143 | 188 | **Txtseq.zip** |
|---|
| 144 | 189 | |
|---|
| … | … | |
| 151 | 196 | The wiggle format is line-oriented. Wiggle data is preceeded by a track definition line, which adds a number of options for controlling the default display of this track. |
|---|
| 152 | 197 | |
|---|
| 153 | | ----- |
|---|
| 154 | | |
|---|
| 155 | | **Other text type** |
|---|
| 156 | | |
|---|
| 157 | | Any text file |
|---|
| 158 | 198 | |
|---|
| 159 | 199 | </help> |
|---|
| r1292 |
r1327 |
|
| 175 | 175 | data = galaxy.datatypes.data:Data,application/octet-stream |
|---|
| 176 | 176 | fasta = galaxy.datatypes.sequence:Fasta |
|---|
| 177 | | gbrowsetrack = galaxy.datatypes.interval:GBrowseTrack |
|---|
| | 177 | fastq = galaxy.datatypes.sequence:Fastq |
|---|
| 178 | 178 | gff = galaxy.datatypes.interval:Gff |
|---|
| 179 | 179 | gff3 = galaxy.datatypes.interval:Gff3 |
|---|