Homo sapiens DNA GRCh37 55 Fasta Gzip Ensembl OpenSci

  • Download 5x Faster
  • Download torrent
  • Direct Download
  • Rate this torrent +  |  -

Torrent info

Name:Homo sapiens DNA GRCh37 55 Fasta Gzip Ensembl OpenSci

Total Size: 2.47 GB

Magnet: Magnet Link

Seeds: 1

Leechers: 1

Stream: Watch Full Movie @ Movie4u

Last Updated: 2010-09-17 10:08:33 (Update Now)

Torrent added: 2009-09-20 23:33:09

CyberGhost VPN For Torrents

Torrent Files List

Distributed by Mininova.txt (Size: 2.47 GB) (Files: 75)

 Distributed by Mininova.txt

0.28 KB


6.06 KB




62.52 MB


36.52 MB


36.43 MB


36.12 MB


26.74 MB


24.56 MB


22.77 MB


21.74 MB


21.30 MB


20.86 MB


14.86 MB


66.39 MB


16.53 MB


9.81 MB


9.60 MB


54.28 MB


52.30 MB


49.49 MB


46.64 MB


42.95 MB


39.76 MB


33.47 MB


682.42 KB


551.80 KB


1.15 MB


1.80 MB


1.64 MB


1.63 MB


1.55 MB


1.67 MB


1.65 MB


5.33 KB


41.85 MB


6.36 MB


1.60 MB


807.91 MB


35.37 MB


21.32 MB


20.42 MB


20.13 MB


15.73 MB


14.04 MB


13.25 MB


12.47 MB


12.53 MB


12.42 MB


7.57 MB


38.78 MB


9.40 MB


5.79 MB


5.62 MB


30.82 MB


29.25 MB


28.14 MB


26.95 MB


24.51 MB


22.47 MB


19.03 MB


512.00 KB


551.81 KB


883.15 KB


1.20 MB


1.12 MB


1.11 MB


1.09 MB


1.13 MB


1.11 MB


5.34 KB


19.46 MB


2.88 MB


799.66 KB


457.77 MB


4.00 KB

Announce URL: http://tracker.mininova.org/announce

Torrent description


Alternate Download: ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/dna/
Source: Ensembl http://www.ensembl.org/
License: Creative Commons Zero Public Domain http://creativecommons.org/publicdomain/zero/1.0/
Release Group: OpenSci http://twitter.com/opensci
Distributor: Mininova http://www.mininova.org
Packager: Mike Chelen http://twitter.com/mikechelen
Retrieved: 2009-09-20

Wikipedia http://en.wikipedia.org/wiki/Fasta_format
"In bioinformatics, FASTA format (a.k.a. Pearson format) is a text-based format for representing either nucleotide sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. The simplicity of FASTA format makes it easy to manipulate and parse sequences using text-processing tools and scripting languages like Python, Ruby, and Perl."

About the Ensembl Project http://www.ensembl.org/info/about/intro.html
"The Ensembl project was started in 1999, some years before the draft human genome was completed. Even at that early stage it was clear that manual annotation of 3 billion base pairs of sequence would not be able to offer researchers timely access to the latest data. The goal of Ensembl was therefore to automatically annotate the genome, integrate this annotation with other available biological data and make all this publicly available via the web. Since the website's launch in July 2000, many more genomes have been added to Ensembl and the range of available data has also expanded to include comparative genomics, variation and regulatory data."

Ensembl Legal Notices http://www.ensembl.org/info/about/legal/index.html
"Ensembl imposes no restrictions on access to, or use of, the data provided and the software used to analyse and present it. Ensembl data generated by members of the project are available without restriction."

This release is not endorsed or approved by the Ensembl project.

From README ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/dna/README

#### README ####

IMPORTANT: Please note you can download correlation data tables,
supported by Ensembl, via the highly customisable BioMart and
EnsMart data mining tools. See http://www.ensembl.org/biomart/martview or
http://www.ebi.ac.uk/biomart/ for more information.

Fasta DNA dumps

The files are consistently named following this pattern:
<species>.<assembly>.<release>.<sequence type>.<id type>.<id>.fa.gz

<species>: The systematic name of the species.
<assembly>: The assembly build name.
<release>: The release number.
<sequence type>:
* 'dna' - unmasked genomic DNA sequences.
* 'dna_rm' - masked genomic DNA. Interspersed repeats and low
complexity regions are detected with the RepeatMasker tool and masked
by replacing repeats with 'N's.
<id type> One of the following:
* 'chromosome'a - The top-level coordinate system in most species in Ensembl
* 'nonchromosomal' - Contains DNA that has not been assigned a chromosome
* 'seqlevel' - This is usually sequence scaffolds, chunks or clones.
-- 'scaffold' - Larger sequence contigs from the assembly of shorter
sequencing reads (often from whole genome shotgun, WGS) which could
not yet be assembled into chromosomes. Often more genome sequencing
is needed to narrow gaps and establish a tiling path.
-- 'chunk' - While contig sequences can be assembled into large entities,
they sometimes have to be artificially broken down into smaller entities
called 'chunks'. This is due to limitations in the annotation
pipeline and the finite record size imposed by MySQL which stores the
sequence and annotation information.
-- 'clone' - In general this is the smallest sequence entity. It is often
identical to the sequence of one BAC clone, or sequence region
of one BAC clone which forms the tiling path.
<id>: The actual sequence identifier. Depending on the <id type> the <id>
could represent the name of a chromosome, a scaffold, a contig, a clone ..
Field is empty for seqlevel files
fa: All files in these directories represent FASTA database files
gz: All files are compacted with GNU Zip for storage efficiency.

These files contain the full sequence of the assembly in fasta format.
They contain one chromosome per file.

The genomic sequence of human chromosome 1:

The masked version of the genome sequence on human chromosome 1
(contains '_rm' in the name):

Non-chromosomal assembly sequences:
e.g. mitochondrial genome, sequence contigs not yet mapped on chromosomes

These files are fasta file dumps of the assembly at the sequence level.

Format: <species>.<assembly>.<release>.<sequence type>.seqlevel.fa.gz

Unmasked sequence file name example (until release 39):

Repeat masked file example (contain '_rm' in the file name) (until release 39);

Now all of these contain 'seqlevel' in the file names e.g.

Note that the type of sequence container varies in different species:
contigs in human, chunks in Anopheles, scaffolds in Fugu.

related torrents

Torrent name

health leech seeds Size

Homo Sapiens 1900

0 0 1.37 GB 0

Homo sapiens 1900

0 0 699.18 MB 0

comments (0)

Main Menu