Frequently Asked Questions

BLAST

What version of BLAST is implemented in the Mnemiopsis BLAST tool?
What BLAST programs are available?
What BLAST databases are available?
What do the boxes in the 'Links' column of the BLAST output represent?

Genome Browser

What tracks are available for viewing in the Mnemiopsis genome browser?
How can I navigate the Mnemiopsis genome using JBrowse?
Is the Mnemiopsis genome browser searchable?
Is there a preferred Web browser I should use to view the Genome Browser?

KEGG Pathways

What KEGG pathways are available?
How can I search for a particular KEGG pathway of interest?
What do the rows in the pathways table represent?
What information does the 'Cluster' column in the pathways table provide?
What does the 'Ratio' column in the pathways table represent?

Pfam Domains

What options are available for searching the Mnemiopsis genome for Pfam-A domains?
How were Pfam domains identified in the Mnemiopsis genome?
What is the structure of the definition line when sequences are downloaded from the Pfam-A domains page?

View a Gene Page

How do I search for a gene using the Mnemiopsis gene Wiki pages?
What type of information is available in the Mnemiopsis gene Wiki pages?
How are the phylogenetically informed ortholog clusters determined?

Fetch a Scaffold

How do I download a single genomic scaffold sequence?
Is there a way to download a partial scaffold sequence?
What are the other options for fetching a scaffold sequence?

Temporal Developmental Expression Profiles

What is the source of the temporal developmental expression profiles data?
How can I search and view data for my gene-of-interest?
How are the graphs generated?

Single-Cell Expression

What is the source of the single-cell expression data?
How can I search for genes that cluster with my gene-of-interest?
Can I download all the sequences from each cluster?

In situ Images

How do I search for an in situ image?
Can I download an original in situ image?

Literature Search

How do I search for a Mnemiopsis manuscript-of-interest?
Can I go directly from a literature search page to PubMed?
How is this any different than searching for a paper in PubMed?

Download Sequences

Where can I download the full Mnemiopsis genome assembly?
What is the convention used for naming scaffolds?
How are the gene identifiers generated?
What is an unfiltered protein model?

Miscellaneous

How should data derived from the Mnemiopsis Genome Project Portal be cited?
How can I contact the Web site administrator regarding technical issues?
Where can I get additional information about the Mnemiopsis Genome Project?
How frequently are the data and portal tools updated?
Where can I download the supplemental perl scripts for customizing JBrowse and MediaWiki input files?
Where can I obtain a Mnemiopsis genomic DNA sample?

What version of BLAST is implemented in the Mnemiopsis BLAST tool?

We use SequenceServer, which implements the NCBI BLAST+ 2.2.31 command line applications (referred to as the BLAST+ applications). SequenceServer (version 1.0.11) is provided by the Wurm Lab and subject to all Terms & Conditions set forth by the developers.

What BLAST programs are available?

blastn: Compares a nucleotide query sequence against a nucleotide sequence database.
blastp: Compares an amino acid query sequence against a protein sequence database.
blastx: Compares a nucleotide query sequence translated in all reading frames against a protein sequence database.
tblastn: Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.
tblastx: Compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

What BLAST databases are available?

Nucleotide Sequence Databases:

Main Scaffolds: A BLAST database containing all 5100 Mnemiopsis genomic scaffolds.
Gene Models 2.2: A BLAST database consisting of consensus Mnemiopsis gene predictions models.
Unfiltered Gene Models (unincorporated predictions): A BLAST database consisting of unincorporated Mnemiopsis gene predictions models.
Public ESTs: A BLAST database containing all publicly available Mnemiopsis ESTs and mRNAs from GenBank.
Mitochondrial genome: A BLAST database containing the Mnemiopsis mitochondrial genome.
Cufflinks-assembled transripts: A BLAST database of Mnemiopsis Cufflinks-assembled RNA-seq transcripts.
Trinity-assembled transcripts: A BLAST database of Mnemiopsis Trinity-assembled RNA-seq transcripts.

Protein Sequence Databases:

Proteome 2.2: A BLAST database of translated proteins derived from the Mnemiopsis Gene Models 2.2 consensus gene prediction models.
Unfiltered Protein Models (unincorporated proteins): A BLAST database of unincorporated Mnemiopsis proteins derived from unincorporated gene prediction models.
Mitochondrial proteins: A BLAST database of computationally derived Mnemiopsis mitochondrial proteins.

What do the boxes in the 'Links' column of the BLAST output represent?

The 'Links' column provides clickable hyperlinks for each significant BLAST alignment to their entries in the Mnemiopsis JBrowse genome browser (B), the scaffold Fetch tool (S), the WikiGene pages (G), the unfiltered (U) prediction models, the Cufflinks-assembled transcripts (C), the Public ESTs (N), the Mitochondrial genome (N) and the Trinity-assembled transcripts (T) (Trinity may align to multiple places in the genome; only one of the alignments is linked from the BLAST output).

What tracks are available for viewing in the Mnemiopsis genome browser?

Embryonic 00-20 h: Aligned RNA-seq data from time-course (0-20 hours) developmental gene expression studies (GSE60478; GSE111748).
HISAT2: RNA-seq data (SRR1971491) derived from Mnemiopsis embryos aligned using HISAT2.
CL2: RNA-seq reads derived from Mnemiopsis embryos (between 15 and 30 hours post-fertilization) assembled into transcripts using Cufflinks.
StringTie: Aligned RNA-seq reads (from HISAT2 above) assembled into transcripts using StringTie.
TRN15-30hpf: RNA-seq reads derived from Mnemiopsis embryos (between 15 and 30 hours post-fertilization) assembled into transcripts using Trinity.
RACE: Experimentally verified Mnemiopsis RACE (Rapid Amplification of cDNA Ends) transcripts.
EST: Publicly available Mnemiopsis ESTs from Genbank.
GBNT: Publicly available Mnemiopsis mRNAs from Genbank.
2.2: Consensus Mnemiopsis gene prediction models.
2.2UF: Unincorporated Mnemiopsis gene prediction models.
PFAM2.2: Non-redundant Mnemiopsis protein domains derived from PFAM HMMscans using the 2.2 and 2.2UF datasets and the six-frame translations of the Mnemiopsis genome.
MASK: Genomic regions that have been repeat-masked using VMatch are highlighted in light blue.
SCF: Assembled genomic scaffolds (SCF) appear as solid black tracks, with intermittent gaps shaded bright pink.

How can I navigate the Mnemiopsis genome using JBrowse?

A user can zoom in or out and left or right on a region of interest by clicking on the appropriate icons centered on the blue toolbar. In addition, available tracks can be viewed (or hidden) by clicking on the appropriate track label on the left sidebar of the browser window.

Is the Mnemiopsis genome browser searchable?

The genome browser is currently searchable using Mnemiopsis scaffold (e.g., ML0001) or gene (e.g., ML00011a) identifiers.

Is there a preferred Web browser I should use to view the Genome Browser?

The Mnemiopsis leidyi Genome Project Web site was developed and tested using Firefox. The rendering of desired output while using other Web browsers, including Google Chrome, Safari and Internet Explorer, is assumed but not guaranteed.

What KEGG pathways are available?

All human KEGG pathways containing human genes with a Mnemiopsis homolog, as determined by our clustering analysis, are searchable.

How do I search for a particular KEGG pathway of interest?

KEGG pathways can be selected from the drop-down menus using KEGG identifiers, KEGG pathway names, or gene symbols. Pathways are also keyword searchable by entering a KEGG identifier, pathway name, or gene symbol in the search text box while selecting the appropriate field table.

What do the rows in the pathways table represent?

Each row represents a cluster from our clustering analysis. At least one human protein from a particular cluster is in the corresponding KEGG pathway.

What information does the 'Cluster' column in the pathways table provide?

The 'Cluster' column indicates the phylogenetic clade that encompasses all of the proteins in the particular cluster (e.g., 'Metazoa' indicates that the cluster contains sequences from a variety of metazoan species, and that the cluster cannot be characterized by a less inclusive clade like Bilateria).

What does the 'Ratio' column in the pathways table represent?

The 'Ratio' column represents the number of human proteins in the particular cluster that are in the corresponding pathway (numerator) over the number of total human proteins in that cluster (denominator).

What options are available for searching the Mnemiopsis genome for Pfam-A domains?

You can elect to search the Protein Models, Unfiltered Protein Models, or both ("All"). You may either select a domain name or domain accession number from the drop-down menus, or enter a domain name or five-digit Pfam accession number in the search box.

How were Pfam domains identified in the Mnemiopsis genome?

We used hmmscan from the HMMER suite to search the ML2.2 Protein Models and Unfiltered Protein Models for domains from the Pfam-A database (version 25). For more information on Pfam, visit http://pfam.sanger.ac.uk/. The gathering threshold (cut_ga) option of hmmscan was used to ensure conservative domain prediction.

What is the structure of the definition line when sequences are downloaded from the Pfam-A domains page?

If the "Full-length protein sequences, in FASTA format" option is chosen, the definition line will contain the gene identifier and query domain. If the "Pfam-A domains of selected proteins, in FASTA format" option is chosen, the definition line will contain the gene identifier, the coordinate range of the domain in that protein model, and the name of the query domain.

How do I search for a gene using the Mnemiopsis gene Wiki pages?

The Mnemiopsis gene Wiki pages are accessible from the home page left sidebar and are searchable by entering an Mnemiopsis gene identifier (e.g., ML00011a) in the search box and clicking ‘Go’.

What type of information is available in the Mnemiopsis gene Wiki pages?

Each record in the gene Wiki pages represents a single Mnemiopsis gene and provides the following annotation: nucleotide and protein sequences, coding exonic genomic coordinates, pre-computed BLAST hits from numerous organisms displaying the top hits for each protein, PFAM domains, functional annotation (gene ontology derived from Argot2 and Blast2GO), related human disease genes from the Online Mendelian Inheritance in Man (OMIM) database, ortholog clusters formed by phylogenetically informed clustering methods, temporal developmental expression profiles, single-cell clusters, and in situ images.

How are the phylogenetically informed ortholog clusters determined?

Sets of genes with putative orthology are computed based on BLAST [BLASTP] sequence similarities and relative position in a predetermined phylogenetic tree. Hits between each pair of genes are assigned bit scores by summing those for initial BLASTP high-scoring segments found on the same pair of genes, in consistent order, and overlapping less than five percent (with bit scores penalized proportional to the amount of overlap). Orthologous sets of genes are computed at each tree node in two steps. First, where a set or gene from one child of the node is in a mutual best hit relation with a set or gene from the other child, they are combined into a new set. Second, all hits within this node's subtree and between the subtree and all outgroup genes are considered in descending order of score. A hit to an outgroup gene blocks any further merging of a gene or set (until we visit another tree node), while a hit between two sets or genes within the subtree, neither previously blocked, results in their being merged into a new set. (This orthology computation is based on that described in Putnam et al. (2007) with further refinement of the blocking rules.).

How do I download a single genomic scaffold sequence?

A user can enter a ScaffoldID (e.g., ML0001) in the "Fetch Scaffold" textbox to return a single FASTA-formatted Mnemiopsis scaffold sequence.

Is there a way to download a partial scaffold sequence?

A partial scaffold sequence can be retrieved by entering a ScaffoldID (e.g., ML0001) while also specifying the relative beginning and ending coordinates in the "Fetch Scaffold" textbox.

What are the other options for fetching a scaffold sequence?

A user can optionally retrieve either a reverse complement or the six-frame translation of a scaffold or partial scaffold by selecting the appropriate "Fetch Scaffold" search option.

What is the source of the temporal developmental expression profiles data?

The developmental expression data are derived from GEO accession GSE111748 and complement GSE60478.

How can I search and view data for my gene-of-interest?

Time-course distribution plots of developmental gene expression data can be searched and viewed by entering a ML gene identifier (e.g., ML00011a) in the search box and clicking ‘Go’.

How are the graphs generated?

The developmental expression graphs are implemented using Andrew Sielen's Violin Plot + Box Plot v2, and represent the number of mapped reads (tpm) as a function of hours post-fertilization for each gene.

What is the source of the single-cell expression data?

Single-cell cluster data was published in Sebe-Pedros et al. (2018), Early metazoan cell type diversity and the evolution of multicellular gene regulation. The experimental Mnemiopsis metacell (cluster) files are available for download here: http://www.wisdom.weizmann.ac.il/~/arnau/Single_cell_datasets/Mnemiopsis/

How can I find genes that cluster with my gene-of-interest?

Single-cell clusters can be searched by selecting a gene identifier, cell type or cluster ID. To find genes that cluster with a gene-of-interest, select the GeneID from the pull-down menu or enter in the search box and click ‘Go’. The results table will list the cell type, cluster ID and all single-cell clusters that contain your queried gene-of-interest (in bold text).

Can I download all the sequences from a particular single-cell expression?

Nucleotide and protein sequences can be downloaded in FASTA-format from the single-cell search results page by clicking on either the DNA (blue) or protein (green) button.

How do I search for an in situ image?

In situ images can be searched by selecting from a list or entering a gene symbol, gene identifier, gene accession or submitter in the specified search box. Users can also view all in situ images by clicking the ‘View All Images’ button at the top of the in situ images search page.

Can I download an original in situ image?

Original in situ images can be downloaded by clicking on the ‘PMID’ link from a search result page. Users are redirected to the appropriate PubMed manuscript entry where the in situ image was originally published.

How do I search for a Mnemiopsis manuscript-of-interest?

Mnemiopsis literature can be searched by selecting or typing a Gene Symbol, Author or Keyword term into the search box. A complete list of all Mnemiopsis literature in PubMed can be retrieved by clicking the ‘View All Literature’ box.

Can I go directly from a literature search page to PubMed?

All search results contain a hyperlinked PubMed identifier (PMID) that redirects the user directly to the appropriate manuscript entry at PubMed.

How is this any different than searching for a paper in PubMed?

The ‘Literature Search’ page was designed as a quick one-step search tool to search and return all primary ‘Mnemiopsis’ manuscripts from PubMed directly from the Mnemiopsis Genome Project Portal. Search terms consist of both MeSH (MH) and author term (OT) fields extracted from PubMed, Entrez Gene and gene2pubmed annotation files.

Where can I download the full Mnemiopsis genome assembly?

The full genome assembly, consisting of 5100 scaffolds, is available for download from the sidebar "Download sequences-->Genome" search option.

What is the convention used for naming scaffolds?

Scaffolds (e.g., MLXXXX) are named as follows:

ML = Mnemiopsis leidyi (L is capitalized to make it clearly an 'L' instead of a 1 or an i).
XXXX = padded integer between 0001 and 5100. There are 5100 scaffolds.

How are the gene identifiers generated?

Genes (e.g., MLXXXXNNNa) are named as follows:

MLXXXX = corresponds with the scaffold where the gene is located.
NNN = non-padded integer. It is unique in combination with the scaffold ID. It is usually in order of its most 5' position on the scaffold, but is not a requirement. Newly added genes get the next highest unused integer regardless of its position.
a = this corresponds with the isoform. The first reported will be 'a', second reported will be 'b' etc.

What is an unfiltered protein model?

An unfiltered protein model is a protein derived from an unincorporated Mnemiopsis gene prediction model.

How should data derived from the Mnemiopsis Genome Project Portal be cited?

Please cite this Web site:

https://research.nhgri.nih.gov/mnemiopsis/

How can I contact the Web site administrator regarding technical issues?

Please send any Web site usability or technical correspondence to bioinformatics@nhgri.nih.gov.

Where can I get additional information about the Mnemiopsis Genome Project?

For additional information, comments, or questions regarding the Mnemiopsis Genome Project, please contact Dr. Baxevanis directly, at andy@nhgri.nih.gov.

How frequently are the data and portal tools updated?

Changes to the Web site and underlying data are documented in the Release History.

Where can I download the supplemental perl scripts for customizing JBrowse and MediaWiki input files?

The supplemental perl scripts for converting sequence and annotation data for JBrowse and MediaWiki can be downloaded here.

Where can I obtain a Mnemiopsis genomic DNA sample?

A Mnemiopsis genomic DNA sample has been deposited to the Ocean Genome Legacy repository at Northeastern University. Further information about obtaining a sample is available by searching the OGL Specimen Catalog using the OGL accession ID number S24180:

http://www.northeastern.edu/ogl/catalog/

TOP

NHGRI Division of Intramural Research

Mnemiopsis Genome Project Portal

Frequently Asked Questions