Frequently Asked Questions


BLAST
Genome Browser
KEGG Pathways
Pfam Domains
View a Gene Page
Fetch a Scaffold Temporal Developmental Expression Profiles Single-Cell Expression In situ Images Literature Search Download Sequences
Miscellaneous



What version of BLAST is implemented in the Mnemiopsis BLAST tool?

We use SequenceServer, which implements the NCBI BLAST+ 2.2.31 command line applications (referred to as the BLAST+ applications). SequenceServer (version 1.0.11) is provided by the Wurm Lab and subject to all Terms & Conditions set forth by the developers.


What BLAST programs are available?

What BLAST databases are available?

Nucleotide Sequence Databases:
Protein Sequence Databases:

What do the boxes in the 'Links' column of the BLAST output represent?

The 'Links' column provides clickable hyperlinks for each significant BLAST alignment to their entries in the Mnemiopsis JBrowse genome browser (B), the scaffold Fetch tool (S), the WikiGene pages (G), the unfiltered (U) prediction models, the Cufflinks-assembled transcripts (C), the Public ESTs (N), the Mitochondrial genome (N) and the Trinity-assembled transcripts (T) (Trinity may align to multiple places in the genome; only one of the alignments is linked from the BLAST output).


What tracks are available for viewing in the Mnemiopsis genome browser?

How can I navigate the Mnemiopsis genome using JBrowse?

A user can zoom in or out and left or right on a region of interest by clicking on the appropriate icons centered on the blue toolbar. In addition, available tracks can be viewed (or hidden) by clicking on the appropriate track label on the left sidebar of the browser window.


Is the Mnemiopsis genome browser searchable?

The genome browser is currently searchable using Mnemiopsis scaffold (e.g., ML0001) or gene (e.g., ML00011a) identifiers.


Is there a preferred Web browser I should use to view the Genome Browser?

The Mnemiopsis leidyi Genome Project Web site was developed and tested using Firefox. The rendering of desired output while using other Web browsers, including Google Chrome, Safari and Internet Explorer, is assumed but not guaranteed.


What KEGG pathways are available?

All human KEGG pathways containing human genes with a Mnemiopsis homolog, as determined by our clustering analysis, are searchable.


How do I search for a particular KEGG pathway of interest?

KEGG pathways can be selected from the drop-down menus using KEGG identifiers, KEGG pathway names, or gene symbols. Pathways are also keyword searchable by entering a KEGG identifier, pathway name, or gene symbol in the search text box while selecting the appropriate field table.


What do the rows in the pathways table represent?

Each row represents a cluster from our clustering analysis. At least one human protein from a particular cluster is in the corresponding KEGG pathway.


What information does the 'Cluster' column in the pathways table provide?

The 'Cluster' column indicates the phylogenetic clade that encompasses all of the proteins in the particular cluster (e.g., 'Metazoa' indicates that the cluster contains sequences from a variety of metazoan species, and that the cluster cannot be characterized by a less inclusive clade like Bilateria).


What does the 'Ratio' column in the pathways table represent?

The 'Ratio' column represents the number of human proteins in the particular cluster that are in the corresponding pathway (numerator) over the number of total human proteins in that cluster (denominator).


What options are available for searching the Mnemiopsis genome for Pfam-A domains?

You can elect to search the Protein Models, Unfiltered Protein Models, or both ("All"). You may either select a domain name or domain accession number from the drop-down menus, or enter a domain name or five-digit Pfam accession number in the search box.


How were Pfam domains identified in the Mnemiopsis genome?

We used hmmscan from the HMMER suite to search the ML2.2 Protein Models and Unfiltered Protein Models for domains from the Pfam-A database (version 25). For more information on Pfam, visit http://pfam.sanger.ac.uk/. The gathering threshold (cut_ga) option of hmmscan was used to ensure conservative domain prediction.


What is the structure of the definition line when sequences are downloaded from the Pfam-A domains page?

If the "Full-length protein sequences, in FASTA format" option is chosen, the definition line will contain the gene identifier and query domain. If the "Pfam-A domains of selected proteins, in FASTA format" option is chosen, the definition line will contain the gene identifier, the coordinate range of the domain in that protein model, and the name of the query domain.


How do I search for a gene using the Mnemiopsis gene Wiki pages?

The Mnemiopsis gene Wiki pages are accessible from the home page left sidebar and are searchable by entering an Mnemiopsis gene identifier (e.g., ML00011a) in the search box and clicking ‘Go’.


What type of information is available in the Mnemiopsis gene Wiki pages?

Each record in the gene Wiki pages represents a single Mnemiopsis gene and provides the following annotation: nucleotide and protein sequences, coding exonic genomic coordinates, pre-computed BLAST hits from numerous organisms displaying the top hits for each protein, PFAM domains, functional annotation (gene ontology derived from Argot2 and Blast2GO), related human disease genes from the Online Mendelian Inheritance in Man (OMIM) database, ortholog clusters formed by phylogenetically informed clustering methods, temporal developmental expression profiles, single-cell clusters, and in situ images.


How are the phylogenetically informed ortholog clusters determined?

Sets of genes with putative orthology are computed based on BLAST [BLASTP] sequence similarities and relative position in a predetermined phylogenetic tree. Hits between each pair of genes are assigned bit scores by summing those for initial BLASTP high-scoring segments found on the same pair of genes, in consistent order, and overlapping less than five percent (with bit scores penalized proportional to the amount of overlap). Orthologous sets of genes are computed at each tree node in two steps. First, where a set or gene from one child of the node is in a mutual best hit relation with a set or gene from the other child, they are combined into a new set. Second, all hits within this node's subtree and between the subtree and all outgroup genes are considered in descending order of score. A hit to an outgroup gene blocks any further merging of a gene or set (until we visit another tree node), while a hit between two sets or genes within the subtree, neither previously blocked, results in their being merged into a new set. (This orthology computation is based on that described in Putnam et al. (2007) with further refinement of the blocking rules.).


How do I download a single genomic scaffold sequence?

A user can enter a ScaffoldID (e.g., ML0001) in the "Fetch Scaffold" textbox to return a single FASTA-formatted Mnemiopsis scaffold sequence.


Is there a way to download a partial scaffold sequence?

A partial scaffold sequence can be retrieved by entering a ScaffoldID (e.g., ML0001) while also specifying the relative beginning and ending coordinates in the "Fetch Scaffold" textbox.


What are the other options for fetching a scaffold sequence?

A user can optionally retrieve either a reverse complement or the six-frame translation of a scaffold or partial scaffold by selecting the appropriate "Fetch Scaffold" search option.


What is the source of the temporal developmental expression profiles data?

The developmental expression data are derived from GEO accession GSE111748 and complement GSE60478.


How can I search and view data for my gene-of-interest?

Time-course distribution plots of developmental gene expression data can be searched and viewed by entering a ML gene identifier (e.g., ML00011a) in the search box and clicking ‘Go’.


How are the graphs generated?

The developmental expression graphs are implemented using Andrew Sielen's Violin Plot + Box Plot v2, and represent the number of mapped reads (tpm) as a function of hours post-fertilization for each gene.


What is the source of the single-cell expression data?

Single-cell cluster data was published in Sebe-Pedros et al. (2018), Early metazoan cell type diversity and the evolution of multicellular gene regulation. The experimental Mnemiopsis metacell (cluster) files are available for download here: http://www.wisdom.weizmann.ac.il/~/arnau/Single_cell_datasets/Mnemiopsis/


How can I find genes that cluster with my gene-of-interest?

Single-cell clusters can be searched by selecting a gene identifier, cell type or cluster ID. To find genes that cluster with a gene-of-interest, select the GeneID from the pull-down menu or enter in the search box and click ‘Go’. The results table will list the cell type, cluster ID and all single-cell clusters that contain your queried gene-of-interest (in bold text).


Can I download all the sequences from a particular single-cell expression?

Nucleotide and protein sequences can be downloaded in FASTA-format from the single-cell search results page by clicking on either the DNA (blue) or protein (green) button.


How do I search for an in situ image?

In situ images can be searched by selecting from a list or entering a gene symbol, gene identifier, gene accession or submitter in the specified search box. Users can also view all in situ images by clicking the ‘View All Images’ button at the top of the in situ images search page.


Can I download an original in situ image?

Original in situ images can be downloaded by clicking on the ‘PMID’ link from a search result page. Users are redirected to the appropriate PubMed manuscript entry where the in situ image was originally published.


How do I search for a Mnemiopsis manuscript-of-interest?

Mnemiopsis literature can be searched by selecting or typing a Gene Symbol, Author or Keyword term into the search box. A complete list of all Mnemiopsis literature in PubMed can be retrieved by clicking the ‘View All Literature’ box.


Can I go directly from a literature search page to PubMed?

All search results contain a hyperlinked PubMed identifier (PMID) that redirects the user directly to the appropriate manuscript entry at PubMed.


How is this any different than searching for a paper in PubMed?

The ‘Literature Search’ page was designed as a quick one-step search tool to search and return all primary ‘Mnemiopsis’ manuscripts from PubMed directly from the Mnemiopsis Genome Project Portal. Search terms consist of both MeSH (MH) and author term (OT) fields extracted from PubMed, Entrez Gene and gene2pubmed annotation files.


Where can I download the full Mnemiopsis genome assembly?

The full genome assembly, consisting of 5100 scaffolds, is available for download from the sidebar "Download sequences-->Genome" search option.


What is the convention used for naming scaffolds?

Scaffolds (e.g., MLXXXX) are named as follows:
How are the gene identifiers generated?

Genes (e.g., MLXXXXNNNa) are named as follows:
What is an unfiltered protein model?

An unfiltered protein model is a protein derived from an unincorporated Mnemiopsis gene prediction model.


How should data derived from the Mnemiopsis Genome Project Portal be cited?

Please cite this Web site:

https://research.nhgri.nih.gov/mnemiopsis/


How can I contact the Web site administrator regarding technical issues?

Please send any Web site usability or technical correspondence to bioinformatics@nhgri.nih.gov.


Where can I get additional information about the Mnemiopsis Genome Project?

For additional information, comments, or questions regarding the Mnemiopsis Genome Project, please contact Dr. Baxevanis directly, at andy@nhgri.nih.gov.


How frequently are the data and portal tools updated?

Changes to the Web site and underlying data are documented in the Release History.


Where can I download the supplemental perl scripts for customizing JBrowse and MediaWiki input files?

The supplemental perl scripts for converting sequence and annotation data for JBrowse and MediaWiki can be downloaded here.


Where can I obtain a Mnemiopsis genomic DNA sample?

A Mnemiopsis genomic DNA sample has been deposited to the Ocean Genome Legacy repository at Northeastern University. Further information about obtaining a sample is available by searching the OGL Specimen Catalog using the OGL accession ID number S24180:

http://www.northeastern.edu/ogl/catalog/

TOP