Help

CyanoLyase is a manually curated sequence database gathering members of the phycobilin lyase family and related sequences.

Browsing the database
Accessing the raw data
Analysing your sequences
Adding informations to the database
Citing CyanoLyase
Getting the code

1. Browsing the database

All the sequences stored in this database can be accessed by several means. This section describes each of them.

1.1 By genome

In CyanoLyase, each sequence is attached to a genome. The genomes includes a full listing of genomes where at least one phycobilin lyase or a related protein sequence has been identified.

You can sort the list by clicking on each column header, and you can also filter the list by typing in the 'Search' box.

The listing contains informations about each genome including taxonomy, phylogenic classification, habitat, pigmentation, and sequencing center and status.

By clicking on a strain name, you can display more details about a specific genome. All strain pages contain the complete taxonomy (from NCBI taxonomy), some bibliographical references if available. If the genome sequence has been submitted to RefSeq, a link to the record at the NCBI website is provided.

This page also provides the full list of phycobilin lyase or related protein sequences for each strain. Each line of the list represents a lyase gene, with the family in which it has been classified, a link to the gene page (see §1.2) as well as a link to the corresponding GenBank record (if available). As all the lists available on CyanoLyase, it is sortable and filterable.

1.2 By gene

By clicking on a gene name, you can access a page displaying some details about the selected gene, including:

the genome where the gene is located
the family to which it belongs
a link to a corresponding GenBank record (if available)
a link to the genomic context, i.e. a representation of the gene and its neighbours on the genome (provided by the NCBI Sequence Viewer)
the position of the gene in the genome
the sequence in fasta format.

1.3 By family

The families page shows a hierarchical classification of the different phycobilin lyases and related protein sequences, based on previous classifications (see e.g. Schluchter et al., 2010) as well as on phylogenetic (using Blast), motifs (using Protomata) and structural (using Phyre2) analyses that we made using the sequences of the CyanoLyase database.

By clicking on a clan, subclan, family or subfamily name, a page describing the corresponding sequence group is displayed. It contains a description of the group and some bibliographical references if the group has been described in the literature.

Some groups have a "Motif" section: in this section, you can download a Protomata motif representing the family, and you can launch some sequence analysis based on this motif. See §3 for more details on this topic.

Some groups have a "Phylogeny" section: in this section, you can download multiple alignments and phylogenetic trees corresponding to the family. See §3.3 for more details on this topic.

At the bottom of the page, a list of all the genes included in the group is displayed, with links to the corresponding gene pages, NCBI records (when available) and genome pages.

1.4 Phyletic profile

With the phyletic profile page, you can get an overview of the presence or the absence of each gene family (or subfamily) in each genome. This is particularly useful if you want to see if the presence of a particular gene is associated to the presence another one, or if some family is associated to a specific habitat for example.

The list of genomes is sortable and filterable. You can also choose to hide some columns if you want to focus on specific phycobilin lyase groups.

2. Accessing the raw data

All the sequences stored in the database can be fetched in fasta format directly from the different pages where the sequences are displayed.

If you want to download all the phycobilin lyases (and related proteins) sequences from a specific genome, go to the corresponding strain page and click on the [fasta] link at the top of the page.

The same thing can be done for each family and each gene stored in the database.

An important feature to mention about the sequences is their reliability. Each sequence has an associated status: the curator of the database can specify if each sequence is 'sure' or 'unsure'. It is marked as 'sure' if the curator has enough evidences (e.g. strong homology/orthology) to assume that the sequence belongs to a specific group of sequences. 'Unsure' sequences generally correspond to those very distant from other sequences of the same group and that have not been included in the set of sequences used for motif design using Protomata.

On each genome and family page, you can choose to restrict the displayed sequences to only the 'sure' or the 'unsure' ones. By default, all of them are displayed, and unsure ones are in italic.

When clicking on a [fasta] link, the same filter is applied. This is useful for example if you want to fetch all the 'sure' sequences of a family.

3. Analysing your sequences

CyanoLyase is not only a list of lyase genes: you can also easily launch Blast and Protomata analysis directly from the website.

3.1 Blast interface

By clicking on the Blast link at the top of the screen, you get to a Blast form from which you can perform blastp or blastx requests.

With this form, you can search for similarities between your own sequences and the manually curated sequences of CyanoLyase. This form uses the latest Blast+ release (2.2.25+ at the time of writing).

3.2 Protomata interface

Protomata is a software suite designed for motif discovery in protein sequences.

It was used on most families or subfamily sequence sets available in CyanoLyase to generate motifs ('protomata') representing these groups. You can view a graphic representation of these protomata on each family page, by clicking on the 'Display the motif (logo)' link.

You can use these motifs to scan your own protein sequences (from a new genome for example) and see if they can be classified into a specific group of sequences. This can be done using the Protomatch interface available from the top of every CyanoLyase page. In this case, you will be able to enter your own sequences, or to choose a databank of protein sequences like Pfam, NR or even the genomes stored in CyanoLyase.

It is also possible to only scan the sequences with the motif of a specific family. To do so, go to the family page and click on the 'Scan protein sequences with this motif' link.

For more informations on the options and the results interpretation, see the official Protomata documentation.

3.3 Phylogeny

A phylogeny program has been integrated into CyanoLyase. It automatically generates (for each family) multiple alignments with Muscle, ii) eliminates gaps and low quality regions with trimAl and iii) generates ML trees with PhyML. The multiple alignments and phylogenetic trees (Newick format) are available on each family page in the 'Phylogeny' section. Radial representations of the phylogenetic trees are also available in this section.

For other types of visualization of phylogenetic trees (hierarchical, axial, etc.), users can download Newick files and use tree viewers, such as:

4. Adding informations to the database

All the annotations in this database are manually curated by the administrators and updated on a regular basis. If you want to add or modify some information, feel free to contact the authors.

5. Citing CyanoLyase

Please cite: Bretaudeau A, Coste F, Humily F, Garczarek L, Six L, Ratin M, Collin O, Schluchter WA & Partensky F. CyanoLyase: a database of phycobilin lyase sequences, motifs and functions. Submitted to NAR Databases.

6. Getting the code

This website was written using the Symfony 2 PHP framework. Some of the code is available under a free license: see the GenOuest platform Github account for more informations.