guide

Overview

The following diagram is a flowchart of a typical FastGroupII experiment. You can find the PCR and cloning protocols we use for prokaryotic 16S rDNA studies on this page. You can also find information on how to use FastGroupII as well as some useful links that might help for later analyses.

Flowchart of a typical FastGroupII experiment
flowchart classifier community FastGroupII clone pcr sample
Online user's guide

1. DNA isolation from environmental samples

Samples are collected from reef-building corals. Each sample is first washed with 0.2µm filtered and autoclaved seawater to remove any loosely associated microbes. Then each coral sample is airbrushed with 10X TE to remove the tissue and associated microbes.

Total DNA is prepared from the samples using the UltraClean Soil DNA kit (MoBio).

 

2. PCR amplification with 27F & 1492R primers

DNA are then subjected to PCR (Polymerase Chain Reaction) using a touchdown protocol. 27F and 1492R primers that are specific to bacteria 16S rDNA are used. Phosphorylate primers are used for next step of cloning.

 

3. Cloning and sequencing

The resulting PCR products are cloned using PCR-SMART Proofreaders Cloning Kit (Lucigen). Inserts from the library are sequenced with 27F primer.

 

4. FastGroupII

Ambiguous bases at the ends of the sequences are removed first. There are two methods to trim the sequences. 1) sequence with ambiguous bases (e.g. "N") can be removed from the ends. 2) sequence with 5' or 3' of a defined site can be removed. In the program, some of the mostly used highly conserved sites of Bacteria and Archaea are listed. PSI of a user-specified conserve site can also be adjusted by the user.

Four grouping algorithms are implemented in FastGroupII. PSI (Percentage Sequence Indentity), PSI with Gaps, Seq-Match (Sequence Match in RDP) and Tree-parsing method. The first two methods are using the same algorithms used in FastGroup1.0 (Seguritan et al 2001). Seq-Match method is modified from the algorithm used in Sequence Match in RDP project (Ribosomal Database Project) developed in Michigan State University. The fourth method is utilizing the information from the guide tree built from ClustalW alignment. Note that the percent identity field has a decimal precision.

Except the tree parsing method, the other three methods are using the same schema to group the sequences.

At the initial step, the first sequence in the library is read in. It is put into a new group and it becomes the representative sequence. Next sequence (query sequence) then is retrieved, ends trimmed and compared to all the representative sequences of the existing groups (representative sequence). If the query sequence is similar to the representative sequence according to the user-specified criteria, it is put into the same group as the representative sequence. If it is not, it is put into a new group and becomes the representative sequence.

One bias can be caused in these methods is the selection of the representative sequence. However there is no an accurate method to choose a representative sequence. For each method, the order of the sequences in the library can be shuffled randomly the number of times specified by the user.

After the sequences in a library are dereplicated, the grouping information is generated which includes 'General group information' which outputs the name of each representative sequence and number of sequences in that group. The representative sequence of each group is produced as the user required. Detailed group information includes each representative sequence and the names of all sequences in the same group. 'Group statistics' contains the information of the coverage of each group by the method of Good.

Rarefaction curves and rank-abundance curves will also be plotted out for later analyses.

For details please see how to use FastGroupII.

5. Community indices calculation

Standard diversity and richness indexes, including the diversity Shannon-Wiener index and richness index Chao1, are calculated from the dereplication results.

 

6. RDP Classifier

RDP Classifier is developed by RDP team in Michigan State University. It is used to assign 16s rRNA sequences to the taxonomical hierarchy proposed in Bergey's Manual of Systematic Bacteriology, 2nd Ed., vetted sequences, upcoming release 2004. Hierarchical taxa is based on a naïve Bayesian rRNA classifier.

 

7. DOTUR (Distance Based OTU and Richness Determination)

DOTUR is a computer program developed by Schloss et al. It also dereplicates 16S rDNA libraries using a distance matrix as input. Interested users can find detailed information about DOTUR here.

 

8. Other tools in FastGroupII

A Java program called Converter can be downloaded here. Raw sequences in separate files obtained directly from sequencing can be edited into a FASTA format file. To run this program, download the .jar file. Then choose 'javaw' to open the file. The instructions in the program will guide you through the process of conversion.

Some users may have noticed that when you click on the link to the Converter, it will bring up a page full of mysterious code. This is because the browser you are using doesn't recognize .jar file. What these users can do is put the mouse on top of the link, click the right button and choose 'save target as'. The file can then be downloaded.

 

9. Source code and test dataset

FastGroupII was developed using Perl 5.8 on Linux. The web interface was developed using the Perl::CGI module. The program is released under the GPL license and available for download on its SourceForge webpage.

A library containing bacterial 16S rDNA sequences from four species of corals was used for testing FastGroupII. The sequences are stored in a FASTA format file and can be downloaded here.