The SILVA ribosomal RNA gene database project: improved data processing and web-based tools

Quast, Christian; Pruesse, Elmar; Yilmaz, Pelin; Gerken, Jan; Schweer, Timmy; Yarza, Pablo; Peplies, Jörg; Glöckner, Frank Oliver

doi:10.1093/nar/gks1219

Abstract

SILVA (from Latin silva, forest, http://www.arb-silva.de) is a comprehensive web resource for up to date, quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. The referred database release 111 (July 2012) contains 3 194 778 small subunit and 288 717 large subunit rRNA gene sequences. Since the initial description of the project, substantial new features have been introduced, including advanced quality control procedures, an improved rRNA gene aligner, online tools for probe and primer evaluation and optimized browsing, searching and downloading on the website. Furthermore, the extensively curated SILVA taxonomy and the new non-redundant SILVA datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches.

INTRODUCTION

Sequencing the ribosomal RNA gene (rRNA) is the method of choice for nucleic acid-based detection and identification of microbes, their taxonomic assignment, phylogenetic analysis and the investigation of microbial diversity. Consequently, vast amounts of rRNA gene sequence data—more than 3.5 million sequences (July 2012)—have been accumulated and are publicly available via the International Nucleotide Sequence Database Collaboration (INSDC) databases (1). While the quantity of data further increases the relevance of rRNA genes for marker gene studies for all domains of life, it also creates significant challenges for data management and curation. For optimal utility, the sequences must be extracted and checked for quality, the annotations must be updated and extended to reflect current understanding and finally all the data must be prepared in a coherent, easily accessible manner. These tasks are beyond the scope of the INSDC databases and therefore performed by domain-specific databases. The Ribosomal Database Project (RDP-II) (2,3) and greengenes (4) both cover the domains Archaea and Bacteria for small subunit rRNA gene (SSU) sequences. The SILVA project also includes the Eukaryota, thus covering all three domains of life. Furthermore, SILVA offers databases for both the SSU and the large subunit rRNA gene (LSU).

The SILVA databases are made available as releases, rather than being updated continuously, to enhance the comparability of the studies employing these databases. Each release is numbered according to the EMBL-Bank release from which the sequence data were extracted and is permanently available for download via the SILVA website. Best efforts are made to provide two full releases per year. The database releases are structured into two datasets for each gene: SILVA Parc and SILVA Ref. The Parc datasets comprise the entire SILVA databases for the respective gene, whereas the Ref datasets represent a subset of the Parc comprising only high-quality nearly full-length sequences.

All SILVA datasets contain a rich set of contextual and sequence-associated information. This includes taxonomic classifications from several taxonomy providers, a multiple sequence alignment, type strain information and the latest valid nomenclature. All sequences are quality checked. The corresponding data are made available as ARB files (5) as well as in FASTA and comma-separated value (CSV) formats. Finally, they can be browsed directly via the SILVA website. The combination of SILVA datasets with the ARB software suite provides an advanced workbench for researchers to perform in-depth sequence analysis and phylogenetic reconstructions, as well as manual curation of rRNA gene datasets. The flat-file exports of the SILVA datasets make it easy to integrate SILVA as a source for reference data in next-generation sequencing (NGS) analysis pipelines such as MOTHUR (6), QIIME (7) or MG-RAST (8).

Since its first release in 2007, 16 full releases have been published by the SILVA project. Many improvements to both the release preparation process and the features offered by the project website (http://www.arb-silva.de) have been made. The group of SILVA users has grown to include thousands of researchers worldwide who visit the website regularly to obtain the recent database releases and to employ the SILVA online services in their work. In the first part of this update paper, we describe the most significant changes to data processing within SILVA. The second part outlines the new or improved functions available on the SILVA website.

DATABASES

rRNA gene prediction

The selection of rRNA gene candidate sequences based on annotations in the EMBL-Bank source database is now complemented by hidden Markov model-based rRNA gene prediction. All sequences in EMBL-Bank are scanned for rRNA gene sequences using the models and parameters from the RNAmmer software package (9), HMMER2 (http://hmmer.janelia.org/) and a custom pipeline component. The gene boundaries of the predicted rRNA gene are determined during sequence alignment. Conflicts between EMBL-Bank annotations and predictions are resolved by giving priority to the EMBL-Bank annotation. The source for the SILVA annotation is documented in the field ann_src_slv. As of release 111, the SILVA SSU database contains 53 950 sequences detected solely by the RNAmmer models and 1 537 342 sequences that were both annotated as rRNA and detected by the RNAmmer models. The LSU database contains 17 828 sequences detected solely by RNAmmer models and 17 563 sequences both annotated as rRNA and detected by the RNAmmer models.

Quality control/quality assurance

The quality criteria employed by SILVA to ensure that only reliable sequence information is included in the SILVA databases have been improved and fine-tuned. The sequence alignment is now used to determine which parts of EMBL-Bank annotated rRNA gene sequences extend beyond the boundaries of the SSU or LSU gene. The fraction of ambiguous bases

and the fraction of bases comprising long (>4 bp) homopolymers

are now confined to the region within the respective rRNA gene. Vector contaminations are now only searched for outside the rRNA gene boundaries of a sequence, with

giving the length of the vector contaminant relative to the number of in-gene bases. All three values are reported in percent. The overall ‘Sequence Quality’ value

gives the averaged fraction to which the thresholds for each criterion were expended in percent. Having

as the respective thresholds,

is calculated as follows:

Sequences with or for any criterion are rejected.

The ‘Alignment Quality’ of a sequence is determined by three values reported by the SINA alignment software (10): the alignment score, the base pair score and, as of release 111, the alignment identity (for details, see ‘Aligner’ section). Sequences are rejected based on these values to achieve specificity of the SILVA databases, correcting over-prediction by the RNAmmer models as well as removing sequences wrongly annotated as rRNA genes. The alignment quality thresholds for the different datasets can be found in Table 1.

Table 1.

List of alignment quality thresholds used to exclude sequences from the different SILVA datasets

	LSU Parc	LSU Ref	SSU Parc	SSU Ref
Alignment length (bp)	300	1900	300	Bacteria/Eukaryota: 1200 Archaea: 900
Alignment identity (%)	40	60	50	70
Alignment score (quality)	30	30	40	50
Base pair score	30	30	30	30

	LSU Parc	LSU Ref	SSU Parc	SSU Ref
Alignment length (bp)	300	1900	300	Bacteria/Eukaryota: 1200 Archaea: 900
Alignment identity (%)	40	60	50	70
Alignment score (quality)	30	30	40	50
Base pair score	30	30	30	30

Open in new tab

Table 1.

List of alignment quality thresholds used to exclude sequences from the different SILVA datasets

	LSU Parc	LSU Ref	SSU Parc	SSU Ref
Alignment length (bp)	300	1900	300	Bacteria/Eukaryota: 1200 Archaea: 900
Alignment identity (%)	40	60	50	70
Alignment score (quality)	30	30	40	50
Base pair score	30	30	30	30

	LSU Parc	LSU Ref	SSU Parc	SSU Ref
Alignment length (bp)	300	1900	300	Bacteria/Eukaryota: 1200 Archaea: 900
Alignment identity (%)	40	60	50	70
Alignment score (quality)	30	30	40	50
Base pair score	30	30	30	30

Open in new tab

The thresholds upon which sequences are rejected are based on the statistical analyses performed in Schweer (11). They were selected as a conservative balance between rejecting too many valid sequences and keeping too many questionable sequences. A common threshold of 2% was found to be best for the sequence quality metrics.

Prediction of potentially anomalous sequences, such as chimeras, remains unchanged with respect to the original SILVA publication. No filtering is performed using this metric due to the difficulty of clearly differentiating between artefacts of the sequence acquisition process and unusual yet natural evolutionary events. The correct choice—whether and at which threshold potentially anomalous sequences need to be excluded—should be made in light of the specific research question and the experimental setup. We must therefore relegate such filtering to the individual researchers.

SILVA taxonomy

A substantial revision of the classification of all bacterial and archaeal sequences in the Ref datasets was first published with SILVA release 100. Based on the ‘guide trees’, all taxonomic assignments are manually curated and follow the Bergey’s Manual of Systematic Bacteriology (12). Specifically, Archaea, Cyanobacteria, Chloroflexi and Chlorobi are based on volume 1; Proteobacteria on volume 2; Firmicutes on volume 3; Bacteroidetes, Spirochaetes, Tenericutes (Mollicutes), Acidobacteria, Fibrobacteres, Fusobacteria, Dictyoglomi, Gemmatimonadetes, Lentisphaerae, Verrucomicrobia, Chlamydiae and Planctomycetes on volume 4 and finally Actinobacteria on volume 5. Since taxonomy and species are dynamic entities with rapid turnover, name changes and taxonomic outlines are also adapted from List of Prokaryotic Names with Standing in Nomenclature (LPSN) (13).

Although the classification is mainly based on these authoritative resources, deviations from their recommendations do exist: the classification is a phylogenetic tree-based process and differences from the original description and classifications are to be expected. For example, the genus Ahrensia (type species accession: D88524) is classified under family Rhodobacteraceae of Alphaproteobacteria; however, in the SSU Ref guide tree, this genus is grouped together with members of family Phyllobacteriaceae. Normally, such discrepancies are accommodated by introducing polyphyletic groups; however, in this case genus Ahrensia is classified within Phyllobacteriaceae due to high sequence identities (>94%) observed with other members of this family.

The LPSN resource is further used to track down names without standing in nomenclature (not-validly published taxa) and Candidatus taxa. The inclusion of the two latter categories is a unique feature of the SILVA taxonomy. Furthermore, collaborations with domain experts have been established to annotate uncultured clades. A number of examples are the OCS116 clade (14), the SAGMC and SAGME groups (15) and the termite clusters (16).

For an improved and unified taxonomy for Eukaryota based on 18S rRNA gene sequences, the Eukaryotic Taxonomy Working Group (ETWG) has been founded in October 2011. The first version of the new eukaryotic taxonomy was deployed with SILVA release 111. Specifically, the taxonomy of protist lineages has been reconciled with the International Society of Protistologists (ISOP) publication (17). An early draft from the ISOP committee (Adl et al. 2012, manuscript in preparation) was used to further improve protist classification where possible. Higher-level ranks have been revised for higher plants, fungi and animals. By implementing the classification of ISOP, their concept of ‘rankless’ taxa was introduced to the SILVA taxonomy. That is, the position of a taxon in the taxonomic hierarchy does no longer necessarily imply a rank. Although this concept is biologically sound, we recognize the difficulties that this may bring in computational analyses. Therefore, a file containing classification rank mappings is provided with the new eukaryotic taxonomy. These mappings assign reasonable ranks to taxa in order to make the different levels comparable.

Third-party contextual data

Several fields containing additional contextual data have been added to the SILVA databases over the last years. Basic fields include organism name, author, title, publication ID, collection, submission and modification dates as well as latitude/longitude, depth, habitat and taxonomic classifications by various other databases. Tables detailing the fields available in the current release can be found in the Frequently Asked Questions (FAQ) section of the webpage (http://www.arb-silva.de/documentation/background/faqs/).

The ‘strain’ field, carrying the strain data imported from EMBL-Bank, is now extended with information from third-party sources. In addition to the tag ‘(T)’ used by EMBL-Bank to mark a sequence as type strain, the following tags are used by SILVA:

the label ‘e[G]’ is added if an entry is part of the list of genomes offered by the EMBL-Bank,
the label ‘l[T]’ is added if the entry is part of the type strain datasets of ‘The All-Species Living Tree‘project (18,19),
the label ‘s[T]’ is added if an entry is listed as a type strain by the StrainInfo project (20),
the label ‘s[C]’ is added if an entry is a cultured strain according to the StrainInfo project and
the label ‘r[T]’ is added if an entry is listed as a type strain by the RDP-II project.

Furthermore, manually curated habitat descriptors and other contextual information are incorporated where available based on information provided by the megx.net project (21).

Datasets

Ref

The basic criteria for inclusion of sequences in the high-quality full-length Ref datasets have remained unchanged since 2007. Briefly, for SSU Ref archaeal sequences must have at least 900 bases length, bacterial and eukaryotic sequences must have at least 1200 bases length and all sequences must have an alignment score of at least 50. Since release 111, sequences must also have an alignment identity score of at least 70. Furthermore, sequences from large-scale submissions such as made by the mouse wound microbiome project, the human skin microbiome project or the Guerrero Negro hypersaline microbial mat project are removed from the SSU Ref and provided in a separate dataset. Please refer to the SILVA website for information on which projects were removed from each respective database release. Criteria for LSU Ref datasets can be found in Table 1.

SSU Ref NR

For users interested in a representative collection of SSU rRNA gene sequences, the SILVA project offers a non-redundant (NR) version of the SSU Ref dataset. The Ref NR dataset is created by clustering at 99% (up to SILVA 108) and 98% (SILVA 111) sequence identity using UCLUST (22). Of each cluster, only the longest sequence is kept. Sequences from cultivated species including type strains and multiple operons are preserved in all cases to serve as an anchor for taxonomy. The resulting SSU Ref NR dataset is significantly smaller (25% of the Ref dataset as of release 111) than the full Ref dataset and has a more even phylogenetic distribution of sequences. We recommend this dataset to be used as the standard SILVA reference dataset for rRNA gene-based classification, phylogenetic analysis and probe design.

Living tree project

The ‘All-Species’ Living Tree Project (LTP) is a multi-partner initiative coordinated by the Journal Systematic and Applied Microbiology (Elsevier publishers) in cooperation with the ARB, LPSN and SILVA projects and promoted by the SILVA web resource. Its main objective is to provide highly curated ribosomal 16S and 23S RNA sequence datasets of all type strains representing the up-to-date described bacterial and archaeal diversity. The LTP database is kept updated according to the changes in nomenclature and new descriptions of taxa that are effectively published in the International Journal of Systematic and Evolutionary Microbiology. New type-strain sequences are carefully examined by means of their sequence quality, associated (meta) data and manual inspection of the alignment. This process results in: (i) finding the best available SSU/LSU entry that may represent a species; (ii) providing corrected organism-name information plus other LTP-specific (meta) data and (iii) propagating the alignment improvements to the SILVA seed alignment. These very small but taxonomically ‘comprehensive’ datasets are frequently used for taxonomic and classification purposes and are useful as test datasets for developers.

Further developments

Data retrieval

The semantic interpretations of ‘gene’, ‘product’ and ‘note’ feature qualifiers were modified to avoid overlapping/duplicate entries in the SILVA database due to insufficiently annotated EMBL-Bank entries.

Aligner

The SILVA database preparation pipeline now employs an updated version (1.2.10) of SINA to compute the multiple sequence alignment provided with the databases. Please refer to Pruesse et al. (10) for a detailed description of SINA. Briefly, SINA is a reference-based alignment tool, designed to maintain high alignment accuracy while allowing for volume sequence processing. For each sequence, SINA selects a set of similar sequences from the given reference alignment and constructs a directed acyclical graph (DAG) representation of the alignment of these reference sequences to be used as alignment template. The computed alignment of the query sequence and the DAG template is optimal under the constraint that no columns may be added to the alignment.

The base pair score reported by SINA is an indicator for the degree to which the expected secondary structure is met by the aligned sequence. The SILVA alignments each include a global secondary structure, defining which pairs of alignment columns are expected to bond. The SINA ‘bp score’ is calculated as the average binding score of the column pairs covered by the aligned sequence. The alignment identity (SINA version 1.2.10 and above) reports the highest fractional identity between the query sequence and any sequence in the reference alignment.

Internal reference datasets

Several reference datasets are used during database preparation. These are the alignment SEED, a vector sequence database and a collection of non-chimeric sequences. Each of these datasets was continuously improved and extended with trusted data. The vector sequence database is available for download via the website archive. The alignment SEED and the collection of non-chimeric sequences derived from the SEED, however, must remain undisclosed. While we would prefer to make this dataset available, we are not free to do so as it contains sequences obtained under the promise of confidentiality. The SEED is extended with additional sequences when a specific phylogenetic branch is found to require a more detailed alignment. The sequences leading to such findings are typically novel and entailed much effort by the respective author to ensure their validity, making it impossible for us to also ask for publication rights. However, once those sequences do become published, they will automatically appear in the next SILVA release. Obtaining the intersection between the SEED and the Ref can be done via the field ‘align_log_slv’. SINA guarantees that sequences identical to (or sub-sequences of) reference sequences retain the exact alignment of the respective reference sequence and marks the sequences accordingly. The dataset used to evaluate SINA was prepared by selecting those sequences from the Ref 108 and is available for download on the SILVA website. However, for optimal alignment and classification results we recommend to use the Ref NR dataset as reference dataset. While this dataset is about five times larger than the SEED and is not purely comprised of manually curated sequences, it has by way of its construction a relatively even sequence distribution and good phylogenetic completeness.

SILVA WEBSITE

The SILVA website comprises core database access features, several online tools and an extensive, regularly updated set of documentation pages. The documentation pages provide tutorials for all SILVA tools and functionalities, FAQs and a detailed documentation page for each released database. The website also hosts information for partner projects and collaborations such as LTP and ETWG.

Core database access features

The SILVA databases can be accessed online via the Taxonomy Browser and Search pages. The Browser implements a hierarchical view on the database contents, similar to a file browser, visualizing any of the taxonomies included with SILVA (SILVA, LTP, RDP-II, greengenes and EMBL-Bank). The Search page supports keyword matches on a variety of fields (publication details, organism name, DOI/PubMed identifiers, etc.) as well as range filters on numeric descriptors (sequence length, quality values and dates). Multiple accession numbers can be matched by pasting comma separated lists or ranges of accession numbers in the respective field. Furthermore, searches can be constrained to the Ref, Ref NR or LTP datasets and restricted to the contents of the Cart.

The Cart system connects the different tools on the SILVA website by storing a set of sequences of interest. The Cart’s contents can be modified and displayed in both the Browser and the Search. When sequences are added to the Cart, these sequences will be highlighted by the Browser and sequence counts will be shown for each displayed taxonomic group. In line with the Cart metaphor, it is also possible to have the Cart’s contents prepared for download. Export files can be generated in ARB and FASTA formats, with and without alignment and optionally compressed.

Within the Search, the Cart allows to express complex questions as a series of simple queries. For example, searching for all sequences marked as type strain by StrainInfo but not by RDP-II can be achieved by first adding all sequences that contain the strain tag ‘s[T]’ to the cart and then removing those sequences for which strain field is marked with ‘r[T]’.

Alignment, sequence-based search and classification

The Aligner page allows submitting sequence data for processing with SINA. FASTA files containing up to 1000 sequences (≤6000 nt each) can be uploaded, and the sequences will be aligned using the same reference alignments that are employed to prepare the SILVA databases. While all SINA parameters are configurable in the web form, the parameters will default to the same values that were used to prepare the most recent SILVA release.

Optionally, the aligned sequences can be passed through the search stage of SINA to find closely related sequences in the Parc, Ref or Ref NR datasets. The query sequences are compared with the sequences in the selected dataset based on the SINA alignment. The search results can then be added to the Cart, and thereby accessed from the other components of the website, for example to prepare a download or to inspect the taxonomic groups containing sequences similar to the submitted query sequences.

It is also possible to classify the submitted sequences based on the search results. For each query sequence, the classifications assigned to the matched sequences are consolidated using a lowest common ancestor approach. The resulting classification is included with the aligned sequence data.

Probe and primer evaluation

Signature sequences are essential to many methods employed in the investigation of microbial communities. Since these signatures are derived from previously characterized sequences, they can only be accurate to the degree to which diversity was covered by available data at the time of their design. As more data become available over time, signature sequences must be regularly re-evaluated and their suitability re-affirmed. In order to make this process as simple as possible, the TestProbe and TestPrime tools are now offered on the SILVA website. Users can base their calculations on the entire SSU or LSU Parc datasets, or on the Ref or Ref NR subsets. The results can be downloaded as CSV files and matched sequences can be added to the Cart for subsequent download. Both TestProbe and TestPrime are cross-linked with the oligonucleotide signature database probeBase (23).

TestProbe

The probe match and evaluation tool tests and visualizes in silico target group coverage of rRNA gene-targeting probes and single primers, optionally with ambiguity codons, against the SILVA datasets. The tool can be configured to allow up to five mismatches between the probe and target sequences. Mismatches can also be weighted. The results are shown in three tables: an overview of the number and position of mismatches, a per-taxon summary table and a third table listing individual matching sequences with sequence names, accession numbers and a graphical representation of the probe’s binding site within all matches.

TestPrime

Similar to the TestProbe tool, TestPrime allows searching for all sequences within the SILVA databases which are targeted by a pair of primers (optionally with ambiguity codons). The maximal number of allowed base mismatches can be configured, as well as a ‘zero tolerance zone’ at the 3′-end of the primers. Coverage and match, mismatch and no data information are shown in overview pie charts (Figure 1). The graphical results are complemented by two tables similar to TestProbe.

Figure 1.

Open in new tab Download slide

Screenshot of the Taxonomy Browser showing TestPrime results for two universal primers for evaluation.

The in silico stage of TestProbe and TestPrime is built upon the ARB ‘PT server’. First, the signature sequences are resolved into sets of ambiguity-free oligonucleotides. Each oligonucleotide is then analyzed by the PT server according to the configured stringency parameters. The results are sorted into three groups: match, mismatch and no data. The last group contains all sequences for which no clear decision could be made. Most commonly, this is the case when the sequence in question does not cover the oligonucleotide match position (short or partial sequences).

SILVA direct link API

Direct linking into the browser is supported by URLs of the form: http://www.arb-silva.de/browser/{lsu,ssu}/<INSDC accession number>.

Direct linking into the search is supported by URLs of the form: http://www.arb-silva.de/search/show/{lsu,ssu}/<search field>/<search term>[<search field>/<search term>[…]].

Up to four pairs of search fields and search terms are allowed. A description of the available search fields is documented on the website. They include ‘acc’ for INSDC accession numbers, ‘name’ for organism name and ‘pubid’ for PubMedID or DOI.

SILVA entries are already directly linked from various sources, including the EMBL-Bank, the GenBank, probeBase and StrainInfo.net.

OUTLOOK

The full impact of the ‘data deluge’ originating by the advent of NGS technologies has not yet influenced the amount of long, assembled sequence data such as full-length rRNA gene sequences. However, the growth of rRNA gene data has already rendered many comparative analysis methods impossible to be applied on comprehensive datasets.

We expect that tree reconstruction for the complete SSU Ref datasets will become infeasible in the near future. Taxonomy curation will then be based on the smaller SSU Ref NR dataset. Classifications for all other sequences in the SILVA databases will be created with a high-throughput approach using this curated Ref NR taxonomy as a reference.

Our databases are already popular in high-throughput analysis pipelines and we expect the importance of these applications to further increase in the future. We are, therefore, committed to enhancing the usability of our reference datasets for these applications. The work on our taxonomy, in particular on its eukaryotic branch, will be of direct and transparent benefit to analysis procedures relying on the SILVA databases.

FUNDING

Max Planck Society; Deutsche Forschungsgemeinschaft [GL 553/4-1]. Funding for open access charge: Deutsche Forschungsgemeinschaft.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

The authors would like to thank Wolfgang Ludwig and Ralf Westram for expert assistance with the ARB software suite, the alignments and phylogeny. They greatly appreciate the help the SILVA users have rendered with critical evaluation and feedback on the SILVA databases and tools. They would also like to thank the RDP-II, StrainInfo and probeBase teams, as well as our taxonomy collaboration partners for their support and many fruitful discussions.

REFERENCES

1

Cochrane

G

,

Karsch-Mizrachi

I

,

Nakamura

Y

.

The international nucleotide sequence database collaboration

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

D15

-

D18

)

2

Cole

JR

,

Chai

B

,

Farris

RJ

,

Wang

Q

,

Kulam-Syed-Mohideen

AS

,

McGarrell

DM

,

Bandela

AM

,

Cardenas

E

,

Garrity

GM

,

Tiedje

JM

.

The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data

,

Nucleic Acids Res.

,

2007

, vol.

35

(pg.

D169

-

D172

)

3

Cole

JR

,

Wang

Q

,

Cardenas

E

,

Fish

J

,

Chai

B

,

Farris

RJ

,

Kulam-Syed-Mohideen

AS

,

McGarrell

DM

,

Marsh

T

,

Garrity

GM

, et al.

The ribosomal database project: improved alignments and new tools for rRNA analysis

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D141

-

D145

)

4

DeSantis

TZ

,

Hugenholtz

P

,

Larsen

N

,

Rojas

M

,

Brodie

EL

,

Keller

K

,

Huber

T

,

Dalevi

D

,

Hu

P

,

Andersen

GL

.

Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB

,

Appl. Environ. Microbiol.

,

2006

, vol.

72

(pg.

5069

-

5072

)

5

Ludwig

W

,

Strunk

O

,

Westram

R

,

Richter

L

,

Meier

H

,

Yadhukumar

,

Buchner

A

,

Lai

T

,

Steppi

S

,

Jobb

G

, et al.

ARB: a software environment for sequence data

,

Nucleic Acids Res.

,

2004

, vol.

32

(pg.

1363

-

1371

)

6

Schloss

PD

,

Westcott

SL

,

Ryabin

T

,

Hall

JR

,

Hartmann

M

,

Hollister

EB

,

Lesniewski

RA

,

Oakley

BB

,

Parks

DH

,

Robinson

CJ

, et al.

Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities

,

Appl. Environ. Microbiol.

,

2009

, vol.

75

(pg.

7537

-

7541

)

7

Caporaso

JG

,

Kuczynski

J

,

Stombaugh

J

,

Bittinger

K

,

Bushman

FD

,

Costello

EK

,

Fierer

N

,

Pena

AG

,

Goodrich

JK

,

Gordon

JI

, et al.

QIIME allows analysis of high-throughput community sequencing data

,

Nat. Methods

,

2010

, vol.

7

(pg.

335

-

336

)

8

Meyer

F

,

Paarmann

D

,

D'Souza

M

,

Olson

R

,

Glass

EM

,

Kubal

M

,

Paczian

T

,

Rodriguez

A

,

Stevens

R

,

Wilke

A

, et al.

The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes

,

BMC Bioinformatics

,

2008

, vol.

9

pg.

386

9

Lagesen

K

,

Hallin

P

,

Andreas Rodland

E

,

Staerfeldt

H-H

,

Rognes

T

,

Ussery

DW

.

RNAmmer: consistent and rapid annotation of ribosomal RNA genes

,

Nucleic Acids Res

,

2007

, vol.

35

(pg.

3100

-

3108

)

10

Pruesse

E

,

Peplies

J

,

Glöckner

FO

.

SINA: accurate high throughput multiple sequence alignment of ribosomal RNA genes

,

Bioinformatics

,

2012

, vol.

28

(pg.

1823

-

1829

)

11

Schweer

T

.

Qualitätsmanagement ribosomaler RNA sequencen in der SILVA datenbank

,

Thesis

,

2011

Germany

University of Applied Sciences Bingen

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

12

Garrity

GM

,

Jonson

KL

,

Bell

J

,

Searles

DB

. ,

Taxonomic Outline of the Prokaryotes

,

2002

New York

Springer-Verlag

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

13

Euzéby

JP

.

List of bacterial names with standing in nomenclature: a folder available on the internet

,

Int. J. Syst. Bacteriol.

,

1997

, vol.

47

(pg.

590

-

592

)

14

Morris

RM

,

Vergin

KL

,

Cho

JC

,

Rappe

MS

,

Carlson

CA

,

Giovannoni

SJ

.

Temporal and spatial response of bacterioplankton lineages to annual convective overturn at the Bermuda atlantic time-series study site

,

Limnol. Oceanogr.

,

2005

, vol.

50

(pg.

1687

-

1696

)

Google Scholar

Crossref

WorldCat

15

Takai

K

,

Moser

DP

,

DeFlaun

M

,

Onstott

TC

,

Fredrickson

JK

.

Archaeal diversity in waters from deep South African gold mines

,

Appl. Environ. Microbiol.

,

2001

, vol.

67

(pg.

5750

-

5760

)

16

Köhler

T

,

Stingl

U

,

Meuser

K

,

Brune

A

.

Novel lineages of planctomycetes densely colonize the alkaline gut of soil-feeding termites (Cubitermes spp.)

,

Environ. Microbiol.

,

2008

, vol.

10

(pg.

1260

-

1270

)

17

Adl

SM

,

Simpson

AGB

,

Farmer

MA

,

Andersen

RA

,

Anderson

OR

,

Barta

JR

,

Bowser

SS

,

Brugerolle

GUY

,

Fensome

RA

,

Fredericq

S

, et al.

The new higher level classification of eukaryotes with emphasis on the taxonomy of protists

,

J. Eukaryot. Microbiol.

,

2005

, vol.

52

(pg.

399

-

451

)

18

Yarza

P

,

Richter

M

,

Peplies

J

,

Euzeby

J

,

Amann

R

,

Schleifer

KH

,

Ludwig

W

,

Glöckner

FO

,

Rossello-Mora

R

.

The all-species living tree project: a 16S rRNA-based phylogenetic tree of all sequenced type strains

,

Syst. Appl. Microbiol.

,

2008

, vol.

31

(pg.

241

-

250

)

19

Munoz

R

,

Yarza

P

,

Ludwig

W

,

Euzeby

J

,

Amann

R

,

Schleifer

KH

,

Glockner

FO

,

Rossello-Mora

R

.

Release LTPs104 of the all-species living tree

,

Syst. Appl. Microbiol.

,

2011

, vol.

34

(pg.

169

-

170

)

20

Dawyndt

P

,

Vancanneyt

M

,

De Meyer

H

,

Swings

J

.

Knowledge accumulation and resolution of data inconsistencies during the integration of microbial information sources

,

IEEE T Knowl. Data En.

,

2005

, vol.

17

(pg.

1111

-

1126

)

Google Scholar

Crossref

WorldCat

21

Kottmann

R

,

Kostadinov

I

,

Duhaime

MB

,

Buttigieg

PL

,

Yilmaz

P

,

Hankeln

W

,

Waldmann

J

,

Glöckner

FO

.

Megx.net: integrated database resource for marine ecological genomics

,

Nucleic Acids Res.

,

2010

, vol.

38

(pg.

D391

-

D395

)

22

Edgar

RC

.

Search and clustering orders of magnitude faster than BLAST

,

Bioinformatics

,

2010

, vol.

26

(pg.

2460

-

2461

)

23

Loy

A

,

Maixner

F

,

Wagner

M

,

Horn

M

.

probeBase–an online resource for rRNA-targeted oligonucleotide probes: new features 2007

,

Nucleic Acids Res.

,

2007

, vol.

35

(pg.

D800

-

D804

)

Author notes

The authors wish it to be known that, in their opinion, the first three authors should be regarded as joint First Authors.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com.

Download all slides

Month:	Total Views:
November 2016	6
December 2016	35
January 2017	103
February 2017	217
March 2017	270
April 2017	181
May 2017	229
June 2017	225
July 2017	163
August 2017	205
September 2017	209
October 2017	170
November 2017	263
December 2017	435
January 2018	527
February 2018	465
March 2018	495
April 2018	564
May 2018	661
June 2018	637
July 2018	617
August 2018	749
September 2018	626
October 2018	687
November 2018	660
December 2018	551
January 2019	616
February 2019	563
March 2019	783
April 2019	725
May 2019	769
June 2019	819
July 2019	1,124
August 2019	1,180
September 2019	1,181
October 2019	1,046
November 2019	904
December 2019	907
January 2020	962
February 2020	1,238
March 2020	1,155
April 2020	688
May 2020	843
June 2020	1,428
July 2020	1,356
August 2020	993
September 2020	1,016
October 2020	944
November 2020	1,041
December 2020	767
January 2021	965
February 2021	786
March 2021	1,053
April 2021	1,090
May 2021	1,028
June 2021	908
July 2021	977
August 2021	982
September 2021	1,157
October 2021	1,122
November 2021	1,236
December 2021	1,319
January 2022	1,276
February 2022	1,084
March 2022	1,632
April 2022	1,400
May 2022	1,403
June 2022	1,149
July 2022	1,096
August 2022	1,007
September 2022	1,164
October 2022	1,170
November 2022	1,184
December 2022	1,234
January 2023	1,449
February 2023	1,368
March 2023	1,525
April 2023	1,465
May 2023	1,447
June 2023	1,325
July 2023	1,325
August 2023	1,295
September 2023	1,380
October 2023	1,442
November 2023	1,375
December 2023	1,476
January 2024	1,615
February 2024	1,603
March 2024	1,841
April 2024	841

Article Contents

The SILVA ribosomal RNA gene database project: improved data processing and web-based tools

Abstract

INTRODUCTION

DATABASES

rRNA gene prediction

Quality control/quality assurance

SILVA taxonomy

Third-party contextual data

Datasets

Ref

SSU Ref NR

Living tree project

Further developments

Data retrieval

Aligner

Internal reference datasets

SILVA WEBSITE

Core database access features

Alignment, sequence-based search and classification

Probe and primer evaluation

TestProbe

TestPrime

SILVA direct link API

OUTLOOK

FUNDING

ACKNOWLEDGEMENTS

REFERENCES

Author notes

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

The SILVA ribosomal RNA gene database project: improved data processing and web-based tools

Abstract

INTRODUCTION

DATABASES

rRNA gene prediction

Quality control/quality assurance

SILVA taxonomy

Third-party contextual data

Datasets

Ref

SSU Ref NR

Living tree project

Further developments

Data retrieval

Aligner

Internal reference datasets

SILVA WEBSITE

Core database access features

Alignment, sequence-based search and classification

Probe and primer evaluation

TestProbe

TestPrime

SILVA direct link API

OUTLOOK

FUNDING

ACKNOWLEDGEMENTS

REFERENCES

Author notes

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only