AllTheBacteria

All bacterial genomes assembled, available and searchable

AllTheBacteria is a large-scale, collaborative, open-access project that assembles, annotates, and catalogs millions of bacterial and archaeal genomes from public sequencing data.

ℹ️
2,440,377 assemblies in total (as of August 2024)
E coli with false colour

Designed to support the global microbiology community, it provides standardized, high-quality genomic data, with some key analyses and annotations run consistently across the whole data (e.g. gene annotation and AMR gene detection), and other targeted tools run on specific species.

We provide multiple types of search tools (e.g. sketchlib and sourmash for finding similar genomes to a query, and Lexicmap for BLAST-style alignment to the full dataset).

Please feel free to use the data, and to join us if you see analyses on your favourite species that would be valuable to all.