Bio Dataome: a collection of uniformly preprocessed and automatically annotated datasets for data-driven biology

    Recent News

    Biodataome Database Update 5/6/2020

    A total of 690 new datasets are now fully available on the Biodataome archive. These data were retrieved from GEO, processed via the latest Biodataome pipeline (see "Documentation"), and manually annotated by an expert biologist. These newly added mollecular datasets refer to Human microarray profiles aquired by 3 different platforms, namely: GPL570, GPL6244, and GPL96.
    Go to Archive


BioDataome is a database of uniformly preprocessed and disease-annotated genomic and epigenomic data with the aim to promote and accelerate the reuse of public data. We followed the same preprocessing pipeline for each biological mart (microarray gene expression, RNASeq gene expression, DNA methylation) to produce ready for downstream analysis datasets and automatically annotated them with Disease-Ontology terms. We also designate datasets that share common samples and automatically discover control samples in case-control studies. Currently, BioDataome includes 7345 datasets, 318144 samples spanning 847 diseases and can be easily used in large scale massive experiments and meta-analysis. All datasets are publicly available for querying and downloading.
Homo sapiens
out of 318144
Mus musculus
28141 out of 318144
GSE Species Entity Technology Type Samples Duplicates Disease ParentNode ChildNode Analyses Annotation Version Release Date
Download metadata