Currently 0 jobs waiting to be calculated.
Upload your FASTA-files (as .fasta or compressed in a tar.gz / zip folder). The fasta files themselves may be compressed as .gz. Nested directories cannot be processed. To get a notification when phenDB is finished, submit your email address (optional).
Your files are processed by phenDB. The site will be automatically reloaded. You can save the URL to view your results later.

PhenDB is an automated pipeline for the prediction of microbial phenotypes based on comparative genomics.


Gene prediction by Prodigal (Hyatt et al., 2010) is run on the DNA sequence of the uploaded metagenomic bins or genomes.
Hmmer (Eddy et al., 2001) then searches orthologous groups from these protein sequences using the EggNOG DB (Huerta-Cepas et al., 2015).
Finally PICA (Feldbauer et al., 2015) uses support vector machine (SVM)-based models (calculated from genomes with known phenotypes) and the list of orthologous groups of proteins present in a bin/genome to predict whether the organism possesses a particular trait or not.
Currently, predictions for 44 different traits are calculated.

Please note that PhenDB is still in an early stage of development and for several models we are still working on improving the training data.
Thus, please take note of the "balanced_accuracy" values ascribed to predictions.

How to Use PhenDB

You may either upload a single metagenomic bin/genome in FASTA format (.gz-compressed or uncompressed), or an archive (.tar.gz or .zip) containing several bins/genomes.
Alternatively, files and archives of files containing a proteome of a metagenomic bin (again in FASTA format) may be uploaded.

Please note that a flat filestructure is required in the compressed folder.
The current maximum filesize for upload is 1 GB, the maximum file size per bin is 30 MB.
Duplicate sequence files (determined by file content) will be silently dropped from the analysis.
Empty sequence files will be silently dropped from the analysis.

Balanced Accuracy is a confidence measure computed from completeness/contamination of the uploaded bin and the model's predictive power. Predictions with a balanced accuracy below the chosen cutoff value (range: 0.5 - 1) are omitted from the result.

After submission, your job is queued and waits for completion of any previously submitted jobs.
When computation starts, expect about 1-1.5 min of calculation time per bin.
To receive a notification upon job completion, enter an email address during your submission.
Alternatively, you may save the URL to your submission to retrieve your results later.
Results are stored by PhenDB for 30 days - after which all user-uploaded data and associated results are deleted.


PhenDB provides the output files in a .zip compressed folder named after your Job ID (i.e. the key after ../results/ in the URL). This folder contains:


Hyatt, Doug, et al. "Prodigal: prokaryotic gene recognition and translation initiation site identification." BMC bioinformatics 11.1 (2010): 119.
Eddy, Sean R. "HMMER: Profile hidden Markov models for biological sequence analysis." (2001).
Huerta-Cepas, Jaime, et al. "eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences." Nucleic acids research 44.D1 (2015): D286-D293.
Feldbauer, Roman, et al. "Prediction of microbial phenotypes based on comparative genomics." BMC bioinformatics 16.14 (2015): S1.