nf-core/createtaxdb
Parallelised and automated construction of metagenomic classifier databases of different tools
databasedatabase-buildermetagenomic-profilingmetagenomicsprofilingtaxonomic-profiling
Version history
Added
- #169 Have KAIJU_MKFMI module also export relevant taxdump files for downstream processes (by @jfy133)
- #170 Publish sometimes generated
unmapped.txtfile for Kraken2 databases (❤️ to @softstam for reporting, fix @jfy133) - #178 Add additional validation checks for required MetaCache inputs (by @jfy133)
- #179 Add new parameter
--save_uncompressed_fastasto only optionally save decompressed input files (fix @jfy133)
Fixed
- #158 Prevent sylph failing due to too long commands when many input genomes (by @softstam, @jfy133)
- #160 Prevent sourmash failing due to too long commands when many input genomes (by @softstam, @jfy133)
- #161 Force METACACHE_BUILD module to always use one CPU, as not multi-threaded, removing warning (by @jfy133)
- #162 Fix links to FAQ in parameter docs (by @jfy133)
- #163 Fix code block title in auxiliary files section of FAQ (by @jfy133)
- #165 Force use of KrakenUniq
--jellyfish-binto ensure more regular execution (by @jfy133) - #173 Fix generated downstream samplesheet’s Bracken directory name being flipped (by @jfy133)
- #175 Fix MetaCache receiving wrong taxonomy file (was seq2map, should have been accession2taxid) (by @sofstam, @jfy133)
- #182 KMCP emits correct taxonomy files for downstream use (by @sofstam, @jfy133)
- #183 Fix KrakenUniq using incorrectly non-renamed seqid2map taxonomy file, resulting in no taxonomy info during classification (by @jfy133)
- #184 Stop generation of concatenated FASTA file of input files if not needed by selected tools (by @jfy133)
Dependencies
| Tool | Old Version | New Version |
|---|---|---|
Added
- #117 - Updated to nf-core/tools template 3.5.1 (by @jfy133)
- #143 - Documented how to resovle KrakenUniq unbound variable jellyfish issue (❤️ to @flass for suggesting, added by @jfy133)
- #140 - Added MetaCache database building support (❤️ to @ChillarAnand for suggestion, added by @alxndrdiaz and @jfy133)
- #144 - Added tutorial on how to convert an NCBI
assembly_summary.txtfile to input samplesheet (❤️ to @dialvarezs for improvements, by @jfy133)
Fixed
Dependencies
| Tool | Old Version | New Version |
|---|---|---|
| nf-core | 3.4.1 | 3.5.1 |
| MetaCache | 2.5.0 |
Added
- #108 - Document workaround for building databases from single FASTA files, e.g. from NCBI RefSeq (by @pcantalupo)
- #111 - Added sourmash reference building both for genomes and proteomes (by @Midnighter).
- #117 - Updated to nf-core/tools template 3.4.1 (by @jfy133)
- #124 - Added sylph reference building (by @jfy133 and @sofstam).
Fixed
- #110 - Corrected the documented structures of the grouped output from the
PREPROCESSINGsubworkflow (by @Midnighter). - #121 - Fix Kraken2 build failing with local files due to symlink-in-symlink mounting error with containers (❤️ to @ellmagu for reporting, fix by @jfy133)
- #122 - Update DIAMOND to support more recent versions of NCBI taxonomy (by @jfy133)
- #123 - Fix a MALT build validation check incorrectly assigned to —build_krakenuniq (by @jfy133)
- #133 - Fix Kaiju compatible renamed FASTA files being always published even if —kaiju_keepintermediates false (by @jfy133)
Dependencies
| Tool | Old Version | New Version |
|---|---|---|
| sourmash | 4.9.4 | |
| sylph | 0.9.0 | |
| kraken2 | 2.1.5 | 2.1.6 |
| tar | 1.34 | |
| nf-core | 3.3.2 | 3.4.1 |
| DIAMOND | 2.1.12 | 2.1.16 |
| MultiQC | 1.31 | 1.32 |
| Nextflow | 24.04.2 | 25.04.2 |
Deprecated
Added
Initial release of nf-core/createtaxdb, created with the nf-core template.
Adds database building support for the following tools:
- (Primarily) nucleotide based
- Bracken (added by @alxndrdiaz)
- Centrifuge (added by @jfy133)
- ganon (added by @jfy133)
- KMCP (added by @alxndrdiaz)
- KrakenUniq (added by @jfy133)
- Kraken2 (added by @alxndrdiaz)
- MALT (added by @jfy133 and @LilyAnderssonLee)
- (Primarily) protein based
Additional optimisation when running with very large number of genomes by @BioWilko.
| Tool | Old Version | New Version |
|---|---|---|
| Bracken | 3.1 | |
| Centrifuge | 1.0.4.2 | |
| DIAMOND | 2.1.12 | |
| find | 4.6.0 | |
| ganon | 2.1.0 | |
| kaiju | 1.10.0 | |
| KMCP | 0.9.4 | |
| Kraken2 | 2.1.5 | |
| KrakenUniq | 1.0.4 | |
| MALT | 0.6.2 | |
| pigz | 2.8 | |
| seqkit | 2.9.0 | |
| tar | 1.34 | |
| MultiQC | 1.29 | |
| p7zip | 16.02 |