The Hajibabaei lab at the Centre for Biodiversity Genomics, University of Guelph, recently published a tool called MetaWorks in PLoS ONE to help facilitate and standardize the processing of large-scale genomics datasets used for biomonitoring from samples like soil, water, and samples collected in traps.
Samples are used for next generation biodiversity monitoring which couples ‘boots on the ground’ collection with high-throughput DNA sequencing. However, sophisticated software tools are needed to ensure sequence quality and to assemble species lists for biodiversity assessments.
Demand led to the creation of MetaWorks—a free and open-source software designed to bring together a suite of tools to process signature DNA regions that target species such as bacteria, fungi, and macroinvertebrates.
“The need to make metabarcoding informatic processing both scalable and tractable within reasonable timeframes drove us to develop MetaWorks,” explained Dr. Mehrdad Hajibabaei, expert in molecular biodiversity and Integrative Biology professor at the College of Biological Science. “We want to support projects that cut across taxon lines.”
As datasets grow with improvements in sequencing technology, MetaWorks makes it possible to monitor a variety of organisms—all at the same time. Next-generation monitoring will not only describe species distributions over space and time, but also their interactions by examining co-occurrences and trophic relationships.
How it Works:
MetaWorks runs at the command-line in a linux-64 environment that comes with its own processing environment to facilitate reproducibility. It uses a Python-based workflow manager called Snakemake to facilitate scalable processing and efficiently use computational resources.
MetaWorks can generate taxonomically assigned exact sequence variants (ESVs) or operational taxonomic units (OTUs). In contrast with existing methods, MetaWorks has the flexibility to handle a variety of popular markers (ex. 16S, ITS, COI) including the specialized steps needed to process ITS (removal of flanking rRNA gene sequences) and protein-coding markers (pseudogene-filtering). Custom classifiers are also available to place sequences in a taxonomic framework along with a measure of statistical probability.
“Although we’ve officially published MetaWorks, it remains in active development to keep up with improvements in the underlying programs and the growth of reference sequence databases,” said Dr. Teresita Porter, bioinformatician and Research Associate at the Centre for Biodiversity Genomics. MetaWorks aims to make working with large multi-marker DNA metabarcoding datasets more tractable, helping researchers address a range of current big-picture issues. With MetaWorks, we are one step closer to automating DNA-based biomonitoring to study ecosystems and environmental change over time.