Cloud-based platforms are transforming the way public health scientists and bioinformaticians conduct large-scale genomic analyses. Terra, developed by the Broad Institute, is a cloud-native platform designed to support scalable, secure, and reproducible biomedical research. This presentation highlights the power of Terra in facilitating pathogen genomics workflows, especially in the context…
Date:
30 July 2025
Read More
|
Running cloud-native pipelines with Dockstore and eLwazi Terra
Starting with a very brief review of Docker (containerization technology) and WDL (Workflow Description Language) as examples, learn how Dockstore facilitates reproducible science through the re-use and launch of tools and workflows across a variety of cloud platforms, including eLwazi Terra and local command-line tools usable in HPC environments. See a demo of how Dockstore integrates with…
Date:
23 July 2025
Read More
|
Introduction to the Workflow Descriptor Language (WDL)
Reproducing a data analysis is a major challenge in the scientific community. In response to this issue, workflow languages, including the Workflow Description Language (WDL), have become a popular solution to reproduce both a compute environment and the analysis steps. The WDL structure defines each analysis task, how they work together, and how to scale the steps for large datasets.…
Date:
09 July 2025
Read More
|
The eLwazi Metadata Harmonisation Tool
The exponential growth of scientific data across disciplines has created a pressing need for robust, flexible tools that enable harmonisation and standardisation of metadata to ensure data interoperability and reusability. A major barrier to effective data sharing lies in the heterogeneity of data formats, vocabularies, and annotation practices, which can hinder discovery, integration, and…
Date:
11 June 2025
Read More
|
Introduction to Nextflow
With the increase in the rate at which raw sequencing data is produced due to improved technology and reduced cost of Next-Generation Sequencing (NGS), researchers in the field of bioinformatics and computational biology can perform “multi-omics” data analyses to answer many biological questions. However, analysis of such large datasets comes with a number of challenges, especially when it…
Date:
28 May 2025
Read More
|
|
Introduction to Containers
As a systems administrator supporting research environments, I’ve seen firsthand how tricky reproducibility can be when software and infrastructure vary across systems. In this talk, I’ll introduce containers—specifically Docker and Singularity—as practical tools for creating consistent, portable environments that support reproducible science. Whether you're running pipelines on local machines…
Date:
14 May 2025
Read More
|
Lessons from 20 years of Open Source Development
Computational biology has undergone a significant transformation since the advent of high-throughput sequencing, a pivotal breakthrough that democratized large-scale genomic analyses for researchers. However, even prior to this technological advancement, several of the most widely utilized tools in the field prioritized reliable software development practices, with data reproducibility being a…
Date:
23 April 2025
Read More
|
Pangenome-based structure deconvolution of the amylase locus
The adoption of agriculture triggered a rapid shift towards starch-rich diets in human populations. Amylase genes facilitate starch digestion, and increased amylase copy number has been observed in some modern human populations with high-starch intake, although evidence of recent selection is lacking. Here, using 94 long-read haplotype-resolved assemblies and short-read data from approximately…
Date:
13 September 2024
Read More
|
|
Exploring the application of pangenome reference graphs to rare disease diagnosis
Although the CHM13 reference represents a complete human genome, it lacks the full diversity of human haplotypes present in Africa. Analysis pipelines which map sequencing reads to a single linear reference may suffer from “reference genome bias”, where unmapped reads bias downstream analysis. The impact of reference genome bias in the clinical evaluation of genome sequencing data from African…
Date:
10 September 2024
Read More
|
|
Introduction to the Gen3 Data Commons system
This workshop will provide an overview of the Gen3 Data Commons Framework. We will describe the core microservices used to create a data service that can be used to harmonize data, facilitate access, and assist new research projects to identify relevant data and will provide a demonstration using two of our current data commons.
Date:
23 June 2022
Read More
|
Introduction to Terra: A scalable platform for biomedical research
Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate. This interactive workshop on Terra will teach you the skills you need to know to start working and collaborating securely in Terra. Specifically, you’ll learn about the architecture of Terra as it relates to cloud-based data sets, tools, and…
Date:
12 April 2022 - 13 April 2022
Read More
|