User Support &Training

Training

Our training focuses on upskilling eLwazi project members and users in how to use the open data science platform and additional computational skills they may need. We work closely with the DS-I Africa consortium training working group which is developing a data science curriculum.

Displaying 1 - 14 of 14

Cloud-based pathogen genomics with Terra: tools, workflows, and real-world use cases Cloud-based platforms are transforming the way public health scientists and bioinformaticians conduct large-scale genomic analyses. Terra, developed by the Broad Institute, is a cloud-native platform designed to support scalable, secure, and reproducible biomedical research. This presentation highlights the power of Terra in facilitating pathogen genomics workflows, especially in the context… Date: 30 July 2025 Read More
Running cloud-native pipelines with Dockstore and eLwazi Terra Starting with a very brief review of Docker (containerization technology) and WDL (Workflow Description Language) as examples, learn how Dockstore facilitates reproducible science through the re-use and launch of tools and workflows across a variety of cloud platforms, including eLwazi Terra and local command-line tools usable in HPC environments. See a demo of how Dockstore integrates with… Date: 23 July 2025 Read More
Introduction to the Workflow Descriptor Language (WDL) Reproducing a data analysis is a major challenge in the scientific community. In response to this issue, workflow languages, including the Workflow Description Language (WDL), have become a popular solution to reproduce both a compute environment and the analysis steps. The WDL structure defines each analysis task, how they work together, and how to scale the steps for large datasets.… Date: 09 July 2025 Read More
The eLwazi Metadata Harmonisation Tool The exponential growth of scientific data across disciplines has created a pressing need for robust, flexible tools that enable harmonisation and standardisation of metadata to ensure data interoperability and reusability. A major barrier to effective data sharing lies in the heterogeneity of data formats, vocabularies, and annotation practices, which can hinder discovery, integration, and… Date: 11 June 2025 Read More
Introduction to Nextflow With the increase in the rate at which raw sequencing data is produced due to improved technology and reduced cost of Next-Generation Sequencing (NGS), researchers in the field of bioinformatics and computational biology can perform “multi-omics” data analyses to answer many biological questions. However, analysis of such large datasets comes with a number of challenges, especially when it… Date: 28 May 2025 Read More
Leveraging publicly available DNA and RNA sequencing data for hypothesis-driven science in cancer biology and beyond This seminar will focus on using publicly available DNA and RNA sequencing datasets to investigate biological hypotheses, taking breast cancer as a case example. To start, we will formulate the hypothesis for this case study, then discuss how to tailor the publicly available datasets for a preliminary exploration of the chosen topic and briefly review the relevant papers. We will walk through… Date: 21 May 2025 Read More
Introduction to Containers As a systems administrator supporting research environments, I’ve seen firsthand how tricky reproducibility can be when software and infrastructure vary across systems. In this talk, I’ll introduce containers—specifically Docker and Singularity—as practical tools for creating consistent, portable environments that support reproducible science. Whether you're running pipelines on local machines… Date: 14 May 2025 Read More
Lessons from 20 years of Open Source Development Computational biology has undergone a significant transformation since the advent of high-throughput sequencing, a pivotal breakthrough that democratized large-scale genomic analyses for researchers. However, even prior to this technological advancement, several of the most widely utilized tools in the field prioritized reliable software development practices, with data reproducibility being a… Date: 23 April 2025 Read More
Pangenome-based structure deconvolution of the amylase locus The adoption of agriculture triggered a rapid shift towards starch-rich diets in human populations. Amylase genes facilitate starch digestion, and increased amylase copy number has been observed in some modern human populations with high-starch intake, although evidence of recent selection is lacking. Here, using 94 long-read haplotype-resolved assemblies and short-read data from approximately… Date: 13 September 2024 Read More
Enhancing Complex Trait Mapping in Recombinant Inbred Rodent Strains through Pangenome Analysis N/A Date: 13 September 2024 Read More
Exploring the application of pangenome reference graphs to rare disease diagnosis Although the CHM13 reference represents a complete human genome, it lacks the full diversity of human haplotypes present in Africa. Analysis pipelines which map sequencing reads to a single linear reference may suffer from “reference genome bias”, where unmapped reads bias downstream analysis. The impact of reference genome bias in the clinical evaluation of genome sequencing data from African… Date: 10 September 2024 Read More
Introduction to Dockstore.org: An “app” store for bioinformatics workflows Dockstore.org is a repository of reusable bioinformatics analyses shared by the research community. Tools and workflows registered in Dockstore are written using languages that combine containers to reproduce the exact compute environment with a precise description of each step of a computational analysis. Dockstore currently supports several workflow… Read More
Introduction to the Gen3 Data Commons system This workshop will provide an overview of the Gen3 Data Commons Framework. We will describe the core microservices used to create a data service that can be used to harmonize data, facilitate access, and assist new research projects to identify relevant data and will provide a demonstration using two of our current data commons. Date: 23 June 2022 Read More
Introduction to Terra: A scalable platform for biomedical research Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate. This interactive workshop on Terra will teach you the skills you need to know to start working and collaborating securely in Terra. Specifically, you’ll learn about the architecture of Terra as it relates to cloud-based data sets, tools, and… Date: 12 April 2022 - 13 April 2022 Read More

Training

Cloud-based pathogen genomics with Terra: tools, workflows, and real-world use cases

Running cloud-native pipelines with Dockstore and eLwazi Terra

Introduction to the Workflow Descriptor Language (WDL)

The eLwazi Metadata Harmonisation Tool

Introduction to Nextflow

Leveraging publicly available DNA and RNA sequencing data for hypothesis-driven science in cancer biology and beyond

Introduction to Containers

Lessons from 20 years of Open Source Development

Pangenome-based structure deconvolution of the amylase locus

Enhancing Complex Trait Mapping in Recombinant Inbred Rodent Strains through Pangenome Analysis

Exploring the application of pangenome reference graphs to rare disease diagnosis

Introduction to Dockstore.org: An “app” store for bioinformatics workflows

Introduction to the Gen3 Data Commons system

Introduction to Terra: A scalable platform for biomedical research