User Support &Training

Human Pangenome bring your data (BYOD) analysis workshop

Overview

The human reference genome provides a universal coordinate system that specifies a standardized reference sequence for genes and their annotation. This reference genome is used for the alignment of sequence reads for variant calling in newly sequenced genomes. However, the current reference is composed of a handful of individual genomes which do not necessarily represent the genetic diversity across different world populations and introduces reference allele bias. This is particularly pertinent for African populations which are very genetically diverse. While the haploid, linear reference genome has formed the basis of all genetic variation studies, the availability of new technologies such as highly accurate long read sequencing, coupled with the development of novel computational tools that allow for the efficient de novo assembly of full human genomes, presents the possibility to now build a more representative human reference genome.

In order to capture the diversity that exists across genomes, rather than using a linear sequence, the variation can be expressed in terms of a mathematical graph structure with multiple overlapping sequence paths based on a collection of ethnically diverse genomes. Traversing through the graph structure would allow for variants observed within a group of individuals to be represented in the reference genome, hence allowing for the more accurate calling of both single nucleotide and structural variants.

We have used long read sequencing data to generate high quality de novo assemblies from a diverse set of African samples. We have combined these with other high quality African genome assemblies based on long read sequencing data to generate a genome graph structure that more accurately represents African genetic variation. While a population specific genome reference graph is not representative of global genetic diversity it can be useful for the exploration of population specific genetic variation and can improve variants called in closely-related population samples.

We are currently inviting applications from African researchers interested in addressing specific scientific questions using Pan-genome graphs.

Training application

Competitive selection process

Skill level of training

Intermediate to advanced

Language

English

Type of training

Workshop

Venue

Southern Sun, Newlands, Cape Town, South Africa

Course date

21 October - 25 October 2024

Organisers

Tshinakaho Malesa, Karen Miga, Melissa Nel, Andrea Guarracino, Flavia Villani, Mohammed Farhat, Gerrit Botha, Kennedy Mwai Wambui, Chris Fields, Shaun Aron, Sumir Panji, Nicola Mulder.

Intended Audience

African based researchers from African genomics funded projects such as H3Africa that are specifically focused on using NGS data for research and clinical applications within African populations.

Prerequisites

Nominees should have experience and completed the following:

Viewed the introductory lectures on pan genome graph building: https://www.youtube.com/playlist?list=PLcQ0XMykNhCSc8ucXrV1g70gRCXbdmiYU

Viewed the second webinar series on practical applications of using a pangenome graph:

Be comfortable working on the command line in unix/linux and an HPC style environment

Be familiar with NGS file formats and tools, be familiar with either variant calling or structural variant analysis e.g. VCF tools, SAM tools etc

Have a research question they would like to learn how to address using the human pangenome graphs

Have a small human dataset they would like to use for the analysis workshop

Learning outcomes

Human Pangenome bring your data (BYOD) analysis workshop outcomes: Be familiar with the methods used for building human pangenome graphs Be able to visualise and work with human pangenome graphs Be able to call variants of human pangenome graphs using tools such as Mini-graph Cactus, Giraffe, VG Utilise the human pangenome graphs to do some preliminary analysis of their data

To apply, please Click HERE