Skip to main content

Leveraging publicly available DNA and RNA sequencing data for hypothesis-driven science in cancer biology and beyond

Overview

This seminar will focus on using publicly available DNA and RNA sequencing datasets to investigate biological hypotheses, taking breast cancer as a case example. To start, we will formulate the hypothesis for this case study, then discuss how to tailor the publicly available datasets for a preliminary exploration of the chosen topic and briefly review the relevant papers. We will walk through how to navigate databases like GEO, and how to use the online analysis tools of Immgen, DepMap, cBioPortal and KMPlotter to explore the molecular and clinical relevance of single genes and groups of genes from a specific hypothesis or generated during preliminary data analysis. Additionally, the importance of comparing to healthy tissues/cells will be discussed. In terms of analytical pipelines, the Seurat package for single cell RNA-sequencing will be presented, as an example, with an introduction of downstream deeper analyses that can be integrated within, such as inferCNV, CellChat, Monocle (for pseudotime analysis) and SCENIC (gene regulatory network analysis). Afterwards, we will brainstorm how these analyses, thus far done in silico from publicly available data, can inform the design of wet-lab or clinical retrospective experiments and how to request further data, such as from tissue biobanks, to do such validation. The accessibility of large human cohort datasets, such as from the UK Biobank (with the publicly available tool, GeneBass) and NIH All of Us will be discussed. Time permitting, an additional case-study example provided by the group will be used to demonstrate the use of these online resources in real time. At the end of this session, attendees will be introduced to the process for selecting and downloading publicly available sequencing data that fits a particular hypothesis, in depth pipelines for analysis (of scRNAseq as an example) and how to leverage additional patient survival and in vitro cancer cell line data to validate and extend findings before making a plan to follow up with further experimentation.

Type of training

Virtual Webinar

Date
  1. 21 May - 21 May 2025
Intended Audience

Individuals interested in learning about exploring publicly available DNA and RNA datasets using various bioinformatics resources

Link to recordings