User Support &Training

Large Language Model Pilot Project Call

Overview

The development of large language models (LLMs) has opened many new possibilities for the use of AI in analysing data, though there are many open questions, including:

accuracy and error
privacy
skills

However, despite these concerns, this is a technology that is under-explored in DS-IAfrica and which many groups would benefit from its use.

The use of public LLMs such as ChatGPT, Claude and DeepSeek are of course very important. However, these are not always possible to use in our projects because the sensitivity of the data means that providing the data to public LLMs is too risky or too complex to get regulatory approval. The alternative – to run these models either in the cloud or locally – is very attractive.

Participation is open to data scientists and trainees in the DS-I Africa Consortium and partners. Participants must be competent programmers.

Skill level of training

Intermediate

Language

English

Credential awarded

Certificate of Attendance

Type of training

Workshop

Venue

Hackathon
Professional Development Hub (PDH), University of the Witwatersrand, Johannesburg, South Africa

Course date

10 February - 17 April 2026

Phase A (online):
Weekly on Tuesdays 10 February - 7 April as well as Thursdays 12 February and 26 March 15h00 – 16h30 CAT, subject to change.

Phase B (hackathon):
13-17 April 2026
08h30 -16h30 CAT

Application opening date

Wednesday 10th of December 2025

Application closing date

Monday 19th of January 2026 - 23:59:59 CAT

Notification date for successful applicants

Monday 26th of January 2026

Organisers

Scott Hazelhurst, Sumir Panji, Michelle Skelton, Kerry Glover, Tshinakaho Malesa, Shaun Aron, Atwine Mugume, Helen Robertson, Ndivhuwo Makondo

Intended Audience

The course is aimed at graduate students and scientists who are currently working on data science projects in Africa, with preference given to DS-I Africa Consortium members and partners.

Prerequisites

Competent programmer
Your own laptop
Unix terminal or Windows Subsystem for Linux (WSL)
Command-line knowledge and experience with working with LLMs
Project support and own project funding for in-person Hackathon travel (see below)

Funding

The Hackathon organisers will cover the venue, a return daily shuttle to/from Rosebank Holiday Inn, daily lunch, and refreshment breaks during the hackathon days.

All other expenses, including travel, accommodation, airport transfers, visas, and vaccinations, must be covered by your project/PI.

Project/PI support is required for your attendance.

Curriculum

The Phase A training component is virtual, although we encourage DS-I Africa projects / individuals to form in person study groups to enable peer to peer learning and develop teamwork skills in virtual classrooms. We would encourage the model of having a TA in each such classroom.

Phase A training will comprise 10 sessions, each 90 minutes long. Some of the sessions may have a practical component/project that participants are expected to complete.

Introduction to LLMs, overview of existing LLMs
Theory of LLMs: part 1
Theory of LLMs: part 2
Using an API to interact with an LLM
Programming using LLMs – best practices
Programming using LLMs: case study
Ethics and legal issues

Bias, privacy
Confidentiality and data leakage

Introduction to running LLMs locally

Overview of different options
Pros/cons of running locally versus cloud, LLM pragmatics
Approaches to running locally (e.g., fine-tuning, RAG)

Running LLMs locally: practical exercise
Critical assessment and reflection

Learning outcomes

After this course participants should be able to:

Define the fundamental architecture and components of large language models, including transformers, attention mechanisms, and tokenisation
Compare and contrast various LLM models
Identify and assess the ethical considerations related to LLM use, including bias, privacy, confidentiality, and data leakage
Identify appropriate use cases for public LLMs versus locally-run models based on data sensitivity and regulatory requirements, and cost
Implement API calls to interact with LLMs programmatically for data analysis tasks
Apply best practices for prompt engineering and programming with LLM, including validation of results
Configure and deploy local LLM instances using appropriate tools and frameworks, including customisation and fine-tuning
Assess LLM outputs for accuracy, reliability, and potential errors in scientific data analysis contexts
Design and implement a complete LLM-based solution for a real-world data science problem in the DS-I Africa consortium
Develop custom workflows that integrate LLMs into existing data analysis pipelines
Critically advocate for responsible and ethical use of LLMs in African research contexts

Limitations

This course provides a foundation for continued learning in research using LLMs and current practices that rapidly change.