Skip to main content

Large Language Model Pilot Project Call

Overview

The development of large language models (LLMs) has opened many new possibilities for the use of AI in analysing data, though there are many open questions, including:

  • accuracy and error
  • privacy
  • skills

However, despite these concerns, this is a technology that is under-explored in DS-IAfrica and which many groups would benefit from its use.

The use of public LLMs such as ChatGPT, Claude and DeepSeek are of course very important. However, these are not always possible to use in our projects because the sensitivity of the data means that providing the data to public LLMs is too risky or too complex to get regulatory approval. The alternative – to run these models either in the cloud or locally – is very attractive.

Participation is open to data scientists and trainees in the DS-I Africa Consortium and partners. Participants must be competent programmers.

Skill level of training

Intermediate

Language

English

Credential awarded

Certificate of Attendance

Type of training

Workshop

Venue

Hackathon
Professional Development Hub (PDH), University of the Witwatersrand, Johannesburg, South Africa

Course date
  1. 10 February - 17 April 2026

Phase A (online):
Weekly on Tuesdays 10 February - 7 April as well as Thursdays 12 February and 26 March 15h00 – 16h30 CAT, subject to change.

Phase B (hackathon):
13-17 April 2026
08h30 -16h30 CAT

Application opening date

Wednesday 10th of December 2025

Application closing date

Monday 19th of January 2026 - 23:59:59 CAT

Notification date for successful applicants

Monday 26th of January 2026

Organisers

Scott Hazelhurst, Sumir Panji, Michelle Skelton, Kerry Glover, Tshinakaho Malesa, Shaun Aron, Atwine Mugume, Helen Robertson, Ndivhuwo Makondo

Sponsors

MADIVA, eLwazi Open Data Science Platform, DS-I Africa Consortium

Intended Audience

The course is aimed at graduate students and scientists who are currently working on data science projects in Africa, with preference given to DS-I Africa Consortium members and partners.

Prerequisites

  • Competent programmer
  • Your own laptop
  • Unix terminal or Windows Subsystem for Linux (WSL)
  • Command-line knowledge and experience with working with LLMs
  • Project support and own project funding for in-person Hackathon travel (see below)

Funding

The Hackathon organisers will cover the venue, a return daily shuttle to/from Rosebank Holiday Inn, daily lunch, and refreshment breaks during the hackathon days.

All other expenses, including travel, accommodation, airport transfers, visas, and vaccinations, must be covered by your project/PI.

Project/PI support is required for your attendance.

Curriculum

The Phase A training component is virtual, although we encourage DS-I Africa projects / individuals to form in person study groups to enable peer to peer learning and develop teamwork skills in virtual classrooms. We would encourage the model of having a TA in each such classroom.

Phase A training will comprise 10 sessions, each 90 minutes long. Some of the sessions may have a practical component/project that participants are expected to complete.

  1. Introduction to LLMs, overview of existing LLMs
  2. Theory of LLMs: part 1
  3. Theory of LLMs: part 2
  4. Using an API to interact with an LLM
  5. Programming using LLMs – best practices
  6. Programming using LLMs: case study
  7. Ethics and legal issues
    • Bias, privacy
    • Confidentiality and data leakage
  8. Introduction to running LLMs locally
    • Overview of different options
    • Pros/cons of running locally versus cloud, LLM pragmatics
    • Approaches to running locally (e.g., fine-tuning, RAG)
  9. Running LLMs locally: practical exercise
  10. Critical assessment and reflection

Learning outcomes

After this course participants should be able to:

  • Define the fundamental architecture and components of large language models, including transformers, attention mechanisms, and tokenisation
  • Compare and contrast various LLM models
  • Identify and assess the ethical considerations related to LLM use, including bias, privacy, confidentiality, and data leakage
  • Identify appropriate use cases for public LLMs versus locally-run models based on data sensitivity and regulatory requirements, and cost
  • Implement API calls to interact with LLMs programmatically for data analysis tasks
  • Apply best practices for prompt engineering and programming with LLM, including validation of results
  • Configure and deploy local LLM instances using appropriate tools and frameworks, including customisation and fine-tuning
  • Assess LLM outputs for accuracy, reliability, and potential errors in scientific data analysis contexts
  • Design and implement a complete LLM-based solution for a real-world data science problem in the DS-I Africa consortium
  • Develop custom workflows that integrate LLMs into existing data analysis pipelines
  • Critically advocate for responsible and ethical use of LLMs in African research contexts

Limitations

This course provides a foundation for continued learning in research using LLMs and current practices that rapidly change.