Large Language Model Pilot Project Call
Overview
The development of large language models (LLMs) has opened many new possibilities for the use of AI in analysing data, though there are many open questions, including:
- accuracy and error
- privacy
- skills
However, despite these concerns, this is a technology that is under-explored in DS-IAfrica and which many groups would benefit from its use.
The use of public LLMs such as ChatGPT, Claude and DeepSeek are of course very important. However, these are not always possible to use in our projects because the sensitivity of the data means that providing the data to public LLMs is too risky or too complex to get regulatory approval. The alternative – to run these models either in the cloud or locally – is very attractive.
Participation is open to data scientists and trainees in the DS-I Africa Consortium and partners. Participants must be competent programmers.
Skill level of training
Intermediate
Language
English
Credential awarded
Certificate of Attendance
Type of training
Workshop
Venue
Hackathon
Professional Development Hub (PDH), University of the Witwatersrand, Johannesburg, South Africa
Course date
- 10 February - 17 April 2026
Phase A (online):
Weekly on Tuesdays 10 February - 7 April as well as Thursdays 12 February and 26 March 15h00 – 16h30 CAT, subject to change.
Phase B (hackathon):
13-17 April 2026
08h30 -16h30 CAT
Application opening date
Wednesday 10th of December 2025
Application closing date
Monday 19th of January 2026 - 23:59:59 CAT
Notification date for successful applicants
Monday 26th of January 2026
Organisers
Scott Hazelhurst, Sumir Panji, Michelle Skelton, Kerry Glover, Tshinakaho Malesa, Shaun Aron, Atwine Mugume, Helen Robertson, Ndivhuwo Makondo
Sponsors
MADIVA, eLwazi Open Data Science Platform, DS-I Africa Consortium
Intended Audience
The course is aimed at graduate students and scientists who are currently working on data science projects in Africa, with preference given to DS-I Africa Consortium members and partners.
Prerequisites
- Competent programmer
- Your own laptop
- Unix terminal or Windows Subsystem for Linux (WSL)
- Command-line knowledge and experience with working with LLMs
- Project support and own project funding for in-person Hackathon travel (see below)
Funding
The Hackathon organisers will cover the venue, a return daily shuttle to/from Rosebank Holiday Inn, daily lunch, and refreshment breaks during the hackathon days.
All other expenses, including travel, accommodation, airport transfers, visas, and vaccinations, must be covered by your project/PI.
Project/PI support is required for your attendance.
Curriculum
The Phase A training component is virtual, although we encourage DS-I Africa projects / individuals to form in person study groups to enable peer to peer learning and develop teamwork skills in virtual classrooms. We would encourage the model of having a TA in each such classroom.
Phase A training will comprise 10 sessions, each 90 minutes long. Some of the sessions may have a practical component/project that participants are expected to complete.
- Introduction to LLMs, overview of existing LLMs
- Theory of LLMs: part 1
- Theory of LLMs: part 2
- Using an API to interact with an LLM
- Programming using LLMs – best practices
- Programming using LLMs: case study
- Ethics and legal issues
- Bias, privacy
- Confidentiality and data leakage
- Introduction to running LLMs locally
- Overview of different options
- Pros/cons of running locally versus cloud, LLM pragmatics
- Approaches to running locally (e.g., fine-tuning, RAG)
- Running LLMs locally: practical exercise
- Critical assessment and reflection
Learning outcomes
After this course participants should be able to:
- Define the fundamental architecture and components of large language models, including transformers, attention mechanisms, and tokenisation
- Compare and contrast various LLM models
- Identify and assess the ethical considerations related to LLM use, including bias, privacy, confidentiality, and data leakage
- Identify appropriate use cases for public LLMs versus locally-run models based on data sensitivity and regulatory requirements, and cost
- Implement API calls to interact with LLMs programmatically for data analysis tasks
- Apply best practices for prompt engineering and programming with LLM, including validation of results
- Configure and deploy local LLM instances using appropriate tools and frameworks, including customisation and fine-tuning
- Assess LLM outputs for accuracy, reliability, and potential errors in scientific data analysis contexts
- Design and implement a complete LLM-based solution for a real-world data science problem in the DS-I Africa consortium
- Develop custom workflows that integrate LLMs into existing data analysis pipelines
- Critically advocate for responsible and ethical use of LLMs in African research contexts
Limitations
This course provides a foundation for continued learning in research using LLMs and current practices that rapidly change.