Data Harmonisation

Data & Data Support

The eLwazi Data Support WG supports DS-I Africa research hubs and other eLwazi users to share their data through the eLwazi platform to enable greater data reuse across the consortium and larger research community. The WG defines data sharing and reuse requirements following shared policy and consent, focusing on making data FAIR (findable, accessible, interoperable and reusable). Data are made discoverable/findable and accessible via a fit-for-purpose data catalogue, which provides metadata search and summary. Similarly, data are made interoperable and reusable by identifying suitable data standards or models for application across the consortium and facilitating data harmonisation and integration efforts within and across DS-I Africa.

Since the DS-I Africa research hubs utilise data from numerous sources, data harmonisation is a crucial focus of the Data Support WG. The WG focuses on identifying suitable data models or tools to facilitate harmonisation, and where applicable, developing tools and resources to support metadata and data harmonisation.

Data harmonisation is a process which involves the alignment of heterogeneous datasets, collected and curated within the same domain, and the integration thereof into a single, homogenised dataset. This process is thus typically carried out across two levels; at a metadata level, when it's harmonised and aligned to effectively compare datasets and identify commonalities between them. The data level, on the other hand, comprises the merging and transformation of data from multiple datasets into a single dataset that contains the pre-determined harmonised variables established at the metadata harmonisation level. Data harmonisation is a crucial aspect of conducting and dealing with large-scale data analyses in a manner that supports efficient data processing and ensures the maintenance of the data quality, ultimately facilitating and accelerating research discovery

The eLwazi Data Support WG provides support in the implementation of various data harmonisation resources, including:

HE2AT Metadata Harmonisation Tool
This simple streamlit application facilitates the matching of variables in the incoming dataset studies to the codebook of your project which is referred to as target variables. In its current release, it also contains an example template that has been used by the data harmonisation team of the HE2AT Centre. In addition, this tool provides the CINECA synthetic cohort Africa H3ABioNet v1 as an example. The general workflow comprises one configuring foundation step and 5 downstream ones. For more details please refer to the tool’s Github page.
H3ABioNet Data Transformation Tool
This tool was originally developed in the H3Africa Cardiovascular Disease Data Jamboree in 2019. It's designed to automate the transformation of data in a single dataset to fit the requirements for a harmonised dataset. The tool allows you to generate a spreadsheet to facilitate the mapping of the dataset metadata to a harmonised codebook. The tool ingests two files; a. xlsx file metadata mapping file, a. csv dataset for transformation and a. csv data dictionary from the clinical dataset as an optional parameter. For more details please refer to the tool’s Github page.
Data Model Application - OMOP
The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) has started as an open community partnership data standard, established to standardise observational research data content and structure, and embraced as the data standardisation model of choice for collaborative work. A central component of OMOP CDM is the Observational Health Data Sciences and Informatics (OHDSI) which contains standardised vocabularies and standard analytical methods and tools, in addition to OMOP itself. Furthermore, the OHDSI vocabularies allow for medical terms standardisation among organisations and in a way that it can be used across the OMOP clinical domains. More information about OMOP CDM can be found on its webpage and GitHub page.