EXAMINING THE INFRASTRUCTURE READINESS FOR RESEARCH DATA MANAGEMENT PRACTICES IN HEALTH INSTITUTIONS IN UGANDA
CHAPTER ONE
1.0 Introduction
The purpose of this study is examining the infrastructure readiness for research data management practices in health institutions in Uganda with a view of proposing interventions to improve fundability, accessibility, interoperability and reusability of research data. Health research institutions and funding agents are spending colossal amount of money to develop new and maintain old infrastructure to maximize creation/production, management, sharing and preservation of research data (Edwards, et al., 2009; Whyte, 2012). Whereas massive research data is being produced, the demand for active data management, curation, and stewardship have become imperative, since the produced research data is of critical value to both current and future researchers (Scott, 2014; Wiley and Burnette, 2019). Managing research data increases opportunities for fundability, accessibility, interoperability and reusability (FAIR) of data which enhance scholarship and is a matter of sound stewardship of research resources (Mladovsky, Mossialos, & McKee, 2015; Simons, 2016; Vasilevsky, et al., 2017).
1.1 Background to the study
This section discusses the historical background, theoretical background, contextual background and conceptual background.
1.1.1 Historical Background
The collection, management and storage of data to aid clinical treatment are probably as old as medicine itself (Ayodele, 2011). As early as the 17th century health data was considered to play an important role in the planning, management and decision-making, in 1662 John Graunt published a landmark analysis of mortality data, this publication was the first to quantify patterns of birth, death, and disease occurrence, noting disparities between males and females, high infant mortality, urban/rural differences, and seasonal variations. More to that Hippocrates in 400BC attempted to explain disease occurrence from a rational rather than a supernatural viewpoint basing on information from study of patterns, (Sekitoleko, 2017).
In 1800 William Farr built upon Graunt’s work by systematically collecting and analyzing Britain’s mortality statistics. Farr established many of the basic practices used today in biostatistics and disease classification. He focused his efforts on collecting vital statistics and evaluating those data, and reporting to responsible health authorities and the general public (Lukumar, 2006).
In the 1970s, 1980s and 1990s, hospital research data management systems in Europe evolved as part of health information systems with a hospital as the health care environment. System developers were initially focused on small applications in special departments of the hospital, for example, in a laboratory, radiology or the administration department. Information systems gradually moved to data processing in the hospital as a whole. This helped to ease the records work in most hospitals worldwide (Kalega, 2015). At the beginning, computer-supported health research data management systems were largely intended to support health care professionals, mainly doctors, nurses, as well as administrative staff in hospitals (Haux, 2016).
Nicol et al., (2013) indicates that research data have not been openly available to other researchers who are not directly associated to the research. In some research communities the practice has been to utilize research data within research units and share with a selected group of trusted colleagues (Koopman & De Jager, 2016).The idea of research data being widely and openly shared is a growing phenomenon which require understanding the practices and cultures of different research communities (Lammerhirt, 2016). However, the means and methods are underdeveloped, fragmented, and lacking (O’Reilly, Johnson, & Sanborn, 2012). Thus, understanding individual researchers’ data behaviors forms the basis for developing the necessary infrastructure and services to comprehensively support and improve productivity (Barsky, 2017; Mladovsky et al., 2015).
Data is the fuel for research. Therefore, managing research data is core to the research enterprise and its sustainability. Besides, RDM may be costly but it represents a great investment since the value of data in terms of its potential for re-use is more than the cost of its acquisition (Ashley, 2012). Governments and funding organizations are increasingly demanding researchers to properly store and share data (Buys & Shaw 2015; Kennan & Markauskaite 2015). Research data management concerns the organisation of data, from its entry to the research cycle through to the dissemination and archiving of valuable results. It aims to ensure reliable verification of results, and permits new and innovative research built on existing information Whyte & Tedds (2011). Good data management is important as facilitates verification of research results thereby making it easier for other researchers to build on the existing research (Corti, Van den Eynden, Bishop & Woollard 2011). However, Kennan & Markauskaite, (2015) went further to suggest that the data may not necessarily be used for research alone since the data include administrative records, log files of learning management systems and web portals and other behavioural traces used in learning analytics and traces of individual lives available from social media.
In Africa, a study by Anane- Sarpong et al. (2017) about research data management practices in health sciences observes that data research data management is slow and unsatisfactory compounded by financial constraints. Other challenges confronting research data management in Africa include lack of data sharing skills, poor or absence of data sharing policies, and poor data infrastructure (Chigwada, Chiparausha, & Kasiroori, 2017). Wyk, (2018) further notes that In Africa, it is only South Africa which has increased awareness about RDM in various sectors and the country is making significant and sustained strides towards inclusion of RDM across its research ecosystem. A number of initiatives are already in place to ensure that research data from different fields is systematically managed and preserved for researchers for both the present and future use. In Kenya and Malawi two studies have so far been carried out to examine RDM in Agriculture Research Institutions and to investigate a wide range of issues in regard to an emerging domain of RDM in public universities respectively (Ng’eno, 2018; Chawinga, 2019). Also a number of initiatives and research endeavours had been going on in Kenya geared towards understanding research data, its access, utilization and application for its accruing benefits (Jao, et al., 2015).
Research data management (RDM) and its accompanying services and infrastructures are predominantly still in a state of infancy in many African countries and little is known about the RDM habits of the researchers from these areas, (Patterton, 2018). The government of Kenya created “Kenyan National Health Management Information System department” in 1975 as the first facility for analyzing and management of health data in the continent. This was created to enable the Kenya manage its research data well and help in giving critical information which is critical in decision making in health research (Odhiambo, 2005).
Providing proof of research data management (RDM) when applying for research grants is a relatively new requirement for research funding in South Africa. A good example of this development is the decision taken by the National Research Foundation (a main contributor to publicly-funded research in South Africa) that recipients of National Research Foundation (NRF) grants would need to indicate how the data generated by the research will be made publicly accessible. In addition, the NRF requires data supporting publications resulting from funding to be deposited in an “accredited Open Access repository” and for a Digital Object Identifier (DOI) for future citation and referencing to be provided (National Research Foundation 2015).
RDM and institutional RDM support at many South African research institutes is currently still at an early stage. Terms describing this state of affairs, such as “haphazard” (Kahn et al. 2014) or “slow” (Van Deventer & Pienaar 2015), feature strongly in South African RDM discussions. Even though articles and reports are being published on the topic, the exploratory nature of the current South African RDM community is evident. Apart from a decade-long data curation service existing at the Human Sciences Research Council (HSRC) a frontrunner in the local data management scene (Lӧtter & van Zyl 2015) Most institutes are not yet at a stage where infrastructure, services and staff form part of an established institutional RDM regime. As a result, current RDM-related literature tends to report on investigative surveys that have been conducted, pilot projects that have taken place, or tools and software that are being tested. Examples of this trend include reports of RDM surveys at the University of Pretoria and the Council for Scientific and Industrial Research (CSIR) (Van Deventer & Pienaar 2015), pilot projects at the Cape Peninsula University of Technology (Chiware & Mathe 2015) and the six-month long nationwide digital repository testing of Figshare and Islandora , spearheaded by the Data Intensive Research Initiative of South Africa (DIRISA 2018). An encouraging activity which is indicative of national RDM interest and progress is the establishment of the Network of Data and Information Curation Communities (NeDICC), a South African RDM community of practice, and its short, yet productive, contributions in terms of RDM-centred workshops and meetings (NeDICC 2017).
Kennan & Markauskaite (2015) research data, just like data sources, are heterogeneous because of the many forms depending on origins, research problem addressed and the discipline of the researcher. The authors note that in the life and physical sciences, researchers gather and produce data mostly through observations, experiments and computer modelling whilst in the social sciences researchers gather and produce data from interviews, surveys and questionnaires, and observations.
Research data management should address issues relating to which data will be generated during research, metadata, standards and quality assurance measures, modalities for sharing and securing data, ethical and legal issues relating to data sharing that include copyright and intellectual property rights of data, data storage and backup, resources and costs associated with data management and, data management roles and responsibilities (Corti, et al., 2011).
According to the World Bank’s Statistical Capacity Indicator, Nigeria recorded a 67.8 per cent in terms of statistical capacity (World Bank, 2016). This score is higher than the sub-Saharan Africa average and other international development association (IDA) eligible countries. Despite the relatively high statistical capacity, data collection and processing in Nigeria over the past decades (post-independence in the 1960) has experienced enormous challenges in terms of its use in achieving developmental objectives. The data community have been undermined with inadequate financial and institutional resources thereby, weakening the data value chain ranging from production, management, dissemination, archiving and use. Some of these challenges have been tackled while many still persist and remain the major hindrances to the way data is being harnessed to impact on development decision-making.
Recognizing the importance of data in achieving developmental objectives, the federal government of Nigeria established the National Statistic System (NSS) under the Statistical Act of 2007 –a repeat of the Statistical Act of 1990 (Government of Nigeria, 2007).
Despite the efforts to promote FAIR data to advances knowledge, expand research opportunities, and improve health services, different studies have found challenges including; lack of supportive infrastructure for research data management, inability to access data, restrictions on usage applied by publishers or data providers, and publication of data that is difficult to reuse (Molloy, 2011). Research culture and incentives also make researchers often unable or unwilling to make their data accessible (Savage & Vickers, 2009; Vines et al. 2013). Furthermore, insufficient infrastructure, resource constraints, disciplinary differences, policy and legal constraints, and lack of awareness have been pointed out (National Academies of Science, 2018). In addition, most research data is under the custody of the researchers, who have different perspectives about managing their research data, are poor stewards of their data particularly over the long term and end up creating unique, ad hoc approaches to organizing their data, hence making it inaccessible and putting data at risk of loss (Chigwada, Chiparausha, & Kasiroori, 2017; Van Tuyl & Michalek, 2015; Vines et al., 2014; Jones, Ball, & Ekmekcioglu, 2008). Governments have done little to develop the necessary policies, technical, social and organizational components required for RDM despite the heavy investment by funders in research data collection. As a result, research institutions and researchers are losing the opportunity to harness the potential that exist within research data resources which affect their scholarly productivity (Barsky, 2017; Mladovsky et al., 2015).
In Uganda, Tomusange, Yoon and Mukasa, (2017), carried out a study on emerging discussion on data sharing and reuse in public sector for purposes of understanding the relevant stakeholders’ perceptions of data sharing and reuse practices/services. Preliminary findings showed that data sharing and reuse culture had not been fully developed noting numerous barrier which inhibited the practices. However, sharing and reuse practices ought to be understood from the data management perspective, since the ultimate goal of RDM is ensuring that data is accessible, sharable and reusable. However, research data management has not been studied despite the increased need and demand for its availability to others. Nonetheless, health research institutions are increasingly establishing databases, biobanks and repositories as part of the global health research trends to ensure availability of health research data for long term preservation, access and reuse (Nnamuchi, 2016). However, most of these initiatives are standalone which may have long term repercussions for RDM and promotion of FAIR data.
Repositories such as; MalariaGEN, 1000Genomes, PANGEA-HIV and H3Africa are operational and offer opportunities for health research data to be accessible and reusable by a wider scientific research community (Kaleebu, 2017). Additionally, there is increased collaboration in research between researchers and institutions from developed countries and Uganda resulting into immerse benefits but also with challenges that need to be understood and mitigated (Kamya, 2017). However there seems to be incomplete knowledge about the underlying technical, social and organizational required for RDM to promote FAIR data in health research institutions in Uganda which affect the long term value of the research data and researchers’ productivity (Barsky, 2017; Mladovsky et al., 2015).
Theoretical Review
The systems theory developed by Ludwig von Bertalanffy in 1930s indicates that nothing could be understood by isolating merely one part of what plays a significant role in a system.
Kerzner, (1987) further indicated that when applied to organization , considers the organizations to be made of different sub-systems which are intergraded into whole , for example sub-systems in an information systems to meet set organizational goals in order to fulfill customer expectations and requirement.
Korzner, (1997) and Borciejet al, (2006) advance that systems operate as an open systems having a dynamic interplay with its sub components and also the environment where it gets resources for production thus it is influenced by other systems, due to the interaction that occurs at the interface, maintenance of stability is crucial during these interaction.
The systems theory focuses on the relation between different parts which make up the entity. Rather than reducing an entity systems theory focuses on the arrangement of and relations between the parts and how they work together as a whole. This theory also further indicates that the way the parts are organized and how they interact with each other, determines the properties of that system.
A system is defined as a complex and highly interlinked network of parts exhibiting synergistic properties-the whole is greater than the sum of its parts (Flood and Jackson, 1991). However according to (Schoderbek et al, 1985) and Checkland (1981) a system is defined as a set of objects together with relationships between the objects and between their attributes related to each other and to the environment so as to create or form a whole.
From the above definition of the systems theory it is therefore imperative to indicate that the research data management practices in health institutions in Uganda is affected by several factors and not only a single factor is responsible for the effective data management.
In relation to this study the systems theory points out that the organization is made of systems of which each have different effects on the entire entity, this therefore indicates that data management practices in Uganda are affected by many independent factors which affect the single factor of data management. In study in Uganda by Tomusange, Yoon and Mukasa, (2017), indicates that data sharing and reuse culture had not been fully developed noting numerous barrier which inhibited the practices. However, sharing and reuse practices ought to be understood from the data management perspective, since the ultimate goal of RDM is ensuring that data is accessible, sharable and reusable.
According to Mladovsky et al., (2015 and Tenopir et al., (2015), the challenges of data management are compounded by data storage problems, duplication of database, disorganized research resources and absence of mandatory local research data repository, in addition to limited skills to assess data quality for possible reuse, this indicates that there is no single factor that explains data management in a whole and therefore there are numerous factor.
Conceptual Background
The term research data was defined by Rice (2009) as data ‘collected, observed or created for the purposes of analyzing to produce original research results. However, Kennan & Markauskaite (2015) went further to suggest that the data may not necessarily be used for research alone since the data include administrative records, log files of learning management systems and web portals and other behavioural traces used in learning analytics and traces of individual lives available from social media.
According to Whyte & Tedds (2011), ‘Research data management concerns the organisation of data, from its entry to the research cycle through to the dissemination and archiving of valuable results. It aims to ensure reliable verification of results, and permits new and innovative research built on existing information.
In a more technical sense, data is a set of values of qualitative or quantitative variables about one or more persons or objects.
Data management includes all aspects of data planning, handling, analysis, documentation and storage, and takes place during all stages of a study. The objective is to create a reliable data base containing high quality data. Data management is a too often neglected part of study design and includes: Planning the data needs of the study, Data collection, Data entry, Data validation and checking, Data manipulation, Data files backup and Data documentation (Sylvia, 2018).
1.2 Problem statement
Research data generated are outpacing the development of technical, social and organizational component, knowledge and skills necessary to manage them (Whitmire, Boock, & Sutton, 2015). Research data created is also crucial in discovering new knowledge to advance medical practices, to understand health and diseases and to improve healthcare and the health of populations ( Bull, 2016; Denny, et al., 2015; Guy & Ploeger, 2015). Thus, research data should to be managed and preserved for ongoing and for possible future reuse (Perrier et al., 2017; Tripathi, et al., 2016). The problem is research data cannot be fairly found, accessed, interoperated and reused and where it is available is of poor quality, incomplete or missing which makes it un-utilizable (Luyirika, 2019; Stover, et al., 2018; UNCST, 2014). This may be attributed to incoherent and inadequate RDM practices which pose a threat to academic research in Uganda. Indeed health research institutions in Uganda are conducting diverse and cutting-edge research which is generating enormous volumes of research data. However, they are operating in silos with tailored resources with no common standards for interoperability across institutions, nations and wide geographical areas (Star and Ruhleder, 1995). Niwagaba, 2017) further indicates that Uganda like other Sub-Saharan Africa still uses these paper records of their patients as a form of medical record and Even the summaries sent to their Ministries of Health are in a hard paper form. A few hospitals have started digitizing sections of their health data, but the road to paperless medical data system remains riddled with encumbrances, this has further led to poor record management systems in the different hospitals in Uganda.
Nonetheless, the established technical, social and organizational components drive researchers to create, organize, share, preserve and reuse research data. Although there is growing international demands for FAIR data, Uganda has not adequately responded to this requirement which may ultimately disadvantage local researchers, health research institutions and society as a whole. Since it continues to be a challenge to locate, find, access, interoperate and reuse research data within and outside the research institutions (Vasilevsky, Minnier, Haendel, & Champieux, 2017; Vines et al., 2014; Jones, Ball, & Ekmekcioglu, 2008). The research institutions in Uganda like the TASO, Uganda National Council of Science and Technology, Uganda National Health Research Organization (UNHRO and UVRI which is supported and collaborates with both local and international partners to undertake health research in Uganda and is recognized as one of the few centers of excellence in health research in the region. However, though the institution is mandated to carry on Health research. Data management in the institution is currently reported to be poor and the organization.
Expensively collected research data is used only once, privately kept or shipped to collaborating institutions in the developed countries’ repositories, or discarded and or abandoned to obscurity (Mladovsky et al., 2015; Tenopir et al., 2015). The challenges are further compounded by data storage problems, duplication of database, disorganized research resources and absence of mandatory local research data repository, in addition to limited skills to assess data quality for possible reuse.
Well aware that existing research data resources present significant assets with limitless opportunities which require adopting and implementing RDM practices for benefit to researchers, society and health research institutions today and in future. However, it is essential to understand how researchers create, organize, share, preserve and reuse research data for benefits and to add values that impact practices, policies, and scientific knowledge (Shen, 2017).
It’s against this background that this theory will be used in examining the infrastructure readiness for research data management practices in health institutions in Uganda
1.3 Purpose of the study
To explore RDM practices and the underlying systems that support creation, organization, storage, preservation and reuse of research data in health research institutions in Uganda.
1.4 Objectives
- i)To assess the status of RDM practices in health institutions in Uganda
- ii)To examine the technical, social norms and organizational practices associated with RDM practices in health institutions in Uganda
iii) To explore health researchers’’ perceptions towards adaption of RDM practices in health institutions in Uganda
- iv)To identify the challenges of adopting RDM practices in health institutions in Uganda
- v)To propose a RDM infrastructure framework for health institutions in Uganda.
1.5 Research Questions
- i)What are the current status of RDM practices in Health institutions in Uganda?
- ii)What are the technical components, social norms and organizational practices associated with RDM practices in health institutions in Uganda?
iii) What are the researchers’ perceptions towards adapting RDM practices in health institutes in Uganda?
- iv)What are the challenges experienced in adapting RDM practices in health institutes in Uganda?
- v)What nature of RDM infrastructure framework can be adapted for health institutions in Uganda?
1.6 Scope of the Study
The scope of the study defines the; context, geographical and the time frame.
Contextual: the study focuses on investigating RDM in health research institutions in Uganda. This shall entail examining the current state of research data management, describing the infrastructure systems and how it influences RDM practices; understanding how activities, organization and material aspects shape RDM practices on researchers and analyzing the human relations and how they influence RDM practices in health research institutes in Uganda.
1.7 Key outputs
Among the key outputs of this study include;
- A research thesis on Research Data Management in health research institutions in Uganda.
- A number of published peer reviewed articles in academic journals on the subject
- A policy brief on Research Data Management in health research institutions in Uganda
1.7.1 Expected output
- A completed research thesis
- Papers published in peer reviewed journals
- A policy brief about Research Data Management in health research institutions in Uganda
1.8 conceptual frame work
CONCEPTUAL FRAME WORK
Infrastructure readiness of health institutions in adapting RDM practices in Uganda.
|
· Data validation and checking · Data entry -Use f Microsoft excel -Epidata · Data manipulation · Data files backup -flash disks -memory cards Computer hard disks · Data documentation |
Infrastructure readiness of health institutions (IV) RDM practices (DV)
| Infrastructure systems · Internet connection · Computers and laptops · Intra-net connection |
| Data Storage · Computerised storage · Manual storage |
| Knowledge and skill · Employee knowledge · Employee qualification |
| · Data life cycle · Organizational policy · Managerial policy · The country’s policy on Health data |
Intervening variable
Source:
The conceptual frame work above indicates that infrastructure of Heal research institutions as an independent variable is measured by the following dimensions of ; Infrastructure systems which specifically include; Internet connection, Computers and laptops and Intra-net connection, Data Storage which is either Computerized storage or manual Manual storage and lastly the conceptual frame work further indicates that Knowledge and skill of the employees is one of the key important infrastructures needed in the country which is measured by employee knowledge and qualification and all the above has influence of Research data management process in different ways like through Data validation and checking, Data entry, Data manipulation, Data files backup and Data documentation. The Conceptual frame work further indicates that both the independent and dependent variable are affected by the intervening variables which includes; Data life cycle, Organizational policy, Managerial policy and The country’s policy on Health data.
1.7 Significant of the study
Although there is growing volume of literature about RDM, little has been written from the perspective of infrastructure theory despite limited theoretical grounding of RDM as subject of study. RDM lack generic theories and most of the study are based on models, where theories have been used, they have been borrowed from other fields. Following that trend, applying the ecology of infrastructure theory shall contribute toward deeper understanding of the role of different stakeholders in RDM practices for FAIR data. In addition, the research intend to shed light on application of infrastructure theory to explain the experiences of researchers with RDM practices in institutions operating in resource constrained environment.
RDM practices have gained significant acknowledgement from funders, publishers and a growing number of scholars due to its contribution towards FAIR data. Assessing researchers’ knowledge, culture and attitudes based on global standards shall help in identifying knowledge gaps that need to be bridged for effective response to global research ecosystem demands within the local institutions.
The study intends to analyse the local legal and policy framework for RDM and its implication on FAIR data practices from an infrastructure theory perspective. Identified knowledge gaps shall require recommendations which are based on empirical findings of the current study. Thus a policy shall be derived that address key findings of the study for purposes of mainstreaming RDM for FAIR data across health research institutions in Uganda.
The study is bound to contribute to knowledge about RDM practices for FAIR data across health research institutions in Uganda and application of ecology of infrastructure theory to Research data from a low developing country’s perspective.
1.8 Justification of the study
Definition of key terms
Research Data: OECD (2005), defines research data broadly as factual records (numerical scores, textual records, images and sounds) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings.
Research data Management: Research Data Management (RDM) is about developing and implementing practices, procedures, and policies to protect, validate, and describe data in a world where data value is outliving its outputs (Consortium of European Social Science Data Archives, 2015).
Health Research: UHRO (2012), defines health research as encompassing the spectrum from the biomedical sciences to health policy and systems research, social sciences, traditional and complementary medicine, political sciences, health economics, behavioural and operational research, and research into the relationship between health and the cultural, economic, physical, political, social and policy environments.
Health Research Institution
Research data management, put simply, refers to the effective handling of information that is created in the course of research. Managing research data is usually an integral part of the research process and extends over the entire life cycle of the data, from the point of creation through to dissemination and archiving, and will usually continue long after the initial research project has concluded. Data management typically involves:
- Planning for and creating data
- Organizing, structuring, and documenting data
- Backing up and storing data
- Preparing data for analysis, to share with others or to preserve for the long-term