LIST OF ABBREVIATIONS
AIDS Acquired Immune Deficiency Syndrome
CCMF Community Capability Model Framework
CODATA International Science Council Committee on Data
COVID-19 Corona Virus
DAF Data Asset Framework
DCCM Digital Curation Centre Lifecycle Model
DDI Data Documentation Initiative
DMP Data Management Plans
DO Data Officer
EPSRC Engineering and Physical Sciences Research Council
EU European Union
FAIR Findability, Accessibility, Interoperability, Reusability
HIV Human Immunodeficiency Virus
ICSU International Council for Science
ICT Information Communication and Technology
ISO International Standard Organization
IT Information Technology
JCRC Joint Clinical Research Centre
LG’s Local Governments
LIS Library and Information Science
MDAs Ministries, Departments, and Agencies
MOH Ministry of Health
MoICT Ministry of Information and Communications Technology
MRC Medical Research Council
MRO Medical Records Officers
NGO Non-Government Organization
NIH National Institute of Health
NITA-U National Information Technology Authority Uganda
NSF National Science Foundation
OECD Organization of Economic Co-operation and Development
QC&QIO Quality Control and Quality Improvement Officer
RA Research Administrator
RDA Research Data Alliance
RDM Research Data Management
RENU Research and Education Network Uganda
SA System Administrator
SDGs Sustainable Development Goals
SOPs Standard Operating Procedures
SPSS Statistical Package for Social Scientist
UCC Uganda Communications Commission
UCI Uganda Cancer Institute
UK United Kingdom
UN United Nations
UNCST Uganda National Council of Science and Technology
UNGA United Nations General Assembly
UNHRO Uganda National Health Research Organization
USA United States of America
UTAUT Unified Theory of the Acceptance and Use of Technology
UVRI Uganda Virus Research Institute
WDS World Data System
WHO World Health Organization
LIST OF TABLES
LIST OF FIGURES
LIST OF APPENDICES
- Appendix 1 Questionnaire for Researchers
- Appendix 2a Interview Guide for Research Administrators
- Appendix 2b Interview Guide for System Administrators/Data Officers
- Appendix 2c Interview Guide for Librarians/Medical Records Officers
- Appendix 3 Document Review guide
- Appendix 4 Consent form
- Appendix 4a Letters of Introduction from East African School of Library and Information Science to the Executive Director, Uganda Cancer Institute
- Appendix 4b Letters of Introduction from East African School of Library and Information Science to the Executive Director, Joint Clinic Research Centre
- Appendix 4c Letters of Introduction from East African School of Library and Information Science to the Executive Director, Uganda Virus Research Institute
- Appendix 4d Letters of Introduction from East African School of Library and Information Science to the Chairperson, Research and Ethics Committee, School of Health Science
- Appendix 5 Letter of Ethical clearance from the School of Health Science Makerere University
- Appendix 6 Letter of Ethical clearance from the Uganda National Council of Science and Technology
- Appendix 7 Letter of Administrative Clearance from Uganda Cancer Institute
- Appendix 8 Letter of Administrative Clearance from Joint Clinic Research Institute
- Appendix 9 Letter of Administrative Clearance from Uganda Virus Research Institute
CHAPTER ONE
INTRODUCTION AND BACKGROUND TO THE STUDY
Research Data Management (RDM) is a general term defining the processes, services, and policies covering how data generated and used for research are created, organized, described, stored, and preserved to ensure its continuous access, sharing, and reuse in addition to guaranteeing its security and long term value to its holders (Science Europe, 2018; Schöpfel, et al., 2018). RDM has been embraced as a pragmatic solution to manage research data, ensure its quality and integrity, and make it accessible to researchers now and in the future. (Heuer, 2020; Wilms, et al., 2020). It is recognized globally as a best practice in research and a standard that gives a competitive advantage to research institutions supporting: finding, accessing, interoperating, and reusing (FAIR) research data (Fuhr, 2019; Perrier et al., 2017).
Research data management (RDM) in addition, defines both activities and practices undertaken in the process of creating, organizing, storing, sharing, preserving, and reusing data generated from a research life cycle. Research Data Management is promoted by policies merging from developed nations (Lämmerhirt, 2016) and progressively being endorsed by governments, funders, publishers, and research councils across the different regions of the world (Perrier et al., 2017b; Tam, et al., 2014). However, in the Low Developing Countries, this demand is pressing research institutes and researchers to adopt the practices without sufficient support systems and infrastructure though they are increasingly required for FAIR data, a best practice and research standard and basis for possible funding and publishing of the results (Wilms, et al., 2020b). Thus the need to comply seems to be the primary factor for research institutes to engage in RDM practices. Whereas funders are enforcing data management plans for funding considerations as an option for optimal harnessing to the research resources for the public good, publishers are concerned with access to data for purposes of validating research findings. Yet the idea of research data being FAIR is a phenomenon enforced in developed rather than developing countries. Nonetheless, as RDM is being adopted across the world, slowly taking shape in developing countries and is of critical importance in increasing integration of the global research enterprise through collaborations (Patterton, 2016).
On the other hand, research data are valuable commodities managed and preserved for current use and also for future benefits to researchers, institutions, and society (Matlatse, 2016). The current international recognition of research data as a resource of value has caused a paradigm shift where both funders and publishers are emphasizing the incorporation of data management plans in funded research projects. The plans articulate how research data shall be managed, made available to other researchers and the inquisitive society, and allow wider use rather than what the originators could have envisaged (Park, 2018; Renaut, et al. 2018; Sa and Dora, 2019). Therefore, RDM is recognized as an effective approach to managing research data generated in different disciplines and institutions. Where it has been successfully adopted as a practice, is commended for mitigating the chaos that could have resulted from the ever-growing and diverse volumes of research data generated across institutions (Berman and Cerf, 2013; Borgman, 2012; Choi and Lee, 2020).
In health and biomedical science, research is a process for systematic collection, description, analysis, and interpretation of data that can be used to improve the health of individuals or groups (Fathalla and Fathalla, 2004). The UNHRO Act (2011), defines health research as using scientific methods to generate new knowledge to deal with an identified health problem or curiosity. However, in the context of health research, research data management entails effective handling of information resources created in the course of research which is often integral to the research process and extends over the entire research life cycle. RDM is commended for increasing efficiency in research collaborations. By ensuring that valuable research data is protected as well as being: preserved, accessible, usable and reusable as long as it remains relevant to society. RDM protects research integrity, saves time, prevents errors, and increases the quality of data for analysis. It also commended for easing and increasing access to original research data for validating and replicating findings. Well aware that undertaking research in health and biomedical science is an expensive venture, creating and collecting research data requires considerable resources which could be optimized through RDM which contributes towards data sharing and reuse. This increases the rate of knowledge generation and enhances research productivity. It also allows valuable discoveries by non-associated researchers who ask new questions to existing research data (Park, 2018). It ensures compliance with standards in data management and documentation (Ray, 2014), enforces ethical codes, data protection laws, journal requirements, and funders’ policies which increases the competitiveness of institutes, by attracting research grants and reducing loss of valuable data (Chawinga and Zinn, 2020a). A classic example of the importance of RDM in health research is the recent early adoption of RDM practices across biomedical research which is contributing to the evolving understanding of COVID-19 disease, development of preventive measures, treatment and care regimes, and speedy development of vaccines. This has further justified RDM and its importance in international research collaborations, scientific transparency, and data sharing for robust evidence and informed health/medical care decisions making mechanisms (Bjormmaln, 2020).
Globally, research data management is commended for improving the quality of research data across disciplines. Unfortunately, policies to guide RDM are yet to be agreed upon and be put in place (Liu, et al., 2020). Although several initiatives have been established to promote RDM and ensure its adoption across disciplines and nations (CODATA, 2019; RDA, 2013). The policy statements from international organizations specifically health and biomedical research funders and publishers have continued to establish guidelines for managing and systems for hosting research data (European Commission, 2016; International Council for Science (ICSU), 2014; Zhang, et al., 2021). The Committee on Data for Science and Technology (CODATA) an interdisciplinary organization launched by the International Council for Science (ICSU) in 1966 is working to support and improve the reliability, quality, accessibility, and management of research data. In 2019 CODATA called for new policies and principles to be implemented for research data and for associated infrastructures, tools, services, and practices needed to be put in place. The Beijing CODATA 2019 declaration, further called on the world to increase the demands and need for RDM to be adopted to leverage quality and reuse of research data across different disciplines. On the other hand, Research Data Alliance (RDA) is spearheading the global scientific communities’ effort through different programs to address barriers and make access to research data a universal reality (RDA, 2019). Research Data Alliance is working to build technical and social bridges to facilitate the open sharing and exchange of data by researchers and innovators irrespective of the technologies used, across disciplines and countries. This has also led ta the wider adoption of the Findable, Accessible, Interoperable, and Reusable (FAIR) principles for data, initially proposed in 2014 as measures to achieve balanced access to research data (Wilkinson, 2016).
Despite the global efforts, research data continues to be affected by a lack of standards, insufficient guidelines, and support services hindering its adoption in low developing countries (Fuhr, 2019). Research Data management has been given lukewarm attention in low developing countries and as a consequence, it remains in formative years, fragmented, and lacking (Patterton, 2016; Patterton, et al., 2018). This may be attributed to continued research data loss, mishandling, misuse, and inaccessibility when needed (Chawinga and Zinn, 2020a). Studies carried out in the Republic of South Africa, Kenya, Tanzania, Malawi, and Zimbabwe revealed several challenges obstructing RDM practices (Chawinga, 2019; Chiparausha and Chigwada, 2019; Mushi, et al., 2020; Ng’ Eno, 2018). The challenges identified included: lack of legal/policy frameworks and standards to guide the research life cycle processes, absence of technological infrastructure and related services, a diverse range of types of data, and low-quality data associated with inconsistencies in collection methods (Fuhr, 2019; Antell, et al., 2014). Other challenges noted were: limited funding, lack of training and leadership as well as the absence of funders’ proactive role to manage research data better (Ashiq, et al., 2020; Carter, 2020). Though there is plenty of literature in the developed nations about RDM, supportive literature in low developing nations is scarce and only emerging due to participation in international research collaborations (Mohammed and Ibrahim, 2019; Mushi, et al., 2020; Tripathi, et al., 2017).
Research aims and guiding questions
The questions that guided the research were?
- i) What kind of research data are created and held in selected health institutes in Uganda?
- How research data is managed in selected health institutes in Uganda?
- What is the research data management readiness for in selected health institutions in Uganda?
- What are the challenges affecting the adoption and uptake of RDM practices in selected health institutes in Uganda?
- What interventions are required to improve RDM practices in selected health institutes in Uganda?
Research Data Management-related studies have been conducted from 2010 onwards across many countries, continents, and institutions (Perrier et al., 2017a). The studies have investigated RDM practices in specific disciplines (Gowen and Meier, 2020), single and or multiple institutes (Lipton, 2020; Liu, et al., 2020). Studies have also been conducted at faculty and discipline-specific levels in higher education institutions (Borda, et al., 2020). However, much of the studies have been subjected to surveys describing practices and investigating researchers’ behaviors in institutions (Curty, 2016). Surveys have been commonly used due to being less resource-intensive to conduct and partly supply a level of data across the institution, rather than focusing on a few cases (Cox and Williamson, 2015). Most of the early studies were interested in the attitudes and data sharing behaviors of scientists. Earlier studies investigated various subject disciplines. researchers’ awareness of RDM, data sharing behaviors, and attitudes (Zimmerman, 2003; OECD, 2007). RDM as a practice has involved different categories but had not been investigated during those early days.
Nature of research
Structure of Uganda’s Health Systems
Uganda’s health system is comprised of decentralized healthcare services (MoLG, 2013). Healthcare is delivered in form of; prevention, promotion, treatment, rehabilitation, and palliation services (W.H.O., n.d). The healthcare services are overseen by district health teams across 147 districts and coordinated by the Ministry of Health centrally. The health system consists of 2 national referral hospitals, 19 regional referral hospitals, 147 district hospitals, 193 health center IV’s; 1250 health centers III’s, and 3610 health center II’s (NDP, 2019). These are key institutions in the delivery of health services across the country, there is a total of about 76 institutions involved in health-related research across the country.
| Nature of Research Institution | Number |
| Universities | 10 |
| Non-Government Organizations | 3 |
| Research Institutes | 24 |
| Other Health Training Institutions | 20 |
| Hospitals | 19 |
| Total | 76 |
The Uganda Health Research Organization (UHRO), (2000) found a reasonable number of organizations devoted to health research. Most of the institutions that carry out health-related research included; universities were also involved in teaching. The report categorized 59 health-related researchers ranging from; Basic Scientists, Clinicians, Epidemiologists, Social/Behavioral Scientists to Economists, and Social Anthropologists. Persons involved in health-related research are diverse based on the nature of health as a discipline. Nakanjako, et al (2017), categorized researchers based on the level of seniority including; junior, mid-level faculty, and senior faculty. However, other studies identified researchers and categorized them based on their role within the research project such as; Principle investigators, Data Managers, and Quality Control Officers, among others. In the current study, the focus shall be based on role function with the health research institute since it was assumed to describe better the actual duties and the responsibilities of the holder.
About the type of research generated, Nakanjako, et al (2017), found out that in a decade (2000 to 2015) research carried out at the Makerere University Health of College Science, based on health specialty were; 52% medicine, 28% public health, 15% biomedical science and 5% health science. Noting that the leading research areas under medicine were identified as; 57% infectious diseases, 20% non-communicable diseases (NCDs), and 11% non-communicable maternal and child health illnesses. Furthermore, it was established that the majority 60% were hospital-based and 40% were population-based studies. This shows that many of the research institutes run medical clinics where routing health vitals-related data is collected to analyze; trends, prevalence, and infection concentrated geographical areas. It is also indicative that much of the data collected resides within institutes as data custodians for both sponsored research projects and the routine data collected from patients.
Stover, et al. (2019), conducted a large public health data collection project in Uganda: methods, tools, and lessons learned noted that large data collection efforts in limited-resource settings are common, but researchers have not published much on how they have managed these projects. The study found out that older data tendered to be less well kept. However, given that the country has no substantive framework for research data management, there is no reference point and each institute applies different approaches even within different projects in the same institute. The absence of a framework for RDM practices may be the reason for the abandonment of and loss of access to huge volumes of research data which is disappearing and never to be recovered. It’s important to know; what data exists, where it’s currently hosted, and to create a mechanism for identification and archiving of such valuable data for potential future use (Smale, et al., 2018).
Uganda’s health and biomedical research are highly dependent on foreign funding. Funding is diverse much of which comes from; the National Institutes of Health (USA), European Union Agencies, Medical Research Council (UK), and global pharmaceutical companies (Nakanjako, et al 2017). Health and biomedical research at different levels of completion are conducted in the country by different institutions. Most of the studies are sponsored by the developed nations’ governments and institutions, surprisingly, data collected is usually shipped to advanced storage facilities and copies archived under restrictive regulations agreed upon in a memorandum of understanding. This makes research data inaccessible to even the local researchers (Kamya, 2017). It’s also critical to note that, undertaking health and biomedical-related research in Uganda requires satisfying the existing legal regime. The framework calls for the submission of a completed proposal to a health-related research ethical committee. Once approved, the completed proposal is also submitted to the Uganda National Council of Science Technology for administrative clearance and monitors the research progress to its conclusive end. The approved protocols should be adhered to and in cases of adjustment, approvals should be sought after before implementation failure which may lead to cancellation or suspension of the research project. In case the research involves the administration of drugs, it’s also submitted to the National Drug Authority which is responsible for monitoring and approving drugs for use in the country.
Whatever the case, research data remains a responsibility of the individual researchers and or the research institution. The existing policy emphasizes the need to keep the data for at least five years at the point of the collection after the research has been completed (UNCST, 2014). It also lays down the requirement for bio-samples export, however, the regulation remains elusive when it comes to research data which in most cases is automatically remitted to sponsors repositories in North America or Europe. The policy is also silent on access to research data by non-associated researchers. This could be attributed to ongoing hindrance to optimizing the research data resources in the country. Uganda National Health Research Organization (UNHRO) policy requires an establishment of a Data and Safety Monitoring Board (DSMB) for each health and biomedical research being undertaken. Such a board should be composed of the independent group of experts established by study sponsors to review data safety during clinical trials. It also ensures that a study is conducted and data are handled following provisions of the research protocol and monitors adverse events and safety data.
The focus of this study is premised on the fact that the health of the Ugandan population is central to the socio-economic transformation of the country (MoH, 2015). Health is a key ingredient for improving the quality of life and enhancing the productivity and social wellbeing of the population (National Development Plan, 2020). In Uganda, healthcare services are delivered through a network consisting of; 2 national referral hospitals, 19 regional referral hospitals, 147 district hospitals, 193 health center fours; 1250 health centers threes, and 3610 health centers twos, increasing access and utilization of health services across the country. However, healthcare services delivery is faced with several bottlenecks that reduce the functionality of many health facilities at all levels. The standards of healthcare are still wanting and require continuous research to inform policy and practice geared towards improving prevention, care, and curative services.
The development of Information Communication Technology (ICT) saw hospital information management systems evolving in the late twentieth century. These focused on specialized applications in laboratories, radiology, and administration. Nonetheless, the ICT systems have since advanced to include; documenting, analyzing, and transmitting information to support healthcare at different service levels (Tierney, 2010). The ICT has further moved into data processing in hospitals not only to ease documentation work and make data/information available from almost any location in the world but has also changed healthcare delivery (Evans, 2016). The health information management systems are currently the basis for an evidence-based informed decision-making mechanism for healthcare professionals which enhances medical treatment, prevention, and care services outcomes (Haux, 2016). It’s also expanding further to incorporate research data management that supports data; generation, accessing, processing, storage, preservation, and reuse which increases the speed of data analysis to curb diseases and improve healthcare services.
Research data management gained importance around the 1980s, particularly in cross-disciplinary scientific research with funders requiring data management and data sharing plans (Brinet, 2015; Furh, 2019). Researchers explored the exponential of research data in the early twenty-first century with digital data becoming increasingly common and taking center stage in research. Well-managed research data lead to discoverable data for re-use, increasing; reproducibility, productivity, and optimizing the use of research resources (Beitze, 2013). Until then, research data had not been openly available to other researchers who were not directly associated with the research (Nicol et al., 2013). In many research communities, the practice had been to utilize research data within research units and share it with selected groups of trusted colleagues (Koopman & De Jager, 2016). This trend is drastically changing due to the open science movement which has become popular across health and biomedical funders, publishers, and research councils around the world (Ashroofa, 2015).
Open Science is the movement to make scientific research and data accessible to all. It has great potential for advancing science (Mwangi, et al., 2021). At its core, it includes (but is not limited to) open access, open data, and open research. Open Science fosters transparency, collaboration, and accessibility in the conduct of research. The concept of open science includes open data, open access publishing, open peer review, open notebook, open education, collaborative research, and citizen science (European Commission, 2017b). Open science is to make scientific research at all levels and formats available to everyone so that learning, research, the creation of additional knowledge, collaborations, and innovations can take place. This has become critical as scientific research is becoming more and more data-driven and the need to share data and knowledge to curb wasteful duplication and scale discoveries and economies continues to increase. Open science advocates for unfettered access to research data, however, studies show institutions operate systems of managed data access in which access is governed by legal and ethical agreements between stewards of research datasets and researchers wishing to make use of them. This has further led to the adoption of FAIR data principles which are reinforcing the demands for research data management practices of researchers (Bishop and Borden, 2020). RDM is an important prerequisite and enabler for the fulfillment of FAIR data principles for a robust and reusable science. The publication of the FAIR data principles in 2016 and the adoption of an open data policy in Horizon 2020 have made universities, research institutes individual scientists adopt RDM as a best research practice. In addition, the open science movement has expanded and is pushing researchers to embrace new ways of working to make research data open and reusable (Gomez-Sanchez and Iriarte, 2017). The open science movement is improving the availability and usability of data for all areas of research. It mainly focuses on strategizes for reuse, reproducibility, and transparency in research. It also emphasizes open data, data that are freely and openly available to the general public widely used in scholarly communication, governmental, and industrial sectors (Park, 2018). The sustainability of open science is dependent on maximizing data reuse rather than the mere sharing of data in repositories (Curty, 2015) since data reuse promotes data sharing (Niu, 2009).
Uganda embraced health and biomedical research way back in 1936 when Uganda Virus Research Institute (UVRI) was established (Uganda VirusResearch Institute, 2018). Over the years, the institute has made considerable discoveries of global importance that have had an impact on health and biomedical science policies and practices. Currently, the institute is a World Health Organization (WHO) designated Regional Center for Arboviruses Reference and Research and a center of excellence. Earlier, in 1922, the now Makerere University College of Health Sciences had been established to train health and medical professionals. This university is training human resources in different health and biomedical science specialties. This has also increased the country’s competitive edge in Sub Sahara Africa in offering the required infrastructure for quality health and biomedical research as indicative of its research productivity for the period up to 2015 (Ultham, 2015).
This has also resulted in establishing several research institutions pursuing to bridge knowledge and practices gaps and to improve healthcare service delivery. The research institutes and or individual researchers usually undertake research projects in partnership with international organizations, foreign universities, and governments. The nature of research undertaken also varies from a specific country or multi-country studies, hospital, and or field-based research. Collaborative research has been acknowledged in providing opportunities to move further and faster by working with leading personalities in the different fields across the world, broadening insights, ideas, and contributing to building strong research infrastructure and culture in the countries (Kaleebu, 2017; Kamya, 2017). Some of the long-term collaborative partners are World Health Organization, United States-Centres for Disease Control, Medical Research Council UK, and International AIDS Vaccine Initiative. These, to a degree, have shaped the local research initiatives and contributed immensely to the development of the local research competencies, infrastructure and improving research productivity (Kamya, 2017).
The country has also put in place a robust research system to protect human participants by emphasizing informed consent, privacy, and confidentiality of respondents and the generated research data. However, the growing volume of research data is being produced within inadequate supportive services, infrastructures, and policies (Vasilevsky, et al., 2017). The existing policy requires keeping research data for at least five years at the point of the collection after the research project (UNCST, 2014). However, funders through the memorandum and research protocols usually demand that generated research data is archived based on a time frame that is pre-defined in the protocols. Massive research data produced currently and over the years resides within hospital/medical facilities and data from population-based studies is residing with individual researchers and different research institutes (UNCST, 2014). Nonetheless, the locally generated research data is given utmost care up to the level of analysis and thereafter it’s either abandoned to obscurity or discarded which rises the risks of research duplication (Ecuru et al., 2008; Ssebulime, 2017). The existing statutory instrument also lays down the procedures for research data and biosamples export but is silent about how they could be accessed and or reused by non-associated researchers. This is one of the many factors attributed to the underutilization of the existing diverse research data currently residing in institutions across the country.
The Medical School Library at Mulago National Referral Hospital Complex is said to be holding original handwritten medical archives of Sir Albert Cook dating back to 1900. These are some of the oldest medical records in the country and could be vital resources in understanding some of the current ailments. However, it is yet to be established how they are accessed and used by could be interested researchers. The National Records and Archive Centre did not hold any medical research data as archives since most of the data is held at respective research institutes under the individual research funders’ protocols agreed upon at the time of executing research projects. There is no specific legislation on research data management practices neither at the national nor at institutional levels. The practices are not uniform but rather dictated by the different research protocols being implemented. In addition, there are no programs geared towards developing the required knowledge and skills to empower researchers in RDM competencies. Thus, health institutions have continued to establish databases, biobanks, and repositories as part of adapting to global trends. However, most of these are standalone and haphazardly implemented which in the long term may affect access and reuse of the research data holdings (Nnamuchi, 2016).
The Country’s ICT sector is organized along with three functional levels namely: policy, regulatory, and service provision. At the policy level, the Ministry of Information and Communications Technology (MoICT) was established in 2016 providing the required framework in collaboration with regulatory bodies: the Uganda Communications Commission (UCC) (1997) and the National Information Technology Authority Uganda (NITA-U) (2009). At the service provision level, Ministries, Departments and Agencies; Local Governments (LGs); Academia; and the Private Sector face challenges that hinder optimal use of available ICT resources. To mitigate some of the challenges the Research and Education Network Uganda (RENU) was established in 2007 to provide affordable internet connectivity for research and educational institutions in Uganda. RENU is supporting ICT sharable resources and services for teaching, learning, and research and also provides the technological infrastructure backbone across Uganda.
The Ministry of Health, information management division, (2018), has also noted how ICT changed health care delivery and running of health systems. Health systems, information, and communication technologies are being used to improve the timeliness and accuracy of public health reporting and facilitate disease monitoring and surveillance. However, it’s noted that the technologies have been adopted in a haphazard manner hindering interoperability which affects access and reuse. There is also limited network coverage; poor quality services; high cost of end-user devices and services; inadequate ICT knowledge and skills; limited innovation capacity and intermittent power supply which are denying many from enjoying the benefits accruing from information resources (Buwule, 2019). This also highlights the problems of access to health-related information and data which limits medical knowledge generation (Omona and Ikoja-Odongo, 2006). It’s further complicated by the adverse and fragmented landscape of ICT pilot projects, numerous databases, and health information system (HIS) silos which are barriers to the effective sharing of information and data (Ministry of Health, 2016). It also noted that the existing systems are inadequate and affected by; power shortage, limited bandwidth, internet connectivity, lack of state of art and insufficient ICT equipment, limiting access and use of information resources (Buwule, 2019).
Uganda’s health research is also dependent on foreign funding particularly from the National Institutes of Health and European Union agencies as the leading funders (Nakanjako et al., 2017). Donors’ contribution to the total health budget increased from 14% to 42% within three financial years (Ministry of Health’s financing strategy, 2016). Most of the funding is attributed to research institutions executing diverse research projects. Therefore, there is an urgent need to expedite the processes of adopting policies that advocate for the management of research data as a strategic resource of long-term benefits and as measures to ensure the continued flow of research funds to support the country’s health agenda.
The Medical Research Council/Uganda Virus Research Centre (MRC/UVRC) in July 2016 passed the only existing policy on research data sharing in Uganda. The policy applies to only bonafide researchers but its implication to non-associated researchers is yet to be studied. The Ministry of Information Communication and Technology (ICT) released the Open data policy for Government Ministries and Department in May 2017. However, the policy lacks specifics clauses to address research data or its management. In addition, the country has no locally certified research data repository (https://www.re3data.org). Implying that research data is not formally handled for possible sharing and only managed based on an internal mechanism that is bound to be diverse and uncoordinated. This also works against researchers and institutes which may be interested in reposting their data for long-term preservation for public access and possible reuse.
Studies carried out by Verhulst, & Young, (2017); MOH, Uganda, (2016), and the World Bank, (2015) examined Uganda’s preparedness towards the data revolution. The studies highlighted several constraints which limit the full utilization of data across disciplines. The challenges pointed out included; legal and ethical concerns, inadequate infrastructure, technological and institutional limitations, and making research data to be a property of individual researchers (Uganda Health Research Organization, 2012). Furthermore, Tomusange, et al., (2017), the study of data sharing and reuse in the public sector sought to understand the relevant stakeholders’ perceptions of data sharing and reuse practices/services. Though the study focused on data as generated from organizational activities rather than research data, the focus of the current study. The preliminary findings showed that data sharing and reuse culture had not been fully developed due to barriers as already articulated which contribute to inhibiting the practices. Like elsewhere in low developing countries, in Uganda, RDM practices seem to be least understood, elusive, and evolving within inadequate competencies, legal and infrastructure frameworks (Kaplan, 2014; Mogire & Wafula, 2016; Patterton, et al., 2018).
This is further compounded by the absence of a baseline study exploring the state of research data and its management in health institutes in Uganda. This limits authoritative understanding of RDM practices and readiness. Thus research data though is abundant within institutions, remains invisible, and the most untapped and underutilized resources of considerable value and potential. Nonetheless, it has serious repercussions to scholarly productivity and funding for future research (Carr & Littler, 2015; Koopman & De Jager, 2016; Mladovsky et al., 2015). It is, therefore, justifiable to understand the research data management practices and readiness to harmonize Uganda’s RDM frameworks within the global context. This requires empirical evidence a gap that should be addressed if Uganda is to benefit from its enormous research data resources existing within the different institutes.
Significance of research
The proposed study’s significance lies in its potential contribution to:
- Library and Information Science (LIS) knowledge and practices. Evolving trends consider RDM to be integral to the research process, though is independent and merging out of knowledge management (Makani, 2015; Patterton, 2016). Accordingly, RDM practices overlap with the traditional library practices of organizing, preserving, and disseminating information (Der, 2015) and lie on the far end of an evolving continuum of library services. The study intends to contribute knowledge about RDM in a Ugandan context with a focus on Health Institutes as well as to contribute to practical work by researchers, librarians, and information professionals in research data practices.
- Knowledge regarding the application of two substantive RDM models and the Data Asset framework methodology in a single study is intended to provide a rich analytical base to explain RDM practices and readiness in selected health institutes in Uganda. This is bound to contribute substantively to the development of a specific RDM model for health institutes that could be applicable in low developing countries.
- Research data management practices focus on data lifecycle and are acknowledged as a best practice in research. It is becoming mandatory and enforceable by funders, publishers, and several research councils across the world (Makani, 2015). Research data gathered at considerable cost are made available for reuse improving the return on investment and contributing towards speeding up knowledge generation as non-associated researchers use the same data for new research insights (Der, 2015).
- The study will analyze the RDM practices and readiness of institutes focusing on research data lifecycle, human, technical and environmental factors and their implications. The identified challenges will form a basis for making empirical recommendations for interventions to fill the existing knowledge and practice gaps. Consequently, the key findings of the study will be used to propose a policy brief on RDM practices in health institutes in Uganda geared towards improving: finding, accessing, interoperating, and reusing research data.
- Justification of the study
- Researchers and institutions particularly in health and biomedical research are currently under pressure to comply with international regulatory and legal/policy frameworks for funding. In many funding calls, researchers are required to include a research data management plan for which local researchers’ have limited knowledge to accomplish. Researchers are also increasingly required to submit their original research data to open data repositories that are not readily available locally or to provide a link to a repository where the research data is stored for possible validation before the journal article is published. This requires data preparation before submission making RDM practices urgently needed as integral and beneficial to research processes that researchers and institutions can no longer afford to ignore (European Commission, 2016).
- Understanding how researchers in health institutes approach research data management will help in contributing towards deigning a discipline-specific intervention tailored for better RDM support service for health and biomedical researchers in selected health institutes.
- The cost required to collect primary research data from subjects is huge and growing. In many instances, the cost could not be affordable to early career researchers. Early career researchers are hardly entrusted with funding for research projects. This limits career growth and is a major challenge hindering the development of required competencies in health and biomedical science across the low developing countries. Thus, by proposing a possible RDM framework, access to original quality research data can be accelerated providing opportunities to early career researchers and non-associated researchers to ask new questions to existing research data and speeding up the rate of knowledge generation, reduce duplications of research data collection, increase efficiency and effectiveness through research data reuse and consequently improving the required professional competencies
- Research data represent a unique or historical value that will never be captured or created from anywhere else in the universe. Such data require prudent management to ensure its preservation and continuous reuse and there is a growing volume of such data in different research institutions (Womack, 2015). The proposed interventions will contribute to the sustainable management of such data in health institutes.
- Health research is expensive and the required resources to undertake such projects are scarce in low developing countries. However, research data generated can become a crucial resource that can be used and reused to generate new knowledge, and respond quickly to emergencies that require diverse knowledge and skills from researchers across wide geographical areas (Tripathi, et al., 2017). However, attaining this requires deliberate RDM practices across disciplines and research institutions at both national and international levels.
- The study will contribute to Sustainable Development Goals (SDGs) specifically goal number three (3) which addresses good health and wellbeing (United Nations Development Program, 2016). Good health is dependent on speedy research outcomes that improve human health and wellbeing. RDM practices are crucial in providing a basis for using large health datasets to help in understanding, treatment, and prevention of diseases (Blaveri, 2017). By improving RDM practices and readiness, the rate of research productivity in health and biomedical research will increase contributing to policy and best practices. This also will contribute to Uganda’s Vision 2040 and the National Development Plan III 2020/21 – 2024/25 aimed at improving the quality of life (National Planning Authority, 2020).
Organization of the thesis
Chapter One – Introduction and background to the study: This chapter provides a general introduction to the study. It includes background information, a statement of the problem, research objectives, research questions, significance of the study, preliminary literature, a brief outline of the theories used to underpin the study, and an introduction to research methodology.
Chapter Two – Literature review: This chapter presents a review of related empirical and theoretical literature in journals, reports, books, conference proceedings, theses, and others on the main variables of the study, as well as gaps in the literature and how this research bridges them. The different models used to underpin this start are the Data Asset Framework, the Data Curation Model framework, the Community Capability model framework among others.
Chapter Three –.Research methodology: Chapter four describes the research paradigm, research approaches, research designs, population, sample size, sampling and sampling techniques, data collection methods and tools, data presentation and analysis, validity and reliability of data, ethical considerations, and gender consideration
Chapter Four – Data presentation, analysis, and discussion of findings: This chapter presents the results of the study. In this chapter, qualitative results are presented thematically, while quantitative results are presented using frequencies, charts, figures, tables, and narrations. Chapter six discusses and interprets the results of the study using existing literature and the models that guided the study. The originality and contributions of the study to theory, practice, policy, and methodology are adduced.
Chapter Five – Summary, Conclusion and Recommendations: This chapter provides the summary, conclusion, and recommendations. In addition, to highlighting areas for further research.
CHAPTER TWO-BACKGROUND (WHY? WHO? WHERE?)
Research Data Management practices is a social real-life issue that can be explained using different theoretical frames. However, for purpose of answering the research question, a pragmatic philosophical stance is adopted. This is concerned with what works and provides solutions to an identified problem (Creswell, 2013; Patton, 2002. Pragmatism allows the researcher to emphasize the research problem and use all approaches available to address the problem. It is an approach that uses mixed methods. How “research data is managed” and why there has been a slow readiness to adopt and uptake of RDM practices” as constructed by respondents and its implications are explored. Thus pragmatism will give the researcher the freedom of choice of methods, techniques, and procedures of research that best meets the needs and purpose of the study (Creswell, 2013b.).
The research design associated with the pragmatic paradigm involves mixed methods (Creswell, 2012). The current study adopted a concurrent parallel design combining the survey design applied within a case study. The mixing of the two designs provided a better understanding of the research problem since it utilizes and is built upon the strengths of both quantitative and qualitative data (Creswell, 2008; Saunders, et al., 2012).
It was imperative o analyze the current data management practices
Current Data Management practices
Different disciplines have specific data management practices although may not necessarily adhere to the best practices (Borgman, 2012; Der, 2015). Best practices ensure; good use of public funds, following standards to make experiments and studies replicable, and research data and results as open as possible and as closed as necessary (Bishop, & Borden, 2020). However, failure to adhere to best practices may lead to complete loss of data and a waste of research effort (Briney, 2015). Schouppe & Burgelman, (2018) noted the varying data management practices by different research communities, noting that research data infrastructures usually generate complex ecosystems of poorly interoperable data. The resulting data in silos slows down the flow of knowledge and prevents the exchange of data in interdisciplinary research across different regions of the world. Researchers and institutions continue to manage research data centered on individual research project protocols which define the different approaches through which data is: created, organized, documented, accessed, preserved, stored, and reused (Borghi et al., 2018). In health institutions, research data is highly regulated and controlled by legal, regulatory, and confidentiality requirement which shape its management (Knight, 2015; Marutha, 2020). The different legal frameworks under which research data is managed create discrepancies which in most cases prevent better data management practices (Manurung, 2019; Wiley, 2020).
Existing practices around data management are varied across the discipline and institutions. Differing data formats, access methods, security systems, and intellectual property restrictions are in place. The unnecessary differences in practices and data characteristics within different research groups and institutions leave researchers to develop ad hoc measures for managing research data and there is no substantive evidence of how it’s accomplished (Wallis, Rolando & Borgman, 2013). Previous work on RDM practices was built around a variety of approaches (Tuyl and Michalek 2015) most of which relied on self-reports (Perrier et al. 2017) which were sensitive and confidential and many have remained unpublished (Patterton, Bothma, and Deventer, 2018), making data management is multi-faceted, diverse and complex (Cox, et al., 2014).
At the global level, most studies about RDM practices originated from the USA, UK, Australia, and Europe, USA and UK with increasing research output from Australia, Canada, and China. These together have generated over 60% of the total RDM related literature globally (Patterton, 2016). Much of the literature is open and easily accessible due to increasing pressure from international research funding agencies (European Commission, 2018; Tenopir, et al., 2015; Welcome Trust, 2015). The exponential growth of literature related to RDM practices is seen from under a decade ago when studies conducted focused mainly on different aspects of RDM. However, in Africa RDM studies have merged from the Republic of South Africa (Patterton, 2016; Patterton, et al., 2018; van Wyk, 2018), Kenya (Bull et al., 2015; Ng ’ Eno, 2018; Ng’eno and Mutula, 2018), Tanzania (Mushi et al., 2020), Malawi (Chawinga, 2019a; Chiware, 2020) and Zimbabwe (Chigwada, et al., 2017b), but no substantive study has been conducted in Uganda to-date (March 2021).
Respondents in previous studies have been drawn from a range and variety of positions, responsibilities, and levels of experience. Although there may be differences in behaviors and practice between the different groups of the respondent, no study has shown differences among the different categories of respondents and no explanations have been given though in most of the cases respondents were heterogeneous and linked by a community of practice.
In most of the RDM studies, sample sizes have been pruned to variances depending on the population and scope of the study. The sample size varied depending on the types of the population of the study. Online questionnaires/web surveys, interviews, document review, cross-sectional or case studies, and focus groups were methods applied in many RDM studies (Perrier, et al., 2017). In some studies more than one method was used, however, online questionnaires and personal interviews were the most commonly used methods (Patterton, 2016). Given the experience of the past studies, the current study shall adopt the questionnaire, interviews, and document reviews as data collection methods.
According to Vision 2040 and the National Development Plan III, Uganda is desirous of improving health services delivery (National Planning Authority, 2020). Yet, the existing health information systems and infrastructure are not well organized and aligned to the country’s health needs. Even the frameworks to speed up knowledge generation to improve health and biomedical care are inadequate (Ministry of Health e-health Strategy, 2018). Although the country is participating in health research of global importance, it remains highly dependent on donor funding and collaborative research dictated by donors and not necessarily a priority to the country’s research agenda. Health and biomedical disciplines are at the forefront of producing massive journal articles and are well recognized globally. However, research data management (RDM) practices though becoming increasingly mandatory research standards espoused in policy statements by a growing number of international funders and publishers (National Science Foundation (NSF), 2011; Medical Research Council, UK, 2013; The Gates Foundation, 2014; Welcome Trust and seven UK research councils, Hahnel, 2015), the country has not put in place a framework to guide researchers and institutions on the best way forward to remain competitive (Ministry of Health, 2017; UNCST, 2014). This may have a long-term impact on the local health and biomedical research affecting Uganda’s participation in this important global enterprise.
The existing research data presents significant assets with opportunities to benefit researchers, institutions, and society today and in the future. Unfortunately, enormous research data remains inaccessible as volumes are stored under different conditions, protocols, and technologies. Research data in electronic formats suffer bit rot while data in physical formats could be lost, misplaced, locked, and abandoned in storage facilities where they are deteriorating to oblivion (Joint Clinic Research Centre, 2017; Stover, 2019; Uganda National Council of Science and Technology, 2014). The general research terrain seems to be characterized by limited awareness of existing research data, data locations, storage, preservation measures, and lack of public knowledge of how such data could be accessed, shared, and reused. This is further complicated by limited competencies required for effective RDM, absence of supportive technical infrastructure, and absence of comprehensive national legal frameworks supporting research data management. This may be contributing to difficulties in finding, accessing, using, and or reusing data which presents a growing volume of dark data across health institutes (Stover, et al., 2019). Consequently, research data is lost or rendered inaccessible during or after research projects (Ministry of Health National eHealth Policy, 2016). Across health institutes, much of the existing research data is remotely owned by donors who dictate any possible access and reuse. The current practices where research data is remitted directly to the funders’ repositories leave local researchers with limited opportunities to access and use the data in which they have participated to generate. Even the publications generated from such data recognize only a few individuals in senior positions leaving out the majority of participants in the research processes. Due to limited analytical skills in research data and the required rigorous processes for ethical clearance, the majority of local research project participants hardly benefit from the enormous research data generated. This continues to pose challenges to scholarly productivity and limits overall benefits from the accumulated research data currently existing in health institutes across the country.
There is, therefore, a need to understand research data management practices and the readiness of health institutes. This is intended to form a basis for proposing a possible alignment of Uganda’s RDM frameworks to the global context in fulfillment of funders’ and publishers’ requirements. In addition, the study is intended to establish a framework for complying with the demands for open science and open data which could offer wider opportunities to the research community worldwide (Strecker, et al., 2021). Therefore, the key research question is twofold: how is research data managed and what is the research data management readiness in health institutes in Uganda?
Application of theories/models in Doctoral Research
The application of either theories or models in doctoral studies is meant to guide the explanation of the phenomenon under investigation. Theories or models justify the progressive evidence showing that research does not appear suddenly but is based on scientific clarity (Swanson and Chermack, 2013). Theories are defined as constructs and propositions that collectively present a logical, systematic, and coherent explanation of a phenomenon of interest within assumptions and boundary conditions explaining a natural phenomenon (Bacharach, 1989; Kerlinger, 1979). Theories explain why things happen and provide an understanding of cause-effect relationships. Whetten, (1989) urged that whereas the constructs of a theory capture concepts that are important in explaining a phenomenon, the propositions capture how concepts relate to each other and logic represents why concepts are related and boundary assumptions examine “who, when, where and what” circumstances the concepts and their relationships work. Thus theories explain why the research problem exists and identify variables that are relevant for the investigation. Researchers not only inform but also present the reasoning behind the theoretical choices made for their studies (Bacharach, 1989). This is especially important to show the connection between the chosen theory or theories and the research phenomenon which may not be immediately evident.
On the other hand, models are theoretical images of the study objects. A graphic representation and explanation of the interrelationship of ideas and key concepts of the study. However, it should be used in a more constrained sense to indicate a set of related concepts. With the help of arrows, it shows the relationship between various types of variables. Models help a researcher to work decisively and provide the logical basis for analyzing the findings. In the current study, models shall be applied due to limitations in the current theories. Research data management is an evolving phenomenon subjected to several theories and models to understand its connotations. However, there is no substantive theory to base on explaining the RDM phenomenon, rather theories have been borrowed from other fields and applied in related studies. Such examples include Awre et al., (2015) who studied RDM as a wicked problem,
Theories and Models applied in RDM studies
Research data management is an evolving phenomenon subjected to several theories and models to understand its connotations. However, there is no generic theory to explain this phenomenon, rather theories have been borrowed from other fields and applied. Noting that most of the studies focused on understanding researchers’ behaviors and not necessarily the RDM practices in reality (Berman, 2017; Curty, 2016; Kim & Stanton, 2016). Theories applied included; the framework of wicked problem concept (Rittel, & Webber, 1973) in Cox, Pinfield, Smith (2014); grounded theory (Glaser, B., & Strauss, 1967), theory of planned behavior (Ajzen, 2012); Institutional theory (Scott, 2008) and the Capability, Opportunity, Motivation-Behavior system (Michie, et al., 2011). Curty, (2014) applied the Unified Theory of Acceptance and Use of Technology (UTAUT) to study research data reuse in social sciences. Llebot & Rempel, (2021), examined the understanding of the pressures and factors that affect research teams in undertaking RDM, they adopt the Unified Theory of the Acceptance and Use of Technology (UTAUT) model (Venkatesh, Morris, Davis, & Davis, 2003) to explain the variables that can influence whether new and better, data management practices could be adopted by the research group.
Models underpinning the current study
The key models used in the current study shall guide the answering of the research questions. The models to be applied are; the Data Asset Framework (Jones, et al, 2009), the Digital Curation Centre Lifecycle Model (Higgins, 2008), and the Community Capability Model Framework (Lyon, et al., 2012). The two models are intended to complement each other in addressing the key research questions. The purpose of the study is to understand the research data management practices and readiness as a basis for proposing interventions to align existing research data frameworks to the global context in fulfillment of funders’ and publishers’ requirements within selected health institutes in Uganda.
The proposed interventions are bound to improve the adoption and uptake of RDM practices to make inaccessible and existing volumes of research data stored under different conditions, protocols, and technologies: findable, accessible, interoperable, and reusable by both the current and future researchers.
Therefore, the research questions to be addressed are:
- i) What kind of research data are created and held in selected Health Institutes in Uganda?
- ii) How is research data managed in selected Health Institutes in Uganda?
- What is the research data management readiness for in selected health institutions in Uganda?
- What are the challenges affecting the adoption and uptake of RDM practices in selected Health Institutes in Uganda?
- v) What interventions are required to improve RDM practices in selected Health Institutes in Uganda?
Given the nature of research questions, there is no single theory or model that could be applied to 0ably offer explanations of the issues surrounding RDM practices and readiness in a single study. As earlier explained, theories have been applied to study researchers’ behaviors, data sharing, and reuse but not necessarily RDM practices or readiness. Whereas research data management is an evolving phenomenon, the theoretical aspects in most cases have been missing. Nonetheless, the current study applies three models; DAF, DCC, and CCMF concurrently to explain in-depth RDM practices and readiness. This could be the first time, the three models are complementing each other in a single study as an analytical tool to explore RDM practices and readiness, particularly in health institutes in Uganda. The models applied concerning RDM are; the Data Asset Framework (Jones, et al., 2009), a flexible methodology applied in case studies to gather comprehensive information describing research data sets and local research data practices (Cox and Williamson, 2015). The Data Curation Centre Lifecycle Model (DCCM) is applied to understand the research data life cycle processes and the Community Capability model is applied to understand the environment in which research data can effectively be managed and to assess both the researchers’ and the institutional readiness for RDM practices.
Digital Curation Centre Lifecycle Model
The Digital Curation Centre Lifecycle model is “a generic, curation–specific tool used in conjunction with relevant standards, to plan curation and preservation activities at different levels of granularity” (Higgins, 2008, p. 134). The model was developed by UK Digital Curation Centre to train creators, curators, and users to organize resources and to plan and implement the preservation of digital assets. The discrete functions outlined in the model are grouped into three categories: full lifecycle; sequential; and occasional actions. The model is commended for promoting a holistic approach to the management of digital resources throughout their lifecycle from initial conceptualization to long-term preservation and disposal (Higgins, 2008). The model provides a structure in which several operations are performed on a data record throughout its lifecycle (Ball, 2012). The model aligns the curation tasks to the lifecycle stages of a digital object.
The model has been applied by; Pinfield, Cox, & Smith, (2014) and Higman & Pinfield, (2015) to understand the data lifecycle in research-intensive institutions. Though the model has been subjected to several reviews comparing it with other lifecycle models (Ball, 2012; Crowston and Qin, 201), there are still limited practical applications. Nonetheless, the model has the potential to explain the research data lifecycle processes (Ball, 2012; Crowston and Qin, 201). This model was found appropriate to explain some of the research questions of the current study.
Key variable of the Model
The Digital Curation Centre Lifecycle Model expounded by Higgins, (2008) defines a step-by-step approach in managing data from creation, organization, preservation, storage, access, sharing, to research data reuse. The model components are explained further in the proceeding sections here below.
Research Data Creation
In health and biomedical research, data creation is diverse and can be accomplished by both humans and non-human resources through experimentation, observation, interviews, survey, and repurposing of existing data (Berman and Cerf, 2013b). Research data is created as raw data, abstracted or analyzed, experimental or observational data which are derived from laboratory notebooks, field notebooks, questionnaires, audiotapes, videotapes, photographs, specimens, samples, artifacts, among others (Borgman, 2012). How data is created, determines the required infrastructure for its processing and eventual preservation for future access and reuse. Data-creating entities like humans, may create data directly or be responsible for a non-human data creator that creates the data (Hartig, 2009). Nonetheless, data flow through a continuum from one level to the next during the preparation process and go through many hands as they evaluate or clean particular datasets (NAS, 2002). The process of creating research data also produces; administrative, descriptive, and structural metadata important for long-term research data preservation. Metadata is key in data discovery, retrieval, and reuse, and much energy and skills are required in creating appropriate metadata based on specific community standards (Zimmerman, 2003).
Data organization is defined as the process through which data produced is named and filed to form a complete data set (Peng and Parker, 2016). Data organization is also based on the temporal and spatial extent, file size and data volume, variable attributes, and data latency and frequency, as well as data sources, retrieval or processing algorithm and steps, and error source and uncertainty estimates. Patterton et al., (2018), have noted that as part of data organization, researchers experienced difficulties when deciding on naming conventions, did not have sufficient backup knowledge, and had no experience in adding metadata. Thus data organization ought to be clear, descriptive, and unique with documented naming conventions supporting data discoverability and accessibility. This should be based on directory structure and folder and file naming conventions, file versioning, and file formats. Data originators should therefore adopt community standards, self-describing, and machine-independent data format to enhance: accessibility, usability, and interoperability. Patterton, Bothma, & Van Deventer, (2018) have noted that as part of data organization, researchers experienced difficulties when deciding on naming conventions, did not have sufficient backup knowledge, and had no experience in adding metadata. Concerns were also raised about the integrity of data collection methods, quality control in absence of RDM practices which compromise institutional-wide data organization. How these are accomplished in health research institutes in Uganda needs to be understood to be sure of possible long-term findability and access for possible reuse by interested researchers.
Data Preservation
Data preservation entails migrating data to the best format and suitable medium, creating metadata and documentation for easy discoverability, backing up and storing, and finally archiving research data (Eynden, 2018). Data preservation means protecting data in a secure environment for long-term access and reuse. It requires regular auditing of the environment and all activities to guarantee its integrity, appropriateness of metadata to ensure discoverability, and monitoring access control measures to meet privacy, licensing, and intellectual property restrictions (Treloar and Wilkinson, 2008). There is a range of actions associated with data preservation including selecting data for long-term curation, choosing a data service, licenses for data, measures for discoverability, and data access statements. Smit, et. al., (2011) present several strategies for preservation which included: using standardized file formats, regular data refreshing to protect against data degradation, and transformation/migration of data from older formats into the best format and suitable medium. Preservation is about creating metadata and documentation to ease discoverability, backup, storage, and archiving of research data. It involves the emulation of older software to access obsolete data formats for future generations. Preservation should be guided by a mixture of policies, processes, and resources including staff and technologies to create an enabling environment for access, reuse, and sharing of research data that should be documented (Scott, 2014). Data preservation is a key functionality of RDM for the long-term preservation of authoritative research data necessary to remain authentic, reliable, and usable and is crucial in health institutes where research data is a strategic resource (Xing, 2019). How the massive research data generated and stored in health institutes are preserved needs to be understood to make the necessary recommendations for its long-term availability for possible future reuse.
Data Storage
Data storage takes cognizant of; location, format, and security (the Chu, et al., 2018). It is a key function in RDM aimed at FAIR data. Researchers continue to store research data in personal computers, tablets, and external storage drives (Majid, et al., 2018). The storage environment should be actively managed, secure and reliable over time and should ensure that it enables the level of control of accessibility required by the researchers and others who may be interested in; accessing, using, and reusing data (Chawinga, 2019; Ng’Eno, 2018b). Data management practices including documentation, preservation, and dissemination are highly dependent on how data is stored. It is therefore ideal for institutions to provide data storage facilities (Shakeri, 2013). However, data storage practices should be diversified as institutional-wide network storage; free-standing devices, non-networked devices and accounts; and institutional storage without remote network access (Schumacher et al., 2014). Cox, & Pinfield, (2013) noted the pervasive use of powerful computing technology has increased the number of researchers generating and using large datasets and making storage of these data in a form that can be easily accessed, processed, and analyzed challenging. Scott (2014), pointed out that computer-based storage requires both hardware and software, with criteria such as capacity, speed, reliability, and cost to be considered. Data storage and security are crucial in ensuring data integrity, however, institutions have continued to store a lot of data that remains unused most of the time, resulting in higher data management costs that should be addressed. In addition, the storage environment should be actively managed, secure and reliable over time and should ensure that it enables the level of control and accessibility required by the researcher(s) and those who require access, use, and reuse of data (Chawinga, 2019). However, the investment required and the skills to manage such facilities may not be available hindering effective storage of research data and its reuse. Nonetheless, data storage in health institutes in Uganda has not been assessed, how it is approached, the nature of hardware and software capabilities, and whether it supports the FAIR data principles remain unknown.
Access to Research Data
Access to research data requires standardized regimes which include: technology, institutional and managerial, financial and budgetary, legal and policy, and cultural and behavioral. These regimes are relatively stable over time and change when it occurs, can be discerned by a change in the institutions. World Health Organization (WHO) and partners agree that timely access to research data in health should become a global norm to expedite research and discoveries (Johansson, et al., 2018; Rani, & Buckley, 2012; Shen, 2016; Tripathi, et al., 2017; Welcome Trust, 2016). However, substantial challenges exist and the requisite infrastructure and mechanisms are not in place to ease access to research data (Chan, 2019; Taichman et al., 2017). Most research data is under the custody of the researchers and data under the custody of institutes is subjected to highly restrictive access regulations (Chigwada, et al., 2017). This is further compounded by researchers’ lack of skills, time, and funds to undertake RDM practices and the absence of an enforcement mechanism (Savage and Vickers, 2009; Tripathi, et al., 2017). Tripathi, et al., (2017) recommended developing standards and rules at the international level to facilitate easy and immediate access and sharing of research data particularly during medical emergencies. The evolving development and rise in the internet, web services, and the declining cost of computing and storage hardware have significantly lowered the barriers to publishing or obtaining data and brought us closer to the reality of broadly sharing scientific data to improve society and people’s lives (CODATA, 2019).
Data Sharing
Data sharing is the practice of making one’s research data available for use by others (Bangani & Moyo, 2019; Michener, 2015). The precursor to data sharing is good data management and stewardship, yet what constitutes good data management is varied and largely undefined (Wiley, 2020). Wiley, (2020) noted the current literature showing data sharing as infrequent despite recommendations and mandates of funders and publishers respectively. Data sharing takes into account the legal and ethical implications associated with sharing. Accordingly, EPSRC (2018) and OECD (2007), have documented several benefits from data sharing which include; increased visibility of research data for citation and reuse by other researchers; reinforcing scientific inquiry by ensuring that enthusiasts and skeptics alike can test, validate, and replicate research results; promote new research and different ways of testing, and analyzing research data; discouraging unnecessary duplication of effort by researchers; saving financial and other resources that are wasted when similar data sets are created by different researchers and enable discoveries from old research data sets. Data sharing foster a culture of transparency facilitates inter and multi-disciplinary research collaborations (Bangani and Moyo, 2019), decreases research duplications, serves as a justification for research spending, and ensures that research data is used in ways that were never envisaged by the original data collector (Vines et al., 2014a).
However, the growing concern of data sharing is its effects on novel health research if secondary data users can “free-ride” on the efforts of those collecting primary data (Castellani, 2013; Ross & Krumholz, 2013; Zarin, 2013). This calls for policies and processes that safeguard data sharing based on and developed in consultation with, relevant stakeholders (Manju & Buckley, 2012; Vallance & Chalmers, 2013).
Data Reuse
Data reuse is using the same data to answer new and different sets of questions the data collector may not have envisaged (Park, 2018; Pasquetto, Randles, and Borgman, 2017; Curty, 2016). This calls for adequate institutional preparations and skills that researchers need in an environment that is demanding transparency and integrity in scholarly work (Lyon, 2016; Margolis et al., 2014). Research data collected to answer one specific research question can be accessed and reused by other persons to answer new and different sets of questions the primary data collector may not have envisaged (Park, 2018; Pasquetto, et al., 2017; Curty, 2016). Research data collected and organized for access by researchers other than those who first collected it offers both opportunities and challenges to institutions and researchers. This calls for adequate institutional preparations and skills that researchers need in an environment which increasingly is demanding transparency and integrity in scholarly work (Lyon, 2016; Margolis et al., 2014). The cultural shift from approaches that kept data mostly private with sharing acknowledged in the form of publications to an information-based culture that engages the scientific community through active sharing of both data and publications makes access to health research data not only a new reality for health and biomedical science but an imperative that must be understood in the quest for further knowledge and to foster discovery as a measure to improve health service delivery (Margolis et al., 2014).
Strengths, applications, and shortcomings of the model
The strength of this model is its ability to be customized to fit different situations. The model supports researchers and institutions to identify and select the best practice activities for implementation that meet curation needs. This implies, some activities or steps may be added or eliminated to complete the curation activities in the model to realize the curation processes. Another strength of the model is the ability to give insights into the roles and responsibilities of stakeholders in the curation process. According to Digital Curation Centre (2008), the model provides for distinct stages that are carried out in a sequence within a cycle that makes it easier to identify data creators, users, and collaborators. The sequences also imply that it is easy to document and include curation policy requirements in existing activities performed by the stakeholders. This makes it easier to identify tools and services required to perform data curation activities at the basic level of implementation.
Data curation activities are carried out are key in maintaining the authenticity, reliability, integrity, and usability of digital material which contributes to the quality of RDM (Higgins, 2008). The DCC model allows the data curation activities to be planned in detail at basic levels of cooperation and in detail also defines the roles and responsibilities of stakeholders engaged in RDM practices (Higgins, 2008). Consequently, the model contributes to building a framework of standards, technologies, and processes and is documented in organizational policies, protocols, and guidelines. However, Pennock (2007) though emphasized that digital research data should be properly managed, stored, and preserved to maximize investment it is also the same standard that should be applied to physical research data. This is the process results in research data that remains reliable, available, and accessible for use for as long as still with value to the research community.
Several studies have used this model to examine RDM in different contexts. In some studies, the model was used singularly and in others integrated with other models and theories. However, in Africa, Ng’Eno, (2018b) and Chawinga, (2019b) used the model in an integrated approach with the community capability model to study RDM practices in Agricultural Research Institutes in Kenya and Public Universities in Malawi respectively. Brambilla (2015) used the model to analyze digital curation practices in academic libraries in Italy. Heidorn, (2011) applied the model in identifying the skills sets required by librarians to undertake successful data curation in the USA. Shakeri (2013) used the model to examine data preservation practices among researchers at Kent State University’s Liquid Crystal Institute. The studies above have verified and validated the model as sufficient in implementing data curation activities. Since the model supports data curation, it provides different activities that should be carried out for effective curation of research data and other related digital objects.
Nonetheless, Constantopoulos et al. (2009), criticized the model since it does not give guidelines for recording and maintaining statistics of stored, curated, and preserved data that could have been accessed by the users through the queries system. In addition, the model does not indicate actions related to adding new knowledge, combining new knowledge to the primary resources, or the prior knowledge stored in digital repositories. The model doesn’t provide baseline information about the data holdings characteristics, nature and status, institutional role, and the current data practices which could be addressed by the Data Asset Framework. The other weakness is the inability to include controlled vocabularies used in different fields of studies such as; geographic names, historical periods, chemical molecules, and biological species (Constantopoulos et al., 2009). The Model does not incorporate institutional readiness as a key contribution to data curation whereas institutional readiness is important in the execution of RDM activities successfully. Ng’Eno, (2018b), pointed out that the model is also silent on the important aspects of technical infrastructure, skills, and training, collaborative partnerships, and legal and policy issues without which curation activities could not be carried out successfully. Thus, the model operates in isolation of the institutional readiness despite its being crucial to research data management practices necessitating the need for complementing it with the community capability model framework to assess the readiness to undertake the curation activities for RDM practices.
Therefore the DCC model shall be applied to examine the research data management practices, awareness, and readiness in selected Health Institutes in Uganda, explain the challenges, and propose interventions to improve RDM practices in compliance with funders and publishers’ requirements and to ensure that inaccessible and accumulated volumes of research data stored under different conditions, protocols and technologies are findable, accessible, interoperable and reusable by current and future researchers.
The Community Capability Model Framework
The Community Capability Model Framework (Lyon, et al., 2012) assist institutions, funders, and researchers to assess the capability of communities to perform data-intensive research. The framework is used to profile the current readiness of the community, identify priority areas for improvement and investment, and help in proposing a roadmap intended to achieve a targeted state of readiness.
The Community Capability Model Framework (CCMF) by Lyon, et al., (2012) has been applied in several studies including; Lyon, Patel, and Takeda, (2014) who assessed the requirement for research data management support in academic libraries and building on the concept of maturity, capability, and readiness; Shen, (2016), investigates the faculty researchers’ current practices in organizing, describing, and preserving data and the emerging needs for services and education. However, results showed the changing nature of faculty demands regarding data documentation, storage, and archiving. Qin, Crowston, and Lyon, (2016) described the development of a capability maturity model (CMM) for RDM as a means of supporting the assessment and improvement of research data management (RDM) practices and to increase its reliability. Thus, it helps organizations to identify areas of strength and weakness and guides where effort is needed to improve RDM practice.
.
However, the two models DCCM and CCMF have been applied concurrently in the same studies by Pinfield et al., (2014), Ng’Eno, (2018), and Chawinga, (2019) among others. The models though evolving and subject to application to different aspects of RDM, are yet to be validated for generalization. In most instances, these models have mainly been applied in higher education and research-intensive institutions in developed nations. The application of this model in low developing countries in RDM related studies is just merging for example in Ng’Eno, (2018) and Chawinga, (2019a).
RDM related studies have also adopted multiple models and practical frameworks to study cases. Commonly used models include Data Asset Framework (Digital Curation Centre, 2009). Community Capability Model Framework (CCMF) (UKOLN, University of Bath and Microsoft Research Connections, 2013) and the Data Curation Centre Lifecycle Model (Higgins, 2008). These restate, many standards providing frameworks for the management of research data. Studies were also based on higher education institutions, university libraries, and a growing number of research-intensive institutions (Carter, 2020; EPSRC, 2014; Lancet, 2020; Liu et al., 2020; Whyte and Tedds, 2011; Welcome Trust, 2015). However, studies based on African institutions have adopted mixed research methods combining surveys as applied in case studies and using models as a basis for understanding the research data management phenomenon in different contexts, among the pioneering studies are; Chawinga, (2019); Ng’Eno, (2018b) and van Wyk, (2018).
Key variable of the Model
According to Ashiq, et al., (2020), the growing importance of RDM and its’ recognition by researchers though poorly practiced is better observed in developed rather than developing countries. RDM practices in low developing countries need a set of capabilities centered around human, technical and environmental factors to advance data: organization, processing, storage, preservation, access, sharing, reuse, and security (Qin, 2013; Vela and Shin, 2019). RDM requires multiple skills from different individuals and with no training available in low developing countries, those implementing it are using experiential knowledge. This has resulted in inconsistent standardization in RDM, hence the need for a documented legal policy framework to guide and enforce uniformity (Ng’Eno, 2018).
Collaboration
Collaboration in research is a key indicator of researchers and institutional capability to perform data-intensive research. In Health Institutes, collaborations vary ranging from international collaborations to national institutions, support units to within individual research groups. Collaborations foster mentorship through knowledge sharing, the culture of transparency, facilitate inter and multi-disciplinary research, spur data sharing and improve research productivity (Bangani and Moyo, 2019).
In Uganda, Kamya, (2017) noted the sharp rise in collaborative research in Uganda with partnerships between northern and southern institutions. He pointed out the major benefits for participating institutions and individuals including more and available resources and facilities in the developed countries based on financial; technology and diverse expertise. In addition, collaborations provide opportunities to move further and faster by working with other leading people in their fields and broadening insights, ideas, and contributing towards building strong research infrastructure in low developing countries. However, this was taking place in absence of a legal framework or outdated policies and the unwillingness of the research institute to initiate a working model as they are seen to be cumbersome. This hampers a vibrant and effective RDM at the national level.
Skills and Training
Chawinga, (2019) found a knowledge gap in key aspects of RDM attributed to the unavailability of skilling and training programs since RDM was considered a new concept. Apart from what is taught during the research course, there is no substantive training program offered across the African continent. Schmidt and Shearer, (2016) categorized the core competencies of RDM into three: providing access to data including, knowledge on repositories, data discovery mechanisms, data manipulation, analysis techniques and skill on data organization, data licensing intellectual property. Secondly, advocacy and support for managing data, knowledge on funders’ policies and requirements, data management plans, data publication requirements, data citation and referencing practices, best practices for data format, types and metadata and skills on, articulating benefits of data sharing and reuse, data audit, assessment tools. Thirdly, managing data collections include knowledge on metadata standards and schemas, database design types and structure, data repositories and storage platforms, and skill in the selection and appraising datasets, undertaking digital preservation activities, and active management of research data.
Openness
Openness refers to the degree to which research outputs are accessible to interested community members with a particular focus on low developing countries. In the context of the International Council for Science (2014), openness includes research data and is concerned with openness in the research lifecycle, published literature, methodologies/workflows, and reusing existing data. Openness in research is credited for the current scientific progress and benefits derived from the complex research challenges currently experienced (Lyon et al., 2011; McKiernan, et al, 2016). Nonetheless, the principles of openness are applicable at different levels of the research lifecycle and do contribute towards value addition and validation of the holistic research process. In addition to promoting reproducibility and reusability of research data. The benefits have resulted in several international initiatives advocating for research communities to embrace openness and transparency which is slowly being adopted (Besançon et al., 2021). Most of these initiatives emphasize research data to not only be openly accessible but also discoverable and reusable (Guedon, 2015). However, the degree to which research data should be open is determined by effective management which involves a broad range of administrative and technical activities (Berman et al., 2014).
The openness of the scientific literature is a well-articulated concept in research since the 2000s. It’s based on Open Science principles well known to increase the rigor, reliability, and reproducibility of scientific results to optimize research efficiency and improve health outcomes (Besançon et al., 2021). Manifesting as open access, open data, and other open scholarship practices its growing in popularity and necessity due to its benefits to the research community (McKiernan et al., 2016). The Budapest Open Access Initiative (2012), the Bethesda Statement on Open Access Publishing (2003), and the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (2003) are global initiatives adopted to advance open access to published materials. These were later joined by the International Council for Science (2014) which expanded the scope of demands to include making research data universally accessible and its benefits sharable (Verhulst et al., 2017).
Technical Infrastructure
Covers a wide range of technologies responsible for collecting, storing, processing, organizing, transmitting, and preserving data as well as platforms for communication and collaboration (Qin, 2013; Smith, 2014). It includes networks, databases, web portals, repositories, web 2.0, social networks, authentication systems, RDM systems, and software applications. The technical infrastructure tools include computational tools and algorithms, tool support for data capture and processing, data storage, support for curation and preservation, data discovery and access, integration and collaboration platforms, visualizations and representations, and platforms for citizen science. The role of technical infrastructure is to provide uniform and equal access to a broad variety of research outputs by making data understandable, searchable, retrievable, available, accessible, sharable, and secure (Bigagli. et al., 2013). It is important therefore to understand the technical infrastructure since it is key in supporting researchers in RDM practices and facilitates RDM capability in institutions
Common practices
In the community capability model framework, common practices produce standards by design or de facto to promote and facilitate the reuse and combination of data. Standards are broadly consensus of different agents to do certain key activities according to agreed-upon rules (Ujil, 2015). Standards are also diverse and should be agreed upon formally and for RDM compatibility standards should be beneficial to all parties involved, balancing their needs, and intended to be used repeatedly during a certain period. Thus in research communities, common practices are standardized at institutional, national, and broadly in a specific field of practice. Common practices should be implemented through the use of uniform data formats, same data collection methods, same procedures for process workflows, data packaging, and transfer protocols, data description, semantics, ontologies and vocabularies, and data identifiers (Lyon et al. 2011). However, common practices should be shared and understood by researchers within and across a particular field. Global initiatives are currently going to forge common practices across different data lifecycles to reduce data silos and improve interoperability, access sharing, and reuse. This is further complicated by the lack of national standards which makes institutions struggle with keeping data in compliance with funders’ mandate that may be contrary to the existing technical infrastructure.
Economic and business models
Data-intensive research requires a high degree of investment that is cognizant of how data management should be funded to make a business case for funders to make the move. However, the business case can be justified by generating many research papers and publishing them quickly from a single investment. Lyon, et al. (2011) viewed data-intensive research as requiring investment in two major areas: research and infrastructure which contribute substantially to the competitiveness of the research institute. Quality research is based on developing systems, procedures, and compliance checks aimed at generating credible data. The economic and business model also provide the basis for sustainability of the research projects and their outputs, the geographic scale, and size of funding for research, and its infrastructure. The extent of engaging the public-private partnerships in funding research and the expected productivity and return on investment
Legal Ethical Issues
Legal ethics issues involve legal and regulatory frameworks and management of ethical responsibilities and norms. Legal/ethical frameworks guide the practices of research data creation, organization, access, sharing, preservation, storage, reuse, intellectual property rights, ethical issues, and disposal. Most institutions have inadequate policies covering data management (Hartter, et al., 2013). RDM legal and policy framework should address key practices: storage, security, preservation, quality, compliance, access, and sharing to enhance management and reuse of research data (Pinfield et al., 2014a; Higman and Pinfield, 2015). Smith (2014) had argued that the legal environment surrounding research data lags hindering the ability to develop best practices for data management. Moreover, RDM legal environment includes laws, regulations, and policies associated with data, as well as strategies for data quality control and management in research institutes. In this respect, RDM legal framework ensures trust since it adopts appropriate technical standards and practices in research data. Karick (2014) pointed out the need for the RDM framework to state ownership and rights associated with research data. Patel, (2016) emphasized institutional RDM legal and policy framework that spell out the purpose, scope, applicability, and guidelines to data contributors, licensing, metadata, data classification, copyright agreements and conditions, terms and conditions of the use of data, protection of confidentiality of sensitive data, protection of data against security breaches and intellectual property concerns.
Most research data is under the custody of the researchers and with no formal RDM policy puts research data at risk of being lost (Chigwada et al., 2017b). In addition, where legal frameworks exist, they are diverse and different creating discrepancies in research data management which prevent better data management practices (Manurung, 2019; Wiley, 2020). This may affect the quality of research data and breeds irresponsible behaviors and poor practices posing a threat to the research enterprise and impairing its effective functioning (Dressel, 2017).
Academic culture
Academic culture defines entrepreneurship, innovation and risks, reward models for researchers, and quality and validation frameworks. Academic culture usually manifests itself through community norms that exist for purposes of supporting research and determine the level of support a researcher expects when moving into data-intensive research (Lyon et al., 2011). Data-intensive research flourishes in communities where data is highly valued, researchers are rewarded for their effort and contributions and high standards are adhered to for the creation of quality data. The successful reward model, according to the CCMF, is where all contributions by researchers are recognized and rewarded, through established procedures. In addition, the academic culture shall result in the generation of quality research data that could be reused for different purposes and validation of any scientific claims (Briney, 2015).
Strengths, applications, and shortcomings of the model
The usefulness of the Community Capability Model Framework is based on addressing key variables as identified in each stage of the data lifecycle and assumed to be crucial in RDM. Cox & Pinfield, (2014b) observed that openness, skill and training, technical infrastructure, legal and policy issues, and collaboration play a fundamental role in creating, appraising, describing, preserving, accessing, and reuse of research data. The framework further provides details of the roles, responsibilities, and requirements of each capability for enhanced effective and efficient RDM practices. CCM framework focuses more on the adoption of ICT in every capability to cope with the exponential growth of data-intensive research.
However, the weakness of the model is that researchers may have problems in solely depending on this model to understand data management practices. Since the model does not address key variables of the data curation processes which are important for successful RDM practices. Thus the model does not focus on data curation which is a significant constituent of RDM and therefore limits its application to understanding RDM practices. By striking a balance between factors that facilitate and impede the creation of research data and preservation, the model limits the scope of data preservation. This, therefore, calls for combining the model with other relevant models addressing the data curation processes for effective handling of the variables of the current study.
Challenges affecting Research Data Management Practices
Several challenges have been identified from literature by different authors related to RDM practices and capability and its uptake within research institutions.
Research Data Management Standards
Research Data Management practices are complex and require strict adherence to standardized procedures, to ensure data integrity and availability. However, there is a general lack of standards in the face of excessive and uncontrolled growth of research data causing challenges in its management (Si, et al 2013). Whereas standards are an important step toward managing data in a more consistent way their absence across institutions and nations results in data corruption, loss of information, and lack of interoperability (Baykoucheva, 2015). The absence of standards also inhibits research reproduction due to a lack of scripted guidelines, metadata, and documentation which further hinder data reuse and limit data integration (Clarkson and Clarkson, 2020). Gupta and Rani, (2019), pointed out the implication of data handling without enforcement of standards results in loss of relevant data and failure to adhere to best practices or complete loss of data and a waste of research effort.
Literature is replete with evidence of significant research data loss due to lack of research data management practices, poor data infrastructure, and inadequate knowledge and skills among researchers (Chigwada, et al., 2017a). However, the exponential growth of research data without corresponding growth in infrastructure, knowledge, and skills in RDM has left many institutions overwhelmed with the amount of data to manage (Chiwanga and Zinn, 2020; Si, et al., 2013).
Technical infrastructure
Technical infrastructure covers a wide range of technologies for collecting, storing, processing, organizing, transmitting, and preserving data as well as platforms for communication and collaboration (Qin, 2013; Smith, 2014). Infrastructure should provide uniform and equal access to research data assets and facilitate actual research data use, reuse, and sharing (Ng ’ Eno, 2018). According to Stephanidis and Salvendy, (1998), the technical infrastructure is necessary to derive maximum benefits from data accessed and shared. The infrastructure should be robust for long-term use and appropriate, for diverse uses. It also must be flexible enough to respond to the continuous and rapid changes in scientific research technologies. The technical infrastructure should in addition be able to support interoperability through standardized and harmonized hardware, software, and peopleware to allow effective access to and optimal use of research data within and across institutions.
Effective data management and use relying on effective technical tools and advances in information technologies have had a profound impact on health research (Harris, 2017). The technical infrastructure should provide uniform and equal access to a broad variety of research outputs by making data understandable, searchable, retrievable, available, accessible, sharable, and secure (Bigagli et al., 2013). The enforcement mechanism should be embedded and backed up by the technical infrastructure including networks, databases, authentication systems, and software applications which should regularly be updated (Antell, et al., 2014; Qin, 2013). Unfortunately, the current technical infrastructure within and across Health Institutes is fragmented which to an extent may be affecting RDM practices and capability.
Legal policy framework
The RDM legal policy frameworks have been evolving since early 2000 when international organizations, funders, and publishers started demanding research data plans and incorporated its requirement into policy statements (National Science Foundation, 2003; Organization of Economic Co-operation and Development, 2003). However, RDM became a major concern for researchers and institutions when funders, publishers, and research councils around the world started enforcing it as a requirement (Bishop and Borden, 2020; Makani, 2015). As early as 2012, funders had begun withholding funds if researchers did not meet this policy requirement (National Institute of Health, 2012; Van Noorden, 2014; Wellcome Trust, 2012). Although there is a strong policy push towards developing open access to research data at the global level (CODATA, 2019), there is a slow response in terms of policy development for RDM from low developing countries. Laws and policies governing data practices vary among different countries, resulting in barriers to scientific cooperation and progress. There is also the absence of harmonized RDM legal, policy, and regulations at the institutional level. In addition, RDM policies though contribute towards improving the quality of research data, there is no clear-cut policy content to guide the process (Liu et al., 2020). The legal and policy framework should therefore take precedence which should be complemented with guidelines and incentives for researchers (Piracha and Ameen, 2018).
Limited budget
Scientific data infrastructure requires continued and dedicated budgetary planning and appropriate financial support. The use of research data cannot be maximized if access, management, and preservation costs are just an-add on or after-thought in research projects. In many areas of public research, there are indications of discrepancies between the funding of the specific research itself and the related data-management requirements. Generally, research organizations fund the former well but pay scant attention to the latter. Without viable economic models, valuable research data may disappear, making it inaccessible to no one and deterring many from making the most of research investments. Nonetheless, digital data is fragile, more prone to loss and corruption than physical data, and researchers need to be more thoughtful and proactive to ensure its preservation (Bracke, 2011).
Overall, limited funding for RDM requirements affects best practices (Choi and Lee, 2020). Though sufficient funding is important for effective research data management, it requires researchers to allocate time to process and document research data with attendant meta-data for re-use. However, there are usually no extra resources currently allotted for the purpose because of little recognition of the value of research data in low developing countries. Donors should therefore play a proactive role to better manage research data by ensuring funding for research data as a specific item in the course of budgeting (Ashiq et al., 2020).
Skills and Training
The low levels of awareness and inadequate data management skills among researchers (Akers and Doty, 2013; Qin and Ph, 2017). There is a general lack of training in RDM as well as insufficient guidance and support (Sallans and Lake, 2014). Researchers lack the skills and do not have the knowledge required to manage their data effectively (Scaramozzino, et al., 2012). Research data continues to be used by individuals who are outside of the research community in which data is generated. How individuals from different cultures and with varied knowledge and expertise find, understand, and reuse data is yet to be established (Faniel and Zimmerman, 2011). Thus training for RDM skills should be embedded as part of the early education and continuing professional development for researchers (Lyon, et al., 2012). The training should provide skills needed to work effectively and efficiently with data throughout the research cycle (Tenopir, et al., 2015; Vela and Shin, 2019). Nevertheless, competencies required for research data management are not well articulated and need to be investigated for proper designing of ideal skills and knowledge programs for research data management, especially in Uganda.
Governance and Leadership in RDM
Governance is based on people, technology, and processes that result in the overall management of; availability, usability, integrity, and security of research data as generated through the support of different organizational resources (N. Zhang, 2017). Governance ensures oversight of quality and compliance to derived mandates of; the governing body; set of principles; set of policies and procedures, plan to execute those procedures, and set of performance metrics to measure the results of good data governance (Tan, 2018). Governance creates a framework for the use of data that fits the individual requirement and improves operational efficiency and minimizes risk. It is operationalized through; rules and regulations and enforcement mechanisms. By implementing effective data governance practices, organizations can successfully manage and govern their data (N. Zhang, 2017).
Leadership is singled out as an area that requires the most urgent attention (Kahn, et al., 2014). It is imperative therefore that leadership is established to champion RDM implementation within institutions. However, leadership and governance have been applauded as important in health ecosystem components including RDM practices to benefit researchers, institutions, and society (Measure Evaluation, 2019). The leadership of RDM within an institution highlights significant weaknesses and fragilities. This was based on a lack of a long-term, consistent, senior-level, institutional approach. Petersen et al., (2020), observed that the development and implementation of policies are more advanced in institutions where the management takes an active interest in RDM issues and allocates resources to the development of data management and plans. Wise et al., (2019), noted good data governance embedded and maintained within the organization, catalyzes change, which improves operational efficiency, improves application effectiveness, and minimizes risk.
Interventions in RDM practices and RDM readiness at Institutes
The policy landscape at the international level has become more consistent making managing research data a priority (OECD, 2007; European Commission, 2013). Nevertheless, much of research data management policies are driven by international funders, governments, and journals publishers from developed countries with little regard for researchers and institutions from the low developing countries (Pampel and Dallmeier-tiessen, 2014). Existing policies and regulations seem to be aligned in favor of institutions and researchers in the developed nation (Vasilevsky, et al., 2017). This makes effective RDM support services locally to be significantly slow (Pinfield et al., 2014).
Research cultures and incentives make researchers often unable or unwilling to make their data accessible (Savage and Vickers, 2009; Vines et al., 2014), inadequate resources, disciplinary differences, policy and legal constraints, and lack of awareness hinder uptake of RDM practices (National Academies of Science, 2018). Most research data continue to be under the custody of researchers, who have different perspectives about managing their data, are poor stewards of data, particularly over the long term, and end up creating unique, ad hoc approaches to organizing data, making it inaccessible and putting data at risk of loss (Chigwada, et al., 2017; Van Tuyl and Michalek, 2015; Vines et al., 2014). Even governments have done little to support RDM practices despite the investment in research data collection (Barsky, 2017; (Mladovsky, et al., 2015).
Justification for the models adopted for the study
The study adopted three models namely Data Asset Framework (Jones, et al., 2008), the Data Curation Centre lifecycle model (Higgins, 2008), and the Community Capability Model framework (Lyon et al., 2012). These are going to complement each other in addressing the research questions. The three models were chosen due to their ability to support each other by explaining the different facets of RDM practices and RDM readiness in selected health institutes. However, the quality of a theory is judged by its explanatory and predictive power and its scope (Schoenfeld, 1998; Vithal, et al., 2013). Additionally, there is also an element of pragmatism ontology in adapting ideas from a range of theories. Creswell, (2009) alluded this to mixed research methods which both test and generate theories and in this case, shall result from the application of the three models in a single study.
This primer study shall apply the Data Asset Framework to describe research data assets created and held in health institutes. In addition to exploring how data are stored, managed, shared, and reused; identifying any risks of misuse, data loss, or irretrievability and understanding the researchers’ attitudes towards data sharing, and suggesting ways to improve overall research data management in health institutes.
The Digital Curation Centre Lifecycle Model shall be applied to explain the key issues necessary for the effective and successful execution of the data curation processes. DCC Lifecycle model describes all activities carried out in RDM which are: capture, appraisal, preservation, access, and re-use of research data. These activities form the core functionalities in RDM. The Digital Curation Centre Lifecycle Model is useful in identifying issues of authenticity and integrity In addition to strategies for providing for adequate knowledge representation and access, which support predictable preservation lifecycle of assets, as well as focusing attention to the interests of communities of practice of researchers.
The Community Capability Model Framework is applied to assess the community/institutional capabilities. It considers the following attributes in its assessment; skill and training, technical infrastructure, legal and policy issues, collaborative partnerships, and openness which are also important for successful RDM practices. Since these attributes are some of the variables the study is investigating, the CCMF is better placed to be applied in the study based on these perspectives. Observing the gaps in each of the models and their limitations to explain fully all the key variables of the study, the researcher found it justifiable to adopt a complementary model involving all the three identified models.
Research gap
There is no study covering all the important aspects of RDM practices and the RDM readiness in a health institution in Uganda. In addition, there is no single theory that could be applied to predict, validate or explain research data management practices and readiness in health institutes locally. In reality, many institutions in low developing countries are yet to implement the most basic of data management services (Mushi et al., 2020). Studies carried out in Uganda (Verhulst, and Young, 2017) focused on the country’s preparedness towards the data revolution in general terms. However, several constraints were identified as limiting the full utilization of data across disciplines. The models chosen to be applied in the study were used in a relatively different context in developed nations and research-intensive institutes either independent of one another, though the tendency to combine the DCC and CCM has merged, in low developing countries where RDM is yet to take root.
Application of the three models in a single study provides room to improve the interpretation and provide a clear understanding of what is taking place in institutions where RDM is yet to be formally established which is a gap and an opportunity for the current study. However, the challenges identified include; legal and ethical concerns, infrastructure, technological and institutional limitations, and making research data a property of individual researchers (Uganda Health Research Organization, 2012). The knowledge gaps existing in RDM practices and RDM readiness in Health Institutes in Uganda present an opportunity for this study.
Summary of chapter two
The chapter focused on reviewing the different theories and models so far applied in studying research data management in a different context. Models have been identified as the most commonly applied platforms to understand RDM practices specifically digital curation. The models, by systematically merging relevant strands of preservation activities into one make it easier to identify and document the required processes. This effectively supports the management of research data for long-term value. Be able to identify the underlying problems that constantly create a deadlock in digital curation processes and be able to propose measures necessary to sustainably manage research data for both current and future user needs. The models also offer guidelines addressing issues associated with the creation, storage, long-term preservation, maintenance, use, and re-use of digital objects which currently include research data. The assessment of each of the models found CCMF and DCC as the most ideal for application in the current study. However, it was also important to incorporate the DAF which has been applied in several studies to provide insight about baseline information to understand the data assets held by the different universities and research institutes. DAF is applied to find out the status and nature of the research data held, institutional role in managing the research data, and the researcher’s data practices (Cox and Williamson, 2015). Thus, the three models were adopted for application in the study to support and complement each other and offer a strong analytical tool to understand research data management practices and readiness to provide empirical evidence necessary to propose interventions to improve RDM practices and uptake in selected health institutes in Uganda.