Research consultancy
CHAPTER THREE
METHODOLOGY
3.0 Introduction:
This section presents a detailed description on how the study will be carried out and collecting the necessary data for the study. It therefore covers the research design, study area, data sources, data processing, data analysis techniques and anticipated limitations of the study.
3.1 Research Design:
The study shall use quantitative methods of research so as to obtain the viable data and this shall include structured secondary data in the records of the ministry of health.
3.2 Data Sources.
Secondary data will be obtained from the data base, records, publications and journals in the ministry of health.
3.3 Data processing and Data analysis techniques. The process of data processing will involve editing in order to check for errors and omissions and coding to reduce the data to a meaningful pattern of responses. Model specification and soft wares employed in the tabulation and processing of the findings will be done in order to prepare data, analyze and compile a research report.
The study will use time series analysis and descriptive statistics will be used to describe the information got from the field this will be inform of graphs and tables
Data Analysis will involve applying statistical techniques on it for easy presentation. It will include the interpretation of research findings in the light of the research questions, and objectives to determine if the results are consistent with those research questions.
3.3.1 Descriptive analysis.
3.3.2 Time series analysis
By the nature of data which is the time series
The analysis however will concentrate on trend and seasonality of HIV prevalence
Assuming a multiplicative model, then 𝑌𝑡=𝑇𝑡∗𝑆𝑡
Where 𝑌𝑡 is the mortality series, 𝑇𝑡 is Trend and 𝑆𝑡 is the seasons.
This employs ARIMA modeling and it includes the following data exploration techniques.
- Graphical presentation
This will involve plotting the series 𝑌𝑡 against time t.
- Non parametric tests for trend
Run’s test: The runs test (Bradley, 1968) can be used to decide if a data set is from a random process.
A run is defined as a series of increasing values or a series of decreasing values. The number of increasing, or decreasing, values is the length of the run. In a random data set, the probability that the (i+1)th value is larger or smaller than the ith value follows a binomial distribution, which forms the basis of the runs test. Testing procedure
Ho: the HIV prevalence series is stationary
Ha: the HIV prevalence series is non-stationary.
Test statistic
Where m=number of pluses Decision rule is at α=0.05
The researcher will reject Ho if Z>𝑍∝/2 i.e. if the computed Z statistic is greater than the notable value and then conclude with (1-α)*100% confidence, the series has trend.
Test for seasonality
In this research, the researcher will use the Kruskal-Wallis test which is an alternative for the parametric one-way analysis of variance test, if there are two or more independent groups to compare (Siegel & Castellan 1988).
The test is described as below; Ho: the series has no seasonality Ha: the series has seasonality
Test statistics, H to compare with (Chi square)
ni is the number of observations in the ith season N is the total number of specific seasons
Ri= 𝑟𝑎𝑛𝑘 (𝑦𝑖) Yi is the specific season for time t. Critical region
Reject Ho if
3.4.3 Autoregressive Integrated Moving Average (ARIMA)
This is also known as the Box-Jenkins model. This methodology will be used to forecast the HIV prevalence for children aged below 15 years. The model is based on the assumption that the time series involved are stationary. Stationary will first be checked and if not found, the series will be differenced d times to make it stationary and then the Autoregressive Moving Average (ARMA) (p, q) will be applied. The ARIMA procedure provides a comprehensive set of tools for univariate time series model identification, parameter estimation, and forecasting, and it offers great flexibility in the kinds of ARIMA models that can be analyzed. The ARIMA procedure supports seasonal, subset, and factored ARIMA models; intervention or interrupted time series models; multiple regression analysis with ARMA errors; and rational transfer function models of any complexity. The Box-Jenkins methodology has four steps that will be followed when forecasting HIV prevalence among children as below;
Identification.0 This involves finding out the values of p, d, and q
where;
p is the number of autoregressive terms
d is the number of times the series is differenced
q is the number of moving average terms
The identification here will be done basing on the correlogram plot obtained. Where both autocorrelation and partial correlation cuts of at a certain point, we conclude that the data follows an autoregressive model. The order p, of the ARIMA model is obtained by identifying the number of lags moving in the same direction. In case the series was non stationary, the number of times we difference the series to obtain stationarity is the value of d.
Estimation. This involves estimation of the parameters of the Autoregressive and Moving average terms in the model. The nonlinear estimation will be used.
Diagnostic checking. Having chosen a particular ARIMA model, and having estimated its parameters, we now examine whether the chosen model fits the data reasonably well. The simple
test of the chosen model will be done to see if the residuals estimated from this model are white noise. If they are, we can accept the particular fit and if not, the model will have to be started over.
Forecasting. Exponential smoothing methods will be used for making forecasts. While exponential smoothing methods do not make any assumptions about correlations between successive values of the time series, in some cases you can make a better predictive model by taking correlations in the data into account. Autoregressive Integrated Moving Average (ARIMA) models include an explicit statistical model for the irregular component of a time series that allows for non-zero autocorrelations in the irregular component.
The forecast for the year 2016 will be done by regressing HIV prevalence against time
The residence and region will be analyzed using the ANOVA test by regressing HIV prevalence (dependent) on the dummies for place of residence and dummies for region using SPSS since residence and region are both categorical independent variables.
Yt =βo+β1DR+β2DC+β3DN+β4DE
Where Yt is the HIV/AIDS prevalence at the time in a given region and residence
DR is a dummy for rural
DC is a dummy for central
DN is a dummy for north
DE is a dummy for East
3.8 Ethical considerations: The researcher will begin data collection by explaining the purpose of the research, which basically meant to help decision makers of ministry of health Uganda and the users of the information from other health organizations and hospitals. Respondents will be informed that the purpose of the information shall be strictly for academic purposes only and the information provided will be treated with highest level of confidentiality.