Research proposal
CHAPTER THREE:
RESEARCH METHODOLOGY
3.0 Introduction:
This section presents a detailed description on how the study will be carried out and collecting the necessary data for the study. It therefore covers the research design, study area, data sources, data processing, data analysis techniques and anticipated limitations of the study.
3.1 Research Design:
The study shall use quantitative methods of research so as to obtain the viable data and this shall include structured secondary data in the records of the ministry of health.
3.2 Data Sources.
Secondary data will be obtained from the data base, records, publications and journals in the ministry of health.
3.3 Data processing and Data analysis techniques.
The process of data processing will involve editing in order to check for errors and omissions and coding to reduce the data to a meaningful pattern of responses. Model specification and soft wares employed in the tabulation and processing of the findings will be done in order to prepare data, analyze and compile a research report.
The study will use time series analysis and descriptive statistics will be used to describe the information got from the field this will be inform of graphs and tables
Data Analysis will involve applying statistical techniques on it for easy presentation. It will include the interpretation of research findings in the light of the research questions, and objectives to determine if the results are consistent with those research questions.
3.3.1 Descriptive analysis.
3.3.2 Time series analysis
By the nature of data which is the time series
The analysis however will concentrate on trend and seasonality of malaria prevalence
Assuming a multiplicative model, then 𝑌𝑡=𝑇𝑡∗𝑆𝑡
Where 𝑌𝑡 is the mortality series, 𝑇𝑡 is Trend and 𝑆𝑡 is the seasons.
This employs ARIMA modeling and it includes the following data exploration techniques.
- Graphical presentation
This will involve plotting the series 𝑌𝑡 against time t.
- Non parametric tests for trend
Run’s test: The runs test (Bradley, 1968) can be used to decide if a data set is from a random process.
A run is defined as a series of increasing values or a series of decreasing values. The number of increasing, or decreasing, values is the length of the run. In a random data set, the probability that the (i+1)th value is larger or smaller than the ith value follows a binomial distribution, which forms the basis of the runs test. Testing procedure
Ho: the malaria prevalence series is stationary
Ha: the malaria prevalence series is non-stationary.
Test statistic
Where m=number of pluses Decision rule is at α=0.05
The researcher will reject Ho if Z>𝑍∝/2 i.e. if the computed Z statistic is greater than the notable value and then conclude with (1-α)*100% confidence, the series has trend.
Test for seasonality
In this research, the researcher will use the Kruskal-Wallis test which is an alternative for the parametric one-way analysis of variance test, if there are two or more independent groups to compare (Siegel & Castellan 1988).
The test is described as below; Ho: the series has no seasonality Ha: the series has seasonality
Test statistics, H to compare with (Chi square)
ni is the number of observations in the ith season N is the total number of specific seasons
Ri= 𝑟𝑎𝑛𝑘 (𝑦𝑖) Yi is the specific season for time t. Critical region
Reject Ho if
Test for stationarity.
Unit root tests were carried out using the augmented Dickey-Fuller test statistic. This was carried out to check whether the series were stationary (integrated) or not. This is because standard inference procedures do not apply to regressions which contain an integrated dependent variable or integrated repressors. The test statistic tested the null hypothesis that the time series has a unit root against the alternative that there is no unit root. The augmented Dickey-Fuller (ADF) statistic used in the test is a negative number. The more negative it is, the stronger the rejection of the hypothesis that there is a unit roots at some level of confidence.
Ho: There is a unit root
Ha: There is no unit root
The test statistic is computed;
DFT =
The test statistic values are compared to the critical values at five percent significant level. The test statistic values less than the critical values at five percent level of significance indicate that the series are non-stationary otherwise they are stationary.
3.4.3 Autoregressive Integrated Moving Average (ARIMA)
This is also known as the Box-Jenkins model. This methodology will be used to forecast the malaria prevalence for children aged below 15 years. The model is based on the assumption that the time series involved are stationary. Stationary will first be checked and if not found, the series will be differenced d times to make it stationary and then the Autoregressive Moving Average (ARMA) (p, q) will be applied. The ARIMA procedure provides a comprehensive set of tools for univariate time series model identification, parameter estimation, and forecasting, and it offers great flexibility in the kinds of ARIMA models that can be analyzed. The ARIMA procedure supports seasonal, subset, and factored ARIMA models; intervention or interrupted time series models; multiple regression analysis with ARMA errors; and rational transfer function models of any complexity. The Box-Jenkins methodology has four steps that will be followed when forecasting malaria prevalence among children as below;
Identification.0 This involves finding out the values of p, d, and q
where;
p is the number of autoregressive terms
d is the number of times the series is differenced
q is the number of moving average terms
The identification here will be done basing on the correlogram plot obtained. Where both autocorrelation and partial correlation cuts of at a certain point, we conclude that the data follows an autoregressive model. The order p, of the ARIMA model is obtained by identifying the number of lags moving in the same direction. In case the series was non stationary, the number of times we difference the series to obtain stationary is the value of d.
Estimation. This involves estimation of the parameters of the Autoregressive and Moving average terms in the model. The nonlinear estimation will be used.
Diagnostic checking. Having chosen a particular ARIMA model, and having estimated its parameters, we now examine whether the chosen model fits the data reasonably well. The simple
test of the chosen model will be done to see if the residuals estimated from this model are white noise. If they are, we can accept the particular fit and if not, the model will have to be started over.
Forecasting. Exponential smoothing methods will be used for making forecasts. While exponential smoothing methods do not make any assumptions about correlations between successive values of the time series, in some cases you can make a better predictive model by taking correlations in the data into account. Autoregressive Integrated Moving Average (ARIMA) models include an explicit statistical model for the irregular component of a time series that allows for non-zero autocorrelations in the irregular component.
The forecast for the year 2016 will be done by regressing malaria prevalence against time
The residence and region will be analyzed using the ANOVA test by regressing malaria prevalence (dependent) on the dummies for place of residence and dummies for region using SPSS since residence and region are both categorical independent variables.
Yt =βo+β1DR+β2DC+β3DN+β4DE
Where Yt is the malaria prevalence at the time in a given region and residence
DR is a dummy for rural
DC is a dummy for central
DN is a dummy for north
DE is a dummy for East
While , βo + β1+ β2+ β3