Data analysis
ANSWERS
Qn; i) Suggest the data collection instrument/s that could have been used to collect the data in the table above. Justify the selection of the instrument.
- i) Suggested Data Collection Instruments:
Survey/Questionnaire:
A structured survey or questionnaire would be the most appropriate instrument to collect the monthly income data. It allows for systematic collection of quantitative data from a large group, such as the 110 villagers in this case. Respondents can be asked directly about their monthly income post-poverty reduction.
Justification: A questionnaire allows for consistency in data collection, where the same set of questions are posed to each participant. It can gather numerical information quickly and easily, which is ideal for collecting income data. Additionally, respondents can fill them out privately, making it more likely that they provide accurate income information.
Interviews (structured or semi-structured):
This involves direct face-to-face or phone interviews where participants are asked about their monthly income. Interviews can be structured (following a fixed set of questions) or semi-structured (allowing for some flexibility).
Justification: Interviews may provide more detailed, accurate responses since interviewers can clarify questions or probe deeper if necessary. This is particularly useful in areas where literacy might be a barrier to using a written questionnaire. Structured interviews would ensure uniformity across all respondents.
Both methods could be used depending on factors such as the literacy levels of the villagers, the required depth of information, and resources available for data collection
ANSWER B; ii) Beginning with the class of 10 -14 and using equal class width, construct a frequency distribution table for the data above.
To construct a frequency distribution table, we first need to determine an appropriate number of classes and the class width.
Determining the Range
The data ranges from a minimum of 10 to a maximum of 54.
Range=54−10=44
Deciding on the Number of Classes
Typically, the number of classes (k) is found using Sturges’ formula:
k=1+3.322log(n)
where n=110n = 110n=110 (the number of data points).
k=1+3.322log(110)=7.49=8
Thus, we’ll use 8 classes.
Determining the Class Width
To find the class width, divide the range by the number of classes:
Class width= Range/K
=44/8 = 5.5
We round this up to 5 (to simplify the classes).
Creating the Classes
We will begin with the class of 10–14, and since the class width is 5, the following classes will be:
- 10–14
- 15–19
- 20–24
- 25–29
- 30–34
- 35–39
- 40–44
- 45–49
- 50–54
Tallying the Frequencies
It is now important to count the data points that fall in each class.
Class | Frequency (f) |
10–14 | 13 |
15–19 | 14 |
20–24 | 20 |
25–29 | 15 |
30–34 | 12 |
35–39 | 14 |
40–44 | 10 |
45–49 | 8 |
50–54 | 4 |
The frequency distribution
Class | Frequency (f) |
10–14 | 13 |
15–19 | 14 |
20–24 | 20 |
25–29 | 15 |
30–34 | 12 |
35–39 | 14 |
40–44 | 10 |
45–49 | 8 |
50–54 | 4 |
Question three; iii) Using data in the frequency distribution table determine the mean, median, mode. Basing on the values you have determined for the averages, describe the nature of the monthly income distribution and suggest the most appropriate average for this distribution.
Mean = summation of
Mean= ∑fixi/∑fi
Where =fi=frequency
Xi=midpoints
Midpoints = Lower Class Limit+Upper Class Limit/2
Class | Frequency (f) | Cumulative frequency | Mid point | Mid-point answers | Multiplication of mid point with the frequency |
10–14 | 13 | 13 | 10+14/2 | 12 | 156 |
15–19 | 14 | 27 | 15+19/2 | 17 | 238 |
20–24 | 20 | 47 | 20+24/2 | 22 | 440 |
25–29 | 15 | 62 | 25+29/2 | 27 | 405 |
30–34 | 12 | 74 | 30+34/2 | 32 | 384 |
35–39 | 14 | 88 | 35+39/2 | 37 | 518 |
40–44 | 10 | 98 | 40+44/2 | 42 | 420 |
45–49 | 8 | 106 | 45+49/2 | 47 | 376 |
50–54 | 4 | 110 | 50+54/4 | 52 | 208 |
Total | 110 | 288 | 3145 |
Mean = 3145/110
Total mean= 28.59
Finding the Median
Median=L+(N/2−F/f)x w
Where:
- L= lower boundary of the median class
- N = total frequency
- F = cumulative frequency of the class before the median class
- f = frequency of the median class
- w= class width
We will first find the cumulative frequency and determine the median class.
24.5+(55-47/15)* 5
24.5+(0.533)*5
Median =27.17
Finding the mode
Mode = L+ (fm-fm-1/(fm-fm-1)+(fm-fm+))x w
L = lower boundary of the modal class
fm = frequency of the modal class
fm−1 = frequency of the class before the modal class
fm+1 = frequency of the class after the modal class
www = class width
The mode is the class with the highest frequency. In this case, the class 20–24 has the highest frequency of 20.
Thus, the modal class is 20–24.
In Public Administration, statistical tools like mean, median, and mode play a crucial role in data analysis and decision-making, the following are their roles in decision making.
- Mean (Average):
Resource Allocation: The mean is useful for assessing the average performance of public services (e.g., average response time of emergency services, average income levels in a region). This helps in making informed decisions about where resources should be allocated.
Policy Evaluation: It assists in determining the average impact of policies (e.g., the average income increase due to a welfare program).
Budgeting: Public administrators use the mean to understand average spending across departments and for various projects, aiding in more accurate budgeting.
- Median:
Equitable Policy Formulation: The median is particularly important when dealing with income distribution, where averages might be misleading due to the presence of outliers (e.g., very high-income individuals skewing the mean). Using the median gives a better picture of the typical citizen.
Public Service Efficiency: For understanding service distribution, such as healthcare access or educational attainment, the median can show where the bulk of the population lies, helping in crafting policies that target the majority effectively.
Income Inequality: Median household income is often more representative of the general public’s situation than the mean, making it a better guide for policy targeting income inequality.
- Mode:
Frequent Trends: The mode helps identify the most common occurrences within datasets, such as the most frequently reported crime, the most common age group applying for welfare, or the most used public services. This helps administrators understand the most typical scenarios they need to address.
Service Optimization: For example, understanding the mode of public transport usage times can help in scheduling services where demand is highest, improving efficiency.
Practical Examples in Public Administration:
Health Services: When analyzing waiting times at public hospitals, the mode might indicate the most common waiting time, the median could reflect the middle value (less sensitive to outliers), and the mean shows the overall average.
Public Budgeting: In budget allocation, the mean can provide a broad sense of average spending, but the median might reveal the true middle ground if some sectors have very high or very low expenses.
Crime Analysis: In policing, the mode can show the most frequent type of crime in an area, helping focus preventive measures.
Qn: What are the challenges of using averages in iii) above in decision making?
Median
Insensitive to Distribution Shape: While the median is less affected by outliers, it does not take into account the distribution shape, which can be a disadvantage in some cases where the overall shape of the data matters.
Inefficient for Large Datasets: The median requires sorting the data, which can be computationally expensive for very large datasets.
Not as Intuitive for Small Datasets: In small datasets, the median can fall between data points, which can feel less intuitive than using the mean.
Example: In a dataset like [1, 2, 3, 4], the median is 2.5, which is not a value actually present in the dataset.
Can Oversimplify: The median does not give any insight into the spread of the data or variability like the mean does. It focuses only on the middle value.
Mode
Limited Use with Continuous Data: For continuous numerical data, the mode is often not very useful because the likelihood of the exact same value repeating is low.
Example: In a dataset of measured heights, there may be no repeated values, so the mode doesn’t offer much insight.
Multiple Modes: Data can be bimodal or multimodal, meaning there may be more than one mode. This complicates the interpretation, as it can be unclear how to summarize a dataset with more than one central value.
Example: A dataset with the values [1, 1, 3, 3, 5] has two modes, 1 and 3, which could be confusing.
Not Always Representative: The mode can sometimes reflect a frequent value that doesn’t represent the overall dataset well. If one value happens to repeat, even if it’s much higher or lower than the other data points, the mode will prioritize that frequent value, possibly leading to misinterpretation.
Irrelevant for Uniform Distributions: In a dataset where all values occur with the same frequency, the mode is not meaningful because no single value occurs more frequently than others.
- vi) Determine the standard deviation and coefficient of variation of the incomes. Comment on the usefulness of these two measures in Public Administration.
Class Interval | Frequency (f) | Mid-point (x) | (x−xˉ)(x – \bar{x})(x−xˉ) | (x−xˉ)2(x – \bar{x})^2(x−xˉ)2 | f×(x−xˉ)2f \times (x – \bar{x})^2f×(x−xˉ)2 |
10–14 | 13 | 12 | 12−28.59=−16.5912 – 28.59 = -16.5912−28.59=−16.59 | 16.592=275.3316.59^2 = 275.3316.592=275.33 | 13×275.33=3579.2913 \times 275.33 = 3579.2913×275.33=3579.29 |
15–19 | 14 | 17 | 17−28.59= −11.5917 – 28.59 = -11.5917−28.59 = −11.59 | 11.592=134.4111.59 ^2 = 134.4111.592 =134.41 | 14×134.41=1881.7414 \times 134.41 = 1881.7414×134.41=1881.74 |
20–24 | 20 | 22 | 22−28.59= −6.5922 – 28.59 = -6.5922−28.59 =−6.59 | 6.592=43.436.59^2 = 43.436.592=43.43 | 20×43.43=868.620 \times 43.43 = 868.620×43.43=868.6 |
25–29 | 15 | 27 | 27−28.59= −1.5927 – 28.59 = -1.5927−28.59 =−1.59 | 1.592=2.531.59^2 = 2.531.592=2.53 | 15×2.53=37.9515 \times 2.53 = 37.9515×2.53=37.95 |
30–34 | 12 | 32 | 32−28.59= 3.4132 – 28.59 = 3.4132−28.59 =3.41 | 3.412=11.633.41^2 = 11.633.412=11.63 | 12×11.63=139.5612 \times 11.63 = 139.5612×11.63=139.56 |
35–39 | 14 | 37 | 37−28.59= 8.4137 – 28.59 = 8.4137−28.59 =8.41 | 8.412=70.748.41^2 = 70.748.412=70.74 | 14×70.74=990.3614 \times 70.74 = 990.3614×70.74=990.36 |
40–44 | 10 | 42 | 42−28.59= 13.4142 – 28.59 = 13.4142−28.6 =13.41 | 13.412=179.8413.41 ^2 = 179.8413.412=179.84 | 10×179.84=1798.410 \times 179.84 = 1798.410×179.84=1798.4 | |||||
45–49 | 8 | 47 | 47−28.59= 18.4147 – 28.59 = 18.4147−28.6 =18.41 | 18.412=338.8818.41 ^2 = 338.8818.412= 338.88 | 8×338.88=2711.048 \times 338.88 = 2711.048×338.88=2711.04 | |||||
50–54 | 4 | 52 | 52−28.59= 23.4152 – 28.59 = 23.4152−28.59 =23.41 | 23.412=548.0623.41^2 = 548.0623.412=548.06 | 4×548.06=2192.244 \times 548.06 = 2192.244×548.06 =2192.24 | |||||
Calculations
∑f×(x−xˉ)2=3579.29+1881.74+868.6+37.95+139.56+990.36+1798.4+2711.04+2192.24=15199.18
σ2= ∑f×(x−xˉ)2/∑f
σ2=15199.18/110=138.17
σ=≈11.76
standard deviation =11.76
- Standard Deviation (SD)
Definition: The standard deviation is a measure of the amount of variation or dispersion in a set of values. A low standard deviation indicates that the data points tend to be close to the mean (average), while a high standard deviation indicates that the data points are spread out over a wide range.
Usefulness of Standard Deviation:
Assessing data spread: SD tells you how much the values in a dataset differ from the mean, giving an understanding of the spread of the data.
Risk assessment: In finance, for instance, SD is used to assess the volatility of investments. A higher SD suggests greater risk because returns are more spread out.
Data consistency: A low SD implies that data points are clustered around the mean, indicating consistency. A high SD suggests inconsistency.
Comparing data sets: SD is commonly used to compare variability between two or more datasets (especially if the datasets have similar scales).
Normal distribution: In a normal distribution, about 68% of the data lies within one SD from the mean, 95% within two SDs, and 99.7% within three SDs. This is useful in understanding the probabilities of outcomes.
- Coefficient of Variation (CV)
Definition: The coefficient of variation is the ratio of the standard deviation to the mean, often expressed as a percentage. It is a normalized measure of dispersion and is useful when you want to compare the degree of variation from one dataset to another, even if they have different units or means.
Usefulness of Coefficient of Variation:
Unit-less comparison: CV allows for comparison between datasets with different units or means. For example, comparing variability in the weight of animals with variability in their lifespan.
Relative variability: CV measures the relative variability in relation to the mean, so it is more informative in situations where the mean of the data plays a crucial role in the context of the analysis.
Assessing risk-to-return ratios: In finance, CV is used to compare the risk (SD) to the expected return (mean). A lower CV indicates a better risk-to-reward ratio, which is essential for portfolio management.
Comparing consistency: CV is particularly useful when comparing consistency between different processes or groups. For instance, a lower CV in manufacturing indicates that a process is more reliable, producing consistent results.
Key Differences:
SD is useful for measuring the absolute dispersion of data around the mean, making it best for datasets with similar means or units.
CV is ideal for comparing the relative dispersion between datasets with different scales or means, offering a way to compare consistency across different datasets.
Qn: Assuming that the monthly income of the selected villagers is normally distributed, test the hypothesis at 5 % level of significance that the project has worked.
To test the hypothesis that the poverty reduction project has worked, we will perform a one-sample t-test. The null hypothesis will state that the average monthly income after the program is equal to the baseline average income (UGX 28,600). The alternative hypothesis will state that the average monthly income after the program is greater than the baseline average.
Formulate the Hypotheses:
Null Hypothesis: H0: μ=28,600 H0: u = 28,600
H0:μ=28,600 (The average monthly income has not increased.)
Alternative Hypothesis: H1:μ>28,600 H1: u > 28,600
H1:μ>28,600 (The average monthly income has increased.)
Calculate the Sample Statistics:
Calculate the sample mean (xˉ) and sample standard deviation (s).
Determine the Sample Size n=110
Calculate the t-statistic ; t= xˉ−μ/ s/n
where μ is the population mean under the null hypothesis.
Sample Mean is UGX 30,109.09 (or Shs 30,109)
Sample Standard Deviation (s): UGX 11,485.41 (or Shs 11,485)
Calculated t-statistic: –26089.06
At the 5% significance level, there is insufficient evidence to conclude that the average monthly income of the villagers has increased after the poverty reduction program.
Qn; write a brief report
- Standard Deviation: Indicates income variability among citizens, essential for assessing the effectiveness of poverty reduction programs.
- Coefficient of Variation: Provides a relative measure of income variability, useful for comparing different regions or demographic groups.
Both measures are vital for public administrators to identify areas needing targeted interventions and to assess the impact of policies over time.