Data analysis

Data analysis

CROSS TABULATION:

This helps us in understanding how much of a specific variable has a relationship with the other.

For example

Age Education Gender District
22PrimaryMaleTororo
22SecondaryMaleTororo
24TertiaryFemaleIganga
31PrimaryMaleTororo
26PrimaryFemaleJinja
26SecondaryFemaleTororo
31UniversityMaleIganga
26SecondaryMaleJinja
22PrimaryMaleJinja

 

STEPS

  • Analyze
  • Descriptive statistics
  • Cross tabulations
  • The select variables for now
  • Select variables for columns

DATA ANALYSIS

This involves 5 major steps

  1. Enter your Data in the Data Editor.
  2. Select the procedure from the menu.
  3. Select variables for the Analysis.
  4. Examine the results in the output window.
  5. Interpret the Results in the word document.

NOTE

Before any data analysis is done, first identify whether the variable(s) is/are Quantitative or categorical.

It includes Univariate analysis, Bivariate analysis and Multi- variate analysis.

For Univariate analysis, a single variable is analyzed at a time e.g what is the Average age of students?

Bivariate- two variables

Does the income of the respondent depend on age?

Multivariate- more than two variables

Does income of the respond depend on age, Education level and sex of the respond.

UNIVARIATE ANALYSIS

Descriptive statistics

These are computed only for Quantitative/ continuous variables such as age, height and weight.

Procedure

  • Analyze _ descriptive statistics _ descriptive.
  • Select variables from the LH box into RH box.
  • The user can specify the particular statistics required by selecting “options” or statistics button.
  • Press ok

Interpret the resorts i.e  mean, median, mode, frequency, Quantile sum, variable, standard deviation, minimum, maximum, range, kurtosis and skewness.

 

Example

To get the minimum, maximum, mean, stdn, variable.

Click

Analyze_ descriptive statistics_ descriptive.

The output will be as;

Frequency Distribution

This done for categorical/ Qualitative variables such as sex, marital status and age group.

GRAPHING

Steps

Graphs

 

 

Legacy dialogues

The choose the type of diagram you want.

A PIE CHART

This is done for categorical/ Qualitative variables such as sex and Educational.

BOX PLOT

  • Graph _ legacy dialogs_ box plots simple
  • Select summaries of separate variables
  • Define
  • Select the continuous variables to be charted
  • Press ok

HISTOGRAM

  • Graphs legacy dialogs_ histogram
  • Select variables
  • Select display normal curve
  • Press ok

LINE GRAPH

  • Graph_ legacy dialogue_ line_ simple
  • Select values of individual cases
  • Define
  • Select Y and X-axis variables
  • Press ok

Multiple responses/open ended questions

This is the case where each respondent gives one or more than one answer to a particular question e.g what are reasons for high children dropouts in some parts of Uganda.

Example

Opinion poll about Rwanda elections

People were asked to give their opinion as to why Museveni won the recent presidential election in Museveni here below is data of their responses.

  1. Best candidate
  2. Rigged the election
  3. He is a dictator
  4. He was the only candidate

Response

  • 1
  • 2,3
  • 2,3,4
  • 3,4
  • 2,3,4
  • 2,4
  • 3,2

According to the survey if Museveni win election?

Method 1 / Dichotomics method

STEPS

DATA ENTRY

Best CandidateRigged election Dictator Only ……..

 

1   

 

 23 

 

 23 

 

  

2

34
  

2

34
  

2

34
  

2

34

 

  • Analyze
  • Multiple Responses
  • Define sets
  • Move the desired variables from set definition box to variables set box
  • Click on dichotomies counted values
  • Put 1
  • Go to name Reasons
  • Label why Kagame won
  • Click on add
  • Close
  • Go to analyze
  • Multiple responses
  • Frequencies

The output appears as below show output

Interpretation

The above analysis shows that the main reasons for Museveni’s win were that he rigged the Election and that He is a dictator as reported by 30.8% of the response in either cases. This is followed by reason that he was the only candidate as reported by 15.4% of the response. The other minor reason was that he was the best candidate reported by only 7.7% of the reason.

Method 2 / categories method

STEPS

Data Entry

 Best candidateRigged electiondictatorOnly candidate
11   
2 2  
3 23 
4  34
5 234
6 234
7 2 4
8 23 
9    

 

  • Analyze
  • Multiple Responses
  • Define sets
  • Move the desired variable from set definition box to variables set box
  • Click on categorical values
  • Put 1 through 4 (it depends on the number of reasons you have)
  • Go to name-reasons
  • Label-why-Museveni won
  • Click on add
  • Close
  • Go to analyze
  • Multiple response
  • Frequencies

The table will appear

NB: Interpretation will appear as in table 1

Bivariate Analysis

This is only done for categorical variables.

CORRELATION ANALYSIS

This is a measure of relationship between two variables.

It tells us how strong the correlation between the two variables.

The relationship could be negative (-) or positive (+) if the correlation coefficient (p) =1, then there is perfect positive correlation between the variables and if it is = -1 then there is perfect negative correlation between variables.

If p>0.5, there is a strong relationship between the variables.

If p= 0.5, the relationship is moderate.

If p<0.5, there is weak relationship between the variables.

If p<0, then the relationship is very weak.

NOTE

In correlation analysis, we analyze the strength, direction and significance of relationship.

DIRECTION

If the correlation coefficient is negative, it implies the two variables are moving in the different directions, as one variables increases another one decreases. If the correlation coefficient positive, it implies that the two variables are moving in the same direction, as one variable increases, another variable also increases.

Significance of the relationship

If the P-value is less than the level of significance such as 0.05, and 0.01, then the relationship is significance otherwise it is insignificant.

Testing for correlation

  1. Graphical approach Ascalter plot is used.

The scalter plot illustrates relationship between the variables which can be positive, negative or non-existing.

  • Graphs legacy Dialogs_ scalter_ simple define
  • Select the Y and X-axis variables
  • Press ok
  • To add the line of best fit, double click in the plot and click on add a reference line from equation.

The following are scalter plots for visual interpretations of types of correlations

Example

Using the data

       Y       X
2441.13776.3
2476.93843.1
2503.73760.3
2619.43906.6
2746.14148.5
2865.84279.8
2969.14404.5
3052.24539.9
3162.44718.6
3223.34838
3260.44877.5
3240.84821

 

Is there any relationship between the variables?

Statistical tests

Reason correlation coefficient

This is used for qualitative variable such as age and income.

For example

Is there any significant correlation between age and income of the respondent.

The Hypotheses are stated as follows

Ho: There is no significant correlation between age and income of the respondent.

Ha: There is a significant correlation between age and income of the respondent.

STEPS

  • Analyze _ correlate_ bivariate.
  • Select the variables from the LH box into the RH box.
  • Select flag significant correlations.
  • Select type of correlation coefficient person.
  • Press ok.
  • Interpret the result.

Example

Interpretation

The correlation coefficient is -0.259, this implies that there a weak negative correlation between highest year of school completed and age of the respondent. The correlation significant at 1% level of significance since the P-value (0.000) < or thus the null hypothesis is rejected and conclusion made there is significant relationship between highest year of school complete and age of the respondent.

  1. Spearman- deals with ranked.
  2. Kendal’s- categorical variables of some order such as Education level.

Example

Using the

If the confidence internal does not include the hypothesized value the population parameter, the null hypothesis is rejected otherwise it is accepted.

 

 

Chi-square test

It is a test of dependence or association between two variables which must be categorical such as marital status, education level, and religion e.t.c.

Example

Does religion of the respondent depend on marital status.

Procedure

Analyze >> descriptive statistics >> cross tabs

Select one variable for arrow and another for a column.

Click statistics >>………..square >> continue

Cells >>> row and column percent- age >> continue

Press ok

NOTE:

First state the Hypothesis

Ho: religion of the respondent does not depend on marital status.

Ha: Religion of the respondent depends on marital.

one-tailed test or a two-tailed test

Should you use a one-tailed test or a two-tailed test for your data analysis?

Quantitative Methodology

Quantitative Results

When creating your data analysis plan or working on your results, you may have to decide if your statistical test should be a one-tailed test or a two-tailed test (also known as “directional” and “non-directional” tests respectively). So, what exactly is the difference between the two? First, it may be helpful to know what the term “tail” means in this context.

The tail refers to the end of the distribution of the test statistic for the particular analysis that you are conducting. For example, a t-test uses the t distribution, and an analysis of variance (ANOVA) uses the F distribution. The distribution of the test statistic can have one or two tails depending on its shape (see the figure below). The black-shaded areas of the distributions in the figure are the tails. Symmetrical distributions like the t and z distributions have two tails. Asymmetrical distributions like the F and chi-square distributions have only one tail. This means that analyses such as ANOVA and chi-square tests do not have a “one-tailed vs. two-tailed” option, because the distributions they are based on have only one tail

SAMPLE T-TESTS

This is used for testing means.

SAMPLE TESTS IN SPSS

  • One sample t-tests.
  • Paired sample t-test.
  • Independent sample t-test.
  • ANOVA test.

Please always remember that;

  • One sample t-test is used to compare the mean of one variable from a target value.
  • Paired sample t-test is used to compared the mean of two variables for a single group.
  • Independent sample t-test is used to compare means of two distinct groups of cases e.g alive or dead, on off, men or women e.t.c.
  • ANOVA is used for testing several means.

One sample t-test

One sample t-test is performed when you want to determine if the mean value of a target variable is different from any pothesized value.

Examples

  • A researcher might want to test whether the average age to respondent differs from 52.
  • A researcher might want to test whether the average marked students differs from 75.

Assumptions for the one sample t-test

  • The dependent variable is normally distributed with in the population.
  • The data are independent.

 

Example 1

A study on the physical strength measured in Kilograms on 7 subjects before and after a specified training period gave the following results.

Subject Before After diff
1100115 
2110125 
390105 
4110130 
5125140 
6130140 
7105125 

 

Is there a difference in the physical strength before and after a specified training period?

  • State the hypothesis.
  • Use t-test to show that there is no mean difference in the physical strength before and after a specified training period.

Solution

First compute a new variable diff in the physical strength before and after a specified training period?

  • State the hypothesis.
  • Use t-test to show that there is no mean difference in the physical strength before and after a specified training period.

Solution

First compute a new variable diff-the difference between the value and the before value.

STEPS

  • Transform- compute.
  • For target variable type diff for numeric, expression type and after-before.
  • Click ok.
  • Analyze-compare means- one sample test.
  • Select diff as the test variables and test value to be 0.
  • Click on option and put 95%.
  • Under missing value select “exclude cases analysis by analysis”
  • Continue

Interpretation of the results

Ho: there is no significant mean difference in physical strength before and after a specified training period.

Ha: there is a significant mean difference in physical strength before and after a specified training period.

Since the P-value (0.000)<0.05, the null hypothesis is rejected implying that there is a significant mean difference in physical strength before and after a specified training period.

THE PAIRED T-TEST

  • The paired sample t-test produce compares the means of two variables for a single group.
  • It computes the difference between values of the two variables for each case and tests whether the average differs from 0.
  • It is usually used in the mate….. pairs or case- control study.

STEPS

  • Analyze – compare means- sample t-test.
  • Select a pair of variables, as follows.
  • Click each of two variables.

The first variables appear.

In the current selection group as variable 1 and the second appears as variable 2.

  • After you have selected a pair of variables.
  • Click the arrow button to move the pair into the paired variables lists click ok.
  • You may select move pairs of variables.
  • To move a pair of variables from the analysis.
  • Select a pair in the paired variable list and click the arrow button.
  • Click options to control treatment of missing data and the level of the confidence interval.

 

Example (page 48)

Independent sample t-test

This is used for testing means of a variable which has two disadvantaged groups of cases e.g is there a significant difference in income between the male and female respondent?

Assumptions for the independent sample t-test

  • The variables of the dependent variable in the two populations are equal.
  • The dependent variable is normally distributed with in the population.
  • The data are independent (scores of one participant are not dependent on scores of others).

The independent sample t-test procedure

  • It compares means for groups of cases.
  • The subjects should be randomly assigned to two groups so that any difference in the response is due to the treatment or lack of treatment but not to other factors.
  • Always ensure that the difference in other factors are not making or enhancing a significance difference in mean.

Example

  • The researcher is interested to see if in the population men and women have the same scores in a test.
  • If there is a difference in the highest year of school completed between the males and females.

STEPS

  • Analyze- compare means- independent- sample t-test.
  • Select one or more quantitative test variables.
  • Select a single grouping variable.
  • Click defines groups to specify two codes for the groups you want to compare.
  • Click options to control the treatment of missing data the level of the confidence.

Example

Page 50

Procedure

  • Analyze- compare means- independent- sample t-test.
  • Select highest year of school completed for test variable.
  • Select sex for grouping variable.
  • Click on define groups- use specified values, put 1 for group 1 and 2 for group 2. This is because 1 stands for male 2 stands for female.

The results show sets of test statistics

  • Equal variance assumed.
  • Equal variance not assumed.
  • If the F-statistics is significant (null is rejected) we used of equal variance not assumed for interpreting the t-test.
  • If the F-statistics is not significant (null is accept) we use the row oof equal variance assumed for interpreting the t-test.

NOTE

Lerene’s test helps to determine which row to use make a decision of accepting or rejecting the null hypothesis.

Results are interpreted as below

Ho: there is no significant mean difference in the highest year of school completed between the male and female respondents.

Ha: there is a significant mean difference in the highest year of school completed between the male and female respondents.

Page 51

Put a diagram down.

Page 52

Analysis of variables (ANOVA)

One way Analysis of variance

  • The one way ANOVA procedure produces a one way analysis of variance for a quantitative dependent variable by a single factor (independent) variable.
  • Analysis of variance is used to test the hypothesis that several means are equal.
  • This technique is an extension of the two sample t-test.
  • In addition to determining that differences exist among the means, you may want to know which means differ.
  • Here we can use post hoc test which are run after the experiment has been conducted.

Assumption (page 53)

  • Independent random samples that is, the group belong compared are regarded as distinct populations, so samples from such population are said to be independent.
  • The population are normally distributed.
  • The population variances are equal.

 

STEPS FOR A ONE-WAY ANALYSIS OF VARIANCE

  • Analyze- compare means- one way ANOVA.
  • Select one or more dependent variables.
  • Select a single independent factor variable.

Post hoc

If we want to know which groups are significant different from each other, use post hoc and select beneferrom tests?

Which group would you recommend?

Under options, click one mean plots or options >>>descriptive >>>continue.

 

STEPS

  • Compare mean-mean.
  • Select the dependent variable (continuous/ quantitative).
  • Select the independent variable (categorical).
  • Options- select means- continue- ok.

Page 54

Using the GSS 93 subject data,

  • Is there any significant difference in the highest year of school completed (education) by religion.
  • Which religion would you recommend to a respondent and why?
  • Which religions are significant from each other?

STEPS (Page 54)

  • Analyze- compare mean- one way ANOVA.
  • Dependent list (Educ).
  • Factor (relig).
  • Option
  • Select post- hoc- beneferrom.
  • Select options, click mean plots and descriptive.

Interpretation:

Ho: there is no significant mean difference in the highest year of school completed by different religions.

Ha: there is a significant mean difference in the highest year of school completed by different religions.

Page (55)

Since P-value (0.000) <0.005, the null hypothesis is rejected and conclusion made that there is a significant mean difference in the highest year of school complete by different religions.

Answer to Questions 2

This has two approaches, using Descriptive table and means plot.

Regression Analysis

In regression Analysis we consider two sets of variables.

  1. Independent variables.
  2. Dependent variables.

Regression is important is enabling the predictions of the dependent variables gives the values of the independent variables using the formulated regression equation.

SIMPLE LINEAR REGRESSION MODEL

There are only two variables which must be quantitative in nature.

It always takes the form of Y=BO+B1 X  error term/ standard error.

Where Y= dependent variable/ Exogenous variables/ Regressions/ Explanatory.

X= Independent variable/ Exogenous variables

B1 and Bo = coefficients of the regressions.

NOTE:

  • Independent variables (x) are variables that drive or determine other variables or relationships.
  • Dependent variables (Y) are variables that are caused by or influenced by the independent variables.
  • Bo is the intercept when the independent variable is not in place or play or are explicitly zero (when X-0)
  • B1 is the change in Y affected by the change in x.
  • € contains other factors that affected the dependent variables e.g seasonality, the product sold.

STEPS

  • Analyze- Regression- linear.
  • In the linear Regression dialogue box, select a dependent variable highest year of school completed.
  • Select an independent variable age.
  • Interpret the Result.

Page 61

Model summary

The R- Square value = 0.067, this implies 6.7% of the variations in the highest year of school completed can be explained by age of the respondent hence it is a poor fit.

Coefficient

Note that since both variables are measured, the same units standard coefficients.

The coefficient (-0.259) shows that a unit increases age of the respondent.

MULTIPLE LINEAR REGRESSION MODELS

Under this regression, there are more than one independent variables. Both the independent and dependent variables are Quantitative.

Model specification

If one wants to predict a sales person total yearly sales (Dependent variable) from the independent variable such as years of education. (ED), Age and years of experience.

S= f (Ed, A, Ex)

S= Bo+B1 Ed+B2A+B3Ex+£

Example

Use the data below to answer the questions that below.

Consumption Income Price Age
88.957.591.726
88.959.39236
89.16293.125
88.756.390.112
8852.782.336
85.944.476.325
8643.878.314
87.147.884.352
85.452.188.163
88.5588845

 

  1. Regress consumption on income, price and age.
  2. Specify model.
  • Interpret all your result.

Ho 1: consumption does not dependent on income.

Ha 1: consumption depends on income.

Ho 2: consumption does not depend on price.

Ho 3: consumption does not depend on age.

Ha 3: consumption depends on age.

Ha 2: consumption depends on price.

Procedure

  • Regression >> linear.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

RSS
Follow by Email
YouTube
Pinterest
LinkedIn
Share
Instagram
WhatsApp
FbMessenger
Tiktok