correlation analysis

CORRELATION ANALYSIS

This is a measure of relationship between two variables.

It tells us how strong the correlation between the two variables.

The relationship could be negative (-) or positive (+) if the correlation coefficient (p) =1, then there is perfect positive correlation between the variables and if it is = -1 then there is perfect negative correlation between variables.

If p>0.5, there is a strong relationship between the variables.

If p= 0.5, the relationship is moderate.

If p<0.5, there is weak relationship between the variables.

If p<0, then the relationship is very weak.

NOTE

In correlation analysis, we analyze the strength, direction and significance of relationship.

DIRECTION

If the correlation coefficient is negative, it implies the two variables are moving in the different directions, as one variables increases another one decreases. If the correlation coefficient positive, it implies that the two variables are moving in the same direction, as one variable increases, another variable also increases.

Significance of the relationship

If the P-value is less than the level of significance such as 0.05, and 0.01, then the relationship is significance otherwise it is insignificant.

Testing for correlation

  1. Graphical approach Ascalter plot is used.

The scalter plot illustrates relationship between the variables which can be positive, negative or non-existing.

  • Graphs legacy Dialogs_ scalter_ simple define
  • Select the Y and X-axis variables
  • Press ok
  • To add the line of best fit, double click in the plot and click on add a reference line from equation.

The following are scalter plots for visual interpretations of types of correlations

Example

Using the data

       Y       X
2441.13776.3
2476.93843.1
2503.73760.3
2619.43906.6
2746.14148.5
2865.84279.8
2969.14404.5
3052.24539.9
3162.44718.6
3223.34838
3260.44877.5
3240.84821

 

Is there any relationship between the variables?

Statistical tests

Reason correlation coefficient

This is used for qualitative variable such as age and income.

For example

Is there any significant correlation between age and income of the respondent.

The Hypotheses are stated as follows

Ho: There is no significant correlation between age and income of the respondent.

Ha: There is a significant correlation between age and income of the respondent.

STEPS

  • Analyze _ correlate_ bivariate.
  • Select the variables from the LH box into the RH box.
  • Select flag significant correlations.
  • Select type of correlation coefficient person.
  • Press ok.
  • Interpret the result.

Example

Interpretation

The correlation coefficient is -0.259, this implies that there a weak negative correlation between highest year of school completed and age of the respondent. The correlation significant at 1% level of significance since the P-value (0.000) < or thus the null hypothesis is rejected and conclusion made there is significant relationship between highest year of school complete and age of the respondent.

  1. Spearman- deals with ranked.
  2. Kendal’s- categorical variables of some order such as Education level.

Example

Using the

If the confidence internal does not include the hypothesized value the population parameter, the null hypothesis is rejected otherwise it is accepted.

 

 

Chi-square test

It is a test of dependence or association between two variables which must be categorical such as marital status, education level, and religion e.t.c.

Example

Does religion of the respondent depend on marital status.

Procedure

Analyze >> descriptive statistics >> cross tabs

Select one variable for arrow and another for a column.

Click statistics >>………..square >> continue

Cells >>> row and column percent- age >> continue

Press ok

NOTE:

First state the Hypothesis

Ho: religion of the respondent does not depend on marital status.

Ha: Religion of the respondent depends on marital.

one-tailed test or a two-tailed test

Should you use a one-tailed test or a two-tailed test for your data analysis?

Quantitative Methodology

Quantitative Results

When creating your data analysis plan or working on your results, you may have to decide if your statistical test should be a one-tailed test or a two-tailed test (also known as “directional” and “non-directional” tests respectively). So, what exactly is the difference between the two? First, it may be helpful to know what the term “tail” means in this context.

The tail refers to the end of the distribution of the test statistic for the particular analysis that you are conducting. For example, a t-test uses the t distribution, and an analysis of variance (ANOVA) uses the F distribution. The distribution of the test statistic can have one or two tails depending on its shape (see the figure below). The black-shaded areas of the distributions in the figure are the tails. Symmetrical distributions like the t and z distributions have two tails. Asymmetrical distributions like the F and chi-square distributions have only one tail. This means that analyses such as ANOVA and chi-square tests do not have a “one-tailed vs. two-tailed” option, because the distributions they are based on have only one tail

SAMPLE T-TESTS

This is used for testing means.

SAMPLE TESTS IN SPSS

  • One sample t-tests.
  • Paired sample t-test.
  • Independent sample t-test.
  • ANOVA test.

Please always remember that;

  • One sample t-test is used to compare the mean of one variable from a target value.
  • Paired sample t-test is used to compared the mean of two variables for a single group.
  • Independent sample t-test is used to compare means of two distinct groups of cases e.g alive or dead, on off, men or women e.t.c.
  • ANOVA is used for testing several means.

One sample t-test

One sample t-test is performed when you want to determine if the mean value of a target variable is different from any pothesized value.

Examples

  • A researcher might want to test whether the average age to respondent differs from 52.
  • A researcher might want to test whether the average marked students differs from 75.

Assumptions for the one sample t-test

  • The dependent variable is normally distributed with in the population.
  • The data are independent.

 

Example 1

A study on the physical strength measured in Kilograms on 7 subjects before and after a specified training period gave the following results.

Subject Before After diff
1100115 
2110125 
390105 
4110130 
5125140 
6130140 
7105125 

 

Is there a difference in the physical strength before and after a specified training period?

  • State the hypothesis.
  • Use t-test to show that there is no mean difference in the physical strength before and after a specified training period.

Solution

First compute a new variable diff in the physical strength before and after a specified training period?

  • State the hypothesis.
  • Use t-test to show that there is no mean difference in the physical strength before and after a specified training period.

Solution

First compute a new variable diff-the difference between the value and the before value.

STEPS

  • Transform- compute.
  • For target variable type diff for numeric, expression type and after-before.
  • Click ok.
  • Analyze-compare means- one sample test.
  • Select diff as the test variables and test value to be 0.
  • Click on option and put 95%.
  • Under missing value select “exclude cases analysis by analysis”
  • Continue

Interpretation of the results

Ho: there is no significant mean difference in physical strength before and after a specified training period.

Ha: there is a significant mean difference in physical strength before and after a specified training period.

Since the P-value (0.000)<0.05, the null hypothesis is rejected implying that there is a significant mean difference in physical strength before and after a specified training period.

THE PAIRED T-TEST

  • The paired sample t-test produce compares the means of two variables for a single group.
  • It computes the difference between values of the two variables for each case and tests whether the average differs from 0.
  • It is usually used in the mate….. pairs or case- control study.

STEPS

  • Analyze – compare means- sample t-test.
  • Select a pair of variables, as follows.
  • Click each of two variables.

The first variables appear.

In the current selection group as variable 1 and the second appears as variable 2.

  • After you have selected a pair of variables.
  • Click the arrow button to move the pair into the paired variables lists click ok.
  • You may select move pairs of variables.
  • To move a pair of variables from the analysis.
  • Select a pair in the paired variable list and click the arrow button.
  • Click options to control treatment of missing data and the level of the confidence interval.

 

Example (page 48)

Independent sample t-test

This is used for testing means of a variable which has two disadvantaged groups of cases e.g is there a significant difference in income between the male and female respondent?

Assumptions for the independent sample t-test

  • The variables of the dependent variable in the two populations are equal.
  • The dependent variable is normally distributed with in the population.
  • The data are independent (scores of one participant are not dependent on scores of others).

The independent sample t-test procedure

  • It compares means for groups of cases.
  • The subjects should be randomly assigned to two groups so that any difference in the response is due to the treatment or lack of treatment but not to other factors.
  • Always ensure that the difference in other factors are not making or enhancing a significance difference in mean.

Example

  • The researcher is interested to see if in the population men and women have the same scores in a test.
  • If there is a difference in the highest year of school completed between the males and females.

STEPS

  • Analyze- compare means- independent- sample t-test.
  • Select one or more quantitative test variables.
  • Select a single grouping variable.
  • Click defines groups to specify two codes for the groups you want to compare.
  • Click options to control the treatment of missing data the level of the confidence.

Example

Page 50

Procedure

  • Analyze- compare means- independent- sample t-test.
  • Select highest year of school completed for test variable.
  • Select sex for grouping variable.
  • Click on define groups- use specified values, put 1 for group 1 and 2 for group 2. This is because 1 stands for male 2 stands for female.

The results show sets of test statistics

  • Equal variance assumed.
  • Equal variance not assumed.
  • If the F-statistics is significant (null is rejected) we used of equal variance not assumed for interpreting the t-test.
  • If the F-statistics is not significant (null is accept) we use the row oof equal variance assumed for interpreting the t-test.

NOTE

Lerene’s test helps to determine which row to use make a decision of accepting or rejecting the null hypothesis.

Results are interpreted as below

Ho: there is no significant mean difference in the highest year of school completed between the male and female respondents.

Ha: there is a significant mean difference in the highest year of school completed between the male and female respondents.

Page 51

Put a diagram down.

Page 52

Analysis of variables (ANOVA)

One way Analysis of variance

  • The one way ANOVA procedure produces a one way analysis of variance for a quantitative dependent variable by a single factor (independent) variable.
  • Analysis of variance is used to test the hypothesis that several means are equal.
  • This technique is an extension of the two sample t-test.
  • In addition to determining that differences exist among the means, you may want to know which means differ.
  • Here we can use post hoc test which are run after the experiment has been conducted.

Assumption (page 53)

  • Independent random samples that is, the group belong compared are regarded as distinct populations, so samples from such population are said to be independent.
  • The population are normally distributed.
  • The population variances are equal.

 

STEPS FOR A ONE-WAY ANALYSIS OF VARIANCE

  • Analyze- compare means- one way ANOVA.
  • Select one or more dependent variables.
  • Select a single independent factor variable.

Post hoc

If we want to know which groups are significant different from each other, use post hoc and select beneferrom tests?

Which group would you recommend?

Under options, click one mean plots or options >>>descriptive >>>continue.

 

STEPS

  • Compare mean-mean.
  • Select the dependent variable (continuous/ quantitative).
  • Select the independent variable (categorical).
  • Options- select means- continue- ok.

Page 54

Using the GSS 93 subject data,

  • Is there any significant difference in the highest year of school completed (education) by religion.
  • Which religion would you recommend to a respondent and why?
  • Which religions are significant from each other?

STEPS (Page 54)

  • Analyze- compare mean- one way ANOVA.
  • Dependent list (Educ).
  • Factor (relig).
  • Option
  • Select post- hoc- beneferrom.
  • Select options, click mean plots and descriptive.

Interpretation:

Ho: there is no significant mean difference in the highest year of school completed by different religions.

Ha: there is a significant mean difference in the highest year of school completed by different religions.

Page (55)

Since P-value (0.000) <0.005, the null hypothesis is rejected and conclusion made that there is a significant mean difference in the highest year of school complete by different religions.

Answer to Questions 2

This has two approaches, using Descriptive table and means plot.

Regression Analysis

In regression Analysis we consider two sets of variables.

  1. Independent variables.
  2. Dependent variables.

Regression is important is enabling the predictions of the dependent variables gives the values of the independent variables using the formulated regression equation.

SIMPLE LINEAR REGRESSION MODEL

There are only two variables which must be quantitative in nature.

It always takes the form of Y=BO+B1 X  error term/ standard error.

Where Y= dependent variable/ Exogenous variables/ Regressions/ Explanatory.

X= Independent variable/ Exogenous variables

B1 and Bo = coefficients of the regressions.

NOTE:

  • Independent variables (x) are variables that drive or determine other variables or relationships.
  • Dependent variables (Y) are variables that are caused by or influenced by the independent variables.
  • Bo is the intercept when the independent variable is not in place or play or are explicitly zero (when X-0)
  • B1 is the change in Y affected by the change in x.
  • € contains other factors that affected the dependent variables e.g seasonality, the product sold.

STEPS

  • Analyze- Regression- linear.
  • In the linear Regression dialogue box, select a dependent variable highest year of school completed.
  • Select an independent variable age.
  • Interpret the Result.

Page 61

Model summary

The R- Square value = 0.067, this implies 6.7% of the variations in the highest year of school completed can be explained by age of the respondent hence it is a poor fit.

Coefficient

Note that since both variables are measured, the same units standard coefficients.

The coefficient (-0.259) shows that a unit increases age of the respondent.

MULTIPLE LINEAR REGRESSION MODELS

Under this regression, there are more than one independent variables. Both the independent and dependent variables are Quantitative.

Model specification

If one wants to predict a sales person total yearly sales (Dependent variable) from the independent variable such as years of education. (ED), Age and years of experience.

S= f (Ed, A, Ex)

S= Bo+B1 Ed+B2A+B3Ex+£

Example

Use the data below to answer the questions that below.

Consumption Income Price Age
88.957.591.726
88.959.39236
89.16293.125
88.756.390.112
8852.782.336
85.944.476.325
8643.878.314
87.147.884.352
85.452.188.163
88.5588845

 

  1. Regress consumption on income, price and age.
  2. Specify model.
  • Interpret all your result.

Ho 1: consumption does not dependent on income.

Ha 1: consumption depends on income.

Ho 2: consumption does not depend on price.

Ho 3: consumption does not depend on age.

Ha 3: consumption depends on age.

Ha 2: consumption depends on price.

Procedure

  • Regression >> linear.

Leave a Reply

Your email address will not be published. Required fields are marked *

RSS
Follow by Email
YouTube
Pinterest
LinkedIn
Share
Instagram
WhatsApp
FbMessenger
Tiktok