regression analysis

CROSS TABULATION:

This helps us in understanding how much of a specific variable has a relationship with the other.

For example

Age	Education	Gender	District
22	Primary	Male	Tororo
22	Secondary	Male	Tororo
24	Tertiary	Female	Iganga
31	Primary	Male	Tororo
26	Primary	Female	Jinja
26	Secondary	Female	Tororo
31	University	Male	Iganga
26	Secondary	Male	Jinja
22	Primary	Male	Jinja

regression analysis

STEPS

Analyze
Descriptive statistics
Cross tabulations
The select variables for now
Select variables for columns

REGRESSION ANALYSIS

Interpretation

The correlation coefficient is -0.259, this implies that there a weak negative correlation between highest year of school completed and age of the respondent. The correlation significant at 1% level of significance since the P-value (0.000) < or thus the null hypothesis is rejected and conclusion made there is significant relationship between highest year of school complete and age of the respondent.

Spearman- deals with ranked.
Kendal’s- categorical variables of some order such as Education level.

Example

Using the

If the confidence internal does not include the hypothesized value the population parameter, the null hypothesis is rejected otherwise it is accepted.

Chi-square test

It is a test of dependence or association between two variables which must be categorical such as marital status, education level, and religion e.t.c.

Example

Does religion of the respondent depend on marital status.

Procedure

Analyze >> descriptive statistics >> cross tabs

Select one variable for arrow and another for a column.

Click statistics >>………..square >> continue

Cells >>> row and column percent- age >> continue

Press ok

NOTE:

First state the Hypothesis

Ho: religion of the respondent does not depend on marital status.

Ha: Religion of the respondent depends on marital.

SAMPLE T-TESTS

This is used for testing means.

SAMPLE TESTS IN SPSS

One sample t-tests.
Paired sample t-test.
Independent sample t-test.
ANOVA test.

Please always remember that;

One sample t-test is used to compare the mean of one variable from a target value.
Paired sample t-test is used to compared the mean of two variables for a single group.
Independent sample t-test is used to compare means of two distinct groups of cases e.g alive or dead, on off, men or women e.t.c.
ANOVA is used for testing several means.

One sample t-test

One sample t-test is performed when you want to determine if the mean value of a target variable is different from any pothesized value.

Examples

A researcher might want to test whether the average age to respondent differs from 52.
A researcher might want to test whether the average marked students differs from 75.

Assumptions for the one sample t-test

The dependent variable is normally distributed with in the population.
The data are independent.

SPSS

Example 1

A study on the physical strength measured in Kilograms on 7 subjects before and after a specified training period gave the following results.

Subject	Before	After	diff
1	100	115
2	110	125
3	90	105
4	110	130
5	125	140
6	130	140
7	105	125

Is there a difference in the physical strength before and after a specified training period?

State the hypothesis.
Use t-test to show that there is no mean difference in the physical strength before and after a specified training period.

Solution

First compute a new variable diff in the physical strength before and after a specified training period?

State the hypothesis.
Use t-test to show that there is no mean difference in the physical strength before and after a specified training period.

Solution

First compute a new variable diff-the difference between the value and the before value.

STEPS

Transform- compute.
For target variable type diff for numeric, expression type and after-before.
Click ok.
Analyze-compare means- one sample test.
Select diff as the test variables and test value to be 0.
Click on option and put 95%.
Under missing value select “exclude cases analysis by analysis”
Continue

Interpretation of the results

Ho: there is no significant mean difference in physical strength before and after a specified training period.

Ha: there is a significant mean difference in physical strength before and after a specified training period.

Since the P-value (0.000)<0.05, the null hypothesis is rejected implying that there is a significant mean difference in physical strength before and after a specified training period.

THE PAIRED T-TEST

The paired sample t-test produce compares the means of two variables for a single group.
It computes the difference between values of the two variables for each case and tests whether the average differs from 0.
It is usually used in the mate….. pairs or case- control study.

STEPS

Analyze – compare means- sample t-test.
Select a pair of variables, as follows.
Click each of two variables.

The first variables appear.

In the current selection group as variable 1 and the second appears as variable 2.

After you have selected a pair of variables.
Click the arrow button to move the pair into the paired variables lists click ok.
You may select move pairs of variables.
To move a pair of variables from the analysis.
Select a pair in the paired variable list and click the arrow button.
Click options to control treatment of missing data and the level of the confidence interval.

DATA ANALYSIS

DATA ANALYSIS

Independent sample t-test

This is used for testing means of a variable which has two disadvantaged groups of cases e.g is there a significant difference in income between the male and female respondent?

Assumptions for the independent sample t-test

The variables of the dependent variable in the two populations are equal.
The dependent variable is normally distributed with in the population.
The data are independent (scores of one participant are not dependent on scores of others).

The independent sample t-test procedure

It compares means for groups of cases.
The subjects should be randomly assigned to two groups so that any difference in the response is due to the treatment or lack of treatment but not to other factors.
Always ensure that the difference in other factors are not making or enhancing a significance difference in mean.

Example

The researcher is interested to see if in the population men and women have the same scores in a test.
If there is a difference in the highest year of school completed between the males and females.

STEPS

Analyze- compare means- independent- sample t-test.
Select one or more quantitative test variables.
Select a single grouping variable.
Click defines groups to specify two codes for the groups you want to compare.
Click options to control the treatment of missing data the level of the confidence.

Example

Page 50

Procedure

Analyze- compare means- independent- sample t-test.
Select highest year of school completed for test variable.
Select sex for grouping variable.
Click on define groups- use specified values, put 1 for group 1 and 2 for group 2. This is because 1 stands for male 2 stands for female.

The results show sets of test statistics

Equal variance assumed.
Equal variance not assumed.
If the F-statistics is significant (null is rejected) we used of equal variance not assumed for interpreting the t-test.
If the F-statistics is not significant (null is accept) we use the row oof equal variance assumed for interpreting the t-test.

NOTE

Lerene’s test helps to determine which row to use make a decision of accepting or rejecting the null hypothesis.

Results are interpreted as below

Ho: there is no significant mean difference in the highest year of school completed between the male and female respondents.

Ha: there is a significant mean difference in the highest year of school completed between the male and female respondents.

Page 51

Put a diagram down.

Page 52

Analysis of variables (ANOVA)

One way Analysis of variance

The one way ANOVA procedure produces a one way analysis of variance for a quantitative dependent variable by a single factor (independent) variable.
Analysis of variance is used to test the hypothesis that several means are equal.
This technique is an extension of the two sample t-test.
In addition to determining that differences exist among the means, you may want to know which means differ.
Here we can use post hoc test which are run after the experiment has been conducted.

REGRESSSION ANALYSIS

LINEAR REGRESSION ANALYSIS USING SPSS STATISTICS

Introduction

Linear regression is the next step up after correlation. It is used when we want to predict the value of a variable based on the value of another variable. The variable we want to predict is called the dependent variable (or sometimes, the outcome variable). The variable we are using to predict the other variable’s value is called the independent variable (or sometimes, the predictor variable). For example, you could use linear regression to understand whether exam performance can be predicted based on revision time; whether cigarette consumption can be predicted based on smoking duration; and so forth. If you have two or more independent variables, rather than just one, you need to use multiple regression analysis.

This “quick start” guide shows you how to carry out linear regression using SPSS Statistics, as well as interpret and report the results from this test. However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for linear regression to give you a valid result. We discuss these assumptions next.

SPSS Statistics

Assumptions

When you choose to analyse your data using linear regression, part of the process involves checking to make sure that the data you want to analyse can actually be analysed using linear regression. You need to do this because it is only appropriate to use linear regression if your data “passes” seven assumptions that are required for linear regression to give you a valid result. In practice, checking for these seven assumptions just adds a little bit more time to your analysis, requiring you to click a few more buttons in SPSS Statistics when performing your analysis, as well as think a little bit more about your data, but it is not a difficult task.

Before we introduce you to these seven assumptions, do not be surprised if, when analysing your own data using SPSS Statistics, one or more of these assumptions is violated (i.e., not met). This is not uncommon when working with real-world data rather than textbook examples, which often only show you how to carry out linear regression when everything goes well! However, don’t worry. Even when your data fails certain assumptions, there is often a solution to overcome this. First, let’s take a look at these seven assumptions:

Assumption one: Your dependent variable should be measured at the continuous level (i.e., it is either an interval or ratio variable). Examples of continuous variables include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth.

Assumption two: Your independent variable should also be measured at the continuous level (i.e., it is either an interval or ratio variable). See the bullet above for examples of continuous variables.

Assumption three: There needs to be a linear relationship between the two variables. Whilst there are a number of ways to check whether a linear relationship exists between your two variables, we suggest creating a scatterplot using SPSS Statistics where you can plot the dependent variable against your independent variable and then visually inspect the scatterplot to check for linearity. Your scatterplot may look something like one of the following:

If the relationship displayed in your scatterplot is not linear, you will have to either run a non-linear regression analysis, perform a polynomial regression or “transform” your data, which you can do using SPSS Statistics. In our enhanced guides, we show you how to:

a) Create a scatterplot to check for linearity when carrying out linear regression using SPSS Statistics;

(b) Interpret different scatterplot results; and

Assumption Four: There should be no significant outliers. An outlier is an observed data point that has a dependent variable value that is very different to the value predicted by the regression equation. As such, an outlier will be a point on a scatterplot that is (vertically) far away from the regression line indicating that it has a large residual, as highlighted below:

The problem with outliers is that they can have a negative effect on the regression analysis (e.g., reduce the fit of the regression equation) that is used to predict the value of the dependent (outcome) variable based on the independent (predictor) variable. This will change the output that SPSS Statistics produces and reduce the predictive accuracy of your results. Fortunately, when using SPSS Statistics to run a linear regression on your data, you can easily include criteria to help you detect possible outliers. In our enhanced linear regression guide, we: (a) show you how to detect outliers using “casewise diagnostics”, which is a simple process when using SPSS Statistics; and (b) discuss some of the options you have in order to deal with outliers.

Assumption FIVE:You should have independence of observations, which you can easily check using the Durbin-Watson statistic, which is a simple test to run using SPSS Statistics. We explain how to interpret the result of the Durbin-Watson statistic in our enhanced linear regression guide.
Assumption SIX:Your data needs to show homoscedasticity, which is where the variances along the line of best fit remain similar as you move along the line. Whilst we explain more about what this means and how to assess the homoscedasticity of your data in our enhanced linear regression guide, take a look at the three scatterplots below, which provide three simple examples: two of data that fail the assumption (called heteroscedasticity) and one of data that meets this assumption (called homoscedasticity):

Whilst these helps to illustrate the differences in data that meets or violates the assumption of homoscedasticity, real-world data can be a lot more messy and illustrate different patterns of heteroscedasticity. Therefore, in our enhanced linear regression guide, we explain: (a) some of the things you will need to consider when interpreting your data; and (b) possible ways to continue with your analysis if your data fails to meet this assumption.

Assumption SEVEN:Finally, you need to check that the residuals (errors) of the regression line are approximately normally distributed (we explain these terms in our enhanced linear regression guide). Two common methods to check this assumption include using either a histogram (with a superimposed normal curve) or a Normal P-P Plot. Again, in our enhanced linear regression guide, we: (a) show you how to check this assumption using SPSS Statistics, whether you use a histogram (with superimposed normal curve) or Normal P-P Plot; (b) explain how to interpret these diagrams; and (c) provide a possible solution if your data fails to meet this assumption.

REGRESSION ANALYSIS

Example

A salesperson for a large car brand wants to determine whether there is a relationship between an individual’s income and the price they pay for a car. As such, the individual’s “income” is the independent variable and the “price” they pay for a car is the dependent variable. The salesperson wants to use this information to determine which cars to offer potential customers in new areas where average income is known.

SPSS Statistics

Setup in SPSS Statistics

In SPSS Statistics, we created two variables so that we could enter our data: Income (the independent variable), and Price (the dependent variable). It can also be useful to create a third variable, caseno, to act as a chronological case number. This third variable is used to make it easy for you to eliminate cases (e.g., significant outliers) that you have identified when checking for assumptions. However, we do not include it in the SPSS Statistics procedure that follows because we assume that you have already checked these assumptions. In our enhanced linear regression guide, we show you how to correctly enter data in SPSS Statistics to run a linear regression when you are also checking for assumptions. You can learn about our enhanced data setup content on our Features: Data Setup page. Alternately, see our generic, “quick start” guide: Entering Data in SPSS Statistics.

SPSS Statistics

Test Procedure in SPSS Statistics

The five steps below show you how to analyse your data using linear regression in SPSS Statistics when none of the seven assumptions in the previous section, Assumptions, have been violated. At the end of these four steps, we show you how to interpret the results from your linear regression. If you are looking for help to make sure your data meets assumptions #3, #4, #5, #6 and #7, which are required when using linear regression and can be tested using SPSS Statistics, you can learn more about our enhanced guides on our Features: Overview page.

Note: The procedure that follows is identical for SPSS Statistics versions 18 to 28, as well as the subscription version of SPSS Statistics, with version 28 and the subscription version being the latest versions of SPSS Statistics. However, in version 27 and the subscription version, SPSS Statistics introduced a new look to their interface called “SPSS Light”, replacing the previous look for versions 26 and earlier versions, which was called “SPSS Standard”. Therefore, if you have SPSS Statistics versions 27 or 28 (or the subscription version of SPSS Statistics), the images that follow will be light grey rather than blue. However, the procedure is identical.

Click Analyze > Regression > L..on the top menu, as shown below: