research consultancy

research consultancy

REGRESSION ANALYSIS

 

HOW TO PERFORM REGRESSION IN SPSS

LINEAR REGRESSION ANALYSIS USING SPSS STATISTICS

Introduction

Linear regression is the next step up after correlation. It is used when we want to predict the value of a variable based on the value of another variable. The variable we want to predict is called the dependent variable (or sometimes, the outcome variable). The variable we are using to predict the other variable’s value is called the independent variable (or sometimes, the predictor variable). For example, you could use linear regression to understand whether exam performance can be predicted based on revision time; whether cigarette consumption can be predicted based on smoking duration; and so forth. If you have two or more independent variables, rather than just one, you need to use multiple regression analysis.

This “quick start” guide shows you how to carry out linear regression using SPSS Statistics, as well as interpret and report the results from this test. However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for linear regression to give you a valid result. We discuss these assumptions next.

SPSS Statistics

Assumptions

When you choose to analyse your data using linear regression, part of the process involves checking to make sure that the data you want to analyse can actually be analysed using linear regression. You need to do this because it is only appropriate to use linear regression if your data “passes” seven assumptions that are required for linear regression to give you a valid result. In practice, checking for these seven assumptions just adds a little bit more time to your analysis, requiring you to click a few more buttons in SPSS Statistics when performing your analysis, as well as think a little bit more about your data, but it is not a difficult task.

Before we introduce you to these seven assumptions, do not be surprised if, when analysing your own data using SPSS Statistics, one or more of these assumptions is violated (i.e., not met). This is not uncommon when working with real-world data rather than textbook examples, which often only show you how to carry out linear regression when everything goes well! However, don’t worry. Even when your data fails certain assumptions, there is often a solution to overcome this. First, let’s take a look at these seven assumptions:

Assumption one: Your dependent variable should be measured at the continuous level (i.e., it is either an interval or ratio variable). Examples of continuous variables include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth.

Assumption two: Your independent variable should also be measured at the continuous level (i.e., it is either an interval or ratio variable). See the bullet above for examples of continuous variables.

Assumption three: There needs to be a linear relationship between the two variables. Whilst there are a number of ways to check whether a linear relationship exists between your two variables, we suggest creating a scatterplot using SPSS Statistics where you can plot the dependent variable against your independent variable and then visually inspect the scatterplot to check for linearity. Your scatterplot may look something like one of the following:

If the relationship displayed in your scatterplot is not linear, you will have to either run a non-linear regression analysis, perform a polynomial regression or “transform” your data, which you can do using SPSS Statistics. In our enhanced guides, we show you how to:

  1. a) Create a scatterplot to check for linearity when carrying out linear regression using SPSS Statistics;

(b) Interpret different scatterplot results; and

(c) Transform your data using SPSS Statistics if there is not a linear relationship between your two variables.

Assumption Four: There should be no significant outliers. An outlier is an observed data point that has a dependent variable value that is very different to the value predicted by the regression equation. As such, an outlier will be a point on a scatterplot that is (vertically) far away from the regression line indicating that it has a large residual, as highlighted below:

The problem with outliers is that they can have a negative effect on the regression analysis (e.g., reduce the fit of the regression equation) that is used to predict the value of the dependent (outcome) variable based on the independent (predictor) variable. This will change the output that SPSS Statistics produces and reduce the predictive accuracy of your results. Fortunately, when using SPSS Statistics to run a linear regression on your data, you can easily include criteria to help you detect possible outliers. In our enhanced linear regression guide, we: (a) show you how to detect outliers using “casewise diagnostics”, which is a simple process when using SPSS Statistics; and (b) discuss some of the options you have in order to deal with outliers.

  • Assumption FIVE:You should have independence of observations, which you can easily check using the Durbin-Watson statistic, which is a simple test to run using SPSS Statistics. We explain how to interpret the result of the Durbin-Watson statistic in our enhanced linear regression guide.
  • Assumption SIX:Your data needs to show homoscedasticity, which is where the variances along the line of best fit remain similar as you move along the line. Whilst we explain more about what this means and how to assess the homoscedasticity of your data in our enhanced linear regression guide, take a look at the three scatterplots below, which provide three simple examples: two of data that fail the assumption (called heteroscedasticity) and one of data that meets this assumption (called homoscedasticity):

Whilst these helps to illustrate the differences in data that meets or violates the assumption of homoscedasticity, real-world data can be a lot more messy and illustrate different patterns of heteroscedasticity. Therefore, in our enhanced linear regression guide, we explain: (a) some of the things you will need to consider when interpreting your data; and (b) possible ways to continue with your analysis if your data fails to meet this assumption.

  • Assumption SEVEN:Finally, you need to check that the residuals (errors) of the regression line are approximately normally distributed (we explain these terms in our enhanced linear regression guide). Two common methods to check this assumption include using either a histogram (with a superimposed normal curve) or a Normal P-P Plot. Again, in our enhanced linear regression guide, we: (a) show you how to check this assumption using SPSS Statistics, whether you use a histogram (with superimposed normal curve) or Normal P-P Plot; (b) explain how to interpret these diagrams; and (c) provide a possible solution if your data fails to meet this assumption.

 

EXAMPLES

SPSS Statistics

Example

A salesperson for a large car brand wants to determine whether there is a relationship between an individual’s income and the price they pay for a car. As such, the individual’s “income” is the independent variable and the “price” they pay for a car is the dependent variable. The salesperson wants to use this information to determine which cars to offer potential customers in new areas where average income is known.

SPSS Statistics

Setup in SPSS Statistics

In SPSS Statistics, we created two variables so that we could enter our data: Income (the independent variable), and Price (the dependent variable). It can also be useful to create a third variable, caseno, to act as a chronological case number. This third variable is used to make it easy for you to eliminate cases (e.g., significant outliers) that you have identified when checking for assumptions. However, we do not include it in the SPSS Statistics procedure that follows because we assume that you have already checked these assumptions. In our enhanced linear regression guide, we show you how to correctly enter data in SPSS Statistics to run a linear regression when you are also checking for assumptions.

SPSS Statistics\Test Procedure in SPSS Statistics

The five steps below show you how to analyse your data using linear regression in SPSS Statistics when none of the seven assumptions in the previous section, Assumptions, have been violated. At the end of these four steps, we show you how to interpret the results from your linear regression. If you are looking for help to make sure your data meets assumptions #3, #4, #5, #6 and #7, which are required when using linear regression and can be tested using SPSS Statistics, you can learn more about our enhanced guides on our Features: Overview page.

Note: The procedure that follows is identical for SPSS Statistics versions 18 to 28, as well as the subscription version of SPSS Statistics, with version 28 and the subscription version being the latest versions of SPSS Statistics. However, in version 27 and the subscription version, SPSS Statistics introduced a new look to their interface called “SPSS Light”, replacing the previous look for versions 26 and earlier versions, which was called “SPSS Standard”. Therefore, if you have SPSS Statistics versions 27 or 28 (or the subscription version of SPSS Statistics), the images that follow will be light grey rather than blue. However, the procedure is identical.

  1. Click Analyze > Regression > L..on the top menu, as shown below:

Published with written permission from SPSS Statistics, IBM Corporation.

You will be presented with the Linear Regression dialogue box:

Published with written permission from SPSS Statistics, IBM Corporation.

  1. Transfer the independent variable, Income, into the Independent(s):box and the dependent variable, Price, into the Dependent: You can do this by either drag-and-dropping the variables or by using the appropriate   buttons. You will end up with the following screen:

Published with written permission from SPSS Statistics, IBM Corporation.

  1. You now need to check four of the assumptions discussed in the Assumptionssection above: no significant outliers (assumption #3); independence of observations (assumption #4); homoscedasticity (assumption #5); and normal distribution of errors/residuals (assumptions #6). You can do this by using the and features, and then selecting the appropriate options within these two dialogue boxes. In our enhanced linear regression guide, we show you which options to select in order to test whether your data meets these four assumptions.
  2. Click on the button. This will generate the results.

HOW TO INTERPRETE REGRESSION RESULTS

 

THE OUT PUT OF SPSS ON REGRESSION ANALYSIS

Adjusted R-squared

Adjusted R-squared is a statistical measure that is closely related to the more commonly known R-squared (R²) value in the context of linear regression analysis. While R-squared measures the proportion of the variance in the dependent variable (the variable being predicted) that is explained by the independent variables (the predictors) in a regression model, Adjusted R-squared takes into account the number of independent variables used in the model, providing a more conservative and useful assessment of model fit.

R-squared (R²): R-squared is a measure of how well the independent variables in a regression model explain the variability in the dependent variable. It ranges from 0 to 1, with higher values indicating a better fit. Specifically, R-squared represents the proportion of the total variation in the dependent variable that is explained by the model. However, as you add more independent variables to the model, R-squared tends to increase, even if the additional variables do not significantly improve the model’s predictive power. This can lead to overfitting, where the model fits the training data extremely well but may not generalize well to new data. Adjusted R-squared: Adjusted R-squared addresses the issue of overfitting by penalizing the inclusion of unnecessary independent variables in the model. It takes into account the number of predictors in the model and adjusts R-squared accordingly.

The formula for Adjusted R-squared is; Adjusted R² = 1 – [(1 – R²) * (n – 1) / (n – k – 1)]

R² is the regular R-squared.

n is the number of data points (observations).

k is the number of independent variables in the model.

Adjusted R-squared will always be lower than R-squared when you have multiple independent variables, and it tends to decrease as you add irrelevant or redundant variables to the model. It provides a more realistic assessment of the model’s fit by accounting for model complexity.

 

In summary, while R-squared tells you how well your regression model fits the data, Adjusted R-squared helps you determine whether the improvement in model fit achieved by adding more independent variables is justified by the increased complexity. It is a useful tool for model selection and comparison, as it encourages the use of simpler models that explain the data adequately without unnecessary complexity.

 

 

 

 

 

 

 

SPSS Statistics will generate quite a few tables of output for a linear regression. In this section, we show you only the three main tables required to understand your results from the linear regression procedure, assuming that no assumptions have been violated. A complete explanation of the output you have to interpret when checking your data for the six assumptions required to carry out linear regression is provided in our enhanced guide. This includes relevant scatterplots, histogram (with superimposed normal curve), Normal P-P Plot, casewise diagnostics and the Durbin-Watson statistic. Below, we focus on the results for the linear regression analysis only.

The first table of interest is the Model Summary table, as shown below:

Published with written permission from SPSS Statistics, IBM Corporation.

This table provides the R and R2 values. The R value represents the simple correlation and is 0.873 (the “R” Column), which indicates a high degree of correlation. The R2 value (the “R Square” column) indicates how much of the total variation in the dependent variable, Price, can be explained by the independent variable, Income. In this case, 76.2% can be explained, which is very large.

The next table is the ANOVA table, which reports how well the regression equation fits the data (i.e., predicts the dependent variable) and is shown below:

Published with written permission from SPSS Statistics, IBM Corporation.

This table indicates that the regression model predicts the dependent variable significantly well. How do we know this? Look at the “Regression” row and go to the “Sig.” column. This indicates the statistical significance of the regression model that was run. Here, p < 0.0005, which is less than 0.05, and indicates that, overall, the regression model statistically significantly predicts the outcome variable (i.e., it is a good fit for the data).

The Coefficients table provides us with the necessary information to predict price from income, as well as determine whether income contributes statistically significantly to the model (by looking at the “Sig.” column). Furthermore, we can use the values in the “B” column under the “Unstandardized Coefficients” column, as shown below:

Published with written permission from SPSS Statistics, IBM Corporation.

to present the regression equation as:

Price = 8287 + 0.564(Income)

 

CORRELATION ANALYSIS

Pearson Correlation

The bivariate Pearson Correlation produces a sample correlation coefficient, r, which measures the strength and direction of linear relationships between pairs of continuous variables. By extension, the Pearson Correlation evaluates whether there is statistical evidence for a linear relationship among the same pairs of variables in the population, represented by a population correlation coefficient, ρ (“rho”). The Pearson Correlation is a parametric measure.

This measure is also known as:

  • Pearson’s correlation
  • Pearson product-moment correlation (PPMC)

Common Uses

The bivariate Pearson Correlation is commonly used to measure the following:

  • Correlations among pairs of variables
  • Correlations within and between sets of variables

The bivariate Pearson correlation indicates the following:

  • Whether a statistically significant linear relationship exists between two continuous variables
  • The strength of a linear relationship (i.e., how close the relationship is to being a perfectly straight line)
  • The direction of a linear relationship (increasing or decreasing)

Note: The bivariate Pearson Correlation cannot address non-linear relationships or relationships among categorical variables. If you wish to understand relationships that involve categorical variables and/or non-linear relationships, you will need to choose another measure of association.

Note: The bivariate Pearson Correlation only reveals associations among continuous variables. The bivariate Pearson Correlation does not provide any inferences about causation, no matter how large the correlation coefficient is.

Data Requirements

To use Pearson correlation, your data must meet the following requirements:

  1. Two or more continuous variables (i.e., interval or ratio level)
  2. Cases must have non-missing values on both variables
  3. Linear relationship between the variables
  4. Independent cases (i.e., independence of observations)
    • There is no relationship between the values of variables between cases. This means that:
      • the values for all variables across cases are unrelated
      • for any case, the value for any variable cannot influence the value of any variable for other cases
      • no case can influence another case on any variable
    • The biviariate Pearson correlation coefficient and corresponding significance test are not robust when independence is violated.
  5. Bivariate normality
    • Each pair of variables is bivariately normally distributed
    • Each pair of variables is bivariately normally distributed at all levels of the other variable(s)
    • This assumption ensures that the variables are linearly related; violations of this assumption may indicate that non-linear relationships among variables exist. Linearity can be assessed visually using a scatterplot of the data.
  6. Random sample of data from the population
  7. No outliers

Hypotheses

The null hypothesis (H0) and alternative hypothesis (H1) of the significance test for correlation can be expressed in the following ways, depending on whether a one-tailed or two-tailed test is requested:

Two-tailed significance test:

H0ρ = 0 (“the population correlation coefficient is 0; there is no association”)
H1: ρ ≠ 0 (“the population correlation coefficient is not 0; a nonzero correlation could exist”)

One-tailed significance test:

H0ρ = 0 (“the population correlation coefficient is 0; there is no association”)
H1ρ  > 0 (“the population correlation coefficient is greater than 0; a positive correlation could exist”)
OR
H1ρ  < 0 (“the population correlation coefficient is less than 0; a negative correlation could exist”)

where ρ is the population correlation coefficient.

Test Statistic

The sample correlation coefficient between two variables x and y is denoted r or rxy, and can be computed as:

Run a Bivariate Pearson Correlation

To run a bivariate Pearson Correlation in SPSS, click Analyze > Correlate > Bivariate.

The Bivariate Correlations window opens, where you will specify the variables to be used in the analysis. All of the variables in your dataset appear in the list on the left side. To select variables for the analysis, select the variables in the list on the left and click the blue arrow button to move them to the right, in the Variables field.

AVariables: The variables to be used in the bivariate Pearson Correlation. You must select at least two continuous variables, but may select more than two. The test will produce correlation coefficients for each pair of variables in this list.

BCorrelation Coefficients: There are multiple types of correlation coefficients. By default, Pearson is selected. Selecting Pearson will produce the test statistics for a bivariate Pearson Correlation.

CTest of Significance: Click Two-tailed or One-tailed, depending on your desired significance test. SPSS uses a two-tailed test by default.

DFlag significant correlations: Checking this option will include asterisks (**) next to statistically significant correlations in the output. By default, SPSS marks statistical significance at the alpha = 0.05 and alpha = 0.01 levels, but not at the alpha = 0.001 level (which is treated as alpha = 0.01)

EOptions: Clicking Options will open a window where you can specify which Statistics to include (i.e., Means and standard deviationsCross-product deviations and covariances) and how to address Missing Values (i.e., Exclude cases pairwise or Exclude cases listwise). Note that the pairwise/listwise setting does not affect your computations if you are only entering two variable, but can make a very large difference if you are entering three or more variables into the correlation procedure.

Example: Understanding the linear association between weight and height

PROBLEM STATEMENT

Perhaps you would like to test whether there is a statistically significant linear relationship between two continuous variables, weight and height (and by extension, infer whether the association is significant in the population). You can use a bivariate Pearson Correlation to test whether there is a statistically significant linear relationship between height and weight, and to determine the strength and direction of the association.

BEFORE THE TEST

In the sample data, we will use two variables: “Height” and “Weight.” The variable “Height” is a continuous measure of height in inches and exhibits a range of values from 55.00 to 84.41 (Analyze > Descriptive Statistics > Descriptives). The variable “Weight” is a continuous measure of weight in pounds and exhibits a range of values from 101.71 to 350.07.

Before we look at the Pearson correlations, we should look at the scatterplots of our variables to get an idea of what to expect. In particular, we need to determine if it’s reasonable to assume that our variables have linear relationships. Click Graphs > Legacy Dialogs > Scatter/Dot. In the Scatter/Dot window, click Simple Scatter, then click Define. Move variable Height to the X Axis box, and move variable Weight to the Y Axis box. When finished, click OK.

To add a linear fit like the one depicted, double-click on the plot in the Output Viewer to open the Chart Editor. Click Elements > Fit Line at Total. In the Properties window, make sure the Fit Method is set to Linear, then click Apply. (Notice that adding the linear regression trend line will also add the R-squared value in the margin of the plot. If we take the square root of this number, it should match the value of the Pearson correlation we obtain.)

From the scatterplot, we can see that as height increases, weight also tends to increase. There does appear to be some linear relationship.

RUNNING THE TEST

To run the bivariate Pearson Correlation, click Analyze > Correlate > Bivariate. Select the variables Height and Weight and move them to the Variables box. In the Correlation Coefficients area, select Pearson. In the Test of Significance area, select your desired significance test, two-tailed or one-tailed. We will select a two-tailed significance test in this example. Check the box next to Flag significant correlations.

Click OK to run the bivariate Pearson Correlation. Output for the analysis will display in the Output Viewer.

Syntax

CORRELATIONS   /VARIABLES=Weight Height   /PRINT=TWOTAIL NOSIG   /MISSING=PAIRWISE.

OUTPUT

Tables

The results will display the correlations in a table, labeled Correlations.

A Correlation of Height with itself (r=1), and the number of non-missing observations for height (n=408).

B Correlation of height and weight (r=0.513), based on n=354 observations with pairwise nonmissing values.

C Correlation of height and weight (r=0.513), based on n=354 observations with pairwise nonmissing values.

D Correlation of weight with itself (r=1), and the number of nonmissing observations for weight (n=376).

The important cells we want to look at are either B or C. (Cells B and C are identical, because they include information about the same pair of variables.) Cells B and C contain the correlation coefficient for the correlation between height and weight, its p-value, and the number of complete pairwise observations that the calculation was based on.

The correlations in the main diagonal (cells A and D) are all equal to 1. This is because a variable is always perfectly correlated with itself. Notice, however, that the sample sizes are different in cell A (n=408) versus cell D (n=376). This is because of missing data — there are more missing observations for variable Weight than there are for variable Height.

If you have opted to flag significant correlations, SPSS will mark a 0.05 significance level with one asterisk (*) and a 0.01 significance level with two asterisks (0.01). In cell B (repeated in cell C), we can see that the Pearson correlation coefficient for height and weight is .513, which is significant (p < .001 for a two-tailed test), based on 354 complete observations (i.e., cases with nonmissing values for both height and weight).

DECISION AND CONCLUSIONS

Based on the results, we can state the following:

    • Weight and height have a statistically significant linear relationship (r=.513, p < .001).
  • The direction of the relationship is positive (i.e., height and weight are positively correlated), meaning that these variables tend to increase together (i.e., greater height is associated with greater weight).
  • The magnitude, or strength, of the association is approximately moderate (.3 < | | < .5).

 

 

HOW TO CARRY OUT DESCRIPTIVE STATISTICS IN SPSS

Variable View:

Data View:

The data can be summarized via the ‘Descriptive Statistics’ part of the ‘Analyse’ menu. Let’s explore;

ANALYZE

 

 

 

 

DESCRIPTIVE STATISTICS

 

 

 

 

FREQUENCIES.

This will open up a new dialogue box:

We can use this to make frequency tables of the variables. Statistics can be calculated and graphs can be drawn. Let’s make frequency tables for the categorical data: Passenger Class, Passenger Gender and Passenger Survived.

Ensure that ‘Display frequency tables’ is checked. Click on one of the three categories required in the left most box and then press the arrow button in the middle to move to the right hand box. Do the same for the other 2 categories.

Click on the ‘Charts’ button and choose the option to draw a bar chart (or a pie chart if you prefer). Press ‘Continue’ the ‘OK’. Output will now be written to the output window.

The frequency tables summarising the data are:

You should also find the charts that were selected.

The descriptive statistics feature of SPSS can also give summary statistics such as the mean, median and standard deviation. We have some scale data in the form of the passenger’s age. Go back to ;

Analyse               Descriptive Stats         Frequencies and return the previously moved categories back to the left box. Move over the ‘Passenger’s Age’ variable to the right box.

Choose the measures that you would like to get by clicking the check boxes.

Click continue. The frequency table is not needed for these data so uncheck and uncheck the options inside the chart menu.

When ready click ‘OK’

Above the chosen options were Mean, Median, Mode, Std. Deviation, Minimum and Maximum.

 

Continuous variables can also be analysed using the ‘Descriptives’ menu in SPSS. Go to Analyse -> Descriptive Statistics -> Descriptives.

First move the variable ‘Passenger Age’ to the ‘Variable(s)’ section.

Click on the ‘Options’ button. We can choose the statistics we want computed. Select those shown in the picture below:

The mean is a measure of average (sum of the values divided by the number of values). Standard deviation measures the spread of the data and can be used to describe normal distributions. Skewness is a measure of how symmetrical the distribution is. Values of skewness close to 0 represent symmetry, positive values mean that there are some high valued outliers and a negative value means some low valued outliers.

(image from www.managedfuturesinvesting.com)

Kurtosis values refer to how peaked the distribution is. A normal distribution would have a value of 0. Negative values mean that the distribution is flat, i,e, many cases in the extremes) and positive values meaning the distribution is clustered in the centre.

Choose

‘Continue’ and ‘OK’

The descriptives provide the requested information about the passenger age on the titanic.

 

SPSS BAR CHART

SPSS BAR CHARTS FOR CATEGORICAL VARIABLE

Our frequency table provides us with the necessary information but we need to look at it carefully for drawing conclusions. Doing so is greatly facilitated by creating a simple bar chart with bars representing frequencies. The fastest way to do so is including it in our FREQUENCIES command but this doesn’t allow us to add a title. We’ll therefore do it differently as shown by the screenshots below.

 

 

 

 

Categorical variables

Summary Measures for Categorical Data

Last Updated: 2021-03-22

For categorical data, the most typical summary measure is the number or percentage of cases in each category. The mode is the category with the greatest number of cases. For ordinal data, the median (the value at which half of the cases fall above and below) may also be a useful summary measure if there is a large number of categories.

The Frequencies procedure produces frequency tables that display both the number and percentage of cases for each observed value of a variable.

From the menus choose:

Analyze > Descriptive Statistics > Frequencies…

Note: This feature requires Statistics Base Edition.

Select Owns PDA [ownpda] and Owns TV [owntv] and move them into the Variable(s) list.

Figure 1. Categorical variables selected for analysis

Click OK to run the procedure.

Figure 2. Frequency tables

The frequency tables are displayed in the Viewer window. The frequency tables reveal that only 20.4% of the people own PDAs, but almost everybody owns a TV (99.0%). These might not be interesting revelations, although it might be interesting to find out more about the small group of people who do not own televisions.

research consultancy

research consultancy

how to write a good research background

  • Background of the study

 

Exporting is one of the most important channels through which developing countries can link with the world economy (World Bank, 2001). Exporting allows firms in developing countries to enlarge their markets and benefit from economies of scale. Additionally, several scholars have pointed out the importance of exporting as a channel of technology transfer (Pack, 1993). Thus, for better performance of a developing country, it is vital to identify the major determinants of its export supply. In order to formulate trade and industrial policies aimed at stimulating exports, it is important to understand which factors stimulate or deter firms from entering foreign markets.

 

Redding and Venable (2004), investigate the relative contribution towards export performance. They find that internal components related to supply capacity such as internal geography and institutional quality played a significant role in explaining the observed differential in export performance. According to Redding and Venables (2004), the relative export performance of the African and Middle Eastern countries tended to deteriorate over 1980s and 1990s. This was driven by relatively poor performance in supply capacity. However, since the late 90s, East Asian and Pacific countries in particular have been among the main beneficiaries of foreign market access which coincides with their successful diversification efforts. Real exchange rate which reflects the underlying relative movement of prices at home and abroad is proved to have a significant effect on the export performance of the lowest performers.

One of the world’s most widely traded commodities is coffee. Coffee beans when roasted produce a flavorful, aromatic and caffeine filled drink that is popular all over the world, with over 600 billion cups sold each year (ICO, 2014). Two botanically different trees can produce coffee. Arabica coffee trees produce coffee beans that are more labor intensive in its cultivation and are grown at higher altitudes. This coffee is milder, more aromatic and more complex than its Robusta counterpart (ICE, 2014). For many countries, over 50% of their total export earnings can be accounted for by coffee exports. In fact, the top 10 Arabica coffee exporting countries in the world are considered developing. The UCDA estimates that approximately 77 million 60 Kilogram bags were exported from the top 10 coffee exporting countries in 2011-2012.

 

Coffee is one of the most significant exported commodities by Brazil Colombia, Guatemala and Honduras, and as previously noted, these are among the top exporting countries of Arabica coffee. Naturally, coffee plays a significant role in the composition of the Gross Domestic Product and Agricultural Gross Domestic Product of these countries, Coffee makes up the highest percentage of both GDP and AGDP for Honduras with 7.37% and 48.17 %, respectively. Honduras is similarly followed by Guatemala with 2.49% and 21.08%; Colombia with 0.86% and 12.53%; and finally Brazil with 0.35% and 6.41%, (ICE, 2014).

 

For most low developed economies in Sub-Sahara Africa (SSA), agriculture has been the main source of livelihood both in terms contributing 34% to Gross Domestic Product (GDP) and 64% to employment, either directly or indirectly, Dependence on agricultural commodities like coffee for exports has been accompanied by a high degree of price risk in terms of both volatile and declining prices, a phenomenon which has not only affected the way households allocate their  resources but also affected their welfare in terms of consumption and export volume to the world market.

 

Uganda’s export sector is dominated by primary products (about 74.1 %), (Roberta, 2004). These include agricultural products; mainly coffee, cotton, flowers, simsim, fish; unprocessed minerals such as gold; live animals, hides and skins among others. At independence time (1962), Uganda’s traditional exports constituted agricultural commodities and unprocessed minerals. By the end of the 1970s, coffee was the largest foreign exchange earner accounting for about 51 percent leaving cotton, copper, tea and tobacco sharing the other portion of the earnings (Musinguzi, 2002).

 

Coffee continues to play a leading role in the economy of Uganda, contributing 18% of the export earnings between 2000 and 2010, despite the vigorous efforts by Government to diversify the economy. Though large scale coffee producers are gradually emerging, the coffee sub-sector is almost entirely dependent on about 500 000 smallholder farmers, 90 percent of whose average farm size ranges from less than 0.5 to 2.5 hectares (UCDA, 2012). The coffee industry employs over 3.5 million families through coffee related activities. Though large scale coffee producers are gradually emerging, the coffee sub-sector is almost entirely dependent on about 500000 smallholder farmers, Domestic consumption of the commodity in Uganda is relatively small ranging from 4-10% of production. As such, coffee is primarily an export crop, between 2005 and 2010, (Sayer, 2002).

 

Coffee has continued to play a leading role in the economy of Uganda (UBOS 2011). It contributes between 20-30 percent of the foreign exchange earnings (Uganda Coffee Development Authority, 2009). In 1995, the National Union of Coffee Agribusinesses and Farm Enterprises (NUCAFE), this has led to the coming up of some large scale coffee farmers. Though large scale coffee producers are gradually emerging, the coffee sub-sector is almost entirely dependent on about 500,000 smallholder farmers, 90 percent of whose average farm size ranges from 0.5 to 2.5 hectares. The coffee industry employs over 3.5 million family members through coffee related activities. From the 1920s, coffee was grown for export and in the 1950s an extensive coffee production programme was launched. In 1972, coffee production reached 4.2 million bags of 60Kgs each. Thereafter, coffee production declined tremendously because of civil strife, poor marketing systems, and low producer prices arising from government monopoly and controls (Rudaherenwa, et al 2003).

 

Uganda ranks fourth after Burundi, Ethiopia and Honduras in terms of contribution of coffee exports in total export earnings in the period 2000-2010 with an average share of 18% during this period (ICO, 2012). The post-1997 coffee price decline has had a negative effect on production and exports (Baffes, 2006). However, production kept declining even when prices recovered until 2006 and has recently been declining.

 

Although coffee contributed as much as $400 million annually to total merchandise exports during the mid-1990s, it currently (2010) contributes about $280 million (MAAIF, 2011). Understandably, the sector’s poor performance raised concerns among policy makers. However, despite the declining foreign earnings compared to the mid-1990s, coffee remained the main foreign exchange earner for the country. Its share in total export earnings declined marginally from 17.9 percent in 2009 to 17.5 percent in 2010.

 

Despite a significant decline in quantity exported coffee export earnings in 2010 increased by 13.1 percent as a result of higher global prices although there was an overall 14.3 percent decline in the quantity of coffee produced in 2010. Coffee exports in 2010/11 were 156,000 MT valued at US$ 338 million. European Union is the main market for Uganda coffee export accounting for over 70% of total exports followed by Sudan importing over 10% of Ugandan coffee and USA with 3% of coffee exports of Uganda (Figure 3) (UCDA, 2011). However, the export market of Uganda is quite diverse with a total of 16 importing countries. The export market is controlled by 29 national and multi-national companies with ten companies controlling about 85% of the export market. The leading company Ugacof (U), Ltd) controlled 15% of the coffee export in 2011 (UCDA, 2011). The top ten importing companies held a market share of 73.4% in 2011.

 

Uganda’s export sector is dominated by primary products (about 74.1 %), (Roberta, 2004). These include agricultural products; mainly coffee, cotton, flowers, simsim, fish; unprocessed minerals such as gold; live animals, hides and skins among others. At independence time (1962), Uganda’s traditional exports constituted agricultural commodities and unprocessed minerals. By the end of the 1970s, coffee was the largest foreign exchange earner accounting for about 51 percent leaving cotton, copper, tea and tobacco sharing the other portion of the earnings (Musinguzi, 2002).

In Uganda, Robusta Coffee is mainly grown in the low altitude areas of Central, Eastern, Western and South Eastern Uganda up to 1,200 meters above sea level. Arabica coffee requires cool, moist and higher altitude. It is mainly grown on Uganda’s mountain fringes, on Mount Elgon in the east (notably in Bugisu, on the western slopes of Mount Elgon in Mbale district) and on the Ruwenzori’s and West Nile (Nebbi and Okoro districts) on the border with Congo. Some Arabica is also grown in Mbarara district in Western Uganda (Sayer, 2002).

Figure 1: Graph showing the changes in the quantity of coffee exported by Uganda since 1991-2010

 

The figure 1 above shows that Uganda’s exports of coffee in 1994 increased 1000,000 Kgs and Uganda exported the largest amount of coffee in 1996 which was above 4,000,000 Kgs , this figure also indicates that the export of Uganda coffee has been fluctuating over the years and by 2010 there was a great general decline of the coffee exports as compared to other years like 1996, 1998, 2000, 2002 this volatility in exports therefore indicates that Uganda’s coffee faces various challenges in its export of coffee to the world markets. From 1991 to 1998, coffee exports increased mainly due to fair prices on the international market. Thereafter, coffee exports declined

almost every subsequent year. This is mainly due to adverse prices on the international market, and there exists a huge value gap between the global revenues generated from coffee and what producing countries earn, due to a long supply chain with very many participants. For instance, in the year 2006/2007, the global coffee revenues were US$90 billion but farmers in producing countries all combined including Brazil earned only US$9 billion which is 10 percent of the global value share (UCDA, 2009). Basing on this background, this study therefore intends to investigate into the determinants of coffee exports in Uganda from 1991-2010.

1.2 Problem statement

Uganda ranks fourth after Burundi, Ethiopia and Honduras in terms of contribution of coffee exports in total export earnings in the period 2000-2010 with an average share of 18% during this period (ICO, 2012). According to figure 1 above there has been unstable export volume of coffee in Uganda from 1991 up to 2010, this is also shown by the fact that there was a rise in coffee export to the world market from 159,983 tons in 2004 up to 200,640 tons in 2008 however the coffee exports generally declined from 200,640 tons in 2008 to 159, 433 tons in 2010. Coffee exports have been declining since 1998 (refer to fig 1) despite the measures undertaken by the government to boost the sector.

 

Although coffee contributed as much as $400 million annually to total merchandise exports during the mid-1990s, it currently (2010) contributes about $280 million (MAAIF, 2011). Understandably, the sector’s poor performance raised concerns among policy makers. However, despite the declining foreign earnings compared to the mid-1990s, coffee remained the main foreign exchange earner for the country. Its share in total export earnings declined marginally from 17.9 percent in 2009 to 17.5 percent in 2010. This volatility in coffee exports has been a matter of concern to the government of Uganda. This study therefore intends to investigate into the determinants of coffee exports in Uganda from 1991-2010.

Leave a Reply

Your email address will not be published. Required fields are marked *

RSS
Follow by Email
YouTube
Pinterest
LinkedIn
Share
Instagram
WhatsApp
FbMessenger
Tiktok