GmbH is a global corporation with over 300000 employees and offices across the whole country. The management had noticed that employee job satisfaction became an issue in the company. For this reason, the company HR department has decided to conduct a survey in order to identify factors influencing satisfaction level of the employees. They measure employees’ satisfaction in two periods of before and after training. The survey consists of the following variables:

The variables include:

  1. ID
  2. Gender 1= male or 2=female
  • Marital status (Married, single)
  1. Age
  2. Years of experience
  3. City that they come from (areas coded from 1 to 5)
  • Region they come from (east, west, south, north)
  • Departments (1=IT, 2=Marketing, 3= Sales, 4= HR, 5= Finance 6=Innovation)
  1. Salary (in thousands)
  2. Job satisfaction score before training (1= extremely dissatisfied; 5=extremely satisfied)
  3. Job satisfaction score after training (1= extremely dissatisfied; 5=extremely satisfied)
  • Life happiness score (1= extremely unhappy; 10=extremely happy)
  • Promoted (yes, No)
  • Organization relation and employee satisfaction survey (OR) (1 to 5)
  1. Teamwork and employee satisfaction survey (TW) (1 to 5)
  • Information and employee satisfaction survey (INF) (1 to 5)
  • Job passion and self-evaluation employee satisfaction survey (JP) (1 to 5)
  • Work/Life balance and employee satisfaction survey (WLB) (1 to 5)

The data set is available on BB108 Moodle site. Employee Survey Data 2022T1 .xlsx


  1. Presentation of the employee job satisfaction problem             (10 marks)
  2. Present the job satisfaction problem of the company through tables, graphs and numerical measure.                         (5 marks)
  3. Describe how you are going to address the job satisfaction issue.

(5 marks)      

  1. Do you think job satisfaction is a Region related problem?             (20 marks)
    1. Formulate statistical hypotheses to test if job satisfaction is a region related problem                                                                                                                 (5 marks)
    2. Run a statistical test on the hypotheses                                                 (12 marks)
    3. Conclude your test result                                                 (3 marks)
  1. Test whether the pay is gender biased?                         (10 marks)
    1. Compare the average salary of male employees with that pf the female employees. What is your tentative hypothesis on the biasedness? (3 marks)
    2. Formulate a statistical hypothesis test to verify the tentative statement. (2 marks)
    3. What is your chosen significance level of the test? (1 mark)
    4. Show your test output                                                                                 (2 marks)
    5. Conclude your test result.                                                                                 (2 marks)         
  1. Quantify the relationship between Age and Salary (20 marks)
    1. Describe how regression can be used to quantify the impact of salary on Life happiness score (4 marks)
    2. Create a scatter plot between Salary and Age in computer, calculate the correlation coefficient between Salary and Age, Interpret the graphs

(4 marks)

  1. Run a regression of Salary on Age in computer.                                                (4 marks)
  2. Do you think Age has a significant influence on Salary? Formulate a statistical test to confirm your answer in d.                                                                 (4 marks)
  3. Complete the test formulated in part d and interpret the results. (4 marks)

Average of Job satisfaction score before training:

2 Do you think job satisfaction is a Region related problem?                                        (20 marks)

  1. Formulate statistical hypotheses to test if job satisfaction is a region related problem                                                                                                                         (5 marks)
  2. Run a statistical test on the hypotheses (12 marks)
Anova: Single Factor
Groups Count Sum Average Variance
East 71 3692 52 76.0571429
North 47 2471 52.5744681 62.0758557
south 59 3111 52.7288136 106.54588
West 123 6453 52.4634146 86.2506997
Source of Variation SS df MS F P-value F crit
Between Groups 19.5009222 3 6.50030739 0.07732945 0.972216 2.635106
Within Groups 24881.7357 296 84.0599181
Total 24901.2367 299        


  1. Conclude your test result (3 marks)
  • Test whether the pay is gender biased? (10 marks)
    1. Compare the average salary of male employees with that of the female employees. What is your tentative hypothesis on the biasedness? (3 marks)
  1. Formulate a statistical hypothesis test to verify the tentative statement.         (2 marks)
  2. What is your chosen significance level of the test?                 (1 mark)
  3. Show your test output                                                                                 (2 marks)
t-Test: Two-Sample Assuming Unequal Variances
  Variable 1 Variable 2
Mean 53.86842105 51.53763441
Variance 91.08872846 76.88776518
Observations 114 186
Hypothesized Mean Difference 0
df 224
t Stat 2.116799544
P(T<=t) one-tail 0.017690046
t Critical one-tail 1.65168456
P(T<=t) two-tail 0.035380093
t Critical two-tail 1.970610961  


  1. Conclude your test result.                                                                                 (2 marks)         
  • Quantify the relationship between Age and Salary (20 marks)
    1. Describe how regression can be used to quantify the impact of Age on Salary                                                       (4 marks)
  1. Create a scatter plot between Salary and Age in computer, calculate the correlation coefficient between Salary and Age, Interpret the graphs

(4 marks)

  1. Run a regression of Salary on Age in computer.                                                (4 marks)
Regression Statistics
Multiple R 0.76
R Square 0.57
Adjusted R Square 0.57
Standard Error 5.99
Observations 300.00
  df SS MS F Significance F
Regression 1.00 14198.88 14198.88 395.36 0.00
Residual 298.00 10702.36 35.91
Total 299.00 24901.24      
  Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 24.07 1.47 16.40 0.00 21.18 26.96 21.18 26.96
X Variable 1 0.68 0.03 19.88 0.00 0.61 0.74 0.61 0.74


  1. Do you think Age has a significant influence on Salary? Formulate a statistical test to confirm your answer in d.                                                                 (4 marks)
  2. Complete the test formulated in part d and interpret the results. 4 (marks)

