i need fixt it For question 2a, check the “id” variable to be sure you have the correct cases numbers that need to be removed. There are three cases that should be removed that you did not list. Also, 18 and 1129 will automatically be removed since there is missing data for these cases (no MAH_1 value). Some of your values are different than what I have. Be sure you run the multiple regression with the profile-b data set and only with cases that are MAH_1 ≤ 22.458. For question 2h, you should not include the values for the variables that are not statistically significant. The regression equation should only include those variables that are statistically significant. For 2i, also mention that these two variables are not statistically significant and provide the p (sig) value.
screen_shot_2020_04_26_at_1.13.16_am.png
__2_.docx
Unformatted Attachment Preview
1. The following output was generated from conducting a forward multiple
regression to identify which IVs (urban, birthrat, lnphone, and lnradio) predict
lngdp. The data analyzed were from the SPSS country-a.sav data file.
a. Evaluate the tolerance statistics. Is multicollinearity a problem?
In order to evaluate the presence of multicollinearity, we can exploit the tolerance
statistics, calculated as 1-R2. A small tolerance is an indication of the fact that the
variable considered is almost a perfect linear combination of the other independent
variables already in the equation. Usually, a value of 0.1 serves as the cutoff point.
Looking at the table, we can see assess that multicollinearity is not a problem because all
tolerance statistics are greater than .1 for all the independent variable in both
specifications.
b. What variables create the model to predict lngdp? What statistics support your
response?
The model summary output indicates that the variables used for the forward multiple
regression are are respectively lnphone (for the simple regression) and lnphone +
birthrate (for the multiple regression).
If we look at the p-values, we can see that both of the coefficients are statistically
significant in explaining the variation of lngdp. However, that of birthrat is significant at
a 5% significance level, differently from that of lnphone which is significant at 1%
significance level. Moreover, despite its significance, the coefficient of birthrat is rather
small in magnitude and the R^2 change between the regression including only lnphone
and the following one with the added birthrate is only 0.004. This is a suggestion of the
fact that the explicative power of birthrat is not much high.
c. Is the model significant in predicting lngdp? Explain.
Regression results indicate an overall model of two predictors (lnphone and birthrat) that
significantly predicts lngdp.
The R squared = .890, the Adjusted R squared = .888
d. What percentage of variance in lngdp is explained by the model?
The model accounted for 89% of the variance in lndgp, as it can be retrieved from the
R^2.
e. Write the regression equation for lngdp.
lngdp = 6.878 + .663*(lnphone) – .013*(birthrat)
2. This question utilizes the data sets profile-a.sav and profile-b.sav,
You are interested in examining whether the variables shown here in brackets
[years of age (age), hours worked per week (hrs 1), years of education (educ), years
of education for mother (maeduc), and years of education for father (paeduc)] are
predictors of individual income (rincmdol). Complete the following steps to conduct
this analysis.
a. Using profile-a.sav, conduct a preliminary regression to calculate Mahalanobis
distance. Identify the critical value for chi-square. Conduct Explore to identify outliers.
Which cases should be removed from further analysis?
In order to calculate Mahalanobis distance, I conducted a preliminary regressio
Model Summaryb
Std. Error of
R
Adjusted R
the
Model
R
Square
Square
Estimate
1
.580a
.336
.331
4.345
a. Predictors: (Constant), Highest Year of School
Completed, Father, Number of Hours Worked Last
Week, Age of Respondent, Highest Year of School
Completed, Highest Year of School Completed,
Mother
b. Dependent Variable: RESPONDENTS INCOME
The model summary indicates the general statistics of the regression where all the IVs
were included into the model
ANOVAa
Model
1
Regressio
n
Residual
Sum of
Squares
Mean
Square
df
6136.473
5
1227.295
12123.027
642
18.883
F
64.994
Sig.
.000b
Total
18259.500
647
a. Dependent Variable: RESPONDENTS INCOME
b. Predictors: (Constant), Highest Year of School Completed, Father,
Number of Hours Worked Last Week, Age of Respondent, Highest Year
of School Completed, Highest Year of School Completed, Mother
The ANOVA table presents the model significantly predicts the dependent variable of
rincmdol, with the F-test for the overall significance telling us that at least one of the
predictors are statistically significant. F(5, 642) = 64.994, p<.001.
Coefficientsa
Unstandardized
Coefficients
B
Std. Error
-5.487
1.302
Model
1
(Constant)
Age of
.133
.016
Respondent
Highest Year of
School
.507
.071
Completed
Number of
Hours Worked
.142
.012
Last Week
Highest Year of
School
.005
.074
Completed,
Mother
Highest Year of
School
.041
.055
Completed,
Father
a. Dependent Variable: RESPONDENTS INCOME
Standardize
d
Coefficients
Beta
t
-4.215
Sig.
.000
.291
8.585
.000
.256
7.145
.000
.385
11.788
.000
.003
.066
.948
.030
.733
.464
The coefficient table indicates the coefficients that were used to predict the regression
equation.
Residuals Statisticsa
Minimu Maximu
m
m
Mean
Predicted
Value
Std. Predicted
Value
Standard Error
of Predicted
Value
Adjusted
Predicted
Value
Residual
Std. Residual
Stud. Residual
Deleted
Residual
Stud. Deleted
Residual
Mahal.
Distance
Cook's
Distance
Centered
Leverage
Value
Std.
Deviation
N
4.16
24.16
13.64
3.080
648
-3.077
3.415
.000
1.000
648
.176
1.005
.398
.128
648
4.23
24.48
13.64
3.083
648
-15.499
-3.567
-3.583
13.759
3.166
3.188
.000
.000
.000
4.329
.996
1.001
648
648
648
-15.637
13.945
-.003
4.375
648
-3.616
3.211
-.001
1.003
648
.059
33.575
4.992
4.223
648
.000
.042
.002
.004
648
.000
.052
.008
.007
648
a. Dependent Variable: RESPONDENTS INCOME
Case Processing Summary
Cases
Missing
Valid
N
Mahalanobis
Distance
Percent
677
N
45.1%
823
Total
Percent
54.9%
N
Percent
1500 100.0%
The sample consisted of 1500 (823 missing values).
Descriptives
Statistic
Mahalanobis
Distance
Mean
95%
Confidence
Interval for
Mean
Lower
Bound
Upper
Bound
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
Std. Error
4.9925522 .16121086
4.6760180
5.3090864
4.5041201
3.8432691
17.595
4.19458144
.05899
33.57526
33.51627
4.15036
2.310
7.910
.094
.188
The skewness statistics has a z-score of 2.310 /.094= 24.574. Based on this, we can
conclude that the skewness is substantial and the distribution is non-normal.
The kurtosis values are in line with that, 7.910/.188 = 42.074 shows there is no
significance.
Using a chi-squared table, critical value 22.458 was found, therefore, cases 406, 508, 18,
1129, and 351 exceeded that value so should be eliminated.
The box plots is not normal and there are outliers at the highest end of the distribution.
The critical value for chi-square is 22.458. Any cases with mahlabnobis>22.458 should
be eliminated from the regression analysis. Therefore, cases 406, 508, 18, 1129, and 351
were eliminated following this reasoning.
For all subsequent analyses, use profile-b.sav. Make sure that only cases where
MAH_1<22.458 are selected.
b. Create a scatterplot matrix. Can you assume linearity and normality?
The scatterplot matrix with the transformed variables displays elliptical shapes,
suggesting that the variables are linear normally distributed.
Tests of Normality
Kolmogorov-Smirnova
Shapiro-Wilk
Statistic
df
Sig.
Statistic
df
Sig.
Age of
Respondent
Highest Year of
School
Completed
.057
609
.000
.975
609
.000
.151
609
.000
.944
609
.000
Number of
Hours Worked
.184
609
Last Week
Highest Year of
School
.270
609
Completed,
Mother
Highest Year of
School
.180
609
Completed,
Father
RESPONDENT
.115
609
S INCOME
a. Lilliefors Significance Correction
.000
.960
609
.000
.000
.891
609
.000
.000
.964
609
.000
.000
.952
609
.000
The Shapiro-Wilk test is particularly useful for testing the non-normality of the variables.
The null hypothesis of this test is that the variables are normally distributed. From the
results, we can reject the null hypothesis for all the variables, concluding that none of
them is normally distributed.
From the plot we can see that the residuals of the regression do not cluster on a horizontal
line; in fact, there is an even distribution above and below the reference. From this, it
seems that there is a moderate violation of linearity and homoscedasticity, which
however should not invalidate the analysis.
d. Conduct multiple regression using the Enter method. Evaluate the tolerance
statistics. Is multicollinearity a problem?
Multicollinearity is not a problem because all tolerance statistics is greater than .1.
Descriptive Statistics
Std.
Mean
Deviation
RESPONDENT
13.25
5.058
S INCOME
Age of
39.45
11.547
Respondent
Highest Year of
School
14.25
2.587
Completed
Number of
Hours Worked
42.88
14.059
Last Week
Highest Year of
School
11.81
2.802
Completed,
Mother
Highest Year of
School
11.65
3.862
Completed,
Father
N
609
609
609
609
609
609
Correlations
Numbe
RESPO
Highest
r of
NDENT
Year of Hours
S
Age of
School Worked
INCOM Respond Complet Last
E
ent
ed
Week
Pearson
Correlation
RESPON
DENTS
INCOME
Age of
Responde
nt
Highest
Year of
School
Comple
ted,
Mother
Highest
Year of
School
Comple
ted,
Father
1.000
.270
.335
.522
.036
.050
.270
1.000
-.017
.053
-.305
-.275
Sig. (1-tailed)
Highest
Year of
School
Complete
d
Number
of Hours
Worked
Last
Week
Highest
Year of
School
Complete
d, Mother
Highest
Year of
School
Complete
d, Father
RESPON
DENTS
INCOME
Age of
Responde
nt
Highest
Year of
School
Complete
d
Number
of Hours
Worked
Last
Week
.335
-.017
1.000
.145
.321
.370
.522
.053
.145
1.000
.037
.049
.036
-.305
.321
.037
1.000
.578
.050
-.275
.370
.049
.578
1.000
.
.000
.000
.000
.185
.109
.000
.
.337
.097
.000
.000
.000
.337
.
.000
.000
.000
.000
.097
.000
.
.180
.112
N
Highest
Year of
School
Complete
d, Mother
Highest
Year of
School
Complete
d, Father
RESPON
DENTS
INCOME
Age of
Responde
nt
Highest
Year of
School
Complete
d
Number
of Hours
Worked
Last
Week
Highest
Year of
School
Complete
d, Mother
Highest
Year of
School
Complete
d, Father
.185
.000
.000
.180
.
.000
.109
.000
.000
.112
.000
.
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
609
Correlation table indicates number of hours worked has highest correlation (.522) and
highest year of school completed (.355) is the second highest correlation. Also indicates
both mom (.036) and dad (.050) have the lowest correlation.
All variables were entered using the enter method.
Model Summaryb
Model
1
R
.635a
Std. Error of
R
Adjusted R
the
Square
Square
Estimate
.404
.399
3.922
a. Predictors: (Constant), Highest Year of School
Completed, Father, Number of Hours Worked Last
Week, Age of Respondent, Highest Year of School
Completed, Highest Year of School Completed,
Mother
b. Dependent Variable: RESPONDENTS INCOME
ANOVAa
Sum of
Squares
Mean
Square
Model
df
F
Sig.
1
Regressio
6280.935
5
1256.187 81.677
.000b
n
Residual
9274.119
603
15.380
Total
15555.054
608
a. Dependent Variable: RESPONDENTS INCOME
b. Predictors: (Constant), Highest Year of School Completed, Father,
Number of Hours Worked Last Week, Age of Respondent, Highest Year
of School Completed, Highest Year of School Completed, Mother
The ANOVA table suggests that the model significantly predicts the dependent variable
of income, with the F test of the overall significance telling us that at least one variable is
useful in predicting the income, F(5, 603) = 81.677, p<.001.
The coefficient table indicates the coefficients that were used to predict the regression
equation.
Collinearity Diagnosticsa
Variance Proportions
Di
m
en
Conditi
Age of
si Eigen
on
(Cons Respo
Model on value Index
tant) ndent
1
1
5.716
1.000
.00
.00
2
.131
6.611
.00
.22
3
.082
8.342
.00
.19
4
.034 12.888
.03
.20
5
.024 15.366
.00
.13
Highest
Year of
School
Comple
ted
.00
.00
.00
.07
.63
6
.012 21.772
.96
.26
.29
a. Dependent Variable: RESPONDENTS INCOME
Residuals Statisticsa
Minimu
m
Maximu
m
Mean
Number
of
Hours
Worked
Last
Week
.00
.06
.85
.04
.02
Highest
Year of
School
Complet
ed,
Mother
.00
.03
.00
.26
.49
Highe
st
Year
of
Scho
ol
Comp
leted,
Fathe
r
.00
.17
.00
.80
.00
.03
.22
.01
Std.
Deviation
Predicted
3.26
23.15
13.25
Value
Residual
-15.487
8.673
.000
Std. Predicted
-3.106
3.082
.000
Value
Std. Residual
-3.949
2.211
.000
a. Dependent Variable: RESPONDENTS INCOME
N
3.214
609
3.906
609
1.000
609
.996
609
e. Does the model significantly predict rincmdol? Explain.
The results indicate the model significantly predicts rincmdol.
The explanation power is given by R square = .404 (not too high),
Adjusted R squared = .399, F(5, 603) = 81.677, p <.001.
f. Which variables significantly predict rincmdol? Which variable is the best
predictor of the DV?
The variables of age (B=.110, Beta=.252, t=7.485, p<.001), edu (B=.531, Beta=.271,
t=7.818, p<.001), and hrs1(B=.169, Beta=.469, t=14.741, p<.001) significantly predict
the DV. The variable of hrs1 is the best predictor of rincmdol as indicated by the beta
weight and respective t and p-values.
g. What percentage of variance in rincmdol is explained by the model?
The model accounted for 40.4% of the variance rincmdol.
h. Write the regression equation for the standardized variables.
Income= -6.052 + .252 * age + .272 * educ + .469 * hrs + .017*maeduc + -.014*paeduc
i. Explain why the variables of mother’s and father’s education are not significant
predictors of rincmdol.
Bivariate and partial correlation coefficients of these two variables with the DV are very
low. Therefore, there seems to be not much evidence of these variables being important
in explaining the DV.
...
Purchase answer to see full
attachment
Why Choose Us
- 100% non-plagiarized Papers
- 24/7 /365 Service Available
- Affordable Prices
- Any Paper, Urgency, and Subject
- Will complete your papers in 6 hours
- On-time Delivery
- Money-back and Privacy guarantees
- Unlimited Amendments upon request
- Satisfaction guarantee
How it Works
- Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
- Fill in your paper’s requirements in the "PAPER DETAILS" section.
- Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
- Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
- From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.