R Studio Assessment : RData Workspace - Piazza Under Resources

December 07, 2017
Author : Alex

Solution Code: 1IIJ

Question: R Studio Assessment

This assignment is related to ”R Studio Assessment” and experts at My Assignment Services AU successfully delivered HD quality work within the given deadline.

R Studio Assessment

Case Scenario/ Task

Problems

1.  For all models in this problem set, report the constants, coefficients, standard errors, and p-values, the N and the R2 values in one table in your writeup. Label them, Model 1, Model 2, etc, as per the example table provided be- low. When you are finished, you should have 5 models in you table. Use the format of Table 1 as a guide for creating this table in your word docu- ment. In the left hand column, you variable names should be included. Name the variables, DO NOT USE THE R CODE NAMES. In the columns to the right of the variable names, insert the different models you will make for this problem set. Each cell should contain the values of the coefficients and their standard errors in they are included in that model. Since the first model is bi- variate, the only cells that will have numbers will be the primary independent variable (the coefficient with SE’s in parentheses underneath), the constant (the coefficient with SE’s in parentheses underneath), and the N and R2. Fill the subsequent columns as the problem set instructs.

Table 1: Title of Table: Name of Dependent Variable

(Model 1) (Model 2) (Model 3)

Main X variable -0.846??? -1.314??? -0.130 (0.153) (0.158) (0.337)

Control X2 0.139??? 0.330??? (0.0128) (0.0558)

Control X3 -0.250??? (0.0710) Constant 23.547 17.556 19.250???

(18.443) (8.691) (3.0710)

N 416983 380425 380425 R2 .03 .1355 .4452

Note: Standard errors are shown below the coefficients in parentheses

? p < 0.05, ?? p < 0.01, ??? p < 0.001

2. Load the data; it should show you two dataframes. Use c for now. View the data frame and the variables. Make sure you know what each variable is and what it is measuring.

3. Report the descriptive statistics for all the above variables. This includes the type and level of measurement; for categorical variables, provide a prop- erly labeled frequency graph using the freq() command, and include the frequency, and percentage of each category in your write-up; for continu- ous, provide a histogram using hist() command and report the n, median, mean and standard deviation in your write-up. For the set of categorical vari- ables that indicate regional differences, you will not need to make a graph, but you should consider them here as one category and report their number and frequency. Label each response as 3a-h. Note: Use the options you used in Problem Set #2 to properly label every graph: this includes the main label, and the x label. Use whatever color you’d like, other than the default color.

4. Your primary interest is the relationship that attractiveness has on the percent- age of margin of victory. Create a scatter-plot using the plot() command for these two variables. The syntax for that command is plot(x,y). Use the label options you used for your histograms and frequency graphs to la- bel the Main Graph label (main=""), the x-axis label (xlab=""), and use ylab="" to label the y-axis. Put your scatterplot in your word document and spend a few sentences discussing whether you can distinguish the direc- tion of the relationship between the variables. Be sure to answer the follow- ing questions: Is a relationship apparent? What factors might explain why the scatterplot looks the way it does? Are there any concerns with outliers or leveraging observations?

5. Create a binary regression model using attractiveness and the percent margin of victory and the lm() command. Report the ? coefficient for attractiveness and report the standard error and p-value for the coefficient in a table, and label the results Model 1. In your word document, report the statistical and substantive significance. Explain the relationship as you might to a person who is not familiar with statistics, but in such a way that a statistician would recognize what you’ve done and would appreciate your work.

6. Write the R code necessary to find the R2 for the binary model you just made using the following equation. Report the value, and interpret what this particular R2 value means as you would to someone not familiar with statistics.

R2 = ?(Yi? ?Y )2??(Yi? ˆYi)2

?(Yi? ?Y )2

7. We may be missing important confounding variables. Perceived candidate attractiveness can be correlated with a variety of other variables that could affect the percent margin of victory. Therefore, we need to include the neces- sary variables. Run a model that includes all of the included control variables, and report the values in your table. In the table, label this model Model 2. Also in your writeup, interpret the results of every variable in the model, both statistically (with p-values), and substantively (with size and directions of coefficients). Explain every relationship as you might to a person who is not familiar with statistics, but in such a way that a statistician would recognize what you’ve done and would appreciate your work. You should use at least 2-3 sentences to properly explain the results of each variable.

8. If you followed instructions on the last part, R refused to include one of your variables. In a few sentences, identify which one and explain why it got dropped.

9. There are two particular control variables that may not have a linear relation- ship with the margin of victory. Identify the most likely candidate and create a new variable that is that variable but squared. Run a new regression model including this variable, and include it as Model 3 in your table. Explain the relationship that this variable has with the dependent variable.

10. From number 3 above, you may have identified former Rep. Henry Waxman as an outlier in our sample. Unfortunately, for reasons completely beyond his control, Mr. Waxman may have affected our data. Mr. Waxman is observa- tion 87. Report his attractiveness (for your own research, Google his image) and his percent margin of victory scores. In a few sentences, explain what impact might he have on our results.

11. We may need to remove Mr. Waxman from our sample. We will use a variant of the call function. For example, to remove the fifth observation from a data frame named x, you would write out: x<-x[-5,]. In a few sentences, provide a meaningful justification for removing Mr. Waxman.

12. Re-run the same code you used for Model 2, and include it in the table as Model 4. What impact did Mr.Waxman’s removal have on our primary vari- able of interest? Use examples and be specific.

13. Using the grant money we got from our above results, we have expanded our sample to include 1000 observations. Using the data frame named t, re-run the same variables we did in Model 2 and report the results in your table as Model 5. Reinterpret the results for your main independent variable both statistically (with p-values), and substantively (with size and directions of coefficients). Explain the relationship as you might to a person who is not familiar with statistics, but in such a way that a statistician would recognize what you’ve done and would appreciate your work. What are specific impacts of increasing the sample size? Why do they occur? Answer both questions with at least 3-4 sentences.

{*** offer code can be varied from 1-5***}

These assignments are solved by our professional R Studio Assessment Experts at My Assignment Services AU and the solution are high quality of work as well as 100% plagiarism free. The assignment solution was delivered within 2-3 Days.

Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.

Solution:

1. Please note that the table for all models is contained in this numbered answer, but all answers elaborating on the model results will be located in their respective locations (based upon original question numbers).

Table 1: Attractiveness of US Political Candidates: Percentage Margin of Victory

(Model 1) (Model 2) (Model 3) (Model 4) (Model 5)
Attractiveness 0.016 *** 0.010 ** -0.024 * 0.0026 *** 0.025 ***
(0.004) (0.003) (0.010) (0.002) (0.001)
Media Coverage - 0.050 ** 0.033 -0.026 0.002
- (0.018) (0.018) (0.013) (0.0004)
Millionaire Status - 0.042 *** 0.044 *** 0.049 *** 0.037 ***
- (0.006) (0.005) (0.003) (0.004)
Candidate’s Age - -0.001 -0.001 *** -0.001 *** -0.001 ***
- (0.0002) *** (0.0002) (0.0001) (0.00005)
Friendliness - 0.005 * 0.006 * 0.003 0.005 ***
- (0.002) (0.002) (0.001) (0.0007)
Political Ideology - -0.002 -0.001 -0.010 0.004
- (0.010) (0.010) (0.006) (0.002)
North Region - -0.008 -0.003 -0.006 -0.007 **
- (0.011) (0.010) (0.006) (0.002)
South Region - -0.001 0.0002 -0.003 -0.004 *
- (0.008) (0.007) (0.005) (0.001)
West Region - -0.005 -0.003 -0.008 -0.007 ***
- (0.008) (0.008) (0.005) (0.001)
Attractiveness2 - - 0.006 *** - -
- - (0.001) - -
(Constant) -0.046 *** -0.011 0.038 -0.025 -0.038 ***
(0.012) (0.020) (0.023) (0.013) (0.004)
N 100 100 100 99 1,000
R2 0.140 0.588 0.641 0.814 0.711

Note: Standard errors are shown below the coefficients in parentheses

* p < 0.05, ** p < 0.01, *** p < 0.001

2. Load the data; it should show you two dataframes. Use c for now. View the data frame and the variables. Make sure you know what each variable is and what it is measuring.

Data loaded appropriately and variables are understood.

3. Report the descriptive statistics for all the above variables. This includes the type and level of measurement; for categorical variables, provide a properly labeled frequency graph using the freq() command, and include the frequency, and percentage of each category in your write-up; for continuous, provide a histogram using hist() command and report the n, median, mean and standard deviation in your write-up. For the set of categorical variables that indicate regional differences, you will not need to make a graph, but you should consider them here as one category and report their number and frequency. Label each response as 3a-h.

       Frequency Percent

midwest        24 24

north          14 14

south          37 37

west           25 25

Total         100 100

4. Your primary interest is the relationship that attractiveness has on the percentage of margin of victory. Create a scatter-plot using the plot() command for these two variables.  The syntax for that command is plot(x,y). Use the label options you used for your histograms and frequency graphs to label the Main Graph label (main=""), the x-axis label (xlab=""), and use ylab="" to label the y-axis. Put your scatterplot in your word document and spend a few sentences discussing whether you can distinguish the direction of the relationship between the variables. Be sure to answer the following questions: Is a relationship apparent? What factors might explain why the scatterplot looks the way it does? Are there any concerns with outliers or leveraging observations?

There appears to be a slight linear relationship between the percent margin of victory and the candidate’s level of attractiveness.  This relationship appears to be a direct relationship – namely, as a candidate’s attractiveness increases, there appears to be an association with increase the percentage margin of victory.  The main concern with this plot is a larger proportion of observations have values of attractiveness of 2, 3, and 4. There are not many observations at 0, 1, 4, 5, or 6. There appears to be at least one outlier (that has an attractiveness of 0, but a large percentage in their respective margin of victory).  This outlier could act as a large leverage point that would reduce the linear association of attractiveness and percent margin of victory.

5. Create a binary regression model using attractiveness and the percent margin of victory and the lm() command. Report the coefficient for attractiveness and report the standard error and p-value for the coefficient in a table, and label the results Model 1. In your word document, report the statistical and substantive significance. Explain the relationship as you might to a person who is not familiar with statistics, but in such a way that a statistician would recognize what you’ve done and would appreciate your work.

Please see Table 1 in Question 1 above for model results.  A candidate’s attractiveness is statistically significant, with a probability value less than 0.001, and is positively associated with their percent margin of victory.  For a candidate that is “very unattractive,” their expected percent margin of victory is -4.6%.

6. Write the R code necessary to find the R2 for the binary model you just made using the following equation. Report the value, and interpret what this particular R2 value means as you would to someone not familiar with statistics.

Please reference the R script for the code as the instructions for this document stated not to include any R code.  The calculated R2 is 0.1404115.  The (variation in the) variable for a political candidate’s attractiveness explains approximately 14.04% of (the variation in) the candidate’s percentage margin of victory.

7. We may be missing important confounding variables. Perceived candidate attractiveness can be correlated with a variety of other variables that could affect the percent margin of victory. Therefore, we need to include the necessary variables. Run a model that includes all of the included control variables, and report the values in your table. In the table, label this model Model 2.  Also in your writeup, interpret the results of every variable in the model, both statistically (with p-values), and substantively (with size and directions of coefficients). Explain every relationship as you might to a person who is not familiar with statistics, but in such a way that a statistician would recognize what you’ve done and would appreciate your work. You should use at least 2-3 sentences to properly explain the results of each variable.

The updated model compares political attractiveness with a candidate’s percentage margin of victory, along with additional controls variables.  The variable of main interest, the politician’s attractiveness, is positive and statistically significant (p < 0.01). There is a positive association between these two variables, such that every increase in a candidate’s attractiveness is associated with an increase in their percent margin of victory by approximately 1%.  For media coverage, there is also a positive association (p < 0.01). As their media coverage increases by one unit, their expected margin of victory increases by approximately 5% points. If the candidate is considered a millionaire, their margin of victory increases by approximately 4.2% (p < 0.001). For every one unit increase in age, there is a negative association with the candidate’s margin of victory.  In this case, with every one unit increase in age, the candidate’s expected margin of victory decreases by 0.13% (p < 0.001). There is a positive association with the candidate’s level of friendliness and their percentage margin of victory. For every one unit increase in their likability (or friendliness), their expected margin of victory increases by 0.59% (p < 0.05). Of note, these findings are not causal in nature, but merely provide evidence of associations between the dependent variable and selected covariates.  The remaining variable in the model, to include political ideology, and their respective region in the country from which they resided were not statistically significant. That is, there is not enough evidence to demonstrate these remaining covariates are significantly associated with the candidate’s margin of victory.

8. If you followed instructions on the last part, R refused to include one of your variables. In a few sentences, identify which one and explain why it got dropped.

For this model, the dummy variable indicating the candidate is from the Midwest was dropped.  This is due to complete (or exact) collinearity. In this case, when all other covariates take on the value of zero, the y-intercept (constant) contains the coefficient estimate for the Midwest variable.

9. There are two particular control variables that may not have a linear relationship with the margin of victory. Identify the most likely candidate and create a new variable that is that variable but squared. Run a new regression model including this variable, and include it as Model 3 in your table. Explain the relationship that this variable has with the dependent variable.

Of note, due to exact collinearity in the models when all variables are included, the Midwest variable is excluded from future models.  The plot below identifies attractiveness as one variable that may not have a linear relationship with margin of victory. As evidenced in the two plots below, the lowess curve for attractiveness and margin of victory appears nonlinear (a polynomial curve).  The relationship appears linear when values of attractiveness fall between 2 and 4. This linear relationship disappears, however, when outside of these range of values. When the lowess curve is fitted with age squared, there appears to be a linear relationship across all range of values.  In Model 3, attractiveness and attractiveness squared are both included in the model.

10. From number 3 above, you may have identified former Rep. Henry Waxman as an outlier in our sample. Unfortunately, for reasons completely beyond his control, Mr. Waxman may have affected our data. Mr. Waxman is observation 87. Report his attractiveness (for your own research, Google his image) and his percent margin of victory scores. In a few sentences, explain what impact he might have on our results.

Representative Waxman’s attractiveness rated a zero, which is considered “very unattractive.”  His margin of victory is 15%. Representative Waxman biases the association between attractiveness and margin of victory as compared to the rest of the observations.

11. We may need to remove Mr. Waxman from our sample. We will use a variant of the call function. For example, to remove the fifth observation from a data frame named x, you would write out: x<-x[-5,]. In a few sentences, provide a meaningful justification for removing Mr. Waxman.

The scatter plot below identifies how Representative Waxman as an outlier compared to the rest of the observations (large red point).  As the dataset is currently lacking another variable that could perhaps better account for Representative Waxman’s margin of victory, one potential option is to remove him from the observations.  Another alternative is to obtain a variable that would better account for Representative Waxman. This one observation could act as negative leverage in describing the association between attractiveness and margin of victory.

12. Re-run the same code you used for Model 2, and include it in the table as Model 4. What impact did Mr. Waxman’s removal have on our primary variable of interest? Use examples and be specific.

Differences are apparent with the results between Models 2 and 4.  First, the r-square value for Model 4 increased significantly to 0.8143.  Second, the coefficient estimate for attractiveness increased from 0.010 to 0.026 when Representative Waxman was removed.  The coefficient estimate for media coverage declined from 0.050 to -0.026. Both friendliness and media coverage had p-values less than 0.05 and 0.01, respectively.  With Representative Waxman removed, both these covariates only have marginal significance at p < 0.10. The other coefficient estimates remain largely unchanged.

13. Using the grant money we got from our above results, we have expanded our sample to include 1000 observations. Using the data frame named t, re-run the same variables we did in Model 2 and report the results in your table as Model 5. Reinterpret the results for your main independent variable both statistically (with p-values), and substantively (with size and directions of coefficients). Explain the relationship as you might to a person who is not familiar with statistics, but in such a way that a statistician would recognize what you’ve done and would appreciate your work. What are specific impacts of increasing the sample size? Why do they occur? Answer both questions with at least 3-4 sentences.

With the main covariate of interest being attractiveness, when the sample size is increased to 1,000 the association between attractiveness and margin of victory increases to 0.025.  The results of this model have coefficient estimates that are in range with the updated model where Representative Waxman was removed, although his observation is still present in this larger dataset.  For every one unit increase in the candidate’s attractiveness, there is an expected increase in margin of victory by 2.5% (p < 0.001). There is a positive association between these two variables. Even with the larger dataset, Representative Waxman still appears as an outlier, but his leverage has less of an impact on the coefficient estimate of attractiveness.  Increasing the sample size will reduce the impact outliers have on the dataset, as larger sample sizes typically are more in line with the population of interest.

Find Solution for R Studio Assessment by dropping us a mail at help@myassignmentservices.com.au along with the question’s URL. Get in Contact with our experts at My Assignment Services AU and get the solution as per your specification & University requirement.

RELATED SOLUTIONS

Order Now

Request Callback

Tap to ChatGet instant assignment help

Get 500 Words FREE