R-Studio Project - Data Analysis Environment - Assessment Answers

December 04, 2017
Author : Charles Hill

Solution Code: 1IJA

Question: R-Studio Project

This assignment is related to “R-Studio Project” and experts at My Assignment Services AU successfully delivered HD quality work within the given deadline.

R-Studio Project Writing

Task

Setting 1: Yoplait Yogurt Advertising

For this problem, you will need to download and install the package Ecdat if you have not done so already. If you do not have Ecdat installed, run the following command and follow the instructions. install.packages (“Ecdat”, dependencies = T) Once you have it installed, run the following commands to load the data for this problem into R. library(Ecdat) data (Yogurt) Once you run these commands, R will contain a data frame object called Yogurt. These data are from a study by Jain, Vilcassim and Chintagunta that uses more sophisticated methods than we willproy in this question. For us, the data set is an interesting setting in which to apply multiple linear regression.

Alter loading the data into R, produce summary statistics of the data (mean, standard deviation, minimum, maximum for each numerical variable; counts for each categorical variable). Produce a nice table to display these summary statistics. Comment briefly on any striking patterns.

Compute the summary statistics for each numerical variable for each level of the categorical

variable choice. Comment briefly on any systematic differences in these variables across the levels of this categorical variable.

Consider a multiple regression model of the form

| Y=X'B +P'y+U

where X is a vector that contains a constant and a full set of the featured advertisement dummy variables, P is a vector that contains the prices for each of the brands of yogurt and Y is a dummy variable for whether the individual bought Yoplait.

(a) Without looking at the data, do you expect there to be heteroskedasticity in this regressionmodel? Explain precisely why or why not.

(b) Does including every featured advertisement dummy variable in the regression lead tomulticollinearity? Explain why or why not and articulate the importance of avoiding perfect multicollinearity.

(c) Use R to produce multiple regression estimates for this regression model. Report theestimates (and standard errors in parentheses) in the first column of a nicely-formattedtable of results. (d) Suppose that we are willing to assume that the conditional expectation E (Y|X,P) is linear.

What does this imply about E (U|X,P]? What does it imply about (?)?

(e) Examine the coefficient estimate on the dummy variable feat.yoplait, Bf.y.

Give an interpretation for this coefficient estimate. Be precise and give context. ii. Is feat. yoplait statistically significant at the 5 percent level? Provide a formal

hypothesis test. iii. Use the summary output in R to construct a 95 percent confidence interval for Bf.v.

Give a precise interpretation of this confidence interval that uses context. (f) Estimate the regression model without the regressor feat.weight. Report the estimatesfrom this reduced regression model in the second column of your table of results.

Relative to the estimates you produced in part (c), how do the coefficient estimateschange? Explain why the coefficient estimates change in the context of multiple regression.

1A Random-Coefficients Logit Brand-Choice Model Applied to Panel Data Dipak C. Jain, Naufel J. Vilcassim and Pradeep K. Chintagunta Journal of Business & Economic Statistics Vol. 12, No. 3 (Jul., 1994), pp. 317-328

(g) Does the existence of a featured advertisement of another brand of yogurt relate to theprobability that an individual purchases Yoplait brand yogurt?

Within the context of the regression model in (1), formally state the null and alternative hypotheses that are appropriate for answering this question. ii. What is an appropriate test statistic to use to conduct this hypothesis test? Beforecomputing this test statistic, state your decision rule. iii. Compute the test statistic and the p-value for this test using both Stata and R. Whatcan you conclude from this hypothesis test? (h) The variable id is an individual identifier. There are 100 individuals in the data set andrepeated observations for each individual. This may or may not be a problem.

Plot the residuals versus id. Do you notice any patterns? ii. Compute the mean of the residuals for each id. Hint: In R, the command tapply(a,b, mean) computes the mean of a for each value of b. Why is the mean of thesemeans not zero? iii. Produce a histogram of these 100 mean residuals by id. Based on this histogramand the context of the problem, what can you conclude? Are any assumptions of the multiple regression model violated?

Now, consider a multiple regression model of the form

Y=8; + X'B+P'y+U

(2)where d; is an id-specific intercept, X is a vector that contains full set of the featured advertisement dummy variables, P is a vector that contains the prices for each of the brands of yogurt and Y is a dummy variable for whether the individual bought Yoplait.

(a) Before estimating the regression specification, do you expect to obtain the same estimatesfor as you did in part 3? Explain precisely why you expect what you expect.

(b) Estimate this regression model in R. To estimate a model with 100 id-specific intercepts,coerce id to be a factor using the as.factor() function and include the coerced variable as an explanatory variable (now a factor) in addition to the predictors you used from before.

Report the coefficient estimates and standard errors for ) in the fourth columnof your table of results. Do not report the 100 intercept coefficient estimates. How dothese estimates compare with the ones you produced in part 3? ii. Produce a histogram of the 100 id-specific intercept estimates. How do these estimates compare with the histogram of mean residuals you computed in part 3? Shouldthey be similar to one another? Should they be correlated? Why or why not?

(c) Are there significant individual-specific factors that affect the probability of purchasingYoplait yogurt?

In the context of the regression model in (2), state null and alternative hypothesesappropriate for answering this question.

ii. What is an appropriate test statistic?

iii. Carry out this test using R. What is the value of the test statistic? What is the p-value?

What do you conclude? Da (d) Imagine that after you explain the regression specification in (2) to a friend of yours whostudies philosophy, your friend says, “Some people are more frugal than others when it comes to yogurt. Isn't this a problem for you?” Respond thoughtfully to this question in plain English

Now, consider a multiple regression model of the form

Y=8; + X'B+P_X_y + Pyoplait Q; +U

(3)where di is an id-specific intercept, Qi is an id-specific slope on the price of Yoplait, X is a vector that contains full set of the featured advertisement dummy variables, P-y is a vector that contains the prices for each of the brands of yogurt except for Yoplait and Y is a dummy variable for whether the individual bought Yoplait,

(a) In words, explain how to use statistical software to estimate the regression model in (3). (b) Use R to estimate this regression model. You should discover that R drops some id xprice. yoplait interactions due to "singularities.” What does this mean?

Summarize your findings in a typed report.

Setting 2: YouTube Partner Advertising

The data set, adsensedata.csv, contains daily observations on a YouTube channel's daily earnings (Earnings), number of clicks on advertisements (Clicks) and number of page impressions for those advertisements (Impressions). In the data, we also have daily observations broken down by ad payment type: pay per impression (PPI) or pay per click (PPC). The data are separately broken down by ad format: Video, Image, Flash and Text. For this reason, there are seven sets of variables.

  • Earnings, Impressions and Clicks: Daily earnings, page impressions and clicks across allads. EarningsPPI, ImpressionsPPI and ClicksPPI: Daily earnings, page impressions and clicks on pay-per-impression ads.

  • EarningsPPC, ImpressionsPPC and ClicksPPC: Daily earnings, page impressions and clickson pay-per-click ads.

  • EarningsVideo, Impressions Video and ClicksVideo: Daily earnings, page impressionsand clicks on video ads.

  • Earnings Image, Impressions Image and Clicks Image: Daily earnings, page impressionsand clicks on image ads.

  • EarningsFlash, ImpressionsFlash and ClicksFlash: Daily earnings, page impressionsand clicks on flash ads. EarningsText, ImpressionsText and ClicksText: Daily earnings, page impressions and clicks on text ads.

Summarize the data in a useful format. [-2 tables, ~2-3 figures] Using the tools we developed

in the course, present means, standard deviations, five number summaries, and useful plots to get a sense of the variation contained in the data. Remember to keep the focus on earnings, and the determinants of earnings. The more your summary statistics are focused on this goal, the better.

Single Regression Analysis [1 to several tables] Use a series of single linear regressions toexplain what determines EarnPPI and EarnPPC. Do the number of clicks relate to pay perimpression earnings? Interpret what this means in the context of the setting.

3. Multiple Regression Analysis (several tables] Consider the following statistical model of payper-impression earnings:

EarnPPI = Bo+B1ClicksPPI +B2ImprPPI+e

before estimating the multiple regression, ask what it means to introduce an additional explanatory variable:

(a) If the earnings are truly paid per impression and no other factors influenced pay-perimpression advertisement revenue, what do you expect to be the coefficient on ClicksPPI?

(b) If clicks have no effect on pay-per-impression advertisements, what would be the effect ofincluding ClicksPPI in the regression? (c) Now that you have given it some thought, estimate this regression specification using OLS,using the right standard errors.

Is ClicksPPI is statistically significant? Explain the implications for this statisticalresult.

Repeat (3) for the following statistical model of pay-per-click earnings:

EarnPPC = Bo+Bi ClicksPPC +B2ImprPPC+e

(5)In reality, PPI advertisements and PPC advertisements compete in an auction to determine whichads are placed on the YouTube channel. The advertisement with the highest bid in terms of cost per thousand impressions wins the auction. Given this fact, how would you interpret the following regression models for PPI and PPC earnings?

EarnPPI = Bo+Bi ClicksPPI+B2ImprPPI+B3ClicksPPC +34ImprPPC+e EarnPPC = Bo+Bi ClicksPPI+BzImprPPI+B3 ClicksPPC +B41mprPPC+e

(6) (7)Estimate these two regression models and interpret the output in light of a reasonable economic model that clicks and impressions determine YouTube Partner earnings.

The data also contain information on advertising format. Estimate the following statisticalmodel of total earnings:

(8)Earnings = Bo + BiImprVideo + B2Imprimage + B3ImprFlash + B4ImprText

+35Clicks Video + B6ClicksImage +B-ClicksFlash+ B8 ClicksText +e

using OLS, where Earnings is total channel earnings.

(a) What do you learn from these regression results? Which ad formats seem to pay the mostand for what? (b) For comparison, estimate the statistical relationship of Earnings to (Impressions, Clicks)within advertising format type. That is, estimate four different regressions of the form

S

EarningsType = Bo + BilmprType + B2ClicksType te

where Type E {Video, Image, Flash, Text}. Report OLS estimates in a table for easy comparison. Use your estimates to discuss how the relationship between earnings, impressions and clicks differs across advertising types.

clich

Summarize the analysis. In particular, what conclusions would you offer to this YouTube partner? YouTube reports X = carinsons and W = earnings as the payoff to the YouTube parner of impressions and clicks, respectively. How does the regression analysis of the YouTube data from this partner's earnings improve upon these simple statistics?

These assignments are solved by our professional R-Studio Project at My Assignment Services AU and the solution are high quality of work as well as 100% plagiarism free. The assignment solution was delivered within 2-3 Days.

Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.

Solution:

1. The data file is extracted in R package

2. The main objective of this study is to determine the factors that influence the Yoplait choice yogurt. The variables used in this study are feat.yoplait, feat.dannon, feat.hiland, feat.weight, price.yoplait, price.dannon, price.hiland and price.weight. The dependent variable is Choice and it has four categories, namely Yoplit, Dannon, Hiland and Weight.

The box plot is given below

3. a) Heterskeodastic refers to the condition that the errors do not have constant variance. In order to validate this assumption, residual plot is constructed and when the plot has points move in some pattern, then the assumption of homoscedasticity is violated. In this situation, standard estimation method is not effective. In our study, the residuals of dependent variable Choice do not satisfy the assumption of homoscedasticity

b) Including all the advertisement variables will result in multicollinearity. As it is known factor that, advertisement types and costs are strong related to each other, including all the variables in the model will result in multicollinearity.

c)The multivariate regression is performed to determine the factors that influence the Yoplait choice of Yogurt. Table 3 results the output of multinomial logistic regression

The regression model with standard error given in parenthesis is given below

Choice = 2.932 – 0.139 * feat.yoplait (0.344) – 0.229 * feat.dannon (0.097) – 0.299 * feat.hiland (0.1235) + 0.453 * feat.weight (0.1243) + 0.14 * price.yoplait (0.019) – 0.122 * price.dannon (0.023) – 0.189 * price.hiland (0.0296) – 0.0328 * price.weight (0.314)

The coefficient of determination is 0.086. This represents the variation in the dependent variable explained by the regression model (8.6%) and the remaining 91.4% left unexplained

d) When the conditional expectation E (Y|X,P) is linear, then it states that the entire regression model is of linear form indicating that there exists linear relationship between the dependent and the independent variables taken into consideration

e)

i. The coefficient of the dummy variable feat.yoplait is -0.139. This indicates that, when there is a presence of feat.yoplait, then the choice of Yogurt will decrease by 0.139 units, provided the other independent variables held constant

ii. Since the p – value of t test statistic is 0.755 > 0.05, we conclude that feast.yoplait is not statistically significant

iii. The 95% confidence interval for feat.yoplait is (-0.329, 0.0516). This indicates that, when repeated samples are taken from the same population, then 95 out of 100 times the true mean slope of feat.yoplait will fall within this interval

f) i. The regression model to predict choice without feat.weight variable with standard error given in parenthesis is given below

Choice = 3.233 – 0.168 * feat.yoplait (0.097) – 0.260 * feat.dannon (0.124) – 0.315 * feat.hiland (0.127) + 0.136 * price.yoplait (0.012) – 0.124 * price.dannon (0.023) – 0.178 * price.hiland (0.0296) – 0.0692 * price.weight (0.0298)

The coefficient of determination is 0.081. This represents the variation in the dependent variable explained by the regression model (8.1%) and the remaining 91.9% left unexplained

On comparing the two models, we see that there is a little bit change in the coefficient values and this change is mainly to adjust to the variable removed from the model to produce unbiased results. Since all the independent variables are jointly estimated, if we add or remove one variable from the model, it will affect the changes all the other coefficients already in the model.

g)

i. In order to determine whether the existence of a featured advertisement of another brand of yogurt relate to the probability that an individual purchases Yoplait brand yogurt, the null and alternate hypothesis is given below

Null Hypothesis: H0: ? = 0

That is, the existence of a featured advertisement of another brand of yogurt do not relate to the probability that an individual purchases Yoplait brand yogurt

Alternate Hypothesis: H0: ?? 0

That is, the existence of a featured advertisement of another brand of yogurt relate to the probability that an individual purchases Yoplait brand yogurt

ii. To test the claim, t test for significant of slope is used

iii. The value of t test statistic is -1.727 and its corresponding p – value is 0.0842 > 0.05. Since the p – value is greater than 0.05, we conclude that the existence of a featured advertisement of another brand of yogurt do not relate to the probability that an individual purchases Yoplait brand yogurt

h)

i. The residual plot is given below

4.

a) Here, we are trying to find whether advertisements and price are significant predictors of choices of Yogurt brand. We expect that all the independent variables are significant predictors of the dependent variable choices

b) Here, we see that the 100 estimates of coefficients for each id turns out to be significant predictors of choices of Yogurt brand

In order to determine whether the individual factors that affect the probability of purchasing Yoplit Yogurt, t test for significance of slope test is used

Null Hypothesis: H0: ?i = 0

That is, the individual factors do not affect the probability of purchasing Yoplit Yogurt

Alternate Hypothesis: H0: ?i ? 0

c) That is, the individual factors affect brand of yogurt relate to the probability that an individual purchases Yoplait brand yogurt

d) The value of t test statistic is -1.727 and its corresponding p – value is 0.0842 > 0.05. Since the p – value is greater than 0.05, we conclude that the existence of a featured advertisement of another brand of yogurt do not relate to the probability that an individual purchases Yoplait brand yogurt

5)

a)

In order to determine the model 3 where the interaction of id and price of Yoplait will provide significant influence to the model, F test for overall significance is used.

b) The regression model is

Choice = 3.53 – 0. 0141 * id – 0.145 * feat.yoplait (0.004) – 0. 219* feat.dannon (0.097) – 0. 295* feat.hiland (0.1235) + 0. 465* feat.weight (0.1243) + 0. 065* price.yoplait (0.019) – 0.119 * price.dannon (0.023) – 0.176 * price.hiland (0.0296) – 0.031 * price.weight (0.314) +0.0015 * id:price.yoplait (0.0004)

6)

The value of F test statistic is 24.64 and its corresponding p – value falls well below 0.05, indicating that the estimated regression model is good fit in predicting choices of Yoplait Yogurt with interactions

Find Solution for Marketing case study assignment by dropping us a mail at help@myassignmentservices.com.au along with the question’s URL. Get in Contact with our experts at My Assignment Services AU and get the solution as per your specification & University requirement.

RELATED SOLUTIONS

Order Now

Request Callback

Tap to ChatGet instant assignment help

Get 500 Words FREE