Solution Code: 1GJB
This assignment is related to ” R studio” and experts at My Assignment Services AU successfully delivered HD quality work within the given deadline.
Case Scenario/ Task
Bike sharing systems are new generation of traditional bike rentals where whole process
from membership, rental and return back has become automatic. Through these systems,
user is able to easily rent a bike from a particular position and return back at another
position. Currently, there are about over 500 bike-sharing programs around the world
which is composed of over 500 thousands bicycles. Today, there exists great interest in
these systems due to their important role in traffic, environmental and health issues.
Opposed to other transport services such as bus or subway, the duration of travel,
departure and arrival position is explicitly recorded in these systems. This feature turns
bike sharing system into a virtual sensor network that can be used for sensing mobility in
the city. It is expected that most of important events in the city could be detected via
monitoring these data.
In the finaly project, you will be analyzing the two-year historical log corresponding to
years 2011 and 2012 from Capital Bikeshare system, Washington D.C., USA. The data
set contains recourds of 17379 hourly counts of rentals. It was originally compiled by
Fanaee and Gama in “Event labeling combining ensemble detectors and background
knowledge” (2013).
You will download the data set bikeshares.csv. The data set contains
- instant: record index
- dteday : date
- season : season (1:springer, 2:summer, 3:fall, 4:winter)
- yr : year (0: 2011, 1:2012)
- mnth : month ( 1 to 12)
- hr : hour (0 to 23)
- holiday : weather day is holiday or not (extracted from https://dchr.dc.gov/page/holidayschedule)
- weekday : day of the week
- workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
- weathersit :
- 1: Clear, Few clouds, Partly cloudy, Partly cloudy
- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered
clouds
- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
- temp : Normalized temperature in Celsius. The values are divided to 41 (max)
- atemp: Normalized feeling temperature in Celsius. The values are divided to 50 (max)
- hum: Normalized humidity. The values are divided to 100 (max)
- windspeed: Normalized wind speed. The values are divided to 67 (max)
- casual: count of casual users
- registered: count of registered users
- cnt: count of total rental bikes including both casual and registered
This analysis is intentionally open ended. While you explore the data, recall the tools you
have learned in class.
These assignments are solved by our professional R studio at My Assignment Services AU and the solution are high quality of work as well as 100% plagiarism free. The assignment solution was delivered within 2-3 Days.
Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.
Bike-sharing systems allow people to rent a bicycle at one of many automatic rental stations scattered
in the city, use them for a short journey and return them at any other station in the city. The aim
of the study was to investigate factors that affect bike rentals in Capital Bikeshare. Exploratory Data
Analysis, Poisson/ Quasi-Poisson models were used to analyze the data. Data was explored by use of
frequency tables and graphical representation. To handle overdispersion in the dataset, Quasi-Poisson
and Negative Binomial models were fitted. Bike rentals are affected by weather conditions, time of the
day, month of the year, season etc.
Key words:Poisson Model, Quasi-Poisson model, Overdispersion
1 INTRODUCTION
Public bike sharing (PBS) systems are currently spreading across the globe and they have been gaining
increasing popularity in transportation plans as a strategy to multiply travel choices, promote the use
of active modes of transport, decrease dependence on automobile and especially reduce greenhouse gas
emission, (Contardo et al, 2012). Bike-sharing systems allow people to rent a bicycle at one of many
automatic rental stations scattered in the city, use them for a short journey and return them at any
other station in the city, (Raviv et al,2011).
Institute for transportation & development policy(ITDP) have stated that, more than 600 cities around
the globe have their own bike-share systems, and more programs are starting every year. The largest
systems are in China, in cities such as Hangzhou and Shanghai. In Paris, London, and Washington, D.C.,
highly successful systems have helped to promote cycling as a viable and valued transport option.
In this Study, we analyzed the two-year historical log corresponding to years 2011 and 2012 from Capital
Bikeshare system, Washington D.C., USA. Capital Bikeshare is the largest bike sharing program in
the United States. The aim of the study was to investigate factors that affect bike rentals in Capital
Bikeshare. An Exploratory Data Analysis(EDA) was done to get insight and a better understanding of
the dataset. This was done by use of frequency tables and graphical representations.
1.1 Data description
The data set contains records of 17379 hourly counts of rentals. It was originally compiled by Fanaee
and Gama in ”Event labeling combining ensemble detectors and background knowledge” (2013). The
variables used are described as follows;
. instant: record index
. dteday : date
. season : season (1:springer, 2:summer,
3:fall, 4:winter)
. yr : year (0: 2011, 1:2012)
. mnth : month ( 1 to 12)
. hr : hour (0 to 23)
. holiday : weather day is holiday
or not (extracted from
https://dchr.dc.gov/page/holidayschedule)
. weekday : day of the week
. workingday : if day is neither weekend
nor holiday is 1, otherwise is 0.
. hum: Normalized humidity. The values
are divided to 100 (max)
. windspeed: Normalized wind speed. The
values are divided to 67 (max)
. weathersit :
1: Clear, Few clouds, Partly cloudy, Partly
cloudy
2: Mist + Cloudy, Mist + Broken clouds,
Mist + Few clouds, Mist
3: Light Snow, Light Rain + Thunderstorm
+ Scattered clouds, Light Rain + Scattered
clouds
4: Heavy Rain + Ice Pallets + Thunderstorm
+ Mist, Snow + Fog
. temp : Normalized temperature in Celsius.
The values are divided to 41 (max)
. atemp: Normalized feeling temperature in Celsius.
The values are divided to 50 (max)
. casual: count of casual users
. registered: count of registered users
. cnt: count of total rental bikes including both
casual and registered
The response variable used was count of total rental bikes including both casual and registered(cnt).
3
1.2 METHODOLOGY
A Poisson distribution was assumed in this study since the response variable was a count data. The
Poisson distribution is used for counts of events that occur randomly over time or space, when outcomes
in disjoint periods or regions are independent.The mean and variance of Poisson distribution are the
same,(Agresti, 2012). The Poisson distribution has positive mean µ. It is more common to model the
log mean as it can take any real value. A Poisson loglinear model with explanatory variables is given
by
cnti ? Poisson(µi)
log(µi) = mnthik+seosonij+holidayi+hrir+workingdayi+humi+windspeedi+weathersitim+tempi+atempi
Where; k=1,..., 12 j=1,2,..., 4 r= 0,.., 23 i=1,2,..., 17379
For count data, Poisson assumption is often unrealistic because of overdispersion-the variance exceeds the
mean,(Agresti, 2012). One can use Quasi-Poisson or Negative binomial models to handle over-dispersion.
Quasi-Poisson model estimates a scale parameter as well, and also fixes the estimated standard error.
It uses quasi-likelihood estimation which assumes only mean-variance relationship rather than a specific
distribution of response variable. Negative binomial model contains an extra parameter ?, which is the
parameter of multiplicative random effect. It permits µ to depend on explanatory variables,(Agresti,
2012). All the models were fitted and compared using AIC.
R software version 3.1.1 was used to analyze the data.
2 RESULTS AND DISCUSSION
2.1 Exploratory Data Analysis
The mean and variance of response variable (cnt) were not the same. The mean and variance values were
189.4631 and 32901.46, respectively.This implies there is over-dispersion i.e there is greater variability in
count data. To observe how the response variable is distributed, an histogram of cnt was plotted. We
observe a very positive skewed distribution, with largest observed value equal to 977.
From Table 1, it can be observed that, during working days there were many bike rentals compared to
non-working days. There were many bike rental during summer(season 2) and fall(season 3) and fewer
during winter. From the table it can be seen that during, heavy Rain + Ice Pallets + Thunderstorm +
Mist, Snow + Fog(weather condition 4) only 3 bike rentals were made over the two years. This implies
that very few bike rental were made during this weather condition. There is also a decreasing trend of
bike rentals when weather conditions grow worse.
Data was also explored by use of scatter plots and box plots. Figure 2 below shows scatter plots of
number of bike rentals against atemp, temp and wind speed, respectively. A random sample of cnt was
selected in order to make these plots. It can be observed that many people tend to rent bikes when its
warmer and wind is calm.
It also implies that on overage, there were more bike rentals in 2012 as compared to 2011. Box plots in
Figure 3 shows different times when bikes were rented. It can be seen clearly that on average, in each
day there was a peak at around 8 a.m and in the evening at around 5 pm - 6 pm. On average in each
year, renting of bikes decrease from month of December- February(winter season) and increases from
March.
2.2 Poisson regression
Dummy variables were created for all categorical explanatory variables. With Poisson regression model,
all variables included in the model were significant. When all the other variables are held constant, the
average number of rented bikes in February(mnth 2) was about 1.089(exp(0.0855)) times or increased by
8% compared to January(mnth, reference variable). In addition, the average number of rented bikes at
8 a.m was about 6.8093(exp(1.9183)) times compared to 12 midnight.
The deviance 744445.1 and degrees of freedom 17378 of Poisson regression model suggested that the
model didnt fit the data well. This was in line with what we observed in EDA that there was overdispersion.
To handle over-dispersion, Quasi-Poisson and Negative binomial models were fitted. Table
2 gives parameter estimates of Poisson/Quasi-Poisson regression and Negative binomial model. The
standard errors for Quasi-poisson regression model are larger than those of Poisson regression models.
The dispersion parameter of Quasi-Poisson regression model was 43.4224 which implies that there is
indeed over dispersion in the data set and Poisson model underestimated the standard errors. The AIC
of Negative binomial model(value= 191 283) was smaller than that of Poisson regression model( value=
7
855 440). The deviance of Negative binomial model was also much smaller than that of Poisson regression.
This implies that Negative binomial improves Poisson model and deals with overdispersion quite well.
The estimated dispersion parameter for Negative Binomial was 3.1038 indicating that Negative binomial
is more appropriate than Poisson regression model for this data set.
3 CONCLUSION and RECOMMENDATION
The study aimed to find whether there are factors that affect bike rental patterns in Capital Bikeshare
system. From EDA it was found that, season, year,month, hour, holiday, working day and weathers
conditions affects bike rental patterns.
There was over-dispersion in dataset. This was observed both in EDA and fitting Poisson regression
model. To handle Over-dispersion, Quasi- Poisson and Negative binomial models were fitted. It was
found that Negative Binomial did improve Poisson regression model. The standard errors in QuasiPoisson
model were larger than in Poisson regression model. This implies that that there was over
dispersion in the data set and Poisson model underestimated the standard errors. All variables were
scientifically significant, implying that the variables can be used to predict bike rental patterns.
In this study, it was found that there was heterogeneity among variables used. Other methods that
account for heterogeneity can be explored.This was not done due to limited time. For further studies,
interactions between explanatory variables can be included in the analyzes.
Find Solution for R studio by dropping us a mail at help@myassignmentservices.com.au along with the question’s URL. Get in Contact with our experts at My Assignment Services AU and get the solution as per your specification & University requirement.
Trending now
The Student Corner
Subscribe to get updates, offers and assignment tips right in your inbox.
Popular Solutions
Popular Solutions
Request Callback
Doing your Assignment with our resources is simple, take Expert assistance to ensure HD Grades. Here you Go....