21 Datasets
This section provides information about the datasets used throughout the course. These datasets come from various R packages and are commonly used in statistical analysis and data science education.
21.1 Built-in R Datasets
21.1.1 mtcars - Motor Trend Car Road Tests
Package: datasets
Description: Data extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
Variables:
- mpg: Miles/(US) gallon
- cyl: Number of cylinders
- disp: Displacement (cu.in.)
- hp: Gross horsepower
- drat: Rear axle ratio
- wt: Weight (1000 lbs)
- qsec: 1/4 mile time
- vs: Engine (0 = V-shaped, 1 = straight)
- am: Transmission (0 = automatic, 1 = manual)
- gear: Number of forward gears
- carb: Number of carburetors
21.1.2 iris - Edgar Anderson’s Iris Data
Package: datasets
Description: Famous dataset giving the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris.
Variables:
- Sepal.Length: Sepal length in cm
- Sepal.Width: Sepal width in cm
- Petal.Length: Petal length in cm
- Petal.Width: Petal width in cm
- Species: Species of iris (setosa, versicolor, virginica)
21.2 Course-Specific Datasets
21.2.2 stroke_data.csv - Stroke Prediction Data
File: data/stroke_data.csv
Description: Dataset for predicting stroke occurrence based on various health indicators.
Variables:
- id: Unique identifier
- gender: Gender
- age: Age
- hypertension: Hypertension (0: No, 1: Yes)
- heart_disease: Heart disease (0: No, 1: Yes)
- ever_married: Ever married (No, Yes)
- work_type: Type of work
- Residence_type: Residence type (Rural, Urban)
- avg_glucose_level: Average glucose level
- bmi: Body mass index
- smoking_status: Smoking status
- stroke: Stroke (0: No, 1: Yes)
21.4 Dataset Usage in Course
- mtcars: Used for linear regression examples and correlation analysis
- iris: Used for classification, clustering, and ANOVA examples
- cars: Used for simple linear regression demonstrations
- bikeshare: Used for time series analysis and multiple regression
- stroke_data: Used for logistic regression and classification examples
- coronary: Used for survival analysis examples