This is a course project for the Getting and Cleaning data course on Coursera.
The goal of this project is to prepare tidy [2] data that can be used for later analysis.
The source data for this project is the UCI HAR Dataset [1].
Create one R script called run_analysis.R
that does the following.
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive variable names.
- From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
run_analysis.R
- R script that does the data processing and generatesdata/summary.txt
fileCodeBook.md
- Markdown file describing the generated data filedata/
- directory where the R script downloads, unzips and stores the original datasetdata/summary.txt
- the tidy output data file generated by therun_analysis.R
script
- Download the original dataset and unzip it into
data/
directory - Prepare variables names
- Load features names from
features.txt
file - Process the features names so they could be used as variables names
- Create vector which can be used to select only the the variables for measurements on the mean and standard deviation
- Load features names from
- Load the sensor measurements data
- Read the
X_test.txt
andX_train.txt
files - Read only the variables for measurements on the mean and standard deviation
- Apply the descriptive column names
- Read the
- Load the activity type data
- Read the
y_test.txt
andy_train.txt
files
- Read the
- Load the subject data
- Read the
subject_test.txt
andsubject_train.txt
- Read the
- Create tables with complete test and train datasets
- Combine the subject, activity and sensor measurements data by columns for train data
- Combine the subject, activity and sensor measurements data by columns for test data
- Combine the test and train data by rows to get complete dataset
- Set descriptive activity names in the combined data set
- Load the
activity_labels.txt
- Apply the activity names as activity variable levels
- Load the
- Generate data summary
- Group the dataset by subject and activity
- Summarise all sensor measurements variables using average value
- Save the generated data summary in
data/summary.txt
file
[1] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. A Public Domain Dataset for Human Activity Recognition Using Smartphones. 21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium 24-26 April 2013.
[2] Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23.