Skip to content

atw/getting-and-cleaning-data-course-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Getting and Cleaning Data Course Project

Introduction

This is a course project for the Getting and Cleaning data course on Coursera.

The goal of this project is to prepare tidy [2] data that can be used for later analysis.

The source data for this project is the UCI HAR Dataset [1].

Project requirements

Create one R script called run_analysis.R that does the following.

  1. Merges the training and the test sets to create one data set.
  2. Extracts only the measurements on the mean and standard deviation for each measurement.
  3. Uses descriptive activity names to name the activities in the data set
  4. Appropriately labels the data set with descriptive variable names.
  5. From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

Project structure

  • run_analysis.R - R script that does the data processing and generates data/summary.txt file
  • CodeBook.md - Markdown file describing the generated data file
  • data/ - directory where the R script downloads, unzips and stores the original dataset
  • data/summary.txt - the tidy output data file generated by the run_analysis.R script

Requirements

Processing steps

  1. Download the original dataset and unzip it into data/ directory
  2. Prepare variables names
    • Load features names from features.txt file
    • Process the features names so they could be used as variables names
    • Create vector which can be used to select only the the variables for measurements on the mean and standard deviation
  3. Load the sensor measurements data
    • Read the X_test.txt and X_train.txt files
    • Read only the variables for measurements on the mean and standard deviation
    • Apply the descriptive column names
  4. Load the activity type data
    • Read the y_test.txt and y_train.txt files
  5. Load the subject data
    • Read the subject_test.txt and subject_train.txt
  6. Create tables with complete test and train datasets
    • Combine the subject, activity and sensor measurements data by columns for train data
    • Combine the subject, activity and sensor measurements data by columns for test data
  7. Combine the test and train data by rows to get complete dataset
  8. Set descriptive activity names in the combined data set
    • Load the activity_labels.txt
    • Apply the activity names as activity variable levels
  9. Generate data summary
    • Group the dataset by subject and activity
    • Summarise all sensor measurements variables using average value
  10. Save the generated data summary in data/summary.txt file

[1] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. A Public Domain Dataset for Human Activity Recognition Using Smartphones. 21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium 24-26 April 2013.

[2] Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23.

About

Getting and Cleaning Data Course Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages