Getting and Cleaning Data Course Project

Introduction

This is a course project for the Getting and Cleaning data course on Coursera.

The goal of this project is to prepare tidy [2] data that can be used for later analysis.

The source data for this project is the UCI HAR Dataset [1].

Project requirements

Create one R script called run_analysis.R that does the following.

Merges the training and the test sets to create one data set.
Extracts only the measurements on the mean and standard deviation for each measurement.
Uses descriptive activity names to name the activities in the data set
Appropriately labels the data set with descriptive variable names.
From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

Project structure

run_analysis.R - R script that does the data processing and generates data/summary.txt file
CodeBook.md - Markdown file describing the generated data file
data/ - directory where the R script downloads, unzips and stores the original dataset
data/summary.txt - the tidy output data file generated by the run_analysis.R script

Requirements

R
plyr package

Processing steps

Download the original dataset and unzip it into data/ directory
Prepare variables names
- Load features names from features.txt file
- Process the features names so they could be used as variables names
- Create vector which can be used to select only the the variables for measurements on the mean and standard deviation
Load the sensor measurements data
- Read the X_test.txt and X_train.txt files
- Read only the variables for measurements on the mean and standard deviation
- Apply the descriptive column names
Load the activity type data
- Read the y_test.txt and y_train.txt files
Load the subject data
- Read the subject_test.txt and subject_train.txt
Create tables with complete test and train datasets
- Combine the subject, activity and sensor measurements data by columns for train data
- Combine the subject, activity and sensor measurements data by columns for test data
Combine the test and train data by rows to get complete dataset
Set descriptive activity names in the combined data set
- Load the activity_labels.txt
- Apply the activity names as activity variable levels
Generate data summary
- Group the dataset by subject and activity
- Summarise all sensor measurements variables using average value
Save the generated data summary in data/summary.txt file

[1] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. A Public Domain Dataset for Human Activity Recognition Using Smartphones. 21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium 24-26 April 2013.

[2] Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
CodeBook.md		CodeBook.md
Readme.md		Readme.md
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Getting and Cleaning Data Course Project

Introduction

Project requirements

Project structure

Requirements

Processing steps

About

Uh oh!

Releases

Packages

Languages

atw/getting-and-cleaning-data-course-project

Folders and files

Latest commit

History

Repository files navigation

Getting and Cleaning Data Course Project

Introduction

Project requirements

Project structure

Requirements

Processing steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages