From db4659dae693fae416a6a5ef0594ca1e84267831 Mon Sep 17 00:00:00 2001 From: moonlanderr Date: Mon, 13 Jan 2025 16:20:55 +0530 Subject: [PATCH 1/6] classification of human activity using tabPFN --- ...an_activity_using _tabPFN_classifier.ipynb | 2168 +++++++++++++++++ 1 file changed, 2168 insertions(+) create mode 100644 samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb diff --git a/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb b/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb new file mode 100644 index 0000000000..868f3879da --- /dev/null +++ b/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb @@ -0,0 +1,2168 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Leveraging TabPFN for Human Activity Recognition Using Mobile dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Table of Contents \n", + "* [Introduction](#1) \n", + "* [Necessary imports](#2)\n", + "* [Connecting to ArcGIS](#3)\n", + "* [Accessing the datasets](#4) \n", + "* [Prepare training data for TabFPN](#5)\n", + " * [Data Preprocessing for tabFPN Classifier Model](#6) \n", + " * [Visualize training data](#9)\n", + "* [Model Training](#10) \n", + " * [Define the tabFPN classifier model ](#11)\n", + " * [Fit the model](#12)\n", + " * [Visualize results in validation set](#13)\n", + "* [Predicting using tabFPN classifier model](#14)\n", + " * [Predict using the trained model](#15)\n", + "* [Accuracy assessment: Compute Model Metric](#16)\n", + "* [Conclusion](#17)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Introduction " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Human Activity Recognition (HAR) using mobile data has become an important area of research and application due to the increasing ubiquity of smartphones, wearables, and other mobile devices that can collect a wealth of sensor data. HAR is a crucial task in various fields, including healthcare, fitness, workplace safety, and smart cities, where the goal is to classify human activities (e.g., walking, running, sitting) based on sensor data. Traditional methods for HAR often require substantial computational resources and complex hyperparameter tuning, making them difficult to deploy in real-time applications. TabPFN (Tabular Prior-Data Fitted Network), a Transformer-based model designed for fast and efficient classification of small tabular datasets, offers a promising solution to overcome these challenges.\n", + "\n", + "TabPFN’s advantages are particularly well-suited for various HAR use cases. In healthcare, it aids in fall detection for the elderly, chronic disease monitoring, providing timely interventions. For fitness and wellness, it can classify activities such as walking or running in real-time, enhancing user experience in mobile apps and wearable devices. It enhances workplace safety by identifying risky workers activities in hazardous industrial environments such as mining, oil rigs ensuring safety and reducing accidents. Furthermore, in case of smart cities and urban mobility, HAR data from pedestrians and commuters can be efficiently classified to optimize traffic flow, public transport systems, and urban planning initiatives. Additionally, HAR supports emergency response efforts during disasters by locating people in need of help. Thus TabPFN's speed, simplicity, and effectiveness make it an ideal choice for these real-time HAR applications." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Necessary imports " + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: total: 0 ns\n", + "Wall time: 1.01 ms\n" + ] + } + ], + "source": [ + "%%time\n", + "\n", + "%matplotlib inline\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import pandas as pd\n", + "from sklearn.discriminant_analysis import LinearDiscriminantAnalysis\n", + "from sklearn.preprocessing import StandardScaler\n", + "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report\n", + "\n", + "import arcgis\n", + "from arcgis.gis import GIS\n", + "from arcgis.learn import MLModel, prepare_tabulardata" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Connecting to ArcGIS " + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "gis = GIS(\"/home\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Accessing the dataset \n", + "\n", + "The HAR training dataset consists of 1,020 rows and 561 features, capturing sensor data from mobile devices to classify human activities like walking, running, and sitting. The data includes measurements from accelerometers, gyroscopes, and GPS, providing insights into movement patterns while ensuring that location data remains anonymized for privacy protection. Features such as BodyAcc (body accelerometer), GravityAcc (gravity accelerometer), BodyAccJerk, BodyGyro (body gyroscope), and BodyGyroJerk are used to capture dynamic and rotational movements. Time-domain and frequency-domain features are extracted from these raw signals, helping to distinguish between various activities based on patterns in acceleration, rotation, and speed, making the dataset ideal for activity classification tasks." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + " train_har_dataset\n", + " \n", + "
HAR dataset
CSV by api_data_owner\n", + "
Last Modified: January 10, 2025\n", + "
0 comments, 3 views\n", + "
\n", + "
\n", + " " + ], + "text/plain": [ + "" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# access the training data\n", + "data_table = gis.content.get('1fafacc88bc3491696f981758a72de50')\n", + "data_table" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "# Download the train datas and save ing it in local folder\n", + "data_path = data_table.get_data()" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
tBodyAcc-mean()-XtBodyAcc-mean()-YtBodyAcc-mean()-ZtBodyAcc-std()-XtBodyAcc-std()-YtBodyAcc-std()-ZtBodyAcc-mad()-XtBodyAcc-mad()-YtBodyAcc-mad()-ZtBodyAcc-max()-X...fBodyBodyGyroJerkMag-kurtosis()angle(tBodyAccMean,gravity)angle(tBodyAccJerkMean),gravityMean)angle(tBodyGyroMean,gravityMean)angle(tBodyGyroJerkMean,gravityMean)angle(X,gravityMean)angle(Y,gravityMean)angle(Z,gravityMean)subjectActivity
00.271144-0.033031-0.121829-0.987884-0.867081-0.945087-0.991858-0.906651-0.943951-0.909793...-0.6859320.0076290.068842-0.762768-0.751408-0.7866470.234417-0.04034522STANDING
10.278211-0.020855-0.103400-0.996593-0.980402-0.988998-0.997065-0.977596-0.988861-0.941245...-0.7719010.0017270.295551-0.035877-0.360496-0.6614640.2212400.22332317STANDING
20.276012-0.015713-0.103117-0.982340-0.834824-0.973649-0.986465-0.862017-0.976193-0.914101...-0.2064140.1273910.0285810.0533580.637500-0.8267210.212775-0.01828025STANDING
30.272753-0.016910-0.101737-0.997409-0.996203-0.983416-0.997425-0.996439-0.984400-0.944557...-0.8947350.0964690.3193190.2293980.267721-0.6720920.195500-0.19452716SITTING
40.275565-0.014967-0.107715-0.995365-0.988601-0.988218-0.995880-0.989519-0.986747-0.937645...-0.8882580.1523360.217308-0.3776480.733588-0.7495990.048129-0.15665411SITTING
\n", + "

5 rows × 563 columns

\n", + "
" + ], + "text/plain": [ + " tBodyAcc-mean()-X tBodyAcc-mean()-Y tBodyAcc-mean()-Z tBodyAcc-std()-X \\\n", + "0 0.271144 -0.033031 -0.121829 -0.987884 \n", + "1 0.278211 -0.020855 -0.103400 -0.996593 \n", + "2 0.276012 -0.015713 -0.103117 -0.982340 \n", + "3 0.272753 -0.016910 -0.101737 -0.997409 \n", + "4 0.275565 -0.014967 -0.107715 -0.995365 \n", + "\n", + " tBodyAcc-std()-Y tBodyAcc-std()-Z tBodyAcc-mad()-X tBodyAcc-mad()-Y \\\n", + "0 -0.867081 -0.945087 -0.991858 -0.906651 \n", + "1 -0.980402 -0.988998 -0.997065 -0.977596 \n", + "2 -0.834824 -0.973649 -0.986465 -0.862017 \n", + "3 -0.996203 -0.983416 -0.997425 -0.996439 \n", + "4 -0.988601 -0.988218 -0.995880 -0.989519 \n", + "\n", + " tBodyAcc-mad()-Z tBodyAcc-max()-X ... fBodyBodyGyroJerkMag-kurtosis() \\\n", + "0 -0.943951 -0.909793 ... -0.685932 \n", + "1 -0.988861 -0.941245 ... -0.771901 \n", + "2 -0.976193 -0.914101 ... -0.206414 \n", + "3 -0.984400 -0.944557 ... -0.894735 \n", + "4 -0.986747 -0.937645 ... -0.888258 \n", + "\n", + " angle(tBodyAccMean,gravity) angle(tBodyAccJerkMean),gravityMean) \\\n", + "0 0.007629 0.068842 \n", + "1 0.001727 0.295551 \n", + "2 0.127391 0.028581 \n", + "3 0.096469 0.319319 \n", + "4 0.152336 0.217308 \n", + "\n", + " angle(tBodyGyroMean,gravityMean) angle(tBodyGyroJerkMean,gravityMean) \\\n", + "0 -0.762768 -0.751408 \n", + "1 -0.035877 -0.360496 \n", + "2 0.053358 0.637500 \n", + "3 0.229398 0.267721 \n", + "4 -0.377648 0.733588 \n", + "\n", + " angle(X,gravityMean) angle(Y,gravityMean) angle(Z,gravityMean) subject \\\n", + "0 -0.786647 0.234417 -0.040345 22 \n", + "1 -0.661464 0.221240 0.223323 17 \n", + "2 -0.826721 0.212775 -0.018280 25 \n", + "3 -0.672092 0.195500 -0.194527 16 \n", + "4 -0.749599 0.048129 -0.156654 11 \n", + "\n", + " Activity \n", + "0 STANDING \n", + "1 STANDING \n", + "2 STANDING \n", + "3 SITTING \n", + "4 SITTING \n", + "\n", + "[5 rows x 563 columns]" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Read the donwloaded data\n", + "train_har_data = pd.read_csv(data_path)\n", + "train_har_data.head(5)" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(1020, 563)" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_har_data.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we will access the test dataset, which is a larger dataset containing 6,332 samples. " + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + " test_har_dataset\n", + " \n", + "
HAR dataset
CSV by api_data_owner\n", + "
Last Modified: January 10, 2025\n", + "
0 comments, 0 views\n", + "
\n", + "
\n", + " " + ], + "text/plain": [ + "" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# access the test data\n", + "test_data_table = gis.content.get('e65312babe5b4efbaa2842235b79f653')\n", + "test_data_table" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [], + "source": [ + "# Download the test data and save it in local folder\n", + "test_data_path = test_data_table.get_data()" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
tBodyAcc-mean()-XtBodyAcc-mean()-YtBodyAcc-mean()-ZtBodyAcc-std()-XtBodyAcc-std()-YtBodyAcc-std()-ZtBodyAcc-mad()-XtBodyAcc-mad()-YtBodyAcc-mad()-ZtBodyAcc-max()-X...fBodyBodyGyroJerkMag-kurtosis()angle(tBodyAccMean,gravity)angle(tBodyAccJerkMean),gravityMean)angle(tBodyGyroMean,gravityMean)angle(tBodyGyroJerkMean,gravityMean)angle(X,gravityMean)angle(Y,gravityMean)angle(Z,gravityMean)subjectActivity
00.288585-0.020294-0.132905-0.995279-0.983111-0.913526-0.995112-0.983185-0.923527-0.934724...-0.710304-0.1127540.030400-0.464761-0.018446-0.8412470.179941-0.0586271STANDING
10.278419-0.016411-0.123520-0.998245-0.975300-0.960322-0.998807-0.974914-0.957686-0.943068...-0.8614990.053477-0.007435-0.7326260.703511-0.8447880.180289-0.0543171STANDING
20.276629-0.016570-0.115362-0.998139-0.980817-0.990482-0.998321-0.979672-0.990441-0.942469...-0.6992050.1233200.1225420.693578-0.615971-0.8478650.185151-0.0438921STANDING
30.277293-0.021751-0.120751-0.997328-0.961245-0.983672-0.997596-0.957236-0.984379-0.940598...-0.5729950.0129540.080936-0.2343130.117797-0.8479710.188982-0.0373641STANDING
40.277175-0.014713-0.106756-0.999188-0.990526-0.993365-0.999211-0.990687-0.992168-0.943323...-0.7659010.105620-0.090278-0.1324030.498814-0.8497730.188812-0.0350631STANDING
\n", + "

5 rows × 563 columns

\n", + "
" + ], + "text/plain": [ + " tBodyAcc-mean()-X tBodyAcc-mean()-Y tBodyAcc-mean()-Z tBodyAcc-std()-X \\\n", + "0 0.288585 -0.020294 -0.132905 -0.995279 \n", + "1 0.278419 -0.016411 -0.123520 -0.998245 \n", + "2 0.276629 -0.016570 -0.115362 -0.998139 \n", + "3 0.277293 -0.021751 -0.120751 -0.997328 \n", + "4 0.277175 -0.014713 -0.106756 -0.999188 \n", + "\n", + " tBodyAcc-std()-Y tBodyAcc-std()-Z tBodyAcc-mad()-X tBodyAcc-mad()-Y \\\n", + "0 -0.983111 -0.913526 -0.995112 -0.983185 \n", + "1 -0.975300 -0.960322 -0.998807 -0.974914 \n", + "2 -0.980817 -0.990482 -0.998321 -0.979672 \n", + "3 -0.961245 -0.983672 -0.997596 -0.957236 \n", + "4 -0.990526 -0.993365 -0.999211 -0.990687 \n", + "\n", + " tBodyAcc-mad()-Z tBodyAcc-max()-X ... fBodyBodyGyroJerkMag-kurtosis() \\\n", + "0 -0.923527 -0.934724 ... -0.710304 \n", + "1 -0.957686 -0.943068 ... -0.861499 \n", + "2 -0.990441 -0.942469 ... -0.699205 \n", + "3 -0.984379 -0.940598 ... -0.572995 \n", + "4 -0.992168 -0.943323 ... -0.765901 \n", + "\n", + " angle(tBodyAccMean,gravity) angle(tBodyAccJerkMean),gravityMean) \\\n", + "0 -0.112754 0.030400 \n", + "1 0.053477 -0.007435 \n", + "2 0.123320 0.122542 \n", + "3 0.012954 0.080936 \n", + "4 0.105620 -0.090278 \n", + "\n", + " angle(tBodyGyroMean,gravityMean) angle(tBodyGyroJerkMean,gravityMean) \\\n", + "0 -0.464761 -0.018446 \n", + "1 -0.732626 0.703511 \n", + "2 0.693578 -0.615971 \n", + "3 -0.234313 0.117797 \n", + "4 -0.132403 0.498814 \n", + "\n", + " angle(X,gravityMean) angle(Y,gravityMean) angle(Z,gravityMean) subject \\\n", + "0 -0.841247 0.179941 -0.058627 1 \n", + "1 -0.844788 0.180289 -0.054317 1 \n", + "2 -0.847865 0.185151 -0.043892 1 \n", + "3 -0.847971 0.188982 -0.037364 1 \n", + "4 -0.849773 0.188812 -0.035063 1 \n", + "\n", + " Activity \n", + "0 STANDING \n", + "1 STANDING \n", + "2 STANDING \n", + "3 STANDING \n", + "4 STANDING \n", + "\n", + "[5 rows x 563 columns]" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# read the test data\n", + "test_har_data = pd.read_csv(test_data_path).drop([\"Unnamed: 0\"], axis=1)\n", + "test_har_data.head(5)" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(6332, 563)" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "test_har_data.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prepare training data for TabFPN " + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "# View column names except the columns - 'subject','Activity'\n", + "ls = list(train_har_data.columns)\n", + "X = [item for item in ls if item not in ['subject','Activity']]" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['tBodyAcc-mean()-X',\n", + " 'tBodyAcc-mean()-Y',\n", + " 'tBodyAcc-mean()-Z',\n", + " 'tBodyAcc-std()-X',\n", + " 'tBodyAcc-std()-Y',\n", + " 'tBodyAcc-std()-Z',\n", + " 'tBodyAcc-mad()-X',\n", + " 'tBodyAcc-mad()-Y',\n", + " 'tBodyAcc-mad()-Z',\n", + " 'tBodyAcc-max()-X',\n", + " 'tBodyAcc-max()-Y',\n", + " 'tBodyAcc-max()-Z',\n", + " 'tBodyAcc-min()-X',\n", + " 'tBodyAcc-min()-Y',\n", + " 'tBodyAcc-min()-Z',\n", + " 'tBodyAcc-sma()',\n", + " 'tBodyAcc-energy()-X',\n", + " 'tBodyAcc-energy()-Y',\n", + " 'tBodyAcc-energy()-Z',\n", + " 'tBodyAcc-iqr()-X',\n", + " 'tBodyAcc-iqr()-Y',\n", + " 'tBodyAcc-iqr()-Z',\n", + " 'tBodyAcc-entropy()-X',\n", + " 'tBodyAcc-entropy()-Y',\n", + " 'tBodyAcc-entropy()-Z',\n", + " 'tBodyAcc-arCoeff()-X,1',\n", + " 'tBodyAcc-arCoeff()-X,2',\n", + " 'tBodyAcc-arCoeff()-X,3',\n", + " 'tBodyAcc-arCoeff()-X,4',\n", + " 'tBodyAcc-arCoeff()-Y,1',\n", + " 'tBodyAcc-arCoeff()-Y,2',\n", + " 'tBodyAcc-arCoeff()-Y,3',\n", + " 'tBodyAcc-arCoeff()-Y,4',\n", + " 'tBodyAcc-arCoeff()-Z,1',\n", + " 'tBodyAcc-arCoeff()-Z,2',\n", + " 'tBodyAcc-arCoeff()-Z,3',\n", + " 'tBodyAcc-arCoeff()-Z,4',\n", + " 'tBodyAcc-correlation()-X,Y',\n", + " 'tBodyAcc-correlation()-X,Z',\n", + " 'tBodyAcc-correlation()-Y,Z',\n", + " 'tGravityAcc-mean()-X',\n", + " 'tGravityAcc-mean()-Y',\n", + " 'tGravityAcc-mean()-Z',\n", + " 'tGravityAcc-std()-X',\n", + " 'tGravityAcc-std()-Y',\n", + " 'tGravityAcc-std()-Z',\n", + " 'tGravityAcc-mad()-X',\n", + " 'tGravityAcc-mad()-Y',\n", + " 'tGravityAcc-mad()-Z',\n", + " 'tGravityAcc-max()-X',\n", + " 'tGravityAcc-max()-Y',\n", + " 'tGravityAcc-max()-Z',\n", + " 'tGravityAcc-min()-X',\n", + " 'tGravityAcc-min()-Y',\n", + " 'tGravityAcc-min()-Z',\n", + " 'tGravityAcc-sma()',\n", + " 'tGravityAcc-energy()-X',\n", + " 'tGravityAcc-energy()-Y',\n", + " 'tGravityAcc-energy()-Z',\n", + " 'tGravityAcc-iqr()-X',\n", + " 'tGravityAcc-iqr()-Y',\n", + " 'tGravityAcc-iqr()-Z',\n", + " 'tGravityAcc-entropy()-X',\n", + " 'tGravityAcc-entropy()-Y',\n", + " 'tGravityAcc-entropy()-Z',\n", + " 'tGravityAcc-arCoeff()-X,1',\n", + " 'tGravityAcc-arCoeff()-X,2',\n", + " 'tGravityAcc-arCoeff()-X,3',\n", + " 'tGravityAcc-arCoeff()-X,4',\n", + " 'tGravityAcc-arCoeff()-Y,1',\n", + " 'tGravityAcc-arCoeff()-Y,2',\n", + " 'tGravityAcc-arCoeff()-Y,3',\n", + " 'tGravityAcc-arCoeff()-Y,4',\n", + " 'tGravityAcc-arCoeff()-Z,1',\n", + " 'tGravityAcc-arCoeff()-Z,2',\n", + " 'tGravityAcc-arCoeff()-Z,3',\n", + " 'tGravityAcc-arCoeff()-Z,4',\n", + " 'tGravityAcc-correlation()-X,Y',\n", + " 'tGravityAcc-correlation()-X,Z',\n", + " 'tGravityAcc-correlation()-Y,Z',\n", + " 'tBodyAccJerk-mean()-X',\n", + " 'tBodyAccJerk-mean()-Y',\n", + " 'tBodyAccJerk-mean()-Z',\n", + " 'tBodyAccJerk-std()-X',\n", + " 'tBodyAccJerk-std()-Y',\n", + " 'tBodyAccJerk-std()-Z',\n", + " 'tBodyAccJerk-mad()-X',\n", + " 'tBodyAccJerk-mad()-Y',\n", + " 'tBodyAccJerk-mad()-Z',\n", + " 'tBodyAccJerk-max()-X',\n", + " 'tBodyAccJerk-max()-Y',\n", + " 'tBodyAccJerk-max()-Z',\n", + " 'tBodyAccJerk-min()-X',\n", + " 'tBodyAccJerk-min()-Y',\n", + " 'tBodyAccJerk-min()-Z',\n", + " 'tBodyAccJerk-sma()',\n", + " 'tBodyAccJerk-energy()-X',\n", + " 'tBodyAccJerk-energy()-Y',\n", + " 'tBodyAccJerk-energy()-Z',\n", + " 'tBodyAccJerk-iqr()-X',\n", + " 'tBodyAccJerk-iqr()-Y',\n", + " 'tBodyAccJerk-iqr()-Z',\n", + " 'tBodyAccJerk-entropy()-X',\n", + " 'tBodyAccJerk-entropy()-Y',\n", + " 'tBodyAccJerk-entropy()-Z',\n", + " 'tBodyAccJerk-arCoeff()-X,1',\n", + " 'tBodyAccJerk-arCoeff()-X,2',\n", + " 'tBodyAccJerk-arCoeff()-X,3',\n", + " 'tBodyAccJerk-arCoeff()-X,4',\n", + " 'tBodyAccJerk-arCoeff()-Y,1',\n", + " 'tBodyAccJerk-arCoeff()-Y,2',\n", + " 'tBodyAccJerk-arCoeff()-Y,3',\n", + " 'tBodyAccJerk-arCoeff()-Y,4',\n", + " 'tBodyAccJerk-arCoeff()-Z,1',\n", + " 'tBodyAccJerk-arCoeff()-Z,2',\n", + " 'tBodyAccJerk-arCoeff()-Z,3',\n", + " 'tBodyAccJerk-arCoeff()-Z,4',\n", + " 'tBodyAccJerk-correlation()-X,Y',\n", + " 'tBodyAccJerk-correlation()-X,Z',\n", + " 'tBodyAccJerk-correlation()-Y,Z',\n", + " 'tBodyGyro-mean()-X',\n", + " 'tBodyGyro-mean()-Y',\n", + " 'tBodyGyro-mean()-Z',\n", + " 'tBodyGyro-std()-X',\n", + " 'tBodyGyro-std()-Y',\n", + " 'tBodyGyro-std()-Z',\n", + " 'tBodyGyro-mad()-X',\n", + " 'tBodyGyro-mad()-Y',\n", + " 'tBodyGyro-mad()-Z',\n", + " 'tBodyGyro-max()-X',\n", + " 'tBodyGyro-max()-Y',\n", + " 'tBodyGyro-max()-Z',\n", + " 'tBodyGyro-min()-X',\n", + " 'tBodyGyro-min()-Y',\n", + " 'tBodyGyro-min()-Z',\n", + " 'tBodyGyro-sma()',\n", + " 'tBodyGyro-energy()-X',\n", + " 'tBodyGyro-energy()-Y',\n", + " 'tBodyGyro-energy()-Z',\n", + " 'tBodyGyro-iqr()-X',\n", + " 'tBodyGyro-iqr()-Y',\n", + " 'tBodyGyro-iqr()-Z',\n", + " 'tBodyGyro-entropy()-X',\n", + " 'tBodyGyro-entropy()-Y',\n", + " 'tBodyGyro-entropy()-Z',\n", + " 'tBodyGyro-arCoeff()-X,1',\n", + " 'tBodyGyro-arCoeff()-X,2',\n", + " 'tBodyGyro-arCoeff()-X,3',\n", + " 'tBodyGyro-arCoeff()-X,4',\n", + " 'tBodyGyro-arCoeff()-Y,1',\n", + " 'tBodyGyro-arCoeff()-Y,2',\n", + " 'tBodyGyro-arCoeff()-Y,3',\n", + " 'tBodyGyro-arCoeff()-Y,4',\n", + " 'tBodyGyro-arCoeff()-Z,1',\n", + " 'tBodyGyro-arCoeff()-Z,2',\n", + " 'tBodyGyro-arCoeff()-Z,3',\n", + " 'tBodyGyro-arCoeff()-Z,4',\n", + " 'tBodyGyro-correlation()-X,Y',\n", + " 'tBodyGyro-correlation()-X,Z',\n", + " 'tBodyGyro-correlation()-Y,Z',\n", + " 'tBodyGyroJerk-mean()-X',\n", + " 'tBodyGyroJerk-mean()-Y',\n", + " 'tBodyGyroJerk-mean()-Z',\n", + " 'tBodyGyroJerk-std()-X',\n", + " 'tBodyGyroJerk-std()-Y',\n", + " 'tBodyGyroJerk-std()-Z',\n", + " 'tBodyGyroJerk-mad()-X',\n", + " 'tBodyGyroJerk-mad()-Y',\n", + " 'tBodyGyroJerk-mad()-Z',\n", + " 'tBodyGyroJerk-max()-X',\n", + " 'tBodyGyroJerk-max()-Y',\n", + " 'tBodyGyroJerk-max()-Z',\n", + " 'tBodyGyroJerk-min()-X',\n", + " 'tBodyGyroJerk-min()-Y',\n", + " 'tBodyGyroJerk-min()-Z',\n", + " 'tBodyGyroJerk-sma()',\n", + " 'tBodyGyroJerk-energy()-X',\n", + " 'tBodyGyroJerk-energy()-Y',\n", + " 'tBodyGyroJerk-energy()-Z',\n", + " 'tBodyGyroJerk-iqr()-X',\n", + " 'tBodyGyroJerk-iqr()-Y',\n", + " 'tBodyGyroJerk-iqr()-Z',\n", + " 'tBodyGyroJerk-entropy()-X',\n", + " 'tBodyGyroJerk-entropy()-Y',\n", + " 'tBodyGyroJerk-entropy()-Z',\n", + " 'tBodyGyroJerk-arCoeff()-X,1',\n", + " 'tBodyGyroJerk-arCoeff()-X,2',\n", + " 'tBodyGyroJerk-arCoeff()-X,3',\n", + " 'tBodyGyroJerk-arCoeff()-X,4',\n", + " 'tBodyGyroJerk-arCoeff()-Y,1',\n", + " 'tBodyGyroJerk-arCoeff()-Y,2',\n", + " 'tBodyGyroJerk-arCoeff()-Y,3',\n", + " 'tBodyGyroJerk-arCoeff()-Y,4',\n", + " 'tBodyGyroJerk-arCoeff()-Z,1',\n", + " 'tBodyGyroJerk-arCoeff()-Z,2',\n", + " 'tBodyGyroJerk-arCoeff()-Z,3',\n", + " 'tBodyGyroJerk-arCoeff()-Z,4',\n", + " 'tBodyGyroJerk-correlation()-X,Y',\n", + " 'tBodyGyroJerk-correlation()-X,Z',\n", + " 'tBodyGyroJerk-correlation()-Y,Z',\n", + " 'tBodyAccMag-mean()',\n", + " 'tBodyAccMag-std()',\n", + " 'tBodyAccMag-mad()',\n", + " 'tBodyAccMag-max()',\n", + " 'tBodyAccMag-min()',\n", + " 'tBodyAccMag-sma()',\n", + " 'tBodyAccMag-energy()',\n", + " 'tBodyAccMag-iqr()',\n", + " 'tBodyAccMag-entropy()',\n", + " 'tBodyAccMag-arCoeff()1',\n", + " 'tBodyAccMag-arCoeff()2',\n", + " 'tBodyAccMag-arCoeff()3',\n", + " 'tBodyAccMag-arCoeff()4',\n", + " 'tGravityAccMag-mean()',\n", + " 'tGravityAccMag-std()',\n", + " 'tGravityAccMag-mad()',\n", + " 'tGravityAccMag-max()',\n", + " 'tGravityAccMag-min()',\n", + " 'tGravityAccMag-sma()',\n", + " 'tGravityAccMag-energy()',\n", + " 'tGravityAccMag-iqr()',\n", + " 'tGravityAccMag-entropy()',\n", + " 'tGravityAccMag-arCoeff()1',\n", + " 'tGravityAccMag-arCoeff()2',\n", + " 'tGravityAccMag-arCoeff()3',\n", + " 'tGravityAccMag-arCoeff()4',\n", + " 'tBodyAccJerkMag-mean()',\n", + " 'tBodyAccJerkMag-std()',\n", + " 'tBodyAccJerkMag-mad()',\n", + " 'tBodyAccJerkMag-max()',\n", + " 'tBodyAccJerkMag-min()',\n", + " 'tBodyAccJerkMag-sma()',\n", + " 'tBodyAccJerkMag-energy()',\n", + " 'tBodyAccJerkMag-iqr()',\n", + " 'tBodyAccJerkMag-entropy()',\n", + " 'tBodyAccJerkMag-arCoeff()1',\n", + " 'tBodyAccJerkMag-arCoeff()2',\n", + " 'tBodyAccJerkMag-arCoeff()3',\n", + " 'tBodyAccJerkMag-arCoeff()4',\n", + " 'tBodyGyroMag-mean()',\n", + " 'tBodyGyroMag-std()',\n", + " 'tBodyGyroMag-mad()',\n", + " 'tBodyGyroMag-max()',\n", + " 'tBodyGyroMag-min()',\n", + " 'tBodyGyroMag-sma()',\n", + " 'tBodyGyroMag-energy()',\n", + " 'tBodyGyroMag-iqr()',\n", + " 'tBodyGyroMag-entropy()',\n", + " 'tBodyGyroMag-arCoeff()1',\n", + " 'tBodyGyroMag-arCoeff()2',\n", + " 'tBodyGyroMag-arCoeff()3',\n", + " 'tBodyGyroMag-arCoeff()4',\n", + " 'tBodyGyroJerkMag-mean()',\n", + " 'tBodyGyroJerkMag-std()',\n", + " 'tBodyGyroJerkMag-mad()',\n", + " 'tBodyGyroJerkMag-max()',\n", + " 'tBodyGyroJerkMag-min()',\n", + " 'tBodyGyroJerkMag-sma()',\n", + " 'tBodyGyroJerkMag-energy()',\n", + " 'tBodyGyroJerkMag-iqr()',\n", + " 'tBodyGyroJerkMag-entropy()',\n", + " 'tBodyGyroJerkMag-arCoeff()1',\n", + " 'tBodyGyroJerkMag-arCoeff()2',\n", + " 'tBodyGyroJerkMag-arCoeff()3',\n", + " 'tBodyGyroJerkMag-arCoeff()4',\n", + " 'fBodyAcc-mean()-X',\n", + " 'fBodyAcc-mean()-Y',\n", + " 'fBodyAcc-mean()-Z',\n", + " 'fBodyAcc-std()-X',\n", + " 'fBodyAcc-std()-Y',\n", + " 'fBodyAcc-std()-Z',\n", + " 'fBodyAcc-mad()-X',\n", + " 'fBodyAcc-mad()-Y',\n", + " 'fBodyAcc-mad()-Z',\n", + " 'fBodyAcc-max()-X',\n", + " 'fBodyAcc-max()-Y',\n", + " 'fBodyAcc-max()-Z',\n", + " 'fBodyAcc-min()-X',\n", + " 'fBodyAcc-min()-Y',\n", + " 'fBodyAcc-min()-Z',\n", + " 'fBodyAcc-sma()',\n", + " 'fBodyAcc-energy()-X',\n", + " 'fBodyAcc-energy()-Y',\n", + " 'fBodyAcc-energy()-Z',\n", + " 'fBodyAcc-iqr()-X',\n", + " 'fBodyAcc-iqr()-Y',\n", + " 'fBodyAcc-iqr()-Z',\n", + " 'fBodyAcc-entropy()-X',\n", + " 'fBodyAcc-entropy()-Y',\n", + " 'fBodyAcc-entropy()-Z',\n", + " 'fBodyAcc-maxInds-X',\n", + " 'fBodyAcc-maxInds-Y',\n", + " 'fBodyAcc-maxInds-Z',\n", + " 'fBodyAcc-meanFreq()-X',\n", + " 'fBodyAcc-meanFreq()-Y',\n", + " 'fBodyAcc-meanFreq()-Z',\n", + " 'fBodyAcc-skewness()-X',\n", + " 'fBodyAcc-kurtosis()-X',\n", + " 'fBodyAcc-skewness()-Y',\n", + " 'fBodyAcc-kurtosis()-Y',\n", + " 'fBodyAcc-skewness()-Z',\n", + " 'fBodyAcc-kurtosis()-Z',\n", + " 'fBodyAcc-bandsEnergy()-1,8',\n", + " 'fBodyAcc-bandsEnergy()-9,16',\n", + " 'fBodyAcc-bandsEnergy()-17,24',\n", + " 'fBodyAcc-bandsEnergy()-25,32',\n", + " 'fBodyAcc-bandsEnergy()-33,40',\n", + " 'fBodyAcc-bandsEnergy()-41,48',\n", + " 'fBodyAcc-bandsEnergy()-49,56',\n", + " 'fBodyAcc-bandsEnergy()-57,64',\n", + " 'fBodyAcc-bandsEnergy()-1,16',\n", + " 'fBodyAcc-bandsEnergy()-17,32',\n", + " 'fBodyAcc-bandsEnergy()-33,48',\n", + " 'fBodyAcc-bandsEnergy()-49,64',\n", + " 'fBodyAcc-bandsEnergy()-1,24',\n", + " 'fBodyAcc-bandsEnergy()-25,48',\n", + " 'fBodyAcc-bandsEnergy()-1,8.1',\n", + " 'fBodyAcc-bandsEnergy()-9,16.1',\n", + " 'fBodyAcc-bandsEnergy()-17,24.1',\n", + " 'fBodyAcc-bandsEnergy()-25,32.1',\n", + " 'fBodyAcc-bandsEnergy()-33,40.1',\n", + " 'fBodyAcc-bandsEnergy()-41,48.1',\n", + " 'fBodyAcc-bandsEnergy()-49,56.1',\n", + " 'fBodyAcc-bandsEnergy()-57,64.1',\n", + " 'fBodyAcc-bandsEnergy()-1,16.1',\n", + " 'fBodyAcc-bandsEnergy()-17,32.1',\n", + " 'fBodyAcc-bandsEnergy()-33,48.1',\n", + " 'fBodyAcc-bandsEnergy()-49,64.1',\n", + " 'fBodyAcc-bandsEnergy()-1,24.1',\n", + " 'fBodyAcc-bandsEnergy()-25,48.1',\n", + " 'fBodyAcc-bandsEnergy()-1,8.2',\n", + " 'fBodyAcc-bandsEnergy()-9,16.2',\n", + " 'fBodyAcc-bandsEnergy()-17,24.2',\n", + " 'fBodyAcc-bandsEnergy()-25,32.2',\n", + " 'fBodyAcc-bandsEnergy()-33,40.2',\n", + " 'fBodyAcc-bandsEnergy()-41,48.2',\n", + " 'fBodyAcc-bandsEnergy()-49,56.2',\n", + " 'fBodyAcc-bandsEnergy()-57,64.2',\n", + " 'fBodyAcc-bandsEnergy()-1,16.2',\n", + " 'fBodyAcc-bandsEnergy()-17,32.2',\n", + " 'fBodyAcc-bandsEnergy()-33,48.2',\n", + " 'fBodyAcc-bandsEnergy()-49,64.2',\n", + " 'fBodyAcc-bandsEnergy()-1,24.2',\n", + " 'fBodyAcc-bandsEnergy()-25,48.2',\n", + " 'fBodyAccJerk-mean()-X',\n", + " 'fBodyAccJerk-mean()-Y',\n", + " 'fBodyAccJerk-mean()-Z',\n", + " 'fBodyAccJerk-std()-X',\n", + " 'fBodyAccJerk-std()-Y',\n", + " 'fBodyAccJerk-std()-Z',\n", + " 'fBodyAccJerk-mad()-X',\n", + " 'fBodyAccJerk-mad()-Y',\n", + " 'fBodyAccJerk-mad()-Z',\n", + " 'fBodyAccJerk-max()-X',\n", + " 'fBodyAccJerk-max()-Y',\n", + " 'fBodyAccJerk-max()-Z',\n", + " 'fBodyAccJerk-min()-X',\n", + " 'fBodyAccJerk-min()-Y',\n", + " 'fBodyAccJerk-min()-Z',\n", + " 'fBodyAccJerk-sma()',\n", + " 'fBodyAccJerk-energy()-X',\n", + " 'fBodyAccJerk-energy()-Y',\n", + " 'fBodyAccJerk-energy()-Z',\n", + " 'fBodyAccJerk-iqr()-X',\n", + " 'fBodyAccJerk-iqr()-Y',\n", + " 'fBodyAccJerk-iqr()-Z',\n", + " 'fBodyAccJerk-entropy()-X',\n", + " 'fBodyAccJerk-entropy()-Y',\n", + " 'fBodyAccJerk-entropy()-Z',\n", + " 'fBodyAccJerk-maxInds-X',\n", + " 'fBodyAccJerk-maxInds-Y',\n", + " 'fBodyAccJerk-maxInds-Z',\n", + " 'fBodyAccJerk-meanFreq()-X',\n", + " 'fBodyAccJerk-meanFreq()-Y',\n", + " 'fBodyAccJerk-meanFreq()-Z',\n", + " 'fBodyAccJerk-skewness()-X',\n", + " 'fBodyAccJerk-kurtosis()-X',\n", + " 'fBodyAccJerk-skewness()-Y',\n", + " 'fBodyAccJerk-kurtosis()-Y',\n", + " 'fBodyAccJerk-skewness()-Z',\n", + " 'fBodyAccJerk-kurtosis()-Z',\n", + " 'fBodyAccJerk-bandsEnergy()-1,8',\n", + " 'fBodyAccJerk-bandsEnergy()-9,16',\n", + " 'fBodyAccJerk-bandsEnergy()-17,24',\n", + " 'fBodyAccJerk-bandsEnergy()-25,32',\n", + " 'fBodyAccJerk-bandsEnergy()-33,40',\n", + " 'fBodyAccJerk-bandsEnergy()-41,48',\n", + " 'fBodyAccJerk-bandsEnergy()-49,56',\n", + " 'fBodyAccJerk-bandsEnergy()-57,64',\n", + " 'fBodyAccJerk-bandsEnergy()-1,16',\n", + " 'fBodyAccJerk-bandsEnergy()-17,32',\n", + " 'fBodyAccJerk-bandsEnergy()-33,48',\n", + " 'fBodyAccJerk-bandsEnergy()-49,64',\n", + " 'fBodyAccJerk-bandsEnergy()-1,24',\n", + " 'fBodyAccJerk-bandsEnergy()-25,48',\n", + " 'fBodyAccJerk-bandsEnergy()-1,8.1',\n", + " 'fBodyAccJerk-bandsEnergy()-9,16.1',\n", + " 'fBodyAccJerk-bandsEnergy()-17,24.1',\n", + " 'fBodyAccJerk-bandsEnergy()-25,32.1',\n", + " 'fBodyAccJerk-bandsEnergy()-33,40.1',\n", + " 'fBodyAccJerk-bandsEnergy()-41,48.1',\n", + " 'fBodyAccJerk-bandsEnergy()-49,56.1',\n", + " 'fBodyAccJerk-bandsEnergy()-57,64.1',\n", + " 'fBodyAccJerk-bandsEnergy()-1,16.1',\n", + " 'fBodyAccJerk-bandsEnergy()-17,32.1',\n", + " 'fBodyAccJerk-bandsEnergy()-33,48.1',\n", + " 'fBodyAccJerk-bandsEnergy()-49,64.1',\n", + " 'fBodyAccJerk-bandsEnergy()-1,24.1',\n", + " 'fBodyAccJerk-bandsEnergy()-25,48.1',\n", + " 'fBodyAccJerk-bandsEnergy()-1,8.2',\n", + " 'fBodyAccJerk-bandsEnergy()-9,16.2',\n", + " 'fBodyAccJerk-bandsEnergy()-17,24.2',\n", + " 'fBodyAccJerk-bandsEnergy()-25,32.2',\n", + " 'fBodyAccJerk-bandsEnergy()-33,40.2',\n", + " 'fBodyAccJerk-bandsEnergy()-41,48.2',\n", + " 'fBodyAccJerk-bandsEnergy()-49,56.2',\n", + " 'fBodyAccJerk-bandsEnergy()-57,64.2',\n", + " 'fBodyAccJerk-bandsEnergy()-1,16.2',\n", + " 'fBodyAccJerk-bandsEnergy()-17,32.2',\n", + " 'fBodyAccJerk-bandsEnergy()-33,48.2',\n", + " 'fBodyAccJerk-bandsEnergy()-49,64.2',\n", + " 'fBodyAccJerk-bandsEnergy()-1,24.2',\n", + " 'fBodyAccJerk-bandsEnergy()-25,48.2',\n", + " 'fBodyGyro-mean()-X',\n", + " 'fBodyGyro-mean()-Y',\n", + " 'fBodyGyro-mean()-Z',\n", + " 'fBodyGyro-std()-X',\n", + " 'fBodyGyro-std()-Y',\n", + " 'fBodyGyro-std()-Z',\n", + " 'fBodyGyro-mad()-X',\n", + " 'fBodyGyro-mad()-Y',\n", + " 'fBodyGyro-mad()-Z',\n", + " 'fBodyGyro-max()-X',\n", + " 'fBodyGyro-max()-Y',\n", + " 'fBodyGyro-max()-Z',\n", + " 'fBodyGyro-min()-X',\n", + " 'fBodyGyro-min()-Y',\n", + " 'fBodyGyro-min()-Z',\n", + " 'fBodyGyro-sma()',\n", + " 'fBodyGyro-energy()-X',\n", + " 'fBodyGyro-energy()-Y',\n", + " 'fBodyGyro-energy()-Z',\n", + " 'fBodyGyro-iqr()-X',\n", + " 'fBodyGyro-iqr()-Y',\n", + " 'fBodyGyro-iqr()-Z',\n", + " 'fBodyGyro-entropy()-X',\n", + " 'fBodyGyro-entropy()-Y',\n", + " 'fBodyGyro-entropy()-Z',\n", + " 'fBodyGyro-maxInds-X',\n", + " 'fBodyGyro-maxInds-Y',\n", + " 'fBodyGyro-maxInds-Z',\n", + " 'fBodyGyro-meanFreq()-X',\n", + " 'fBodyGyro-meanFreq()-Y',\n", + " 'fBodyGyro-meanFreq()-Z',\n", + " 'fBodyGyro-skewness()-X',\n", + " 'fBodyGyro-kurtosis()-X',\n", + " 'fBodyGyro-skewness()-Y',\n", + " 'fBodyGyro-kurtosis()-Y',\n", + " 'fBodyGyro-skewness()-Z',\n", + " 'fBodyGyro-kurtosis()-Z',\n", + " 'fBodyGyro-bandsEnergy()-1,8',\n", + " 'fBodyGyro-bandsEnergy()-9,16',\n", + " 'fBodyGyro-bandsEnergy()-17,24',\n", + " 'fBodyGyro-bandsEnergy()-25,32',\n", + " 'fBodyGyro-bandsEnergy()-33,40',\n", + " 'fBodyGyro-bandsEnergy()-41,48',\n", + " 'fBodyGyro-bandsEnergy()-49,56',\n", + " 'fBodyGyro-bandsEnergy()-57,64',\n", + " 'fBodyGyro-bandsEnergy()-1,16',\n", + " 'fBodyGyro-bandsEnergy()-17,32',\n", + " 'fBodyGyro-bandsEnergy()-33,48',\n", + " 'fBodyGyro-bandsEnergy()-49,64',\n", + " 'fBodyGyro-bandsEnergy()-1,24',\n", + " 'fBodyGyro-bandsEnergy()-25,48',\n", + " 'fBodyGyro-bandsEnergy()-1,8.1',\n", + " 'fBodyGyro-bandsEnergy()-9,16.1',\n", + " 'fBodyGyro-bandsEnergy()-17,24.1',\n", + " 'fBodyGyro-bandsEnergy()-25,32.1',\n", + " 'fBodyGyro-bandsEnergy()-33,40.1',\n", + " 'fBodyGyro-bandsEnergy()-41,48.1',\n", + " 'fBodyGyro-bandsEnergy()-49,56.1',\n", + " 'fBodyGyro-bandsEnergy()-57,64.1',\n", + " 'fBodyGyro-bandsEnergy()-1,16.1',\n", + " 'fBodyGyro-bandsEnergy()-17,32.1',\n", + " 'fBodyGyro-bandsEnergy()-33,48.1',\n", + " 'fBodyGyro-bandsEnergy()-49,64.1',\n", + " 'fBodyGyro-bandsEnergy()-1,24.1',\n", + " 'fBodyGyro-bandsEnergy()-25,48.1',\n", + " 'fBodyGyro-bandsEnergy()-1,8.2',\n", + " 'fBodyGyro-bandsEnergy()-9,16.2',\n", + " 'fBodyGyro-bandsEnergy()-17,24.2',\n", + " 'fBodyGyro-bandsEnergy()-25,32.2',\n", + " 'fBodyGyro-bandsEnergy()-33,40.2',\n", + " 'fBodyGyro-bandsEnergy()-41,48.2',\n", + " 'fBodyGyro-bandsEnergy()-49,56.2',\n", + " 'fBodyGyro-bandsEnergy()-57,64.2',\n", + " 'fBodyGyro-bandsEnergy()-1,16.2',\n", + " 'fBodyGyro-bandsEnergy()-17,32.2',\n", + " 'fBodyGyro-bandsEnergy()-33,48.2',\n", + " 'fBodyGyro-bandsEnergy()-49,64.2',\n", + " 'fBodyGyro-bandsEnergy()-1,24.2',\n", + " 'fBodyGyro-bandsEnergy()-25,48.2',\n", + " 'fBodyAccMag-mean()',\n", + " 'fBodyAccMag-std()',\n", + " 'fBodyAccMag-mad()',\n", + " 'fBodyAccMag-max()',\n", + " 'fBodyAccMag-min()',\n", + " 'fBodyAccMag-sma()',\n", + " 'fBodyAccMag-energy()',\n", + " 'fBodyAccMag-iqr()',\n", + " 'fBodyAccMag-entropy()',\n", + " 'fBodyAccMag-maxInds',\n", + " 'fBodyAccMag-meanFreq()',\n", + " 'fBodyAccMag-skewness()',\n", + " 'fBodyAccMag-kurtosis()',\n", + " 'fBodyBodyAccJerkMag-mean()',\n", + " 'fBodyBodyAccJerkMag-std()',\n", + " 'fBodyBodyAccJerkMag-mad()',\n", + " 'fBodyBodyAccJerkMag-max()',\n", + " 'fBodyBodyAccJerkMag-min()',\n", + " 'fBodyBodyAccJerkMag-sma()',\n", + " 'fBodyBodyAccJerkMag-energy()',\n", + " 'fBodyBodyAccJerkMag-iqr()',\n", + " 'fBodyBodyAccJerkMag-entropy()',\n", + " 'fBodyBodyAccJerkMag-maxInds',\n", + " 'fBodyBodyAccJerkMag-meanFreq()',\n", + " 'fBodyBodyAccJerkMag-skewness()',\n", + " 'fBodyBodyAccJerkMag-kurtosis()',\n", + " 'fBodyBodyGyroMag-mean()',\n", + " 'fBodyBodyGyroMag-std()',\n", + " 'fBodyBodyGyroMag-mad()',\n", + " 'fBodyBodyGyroMag-max()',\n", + " 'fBodyBodyGyroMag-min()',\n", + " 'fBodyBodyGyroMag-sma()',\n", + " 'fBodyBodyGyroMag-energy()',\n", + " 'fBodyBodyGyroMag-iqr()',\n", + " 'fBodyBodyGyroMag-entropy()',\n", + " 'fBodyBodyGyroMag-maxInds',\n", + " 'fBodyBodyGyroMag-meanFreq()',\n", + " 'fBodyBodyGyroMag-skewness()',\n", + " 'fBodyBodyGyroMag-kurtosis()',\n", + " 'fBodyBodyGyroJerkMag-mean()',\n", + " 'fBodyBodyGyroJerkMag-std()',\n", + " 'fBodyBodyGyroJerkMag-mad()',\n", + " 'fBodyBodyGyroJerkMag-max()',\n", + " 'fBodyBodyGyroJerkMag-min()',\n", + " 'fBodyBodyGyroJerkMag-sma()',\n", + " 'fBodyBodyGyroJerkMag-energy()',\n", + " 'fBodyBodyGyroJerkMag-iqr()',\n", + " 'fBodyBodyGyroJerkMag-entropy()',\n", + " 'fBodyBodyGyroJerkMag-maxInds',\n", + " 'fBodyBodyGyroJerkMag-meanFreq()',\n", + " 'fBodyBodyGyroJerkMag-skewness()',\n", + " 'fBodyBodyGyroJerkMag-kurtosis()',\n", + " 'angle(tBodyAccMean,gravity)',\n", + " 'angle(tBodyAccJerkMean),gravityMean)',\n", + " 'angle(tBodyGyroMean,gravityMean)',\n", + " 'angle(tBodyGyroJerkMean,gravityMean)',\n", + " 'angle(X,gravityMean)',\n", + " 'angle(Y,gravityMean)',\n", + " 'angle(Z,gravityMean)']" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "X" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "561" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(X)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Data Preprocessing for tabFPN Classifier Model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To process the training data for the TabPFN model, we will use Linear Discriminant Analysis (LDA) to reduce the number of features from the original 560 to below the tabFPN model's maximum limit of 100. By applying LDA, we can preserve the most relevant information for classification while reducing the complexity of the input data, making it suitable for the TabPFN model, which requires a compact input format for efficient processing and predictions." + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(1020, 6)" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Data processing to reduce the features to 100 or less as required for tabFPN models\n", + "X = train_har_data.drop(columns=['Activity'])\n", + "y = train_har_data['Activity']\n", + "scaler = StandardScaler()\n", + "X_scaled = scaler.fit_transform(X)\n", + "lda = LinearDiscriminantAnalysis(n_components=min(100, len(set(y)) - 1))\n", + "X_reduced_lda = lda.fit_transform(X_scaled, y)\n", + "X_train_lda_df = pd.DataFrame(X_reduced_lda, columns=[f'LDA{i+1}' for i in range(X_reduced_lda.shape[1])])\n", + "X_train_lda_df['Activity'] = y.reset_index(drop=True)\n", + "X_train_lda_df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# define the explanatory vairables\n", + "X = list(X_train_lda_df.columns)\n", + "X =X[:-1]\n", + "len(X)" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['LDA1', 'LDA2', 'LDA3', 'LDA4', 'LDA5', 'Activity'], dtype='object')" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "X_train_lda_df.columns" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once the explanatory variables X are preprocessed this is now used as input for the *prepare_tabulardata* method from the tabular learner in the arcgis.learn. The method takes the feature layer or a spatial dataframe containing the dataset and prepares it for fitting the model. \n", + "\n", + "The input parameters required for the tool are similar to the ones mentioned previously :" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [], + "source": [ + "data = prepare_tabulardata(X_train_lda_df, 'Activity',explanatory_variables=X)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Visualize training data " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To get a sense of what the training data looks like, the show_batch() method will randomly pick a few training sample and visualize them. The sample are showing the explanaotyr vairables and the variblss to predict column." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ActivityLDA1LDA2LDA3LDA4LDA5
28STANDING-19.431367-11.1744150.816325-0.6871532.809134
130SITTING-18.377005-9.4293100.342490-0.757008-4.074501
311WALKING17.852727-1.390446-8.1646044.176480-0.160302
734WALKING_UPSTAIRS23.6336402.3011212.475455-10.447339-0.302447
847LAYING-26.62091314.783496-0.7058470.511383-0.568820
\n", + "
" + ], + "text/plain": [ + " Activity LDA1 LDA2 LDA3 LDA4 LDA5\n", + "28 STANDING -19.431367 -11.174415 0.816325 -0.687153 2.809134\n", + "130 SITTING -18.377005 -9.429310 0.342490 -0.757008 -4.074501\n", + "311 WALKING 17.852727 -1.390446 -8.164604 4.176480 -0.160302\n", + "734 WALKING_UPSTAIRS 23.633640 2.301121 2.475455 -10.447339 -0.302447\n", + "847 LAYING -26.620913 14.783496 -0.705847 0.511383 -0.568820" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.show_batch(rows=5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Model Training \n", + "First we initialize the model as follows:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the tabFPN classifier model \n", + "\n", + "The default, initialization of the tabFPN classifier model object is shown below:" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "from arcgis.learn import MLModel\n", + "tabpfn_classifier = MLModel(data, 'tabpfn.TabPFNClassifier',device='cpu', N_ensemble_configurations=32)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Fit the model \n", + "\n", + "Next, we will train the model.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [], + "source": [ + "tabpfn_classifier.fit()" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.9901960784313726" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tabpfn_classifier.score()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can see the model score is showing excellent result." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize results in validation set \n", + "\n", + "It is a good practice to see the results of the model viz-a-viz ground truth. The code below picks random samples and shows us the `Activity` which is the ground truth or target state and model predicted `Activity_results` side by side. This enables us to preview the results of the model we trained." + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ActivityLDA1LDA2LDA3LDA4LDA5Activity_results
101LAYING-27.92574617.698939-0.2119930.137931-0.872844LAYING
299WALKING22.2306590.572533-8.3181691.2750160.139479WALKING
693SITTING-18.977440-10.1230490.551960-0.028800-3.010251SITTING
884WALKING23.685543-0.261172-11.3384813.464041-0.628183WALKING
967SITTING-18.285170-11.0177981.873937-0.702550-1.983438SITTING
\n", + "
" + ], + "text/plain": [ + " Activity LDA1 LDA2 LDA3 LDA4 LDA5 \\\n", + "101 LAYING -27.925746 17.698939 -0.211993 0.137931 -0.872844 \n", + "299 WALKING 22.230659 0.572533 -8.318169 1.275016 0.139479 \n", + "693 SITTING -18.977440 -10.123049 0.551960 -0.028800 -3.010251 \n", + "884 WALKING 23.685543 -0.261172 -11.338481 3.464041 -0.628183 \n", + "967 SITTING -18.285170 -11.017798 1.873937 -0.702550 -1.983438 \n", + "\n", + " Activity_results \n", + "101 LAYING \n", + "299 WALKING \n", + "693 SITTING \n", + "884 WALKING \n", + "967 SITTING " + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tabpfn_classifier.show_results()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Predicting using the tabFPN classifier model \n", + "\n", + "Once the TabPFN classifier is trained on the smaller dataset of 1,020 samples, we can use it to predict the classes of a larger dataset containing 6,332 samples. Given TabPFN’s ability to process data efficiently with a single forward pass, it can handle this larger dataset quickly, classifying each sample based on the patterns learned during training. Since the model is optimized for fast and scalable predictions, it will generate class predictions for all samples. \n", + "\n", + "Before using the trained TabPFN model to predict the classes of the test dataset, we will first apply Linear Discriminant Analysis (LDA) to reduce the test data to the same feature space as the training data. This ensures consistency between the training and test datasets, enabling the trained TabPFN model to effectively classify the larger test sample." + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(6332, 6)\n" + ] + } + ], + "source": [ + "# Align tset data with the train data format \n", + "X = test_har_data.drop(columns=['Activity'])\n", + "y = test_har_data['Activity']\n", + "scaler = StandardScaler()\n", + "X_scaled = scaler.fit_transform(X)\n", + "lda = LinearDiscriminantAnalysis(n_components=min(100, len(set(y)) - 1)) \n", + "X_reduced_lda = lda.fit_transform(X_scaled, y)\n", + "X_test_lda_df = pd.DataFrame(X_reduced_lda, columns=[f'LDA{i+1}' for i in range(X_reduced_lda.shape[1])])\n", + "X_test_lda_df['Activity'] = y.reset_index(drop=True)\n", + "print(X_test_lda_df.shape) " + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
LDA1LDA2LDA3LDA4LDA5Activity
0-10.188443-8.6413770.6066691.1209833.836137STANDING
1-9.735631-6.7166750.537841-0.5433852.295157STANDING
2-8.954351-7.3762960.798942-0.5074652.508069STANDING
3-10.400401-7.2673211.0351340.2727382.034312STANDING
4-9.596161-6.9800610.480017-0.2845371.103180STANDING
\n", + "
" + ], + "text/plain": [ + " LDA1 LDA2 LDA3 LDA4 LDA5 Activity\n", + "0 -10.188443 -8.641377 0.606669 1.120983 3.836137 STANDING\n", + "1 -9.735631 -6.716675 0.537841 -0.543385 2.295157 STANDING\n", + "2 -8.954351 -7.376296 0.798942 -0.507465 2.508069 STANDING\n", + "3 -10.400401 -7.267321 1.035134 0.272738 2.034312 STANDING\n", + "4 -9.596161 -6.980061 0.480017 -0.284537 1.103180 STANDING" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "X_test_lda_df.head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Predicting using the trained model " + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [], + "source": [ + "activity_predicted_tabfpn = tabpfn_classifier.predict(X_test_lda_df, prediction_type='dataframe')" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
LDA1LDA2LDA3LDA4LDA5Activityprediction_results
632718.0957832.5937758.7043375.4242570.443448WALKING_DOWNSTAIRSWALKING_DOWNSTAIRS
632817.0942861.9972845.2707522.8398470.550822WALKING_DOWNSTAIRSWALKING_DOWNSTAIRS
632915.9095941.5378034.8872374.771153-0.157321WALKING_DOWNSTAIRSWALKING_DOWNSTAIRS
633011.9449850.8346600.116338-6.2852360.045984WALKING_UPSTAIRSWALKING_UPSTAIRS
633114.5755701.737412-0.866397-5.458011-1.150021WALKING_UPSTAIRSWALKING_UPSTAIRS
\n", + "
" + ], + "text/plain": [ + " LDA1 LDA2 LDA3 LDA4 LDA5 Activity \\\n", + "6327 18.095783 2.593775 8.704337 5.424257 0.443448 WALKING_DOWNSTAIRS \n", + "6328 17.094286 1.997284 5.270752 2.839847 0.550822 WALKING_DOWNSTAIRS \n", + "6329 15.909594 1.537803 4.887237 4.771153 -0.157321 WALKING_DOWNSTAIRS \n", + "6330 11.944985 0.834660 0.116338 -6.285236 0.045984 WALKING_UPSTAIRS \n", + "6331 14.575570 1.737412 -0.866397 -5.458011 -1.150021 WALKING_UPSTAIRS \n", + "\n", + " prediction_results \n", + "6327 WALKING_DOWNSTAIRS \n", + "6328 WALKING_DOWNSTAIRS \n", + "6329 WALKING_DOWNSTAIRS \n", + "6330 WALKING_UPSTAIRS \n", + "6331 WALKING_UPSTAIRS " + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "activity_predicted_tabfpn.tail(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Accuracy assessment \n", + "\n", + "Here we weill evaluate the model's performance. This will print out multiple model metrics. we can assess the model quality using its corresponding metrics. These metrics include a combination of multiple evaluation criteria, such as `accuracy`, `precision`, `recall` and `F1-Score`, which collectively measure the model's performance on the validation set." + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Accuracy: 96.83%\n", + "Precision: 0.97\n", + "Recall: 0.97\n", + "F1 Score: 0.97\n", + "\n", + "Classification Report:\n", + " precision recall f1-score support\n", + "\n", + " LAYING 1.00 1.00 1.00 1219\n", + " SITTING 1.00 0.83 0.91 1119\n", + " STANDING 0.86 0.99 0.93 1197\n", + " WALKING 1.00 1.00 1.00 1031\n", + "WALKING_DOWNSTAIRS 1.00 0.99 1.00 835\n", + " WALKING_UPSTAIRS 0.99 1.00 1.00 931\n", + "\n", + " accuracy 0.97 6332\n", + " macro avg 0.97 0.97 0.97 6332\n", + " weighted avg 0.97 0.97 0.97 6332\n", + "\n" + ] + } + ], + "source": [ + "# Extract ground truth and predictions\n", + "y_true = activity_predicted_tabfpn['Activity']\n", + "y_pred = activity_predicted_tabfpn['prediction_results']\n", + "\n", + "# Calculate Accuracy\n", + "accuracy = accuracy_score(y_true, y_pred)\n", + "print(f'Accuracy: {accuracy * 100:.2f}%')\n", + "\n", + "# Calculate Precision \n", + "precision = precision_score(y_true, y_pred, average='weighted', zero_division=0)\n", + "print(f'Precision: {precision:.2f}')\n", + "\n", + "# Calculate Recall \n", + "recall = recall_score(y_true, y_pred, average='weighted', zero_division=0)\n", + "print(f'Recall: {recall:.2f}')\n", + "\n", + "# Calculate F1-Score \n", + "f1 = f1_score(y_true, y_pred, average='weighted', zero_division=0)\n", + "print(f'F1 Score: {f1:.2f}')\n", + "\n", + "# classification_report \n", + "print(\"\\nClassification Report:\")\n", + "print(classification_report(y_true, y_pred))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The performance metrics obtained from the trained TabPFN model on the test dataset of 6,332 samples indicate excellent classification quality.\n", + "\n", + "`Accuracy (96.81%)` : The model correctly classified approximately 97% of the samples, which is a strong indication of its ability to generalize well to unseen data, despite being trained on a smaller dataset of just 1,020 samples.\n", + "\n", + "`Precision (0.97)` : Precision measures the proportion of true positive predictions among all positive predictions made by the model. A precision of 0.97 means that 97% of the predicted positive activity classes are correct, indicating that the model rarely makes false positive errors.\n", + "\n", + "`Recall (0.97)` : Recall represents the model's ability to correctly identify all relevant instances of a class. A recall of 0.97 means that the model correctly identifies 97% of all actual positive instances, with minimal false negatives.\n", + "\n", + "`F1 Score (0.97)` : The F1 Score is the harmonic mean of precision and recall, and a value of 0.97 shows that the model balances precision and recall very well. This indicates that the model is both highly accurate and sensitive in detecting the correct activity classes.\n", + "\n", + "Overall, these metrics demonstrate that the TabPFN model performs exceptionally well, achieving near-perfect classification with minimal errors. This performance is particularly impressive given that it was trained on a relatively small sample size of 1,020 data points, highlighting its efficiency and effectiveness in handling human activity recognition tasks." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This project highlights the powerful capabilities of the TabPFN classifier for Human Activity Recognition (HAR) tasks. Even with a training dataset of just 1,020 samples, the model achieved impressive results on a larger test dataset of 6,332 samples, with an accuracy of 96.81%, precision, recall, and F1 score all reaching 0.97. The TabPFN model's speed, simplicity, and strong performance in classifying human activities, highlight its potential for applications in healthcare, fitness, smart cities and disaster relief operations, offering an efficient and scalable solution for HAR systems." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "pro3.4_climax_27October2024", + "language": "python", + "name": "pro3.4_climax_27october2024" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From 1fd6c49303fab2a2936b93ea97b742a60c01dbac Mon Sep 17 00:00:00 2001 From: moonlanderr Date: Tue, 14 Jan 2025 13:05:04 +0530 Subject: [PATCH 2/6] suggested corrections added --- ...an_activity_using _tabPFN_classifier.ipynb | 96 +++++++++---------- 1 file changed, 45 insertions(+), 51 deletions(-) diff --git a/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb b/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb index 868f3879da..8e9c35f2a6 100644 --- a/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb +++ b/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Leveraging TabPFN for Human Activity Recognition Using Mobile dataset" + "## Leveraging TabPFN for Human Activity Recognition Using Mobile Dataset" ] }, { @@ -16,14 +16,14 @@ "* [Necessary imports](#2)\n", "* [Connecting to ArcGIS](#3)\n", "* [Accessing the datasets](#4) \n", - "* [Prepare training data for TabFPN](#5)\n", - " * [Data Preprocessing for tabFPN Classifier Model](#6) \n", + "* [Prepare training data for TabPFN](#5)\n", + " * [Data Preprocessing for TabPFN Classifier Model](#6) \n", " * [Visualize training data](#9)\n", "* [Model Training](#10) \n", - " * [Define the tabFPN classifier model ](#11)\n", + " * [Define the TabPFN classifier model ](#11)\n", " * [Fit the model](#12)\n", " * [Visualize results in validation set](#13)\n", - "* [Predicting using tabFPN classifier model](#14)\n", + "* [Predicting using TabPFN classifier model](#14)\n", " * [Predict using the trained model](#15)\n", "* [Accuracy assessment: Compute Model Metric](#16)\n", "* [Conclusion](#17)" @@ -67,8 +67,6 @@ } ], "source": [ - "%%time\n", - "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "\n", @@ -77,7 +75,6 @@ "from sklearn.preprocessing import StandardScaler\n", "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report\n", "\n", - "import arcgis\n", "from arcgis.gis import GIS\n", "from arcgis.learn import MLModel, prepare_tabulardata" ] @@ -102,9 +99,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Accessing the dataset \n", + "## Accessing the datasets \n", "\n", - "The HAR training dataset consists of 1,020 rows and 561 features, capturing sensor data from mobile devices to classify human activities like walking, running, and sitting. The data includes measurements from accelerometers, gyroscopes, and GPS, providing insights into movement patterns while ensuring that location data remains anonymized for privacy protection. Features such as BodyAcc (body accelerometer), GravityAcc (gravity accelerometer), BodyAccJerk, BodyGyro (body gyroscope), and BodyGyroJerk are used to capture dynamic and rotational movements. Time-domain and frequency-domain features are extracted from these raw signals, helping to distinguish between various activities based on patterns in acceleration, rotation, and speed, making the dataset ideal for activity classification tasks." + "Here we will access the train and test datasets. The Human Activity Recognition (HAR) training dataset consists of 1,020 rows and 561 features, capturing sensor data from mobile devices to classify human activities like walking, running, and sitting. The data includes measurements from accelerometers, gyroscopes, and GPS, providing insights into movement patterns while ensuring that location data remains anonymized for privacy protection. Features such as BodyAcc (body accelerometer), GravityAcc (gravity accelerometer), BodyAccJerk, BodyGyro (body gyroscope), and BodyGyroJerk are used to capture dynamic and rotational movements. Time-domain and frequency-domain features are extracted from these raw signals, helping to distinguish between various activities based on patterns in acceleration, rotation, and speed, making the dataset ideal for activity classification tasks." ] }, { @@ -153,7 +150,7 @@ "metadata": {}, "outputs": [], "source": [ - "# Download the train datas and save ing it in local folder\n", + "# Download the train data and saving it in local folder\n", "data_path = data_table.get_data()" ] }, @@ -391,7 +388,7 @@ } ], "source": [ - "# Read the donwloaded data\n", + "# Read the downloaded data\n", "train_har_data = pd.read_csv(data_path)\n", "train_har_data.head(5)" ] @@ -420,7 +417,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Next we will access the test dataset, which is a larger dataset containing 6,332 samples. " + "Next, we will access the test dataset, which is significantly larger, containing 6,332 samples." ] }, { @@ -469,7 +466,7 @@ "metadata": {}, "outputs": [], "source": [ - "# Download the test data and save it in local folder\n", + "# Download the test data and save it to a local folder\n", "test_data_path = test_data_table.get_data()" ] }, @@ -736,7 +733,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Prepare training data for TabFPN " + "## Prepare training data for TabPFN " ] }, { @@ -1354,14 +1351,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Data Preprocessing for tabFPN Classifier Model" + "### Data Preprocessing for TabPFN Classifier Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "To process the training data for the TabPFN model, we will use Linear Discriminant Analysis (LDA) to reduce the number of features from the original 560 to below the tabFPN model's maximum limit of 100. By applying LDA, we can preserve the most relevant information for classification while reducing the complexity of the input data, making it suitable for the TabPFN model, which requires a compact input format for efficient processing and predictions." + "To process the training data for the TabPFN model, we will use Linear Discriminant Analysis (LDA) to reduce the number of features from the original 561 to below the TabPFN model's maximum limit of 100. By applying LDA, we can preserve the most relevant information for classification while reducing the complexity of the input data, making it suitable for the TabPFN model, which requires a compact input format for efficient processing and predictions." ] }, { @@ -1381,7 +1378,7 @@ } ], "source": [ - "# Data processing to reduce the features to 100 or less as required for tabFPN models\n", + "# Data processing to reduce the features to 100 or less as required for TabPFN models\n", "X = train_har_data.drop(columns=['Activity'])\n", "y = train_har_data['Activity']\n", "scaler = StandardScaler()\n", @@ -1395,54 +1392,62 @@ }, { "cell_type": "code", - "execution_count": 35, + "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "5" + "Index(['LDA1', 'LDA2', 'LDA3', 'LDA4', 'LDA5', 'Activity'], dtype='object')" ] }, - "execution_count": 35, + "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# define the explanatory vairables\n", - "X = list(X_train_lda_df.columns)\n", - "X =X[:-1]\n", - "len(X)" + "# Visualize the final processed training data columns\n", + "X_train_lda_df.columns" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the above training dataframe we will use the `Activity` as the target label to be predicted using rest of the features as explanatory variables `X`. We define the explanatory variables as follows: " ] }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "Index(['LDA1', 'LDA2', 'LDA3', 'LDA4', 'LDA5', 'Activity'], dtype='object')" + "5" ] }, - "execution_count": 36, + "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "X_train_lda_df.columns" + "# define the explanatory vairables\n", + "X = list(X_train_lda_df.columns)\n", + "X =X[:-1]\n", + "len(X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Once the explanatory variables X are preprocessed this is now used as input for the *prepare_tabulardata* method from the tabular learner in the arcgis.learn. The method takes the feature layer or a spatial dataframe containing the dataset and prepares it for fitting the model. \n", + "Once the explanatory variables `X` is defined, this is now used as input in the `prepare_tabulardata` method from the tabular learner in the `arcgis.learn`. The method takes the feature layer or a spatial dataframe containing the dataset and prepares it for fitting the model. \n", "\n", - "The input parameters required for the tool are similar to the ones mentioned previously :" + "The input parameters required for the tool are used as shown here :" ] }, { @@ -1465,7 +1470,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "To get a sense of what the training data looks like, the show_batch() method will randomly pick a few training sample and visualize them. The sample are showing the explanaotyr vairables and the variblss to predict column." + "To get a sense of what the training data looks like, the `show_batch()` method will randomly pick a few training sample and visualize them. The sample are showing the explanatory variables and the `Activity` target label to predict." ] }, { @@ -1582,9 +1587,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Define the tabFPN classifier model \n", + "### Define the TabPFN classifier model \n", "\n", - "The default, initialization of the tabFPN classifier model object is shown below:" + "The default, initialization of the TabPFN classifier model object is shown below:" ] }, { @@ -1593,7 +1598,6 @@ "metadata": {}, "outputs": [], "source": [ - "from arcgis.learn import MLModel\n", "tabpfn_classifier = MLModel(data, 'tabpfn.TabPFNClassifier',device='cpu', N_ensemble_configurations=32)" ] }, @@ -1639,7 +1643,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can see the model score is showing excellent result." + "We can see the model score is showing excellent results." ] }, { @@ -1770,7 +1774,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Predicting using the tabFPN classifier model \n", + "## Predicting using the TabPFN classifier model \n", "\n", "Once the TabPFN classifier is trained on the smaller dataset of 1,020 samples, we can use it to predict the classes of a larger dataset containing 6,332 samples. Given TabPFN’s ability to process data efficiently with a single forward pass, it can handle this larger dataset quickly, classifying each sample based on the patterns learned during training. Since the model is optimized for fast and scalable predictions, it will generate class predictions for all samples. \n", "\n", @@ -2042,17 +2046,7 @@ "source": [ "### Accuracy assessment \n", "\n", - "Here we weill evaluate the model's performance. This will print out multiple model metrics. we can assess the model quality using its corresponding metrics. These metrics include a combination of multiple evaluation criteria, such as `accuracy`, `precision`, `recall` and `F1-Score`, which collectively measure the model's performance on the validation set." - ] - }, - { - "cell_type": "code", - "execution_count": 47, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report" + "Here we will evaluate the model's performance. This will print out multiple model metrics. we can assess the model quality using its corresponding metrics. These metrics include a combination of multiple evaluation criteria, such as `accuracy`, `precision`, `recall` and `F1-Score`, which collectively measure the model's performance on the validation set." ] }, { @@ -2146,9 +2140,9 @@ ], "metadata": { "kernelspec": { - "display_name": "pro3.4_climax_27October2024", + "display_name": "pro3.4_climaxAug2024", "language": "python", - "name": "pro3.4_climax_27october2024" + "name": "pro3.4_climaxaug2024" }, "language_info": { "codemirror_mode": { @@ -2160,7 +2154,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.9" + "version": "3.11.8" } }, "nbformat": 4, From 8f928248ae124f0ffe21e904982c8364a4a4a271 Mon Sep 17 00:00:00 2001 From: moonlanderr Date: Mon, 17 Feb 2025 12:31:00 +0530 Subject: [PATCH 3/6] tabpfn license info added --- ...an_activity_using _tabPFN_classifier.ipynb | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb b/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb index 8e9c35f2a6..b33700ed1a 100644 --- a/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb +++ b/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb @@ -26,7 +26,8 @@ "* [Predicting using TabPFN classifier model](#14)\n", " * [Predict using the trained model](#15)\n", "* [Accuracy assessment: Compute Model Metric](#16)\n", - "* [Conclusion](#17)" + "* [Conclusion](#17)\n", + "* [TabPFN License Information](#18) " ] }, { @@ -2136,6 +2137,22 @@ "source": [ "This project highlights the powerful capabilities of the TabPFN classifier for Human Activity Recognition (HAR) tasks. Even with a training dataset of just 1,020 samples, the model achieved impressive results on a larger test dataset of 6,332 samples, with an accuracy of 96.81%, precision, recall, and F1 score all reaching 0.97. The TabPFN model's speed, simplicity, and strong performance in classifying human activities, highlight its potential for applications in healthcare, fitness, smart cities and disaster relief operations, offering an efficient and scalable solution for HAR systems." ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### TabPFN License Information " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "| License Description |\n", + "|:------------------- |\n", + "| Built with TabPFN - tabpfn.TabPFNClassifier |" + ] } ], "metadata": { From b818ab153e0798ff4f0be232005ae38a3c27cde3 Mon Sep 17 00:00:00 2001 From: moonlanderr Date: Mon, 17 Feb 2025 13:38:13 +0530 Subject: [PATCH 4/6] tabPFN name updated --- ...ssifying_human_activity_using _tabPFN_classifier.ipynb | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb b/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb index b33700ed1a..5516d9c201 100644 --- a/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb +++ b/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb @@ -1923,7 +1923,7 @@ "metadata": {}, "outputs": [], "source": [ - "activity_predicted_tabfpn = tabpfn_classifier.predict(X_test_lda_df, prediction_type='dataframe')" + "activity_predicted_tabpfn = tabpfn_classifier.predict(X_test_lda_df, prediction_type='dataframe')" ] }, { @@ -2038,7 +2038,7 @@ } ], "source": [ - "activity_predicted_tabfpn.tail(5)" + "activity_predicted_tabpfn.tail(5)" ] }, { @@ -2083,8 +2083,8 @@ ], "source": [ "# Extract ground truth and predictions\n", - "y_true = activity_predicted_tabfpn['Activity']\n", - "y_pred = activity_predicted_tabfpn['prediction_results']\n", + "y_true = activity_predicted_tabpfn['Activity']\n", + "y_pred = activity_predicted_tabpfn['prediction_results']\n", "\n", "# Calculate Accuracy\n", "accuracy = accuracy_score(y_true, y_pred)\n", From 665c67e5a568fac053a44551146f8ac9ed99fd95 Mon Sep 17 00:00:00 2001 From: moonlanderr Date: Thu, 20 Feb 2025 10:25:12 +0530 Subject: [PATCH 5/6] all suggestions added --- ...an_activity_using _tabPFN_classifier.ipynb | 24 ++++++++++--------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb b/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb index 5516d9c201..1377231472 100644 --- a/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb +++ b/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb @@ -43,7 +43,7 @@ "source": [ "Human Activity Recognition (HAR) using mobile data has become an important area of research and application due to the increasing ubiquity of smartphones, wearables, and other mobile devices that can collect a wealth of sensor data. HAR is a crucial task in various fields, including healthcare, fitness, workplace safety, and smart cities, where the goal is to classify human activities (e.g., walking, running, sitting) based on sensor data. Traditional methods for HAR often require substantial computational resources and complex hyperparameter tuning, making them difficult to deploy in real-time applications. TabPFN (Tabular Prior-Data Fitted Network), a Transformer-based model designed for fast and efficient classification of small tabular datasets, offers a promising solution to overcome these challenges.\n", "\n", - "TabPFN’s advantages are particularly well-suited for various HAR use cases. In healthcare, it aids in fall detection for the elderly, chronic disease monitoring, providing timely interventions. For fitness and wellness, it can classify activities such as walking or running in real-time, enhancing user experience in mobile apps and wearable devices. It enhances workplace safety by identifying risky workers activities in hazardous industrial environments such as mining, oil rigs ensuring safety and reducing accidents. Furthermore, in case of smart cities and urban mobility, HAR data from pedestrians and commuters can be efficiently classified to optimize traffic flow, public transport systems, and urban planning initiatives. Additionally, HAR supports emergency response efforts during disasters by locating people in need of help. Thus TabPFN's speed, simplicity, and effectiveness make it an ideal choice for these real-time HAR applications." + "TabPFN’s advantages are particularly well-suited for various HAR use cases. In healthcare, it aids in fall detection for the elderly, chronic disease monitoring, providing timely interventions. For fitness and wellness, it can classify activities such as walking or running in real-time, enhancing user experience in mobile apps and wearable devices. It enhances workplace safety by identifying risky workers' activities in hazardous industrial environments, such as in mining and on oil rigs, ensuring safety and reducing accidents. Furthermore, in the case of smart cities and urban mobility, HAR data from pedestrians and commuters can be efficiently classified to optimize traffic flow, public transport systems, and urban planning initiatives. Additionally, HAR supports emergency response efforts during disasters by locating people in need of help. TabPFN's speed, simplicity, and effectiveness make it an ideal choice for these real-time HAR applications." ] }, { @@ -751,7 +751,9 @@ { "cell_type": "code", "execution_count": 32, - "metadata": {}, + "metadata": { + "scrolled": true + }, "outputs": [ { "data": { @@ -1416,7 +1418,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In the above training dataframe we will use the `Activity` as the target label to be predicted using rest of the features as explanatory variables `X`. We define the explanatory variables as follows: " + "We define the explanatory variables as follows: In the training dataframe above we use `Activity` as the target label to be predicted, using the rest of the features as explanatory variables `X`. We define the explanatory variables as follows: " ] }, { @@ -1446,9 +1448,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Once the explanatory variables `X` is defined, this is now used as input in the `prepare_tabulardata` method from the tabular learner in the `arcgis.learn`. The method takes the feature layer or a spatial dataframe containing the dataset and prepares it for fitting the model. \n", + "Once the explanatory variables `X` are defined, they are used as input in the `prepare_tabulardata` method from the tabular learner in `arcgis.learn`. The method takes the feature layer or a spatial dataframe containing the dataset and prepares it for fitting the model.\n", "\n", - "The input parameters required for the tool are used as shown here :" + "The input parameters required for the tool are used as follows:" ] }, { @@ -1471,7 +1473,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "To get a sense of what the training data looks like, the `show_batch()` method will randomly pick a few training sample and visualize them. The sample are showing the explanatory variables and the `Activity` target label to predict." + "To get a sense of what the training data looks like, the `show_batch()` method will randomly pick a few training samples and visualize them. The samples show the explanatory variables and the `Activity` target label to predict." ] }, { @@ -2047,7 +2049,7 @@ "source": [ "### Accuracy assessment \n", "\n", - "Here we will evaluate the model's performance. This will print out multiple model metrics. we can assess the model quality using its corresponding metrics. These metrics include a combination of multiple evaluation criteria, such as `accuracy`, `precision`, `recall` and `F1-Score`, which collectively measure the model's performance on the validation set." + "Next, we will evaluate the model's performance. This will print out multiple model metrics that we can use to assess the model quality. These metrics include a combination of multiple evaluation criteria, such as `accuracy`, `precision`, `recall` and `F1-Score`, which collectively measure the model's performance on the validation set." ] }, { @@ -2135,7 +2137,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This project highlights the powerful capabilities of the TabPFN classifier for Human Activity Recognition (HAR) tasks. Even with a training dataset of just 1,020 samples, the model achieved impressive results on a larger test dataset of 6,332 samples, with an accuracy of 96.81%, precision, recall, and F1 score all reaching 0.97. The TabPFN model's speed, simplicity, and strong performance in classifying human activities, highlight its potential for applications in healthcare, fitness, smart cities and disaster relief operations, offering an efficient and scalable solution for HAR systems." + "This project highlights the powerful capabilities of the TabPFN classifier for Human Activity Recognition (HAR) tasks. Even with a training dataset of just 1,020 samples, the model achieved impressive results on a larger test dataset of 6,332 samples, with an accuracy of 96.81%, and precision, recall, and F1 scores all reaching 0.97. The TabPFN model's speed, simplicity, and strong performance in classifying human activities, highlight its potential for applications in healthcare, fitness, smart cities and disaster relief operations, offering an efficient and scalable solution for HAR systems." ] }, { @@ -2157,9 +2159,9 @@ ], "metadata": { "kernelspec": { - "display_name": "pro3.4_climaxAug2024", + "display_name": "pro3.5_LearnLesson2025", "language": "python", - "name": "pro3.4_climaxaug2024" + "name": "pro3.5_learnlesson2025" }, "language_info": { "codemirror_mode": { @@ -2171,7 +2173,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.8" + "version": "3.11.11" } }, "nbformat": 4, From b91d729c160149970b65f96a9b8b4111683002df Mon Sep 17 00:00:00 2001 From: moonlanderr Date: Fri, 28 Feb 2025 10:03:48 +0530 Subject: [PATCH 6/6] all suggestions added --- ...an_activity_using _tabPFN_classifier.ipynb | 39 +++++++++---------- 1 file changed, 19 insertions(+), 20 deletions(-) diff --git a/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb b/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb index 1377231472..e9a0716900 100644 --- a/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb +++ b/samples/04_gis_analysts_data_scientists/classifying_human_activity_using _tabPFN_classifier.ipynb @@ -14,20 +14,19 @@ "## Table of Contents \n", "* [Introduction](#1) \n", "* [Necessary imports](#2)\n", - "* [Connecting to ArcGIS](#3)\n", - "* [Accessing the datasets](#4) \n", + "* [Connect to ArcGIS](#3)\n", + "* [Access the datasets](#4) \n", "* [Prepare training data for TabPFN](#5)\n", - " * [Data Preprocessing for TabPFN Classifier Model](#6) \n", + " * [Data preprocessing for TabPFN classifier model](#6) \n", " * [Visualize training data](#9)\n", - "* [Model Training](#10) \n", + "* [Model training](#10) \n", " * [Define the TabPFN classifier model ](#11)\n", " * [Fit the model](#12)\n", " * [Visualize results in validation set](#13)\n", - "* [Predicting using TabPFN classifier model](#14)\n", - " * [Predict using the trained model](#15)\n", - "* [Accuracy assessment: Compute Model Metric](#16)\n", + "* [Predict using TabPFN classifier model](#14)\n", + "* [Accuracy assessment: Compute model metric](#16)\n", "* [Conclusion](#17)\n", - "* [TabPFN License Information](#18) " + "* [TabPFN license information](#18) " ] }, { @@ -84,7 +83,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Connecting to ArcGIS " + "## Connect to ArcGIS " ] }, { @@ -93,14 +92,14 @@ "metadata": {}, "outputs": [], "source": [ - "gis = GIS(\"/home\")" + "gis = GIS(\"home\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Accessing the datasets \n", + "## Access the datasets \n", "\n", "Here we will access the train and test datasets. The Human Activity Recognition (HAR) training dataset consists of 1,020 rows and 561 features, capturing sensor data from mobile devices to classify human activities like walking, running, and sitting. The data includes measurements from accelerometers, gyroscopes, and GPS, providing insights into movement patterns while ensuring that location data remains anonymized for privacy protection. Features such as BodyAcc (body accelerometer), GravityAcc (gravity accelerometer), BodyAccJerk, BodyGyro (body gyroscope), and BodyGyroJerk are used to capture dynamic and rotational movements. Time-domain and frequency-domain features are extracted from these raw signals, helping to distinguish between various activities based on patterns in acceleration, rotation, and speed, making the dataset ideal for activity classification tasks." ] @@ -1354,7 +1353,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Data Preprocessing for TabPFN Classifier Model" + "### Data preprocessing for TabPFN classifier model" ] }, { @@ -1459,7 +1458,7 @@ "metadata": {}, "outputs": [], "source": [ - "data = prepare_tabulardata(X_train_lda_df, 'Activity',explanatory_variables=X)" + "data = prepare_tabulardata(X_train_lda_df, 'Activity', explanatory_variables=X)" ] }, { @@ -1582,15 +1581,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Model Training \n", - "First we initialize the model as follows:" + "### Model training \n", + "First, we initialize the model as follows:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Define the TabPFN classifier model \n", + "### Model initialization \n", "\n", "The default, initialization of the TabPFN classifier model object is shown below:" ] @@ -1601,7 +1600,7 @@ "metadata": {}, "outputs": [], "source": [ - "tabpfn_classifier = MLModel(data, 'tabpfn.TabPFNClassifier',device='cpu', N_ensemble_configurations=32)" + "tabpfn_classifier = MLModel(data, 'tabpfn.TabPFNClassifier', device='cpu', N_ensemble_configurations=32)" ] }, { @@ -1777,7 +1776,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Predicting using the TabPFN classifier model \n", + "## Predict using the TabPFN classifier model \n", "\n", "Once the TabPFN classifier is trained on the smaller dataset of 1,020 samples, we can use it to predict the classes of a larger dataset containing 6,332 samples. Given TabPFN’s ability to process data efficiently with a single forward pass, it can handle this larger dataset quickly, classifying each sample based on the patterns learned during training. Since the model is optimized for fast and scalable predictions, it will generate class predictions for all samples. \n", "\n", @@ -1916,7 +1915,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Predicting using the trained model " + "### Predict " ] }, { @@ -2144,7 +2143,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### TabPFN License Information " + "### TabPFN license information " ] }, {