diff --git a/documentation/tutorials/kaggle_beginner_example_classification.ipynb b/documentation/tutorials/kaggle_beginner_example_classification.ipynb
new file mode 100644
index 00000000..bd796f04
--- /dev/null
+++ b/documentation/tutorials/kaggle_beginner_example_classification.ipynb
@@ -0,0 +1,1804 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "##### Copyright 2022 The TensorFlow Authors."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "# you may not use this file except in compliance with the License.\n",
+ "# You may obtain a copy of the License at\n",
+ "#\n",
+ "# https://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing, software\n",
+ "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "# See the License for the specific language governing permissions and\n",
+ "# limitations under the License."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "MDBzBKC_pnXl"
+ },
+ "source": [
+ "# Structured Data Classification using TFDF\n",
+ "\n",
+ "
\n",
+ " "
+ ],
+ "text/plain": [
+ " Survived Pclass Sex Age SibSp Parch Fare Cabin Embarked\n",
+ "0 0 3 male 22.0 1 0 7.2500 NaN S\n",
+ "1 1 1 female 38.0 1 0 71.2833 C85 C\n",
+ "2 1 3 female 26.0 0 0 7.9250 NaN S"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "train_full_data.head(3)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Qs070SbkMJix"
+ },
+ "source": [
+ "Refer to [Kaggle](https://www.kaggle.com/competitions/titanic/data) for a comprehensive guide to the data."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "cwdbYZeTJP89"
+ },
+ "source": [
+ "## Exploratory Data Analysis (EDA)\n",
+ "Data scientists use exploratory analysis techniques to analyze and visualize large datasets. This process helps them identify the main characteristics of their data sets and develop effective strategies to get the answers they need. It can also help them spot anomalies and test hypotheses.\n",
+ "\n",
+ "For this dataset, there are some amazing notebooks already available on Kaggle. One of them is [EDA is fun](https://www.kaggle.com/code/prashant111/eda-is-fun#EDA-is-fun) by Prashant Banerjee."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "2EpAa_q55Ke8"
+ },
+ "source": [
+ "## Prepare the dataset\n",
+ "This dataset contains a mix of numeric, categorical and missing features. TF-DF supports all these feature types natively, and no preprocessing is required. This is one advantage of tree-based models; making them a great entry point to TensorFlow and ML."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "uQMX8Md3ISq0"
+ },
+ "source": [
+ "Convert the values stored in the `Survived` column to a list of values, where the list does not allow for duplicates. `Survived` has one of two values, 0 or 1."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "YmrDp4SL7hTw"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Label classes: [0, 1]\n"
+ ]
+ }
+ ],
+ "source": [
+ "label=\"Survived\"\n",
+ "classes = train_full_data[label].unique().tolist()\n",
+ "print(f\"Label classes: {classes}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "0NGJhK0R58Oa"
+ },
+ "source": [
+ "Split the dataset into training and testing:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "CW3ofmmI5xIr"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "611 examples in training, 280 examples in validation.\n"
+ ]
+ }
+ ],
+ "source": [
+ "def split_dataset(dataset, test_ratio=0.30):\n",
+ " test_indices = np.random.rand(len(dataset)) < test_ratio\n",
+ " return dataset[~test_indices], dataset[test_indices]\n",
+ "\n",
+ "train_ds_pd, val_ds_pd = split_dataset(train_full_data)\n",
+ "print(\"{} examples in training, {} examples in validation.\".format(\n",
+ " len(train_ds_pd), len(val_ds_pd)))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "I0ZrYmer6tMp"
+ },
+ "source": [
+ "There's one more step required before you can train your model. You need to convert from Pandas format (`pd.DataFrame`) into TensorFlow format (`tf.data.Dataset`). A single line helper function that will do this for you: \n",
+ "\n",
+ "```\n",
+ "tfdf.keras.pd_dataframe_to_tf_dataset(your_df, label='your_label', task=tfdf.keras.Task.CLASSIFICATION)\n",
+ "```\n",
+ "\n",
+ "This is a high [performance](https://www.tensorflow.org/guide/data_performance) data loading library which is helpful when training neural networks with accelerators like [GPUs](https://cloud.google.com/gpu) and [TPUs](https://cloud.google.com/tpu). It is not necessary for tree-based models until you begin to do distributed training.\n",
+ "\n",
+ "Note that tf.data is a bit tricky to use, and has a learning curve. There are guides on [tensorflow.org/guide](https://www.tensorflow.org/guide) to help."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "DyAHpZ0R6B5R"
+ },
+ "outputs": [],
+ "source": [
+ "train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(\n",
+ " train_ds_pd, \n",
+ " label = label, \n",
+ " task = tfdf.keras.Task.CLASSIFICATION)\n",
+ "\n",
+ "val_ds = tfdf.keras.pd_dataframe_to_tf_dataset(\n",
+ " val_ds_pd, \n",
+ " label = label, \n",
+ " task = tfdf.keras.Task.CLASSIFICATION)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "3m46QYDz8IB4"
+ },
+ "source": [
+ "## Create and train a Random Forest model "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "11yxinBK78qU"
+ },
+ "outputs": [],
+ "source": [
+ "model = tfdf.keras.RandomForestModel(task = tfdf.keras.Task.CLASSIFICATION)\n",
+ "model.compile(metrics=[\"accuracy\"]) # Optional, you can use this to include a list of eval metrics"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "_tQfGooA8OI2"
+ },
+ "outputs": [],
+ "source": [
+ "model.fit(x=train_ds)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "YoqPROtT9A33"
+ },
+ "source": [
+ "## Visualize your model\n",
+ "One benefit of tree-based models is that you can easily visualize them. The default number of trees used in the Random Forest is 300. You can select a tree to display below."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "Cwv7-NXc8WUq"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "tfdf.model_plotter.plot_model_in_colab(model, tree_idx=0, max_depth=3)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "RtGEzEGU9FsI"
+ },
+ "source": [
+ "## Evaluate the model on OOB data and the validation dataset\n",
+ "\n",
+ "Let's plot accuracy on OOB evaluation dataset as a function of the number of trees in the forest. One of the nice features about this particular hyperparameter is that larger values are usually better, and come with little risk aside from slowing down training."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "4nOZy6lX9CwJ"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ "