|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "#### Connect to Abacus\n", |
| 8 | + "You can find your API key here: [API KEY](https://abacus.ai/app/profile/apikey)" |
| 9 | + ] |
| 10 | + }, |
| 11 | + { |
| 12 | + "cell_type": "code", |
| 13 | + "execution_count": null, |
| 14 | + "metadata": {}, |
| 15 | + "outputs": [], |
| 16 | + "source": [ |
| 17 | + "import abacusai\n", |
| 18 | + "client = abacusai.ApiClient(\"YOUR API KEY\")" |
| 19 | + ] |
| 20 | + }, |
| 21 | + { |
| 22 | + "cell_type": "markdown", |
| 23 | + "metadata": {}, |
| 24 | + "source": [ |
| 25 | + "#### Finding API's easily.\n", |
| 26 | + "There are two ways to find API's easily in the Abacus platform:\n", |
| 27 | + "1. Try auto-completion by using tab. Most API's follow expressive language so you can search them using the autocomplete feature.\n", |
| 28 | + "2. Use the `suggest_abacus_apis` method. This method calls a large language model that has access to our full documentation. It can suggest you what API works for what you are trying to do.\n", |
| 29 | + "3. Use the official [Python SDK documentation](https://abacusai.github.io/api-python/autoapi/abacusai/index.html) page which will have all the available methods and attributes of classes." |
| 30 | + ] |
| 31 | + }, |
| 32 | + { |
| 33 | + "cell_type": "code", |
| 34 | + "execution_count": null, |
| 35 | + "metadata": {}, |
| 36 | + "outputs": [], |
| 37 | + "source": [ |
| 38 | + "apis = client.suggest_abacus_apis(\"list feature groups in a project\", verbosity=2, limit=3)\n", |
| 39 | + "for api in apis:\n", |
| 40 | + " print(f\"Method: {api.method}\")\n", |
| 41 | + " print(f\"Docstring: {api.docstring}\")\n", |
| 42 | + " print(\"---\")" |
| 43 | + ] |
| 44 | + }, |
| 45 | + { |
| 46 | + "cell_type": "markdown", |
| 47 | + "metadata": {}, |
| 48 | + "source": [ |
| 49 | + "#### Project Level API's\n", |
| 50 | + "You can find the ID easily by looking at the URL in your browser. For example, if your URL looks like this: `https://abacus.ai/app/projects/fsdfasg33?doUpload=true`, the project id is \"fsdfasg33\"" |
| 51 | + ] |
| 52 | + }, |
| 53 | + { |
| 54 | + "cell_type": "code", |
| 55 | + "execution_count": null, |
| 56 | + "metadata": {}, |
| 57 | + "outputs": [], |
| 58 | + "source": [ |
| 59 | + "# Gets information about the project based on the ID.\n", |
| 60 | + "project = client.describe_project(project_id=\"YOUR_PROJECT_ID\")\n", |
| 61 | + "\n", |
| 62 | + "# A list of all models trained under the project\n", |
| 63 | + "models = client.list_models(project_id=\"YOUR_PROJECT_ID\")" |
| 64 | + ] |
| 65 | + }, |
| 66 | + { |
| 67 | + "cell_type": "markdown", |
| 68 | + "metadata": {}, |
| 69 | + "source": [ |
| 70 | + "#### Load a Feature Group" |
| 71 | + ] |
| 72 | + }, |
| 73 | + { |
| 74 | + "cell_type": "code", |
| 75 | + "execution_count": null, |
| 76 | + "metadata": {}, |
| 77 | + "outputs": [], |
| 78 | + "source": [ |
| 79 | + "# Loads the specific version of a FeatureGroup class object \n", |
| 80 | + "fg = client.describe_feature_group_version(\"FEATURE_GROUP_VERSION\")\n", |
| 81 | + "\n", |
| 82 | + "# Loads the latest version of a FeatureGroup class object based on a name\n", |
| 83 | + "fg = client.describe_feature_group_by_table_name(\"FEATURE_GROUP_NAME\")\n", |
| 84 | + "\n", |
| 85 | + "# Loads the FeatureGroup as a pandas dataframe\n", |
| 86 | + "df = fg.load_as_pandas()" |
| 87 | + ] |
| 88 | + }, |
| 89 | + { |
| 90 | + "cell_type": "markdown", |
| 91 | + "metadata": {}, |
| 92 | + "source": [ |
| 93 | + "#### Add a Feature Group to the Project" |
| 94 | + ] |
| 95 | + }, |
| 96 | + { |
| 97 | + "cell_type": "code", |
| 98 | + "execution_count": null, |
| 99 | + "metadata": {}, |
| 100 | + "outputs": [], |
| 101 | + "source": [ |
| 102 | + "# First we connect our docstore to our project\n", |
| 103 | + "client.add_feature_group_to_project(\n", |
| 104 | + " feature_group_id='FEATURE_GROUP_ID',\n", |
| 105 | + " project_id='PROJECT_ID',\n", |
| 106 | + " feature_group_type='CUSTOM_TABLE' # You can set to DOCUMENTS if this is a document set\n", |
| 107 | + ")" |
| 108 | + ] |
| 109 | + }, |
| 110 | + { |
| 111 | + "cell_type": "markdown", |
| 112 | + "metadata": {}, |
| 113 | + "source": [ |
| 114 | + "#### Update the feature group SQL definition" |
| 115 | + ] |
| 116 | + }, |
| 117 | + { |
| 118 | + "cell_type": "code", |
| 119 | + "execution_count": null, |
| 120 | + "metadata": {}, |
| 121 | + "outputs": [], |
| 122 | + "source": [ |
| 123 | + "client.update_feature_group_sql_definition('YOUR_FG_ID', 'SQL')" |
| 124 | + ] |
| 125 | + }, |
| 126 | + { |
| 127 | + "cell_type": "markdown", |
| 128 | + "metadata": {}, |
| 129 | + "source": [ |
| 130 | + "#### Creating a Dataset from local\n", |
| 131 | + "For every dataset created, a feature group with the same name will also be generated. When you need to update the source data, just update the dataset directly and the feature group will also reflect those changes." |
| 132 | + ] |
| 133 | + }, |
| 134 | + { |
| 135 | + "cell_type": "code", |
| 136 | + "execution_count": null, |
| 137 | + "metadata": {}, |
| 138 | + "outputs": [], |
| 139 | + "source": [ |
| 140 | + "import io\n", |
| 141 | + "zip_filename= 'sample_data_folder.zip'\n", |
| 142 | + "\n", |
| 143 | + "with open(zip_filename, 'rb') as f:\n", |
| 144 | + " zip_file_content = f.read()\n", |
| 145 | + "\n", |
| 146 | + "zip_file_io = io.BytesIO(zip_file_content)\n", |
| 147 | + "\n", |
| 148 | + "# If the ZIP folder contains unstructured text documents (PDF, Word, etc.), then set `is_documentset` == True\n", |
| 149 | + "upload = client.create_dataset_from_upload(table_name='MY_SAMPLE_DATA', file_format='ZIP', is_documentset=False)\n", |
| 150 | + "upload.upload_file(zip_file_io)" |
| 151 | + ] |
| 152 | + }, |
| 153 | + { |
| 154 | + "cell_type": "markdown", |
| 155 | + "metadata": {}, |
| 156 | + "source": [ |
| 157 | + "#### Updating a Dataset from local" |
| 158 | + ] |
| 159 | + }, |
| 160 | + { |
| 161 | + "cell_type": "code", |
| 162 | + "execution_count": null, |
| 163 | + "metadata": {}, |
| 164 | + "outputs": [], |
| 165 | + "source": [ |
| 166 | + "upload = client.create_dataset_version_from_upload(dataset_id='YOUR_DATASET_ID', file_format='ZIP')\n", |
| 167 | + "upload.upload_file(zip_file_io)" |
| 168 | + ] |
| 169 | + }, |
| 170 | + { |
| 171 | + "cell_type": "markdown", |
| 172 | + "metadata": {}, |
| 173 | + "source": [ |
| 174 | + "#### Executing SQL using a connector" |
| 175 | + ] |
| 176 | + }, |
| 177 | + { |
| 178 | + "cell_type": "code", |
| 179 | + "execution_count": null, |
| 180 | + "metadata": {}, |
| 181 | + "outputs": [], |
| 182 | + "source": [ |
| 183 | + "connector_id = \"YOUR_CONNECTOR_ID\"\n", |
| 184 | + "sql_query = \"SELECT * FROM TABLE LIMIT 5\"\n", |
| 185 | + "\n", |
| 186 | + "result = client.query_database_connector(connector_id, sql_query)" |
| 187 | + ] |
| 188 | + }, |
| 189 | + { |
| 190 | + "cell_type": "markdown", |
| 191 | + "metadata": {}, |
| 192 | + "source": [ |
| 193 | + "#### Uploading a Dataset using a connector\n", |
| 194 | + "\n", |
| 195 | + "`doc_processing_config` is optional depending on if you want to load a document set or no. use the code below and change based on your application. \n", |
| 196 | + "\n", |
| 197 | + "Similar to `create_dataset_from_file_connector` you can use `create_dataset_from_database_connector`." |
| 198 | + ] |
| 199 | + }, |
| 200 | + { |
| 201 | + "cell_type": "code", |
| 202 | + "execution_count": null, |
| 203 | + "metadata": {}, |
| 204 | + "outputs": [], |
| 205 | + "source": [ |
| 206 | + "# doc_processing_config = abacusai.DatasetDocumentProcessingConfig(\n", |
| 207 | + "# extract_bounding_boxes=True,\n", |
| 208 | + "# use_full_ocr=False,\n", |
| 209 | + "# remove_header_footer=False,\n", |
| 210 | + "# remove_watermarks=True,\n", |
| 211 | + "# convert_to_markdown=False,\n", |
| 212 | + "# )\n", |
| 213 | + "\n", |
| 214 | + "dataset = client.create_dataset_from_file_connector(\n", |
| 215 | + " table_name=\"MY_TABLE_NAME\",\n", |
| 216 | + " location=\"azure://my-location:share/whatever/*\",\n", |
| 217 | + " # refresh_schedule=\"0 0 * * *\", # Daily refresh at midnight UTC\n", |
| 218 | + " # is_documentset=True, #Only if this is an actual documentset (Meaning word documents, PDF files, etc)\n", |
| 219 | + " # extract_bounding_boxes=True,\n", |
| 220 | + " # document_processing_config=doc_processing_config,\n", |
| 221 | + " # reference_only_documentset=False,\n", |
| 222 | + ")" |
| 223 | + ] |
| 224 | + }, |
| 225 | + { |
| 226 | + "cell_type": "markdown", |
| 227 | + "metadata": {}, |
| 228 | + "source": [ |
| 229 | + "#### Updating a Dataset using a connector" |
| 230 | + ] |
| 231 | + }, |
| 232 | + { |
| 233 | + "cell_type": "code", |
| 234 | + "execution_count": null, |
| 235 | + "metadata": {}, |
| 236 | + "outputs": [], |
| 237 | + "source": [ |
| 238 | + "client.create_dataset_version_from_file_connector('DATASET_ID') # For file connector\n", |
| 239 | + "client.create_dataset_version_from_database_connector('DATASET_ID')" |
| 240 | + ] |
| 241 | + }, |
| 242 | + { |
| 243 | + "cell_type": "markdown", |
| 244 | + "metadata": {}, |
| 245 | + "source": [ |
| 246 | + "#### Export A feature group to a connector\n", |
| 247 | + "Below code will also work for non-SQL connectors like blob storages. The `database_feature_mapping` would be optional in those cases.\n", |
| 248 | + "\n", |
| 249 | + "You can find the `connector_id` [here](https://abacus.ai/app/profile/connected_services)" |
| 250 | + ] |
| 251 | + }, |
| 252 | + { |
| 253 | + "cell_type": "code", |
| 254 | + "execution_count": null, |
| 255 | + "metadata": {}, |
| 256 | + "outputs": [], |
| 257 | + "source": [ |
| 258 | + "WRITEBACK = 'Anonymized_Store_Week_Result'\n", |
| 259 | + "MAPPING = {\n", |
| 260 | + " 'COLUMN_1': 'COLUMN_1', \n", |
| 261 | + " 'COLUMN_2': 'COLUMN_2', \n", |
| 262 | + "}\n", |
| 263 | + "\n", |
| 264 | + "feature_group = client.describe_feature_group_by_table_name(f\"FEATURE_GROUP_NAME\")\n", |
| 265 | + "feature_group.materialize() # To make sure we have latest version\n", |
| 266 | + "feature_group_version = feature_group.latest_feature_group_version.feature_group_version\n", |
| 267 | + "client.export_feature_group_version_to_database_connector(\n", |
| 268 | + " feature_group_version, \n", |
| 269 | + " database_connector_id='connector_id',\n", |
| 270 | + " object_name=WRITEBACK,\n", |
| 271 | + " database_feature_mapping=MAPPING, \n", |
| 272 | + " write_mode='insert'\n", |
| 273 | + ")" |
| 274 | + ] |
| 275 | + }, |
| 276 | + { |
| 277 | + "cell_type": "markdown", |
| 278 | + "metadata": {}, |
| 279 | + "source": [ |
| 280 | + "#### " |
| 281 | + ] |
| 282 | + } |
| 283 | + ], |
| 284 | + "metadata": { |
| 285 | + "language_info": { |
| 286 | + "name": "python" |
| 287 | + } |
| 288 | + }, |
| 289 | + "nbformat": 4, |
| 290 | + "nbformat_minor": 2 |
| 291 | +} |
0 commit comments