Skip to content

Commit 5150255

Browse files
authored
Merge pull request #17 from forcedotcom/onboarding-readme-updates-3
Add Dev Containers and JupyterLab docs
2 parents 8f46ff6 + 15e3e7b commit 5150255

File tree

1 file changed

+43
-3
lines changed

1 file changed

+43
-3
lines changed

README.md

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
# Data Cloud Custom Code SDK
22

3-
<img src="https://img.shields.io/badge/version-0.1.0-blue" alt="license">
4-
53
This package provides a development kit for creating custom data transformations in [Data Cloud](https://www.salesforce.com/data/). It allows you to write your own data processing logic in Python while leveraging Data Cloud's infrastructure for data access and running data transformations, mapping execution into Data Cloud data structures like [Data Model Objects](https://help.salesforce.com/s/articleView?id=data.c360_a_data_model_objects.htm&type=5) and [Data Lake Objects](https://help.salesforce.com/s/articleView?id=sf.c360_a_data_lake_objects.htm&language=en_US&type=5).
64

75
More specifically, this codebase gives you ability to test code locally before pushing to Data Cloud's remote execution engine, greatly reducing how long it takes to develop.
@@ -84,7 +82,7 @@ Once the Data Transform run is successful, check the DLO your script is writing
8482

8583
## API
8684

87-
You entry point script will define logic using the `Client` object which wraps data access layers.
85+
Your entry point script will define logic using the `Client` object which wraps data access layers.
8886

8987
You should only need the following methods:
9088
* `read_dlo(name)` – Read from a Data Lake Object by name
@@ -174,6 +172,48 @@ Zip a transformation job in preparation to upload to Data Cloud.
174172
Options:
175173
- `--path TEXT`: Path to the code directory (default: ".")
176174

175+
## Docker usage
176+
177+
After initializing a project with `datacustomcode init my_package`, you might notice a Dockerfile. This file isn't used for the
178+
[Quick Start](#quick-start) approach above, which uses virtual environments, until the `zip` or `deploy` commands are used. When using dependencies
179+
that include [native features](https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html#using-pyspark-native-features)
180+
like C++ or C interop, the platform and architecture may be different between your machine and Data Cloud compute. This is all taken care of
181+
in the `zip` and `deploy` commands, which utilize the Dockerfile which starts `FROM` an image compatible with Data Cloud. However, you may
182+
want to build, run, and test your script on your machine using the same platform and architecture as Data Cloud. You can use the sections below
183+
to test your script in this manner.
184+
185+
### VS Code Dev Containers
186+
187+
Within your `init`ed package, you will find a `.devcontainer` folder which allows you to run a docker container while developing inside of it.
188+
189+
Read more about Dev Containers here: https://code.visualstudio.com/docs/devcontainers/containers.
190+
191+
1. Install the VS Code extension "Dev Containers" by microsoft.com.
192+
1. Open your package folder in VS Code, ensuring that the `.devcontainer` folder is at the root of the File Explorer
193+
1. Bring up the Command Palette (on mac: Cmd + Shift + P), and select "Dev Containers: Rebuild and Reopen in Container"
194+
1. Allow the docker image to be built, then you're ready to develop
195+
1. Now if you open a terminal (within the Dev Container window) and `datacustomcode run ./payload/entrypoint.py`, it will run inside a docker container that more closely resembles Data Cloud compute than your machine
196+
197+
> [!IMPORTANT]
198+
> Dev Containers get their own tmp file storage, so you'll need to re-run `datacustomcode configure` every time you "Rebuild and Reopen in Container".
199+
200+
### JupyterLab
201+
202+
Within your `init`ed package, you will find a `jupyterlab.sh` file that can open a jupyter notebook for you. Jupyter notebooks, in
203+
combination with Data Cloud's [Query Editor](https://help.salesforce.com/s/articleView?id=data.c360_a_add_queries_to_a_query_workspace.htm&type=5)
204+
and [Data Explorer](https://help.salesforce.com/s/articleView?id=data.c360_a_data_explorer.htm&type=5), can be extremely helpful for data
205+
exploration. Instead of running an entire script, one can run one code cell at a time as they discover and experiment with the DLO or DMO data.
206+
207+
You can read more about Jupyter Notebooks here: https://jupyter.org/
208+
209+
1. Within the root project of your package folder, run `./jupyterlab.sh start`
210+
1. Double-click on "account.ipynb" file, which provides a starting point for a notebook
211+
1. Use shift+enter to execute each cell within the notebook. Add/edit/delete cells of code as needed for your data exploration.
212+
1. Don't forget to run `./jupyterlab.sh stop` to stop the docker container
213+
214+
> [!IMPORTANT]
215+
> JupyterLab uses its own tmp file storage, so you'll need to re-run `datacustomcode configure` each time you `./jupyterlab.sh start`.
216+
177217
## Prerequisite details
178218

179219
### Creating a connected app

0 commit comments

Comments
 (0)