Skip to content

Commit 446454f

Browse files
AlexejPennerbcdurak
andcommitted
Added docs section (#3640)
* Added docs section * Update docs/book/user-guide/best-practices/choose-orchestration-environment.md Co-authored-by: Barış Can Durak <[email protected]> * Fixed wrong link * Added icon --------- Co-authored-by: Barış Can Durak <[email protected]>
1 parent b883aa5 commit 446454f

File tree

2 files changed

+169
-0
lines changed

2 files changed

+169
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
---
2+
description: How to choose the right orchestration environment
3+
icon: server
4+
---
5+
6+
# Choosing the right Orchestration Environment
7+
8+
When embarking on a machine learning project, one of the most critical early decisions is where to run your pipelines. This choice impacts development speed, costs, and the eventual path to production. In this post, we'll explore the most common environments for running initial ML experiments, helping you make an informed decision based on your specific needs.
9+
10+
### Local Environment
11+
12+
The local environment — your laptop or desktop computer - is where most ML projects begin their journey.
13+
14+
<table>
15+
<tr>
16+
<td>
17+
18+
### Pros:
19+
20+
- **Zero setup time**: Start coding immediately without provisioning remote resources
21+
- **No costs**: Uses hardware you already own
22+
- **Low latency**: No network delays when working with data
23+
- **Works offline**: Develop on planes, in cafes, or anywhere without internet
24+
- **Complete control**: Easy access to logs, files, and debugging capabilities
25+
- **Simplicity**: No need to interact with cloud configurations or container orchestration
26+
27+
</td>
28+
<td>
29+
30+
### Cons:
31+
32+
- **Environment inconsistency**: "Works on my machine" problems
33+
- **Limited resources**: RAM, CPU, and GPU constraints
34+
- **Poor scalability**: Difficult to process large datasets
35+
- **Limited parallelization**: Running multiple experiments simultaneously is challenging
36+
37+
</td>
38+
</tr>
39+
</table>
40+
41+
### Ideal for:
42+
43+
- Quick proof-of-concepts with small datasets
44+
- Early-stage algorithm development and debugging
45+
- Small datasets, low compute requirements
46+
- Small teams with standardized development environments
47+
- Projects with minimal computational requirements
48+
49+
### Cloud VMs/Serverless Functions
50+
51+
When local resources become insufficient, cloud virtual machines (VMs) or serverless functions offer the next step up.
52+
53+
<table>
54+
<tr>
55+
<td>
56+
57+
### Pros:
58+
59+
- **Scalable resources**: Access to powerful CPUs/GPUs as needed
60+
- **Pay-per-use**: Only pay for what you consume
61+
- **Flexibility**: Choose the right instance type for your workload
62+
- **No hardware management**: Leave infrastructure concerns to the provider
63+
- **Easy snapshots**: Create machine images to replicate environments
64+
- **Global accessibility**: Access your work from anywhere
65+
66+
</td>
67+
<td>
68+
69+
### Cons:
70+
71+
- **Costs can accumulate**: Easy to forget running instances
72+
- **Setup complexity**: Requires cloud provider knowledge (if not using ZenML)
73+
- **Security considerations**: Data must leave your local network
74+
- **Dependency management**: Need to configure environments properly
75+
- **Network dependency**: Requires internet connection for access
76+
77+
</td>
78+
</tr>
79+
</table>
80+
81+
### Ideal for:
82+
83+
- Larger datasets that won't fit in local memory
84+
- Projects requiring specific hardware (like GPUs)
85+
- Teams working remotely across different locations
86+
- Experiments that run for hours or days
87+
- Projects transitioning from development to small-scale production
88+
89+
### Kubernetes
90+
91+
Kubernetes provides a platform for automating the deployment, scaling, and operations of application containers.
92+
93+
<table>
94+
<tr>
95+
<td>
96+
97+
### Pros:
98+
99+
- **Containerization**: Ensures consistency across environments
100+
- **Resource optimization**: Efficient allocation of compute resources
101+
- **Horizontal scaling**: Easily scale out experiments across nodes
102+
- **Orchestration**: Automated management of your workloads
103+
- **Reproducibility**: Consistent environments for all team members
104+
- **Production readiness**: Similar environment for both experiments and production
105+
106+
</td>
107+
<td>
108+
109+
### Cons:
110+
111+
- **Steep learning curve**: Requires Kubernetes expertise
112+
- **Complex setup**: Significant initial configuration
113+
- **Overhead**: May be overkill for simple experiments
114+
- **Resource consumption**: Kubernetes itself consumes resources
115+
- **Maintenance burden**: Requires ongoing cluster management
116+
117+
</td>
118+
</tr>
119+
</table>
120+
121+
### Ideal for:
122+
123+
- Teams already using Kubernetes for production
124+
- Experiments that need to be distributed across machines
125+
- Projects requiring strict environment isolation
126+
- ML workflows that benefit from a microservices architecture
127+
- Organizations with dedicated DevOps support
128+
129+
### Databricks
130+
131+
Databricks provides a unified analytics platform designed specifically for big data processing and machine learning.
132+
133+
<table>
134+
<tr>
135+
<td>
136+
137+
### Pros:
138+
139+
- **Optimized for Spark**: Excellent for large-scale data processing
140+
- **Collaborative notebooks**: Built-in collaboration features
141+
- **Managed infrastructure**: Minimal setup required
142+
- **Integrated MLflow**: Built-in experiment tracking
143+
- **Auto-scaling**: Dynamically adjusts cluster size
144+
- **Delta Lake integration**: Reliable data lake operations
145+
- **Enterprise security**: Compliance and governance features
146+
147+
</td>
148+
<td>
149+
150+
### Cons:
151+
152+
- **Cost**: Typically more expensive than raw cloud resources
153+
- **Vendor lock-in**: Some features are Databricks-specific
154+
- **Learning curve**: New interface and workflows to learn
155+
- **Less flexibility**: Some customizations are more difficult
156+
- **Not ideal for small data**: Overhead for tiny datasets
157+
158+
</td>
159+
</tr>
160+
</table>
161+
162+
### Ideal for:
163+
164+
- Data science teams in large enterprises
165+
- Projects involving both big data processing and ML
166+
- Teams that need collaboration features built-in
167+
- Organizations already using Spark
168+
- Projects requiring end-to-end governance and security

docs/book/user-guide/toc.md

+1
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@
7070
* [Using VS Code extension](best-practices/vscode-extension.md)
7171
* [Leveraging MCP](best-practices/mcp-chat-with-server.md)
7272
* [Debugging and Solving Issues](best-practices/debug-and-solve-issues.md)
73+
* [Choosing an Orchestrator](best-practices/choose-orchestration-environment.md)
7374

7475
## Examples
7576

0 commit comments

Comments
 (0)