Spline - data lineage tracking solution for data pipelines like Apache Spark and others
The project consists of three main parts:
-
Spark Agent that sits on a driver capturing the data lineage from Spark jobs by analyzing the execution plans
-
Rest Gateway that receives the lineage data from agent and stores it in the database
-
Web UI application that visualizes the stored data lineages
Spin up a Spline server in a Docker
wget https://raw.githubusercontent.com/AbsaOSS/spline-getting-started/main/docker/compose.yaml
wget https://raw.githubusercontent.com/AbsaOSS/spline-getting-started/main/docker/.env
SEED=1 docker-compose up
# SEED=1 means to also run sample jobs to populate the database.
You can access Spline services on the following URLs:
- Spline Web UI: http://localhost:9090
- Spline Server: http://localhost:8080
To access Spline UI from another host set DOCKER_HOST_EXTERNAL
variable pointing to the current host before running docker-compose
.
Spline UI will propagate it to the user browser so that one will be able to connect to the Spline REST endpoint from the outside of this machine.
DOCKER_HOST_EXTERNAL=192.168.1.222 docker-compose up
There are three ways how to customize default Spline Spark Agent behavior. Choose the one that fits you needs better.
- A lot of things can be customized declaratively, without any coding needed, by just tweaking the Agent configuration.
- Spline agent is designed for extension, so the chances are it's enough to override some method or implement some trait to achieve desired behavior, and attach it as an extension module to your Spark application. See the example extension project.
- If the extension API isn't enough then fork the project, replace the Maven coordinates with your custom ones, and build the agent as your own JAR.
For more information about Spline see - https://absaoss.github.io/spline/
Enjoy.
Copyright 2019 ABSA Group Limited
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.