Skip to content

1. One file example

Justin edited this page Jul 27, 2023 · 1 revision

To get started, you can use this library in one file :

Import and init

import graph-etl as getl
getl.init(options)

Check init options to see details on options parameter that can be passed to getl.init()

Parse your file in Python

Using etl.Parser in a with clause will execute the code directly when you start the script. You can use it as the decorator in combination with etl.parse function, details can be found on the multiple file example.

with @getl.Parser(
    sources_path="./example/data/IMDB-Movie-Data.csv",
    source="kaggle"
) as context:
    df = (
        pl.read_csv("./example/data/IMDB-Movie-Data.csv")
            .select(["Rank", "Title", "Director", "Actors"])
            .with_columns([
                pl.col("Actors").str.split(", ")
            ])
            .explode("Actors")
            .rename({
                "Rank": "id",
                "Title": "title"
            })
    )
    
    context.save_nodes(
        df.select(["id", "title"]),
        "Movie"
    )
    
    context.save_nodes(
        pl.concat((
            df.select("Actors").rename({"Actors": "name"}),
            df.select("Director").rename({"Director": "name"})
        )),
        "Person",
        primary_key="name"
    )
    
    context.save_edges(
        df.select(["id", "Actors"]).rename({"id": "end",  "Actors": "start"}),
        "ACTED_IN",
        start_id="Person:name",
        end_id="Movie:id"
    )
    
    context.save_edges(
        df.select(["id", "Director"]).rename({"id": "end",  "Director": "start"}),
        "DIRECTED",
        start_id="Person:name",
        end_id="Movie:id"
    )

Details on the getl.Parser object can be found here.

Create loading object and load

neo_connection = getl.Neo4JLoader()
tiger_connection = getl.TigerGraphLoader()
etl.load(neo_connection)

Parameters passed to getl.Neo4JLoader are used to create a neo4j driver, same for getl.TigerGraphLoader() used to create a tigergraph driver.

Clear intermediate files

After parsing your files, the ETL will store it as CSV files if you want to clear everything, don't forget to call :

getl.clear()