-
Notifications
You must be signed in to change notification settings - Fork 0
1. One file example
Justin edited this page Jul 27, 2023
·
1 revision
To get started, you can use this library in one file :
import graph-etl as getl
getl.init(options)
Check init options to see details on options
parameter that can be passed to getl.init()
Using etl.Parser
in a with
clause will execute the code directly when you start the script.
You can use it as the decorator in combination with etl.parse
function, details can be found on the multiple file example.
with @getl.Parser(
sources_path="./example/data/IMDB-Movie-Data.csv",
source="kaggle"
) as context:
df = (
pl.read_csv("./example/data/IMDB-Movie-Data.csv")
.select(["Rank", "Title", "Director", "Actors"])
.with_columns([
pl.col("Actors").str.split(", ")
])
.explode("Actors")
.rename({
"Rank": "id",
"Title": "title"
})
)
context.save_nodes(
df.select(["id", "title"]),
"Movie"
)
context.save_nodes(
pl.concat((
df.select("Actors").rename({"Actors": "name"}),
df.select("Director").rename({"Director": "name"})
)),
"Person",
primary_key="name"
)
context.save_edges(
df.select(["id", "Actors"]).rename({"id": "end", "Actors": "start"}),
"ACTED_IN",
start_id="Person:name",
end_id="Movie:id"
)
context.save_edges(
df.select(["id", "Director"]).rename({"id": "end", "Director": "start"}),
"DIRECTED",
start_id="Person:name",
end_id="Movie:id"
)
Details on the getl.Parser
object can be found here.
neo_connection = getl.Neo4JLoader()
tiger_connection = getl.TigerGraphLoader()
etl.load(neo_connection)
Parameters passed to getl.Neo4JLoader
are used to create a neo4j driver, same for getl.TigerGraphLoader()
used to create a tigergraph driver.
After parsing your files, the ETL will store it as CSV files if you want to clear everything, don't forget to call :
getl.clear()