Skip to content

A real world example of an ETL pipeline on JSON data in GO.

Notifications You must be signed in to change notification settings

ojasww/generic-etl-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generic ETL Pipeline

This repository contains solution that solves the following problem:

Given a set of input files which contain JSON data of same template with different key-valu pairs and a schema JSON which represent the final format, convert each input file into an output file. The key mappings are provided in another JSON file.

This is a typical example of an ETL - Extract Transform and Load pipeline. Data pipelining from different data sources, be it a data warehouse or a data lake, need to be refined and transformed before storing it in a standardized format. This problem is a simpler version of exactly this.

Problem Statement

Repository structure:

pkg/
├── buildjson/
│   ├── schema/
│   │   └── user.json
│   └── json.go
sample_data/
go.mod
main.go
mapping.json
problem_statement
README.md

json.go contains the implementation for the schema transformer. mapping.json contains a single JSON object for respective mapping for each input file. user.json is the schema for the final output. Note that nesting of the objects is flattened out in schema.json.

Run

To run the pipeline, just head over to terminal and execute go run main.go

About

A real world example of an ETL pipeline on JSON data in GO.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages