Skip to content

This is a PHP project which combines ETL with different strategies to extract data from multiple databases, files, and services, transform it and load it into multiple destinations.

License

Notifications You must be signed in to change notification settings

TheCocoTeam/source-watcher-core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Source Watcher Core codecov travis-ci scrutinizer-ci

All Contributors

This is a PHP project that allows extracting, transforming, and loading data from and to different sources including databases, files, and services, while at the same time facilitating the transformation of the data in multiple ways.

What is ETL?

ETL is an abbreviation that stands: for extract, transform, and load. It's a software process used to fill data warehouses with information in three steps:

  • Extract: The process extracts or pulls data from multiple sources.

  • Transform: The incoming data passes through a transformation step.

  • Load: The ETL process will send the data to its final destination.

The foundations of ETL come from data warehousing methodologies dating back to the 1960s. ETL is the process of gathering raw data, like the one from production systems. Once collected, the data is transformed into a more readable, understandable format. The transformed and cleaned data is then loaded into a data repository, usually a relational database but not limited to other types of databases, files, and even REST services among others.

ETL example

Assume that you have some information in a CSV file, and you want to import the content of the file to a MySQL database.

The CSV file is a standard comma-separated value file with some headers: id, name and email.

You have a table in your MySQL database with some fields: id, name and email_address.

First you would need to extract the information from your CSV file, only the fields that you want to insert into your MySQL database.

After you have extracted the information, you want to rename the CSV header email to match the database field email_address.

Finally, you want to save the information in your database table.

<?php
use Coco\SourceWatcher\Core\Database\Connections\MySqlConnector;
use Coco\SourceWatcher\Core\IO\Inputs\FileInput;
use Coco\SourceWatcher\Core\IO\Outputs\DatabaseOutput;
use Coco\SourceWatcher\Core\SourceWatcher;
use Coco\SourceWatcher\Core\SourceWatcherException;

$mysqlConnector = new MySqlConnector();
$mysqlConnector->setUser( "user" );
$mysqlConnector->setPassword( "password" );
$mysqlConnector->setHost( "host" );
$mysqlConnector->setPort( 3306 );
$mysqlConnector->setDbName( "tests" );
$mysqlConnector->setTableName( "people" );

$sourceWatcher = new SourceWatcher();

try {
    $sourceWatcher
        ->extract( "Csv", new FileInput( __DIR__ . "/../data/csv/csv1.csv" ), [ "columns" => array( "name", "email" ) ] )
        ->transform( "RenameColumns", [ "columns" => array( "email" => "email_address" ) ] )
        ->load( "Database", new DatabaseOutput( $mysqlConnector ) )
        ->run();
} catch ( SourceWatcherException $exception ) {
    echo sprintf( "Something unexpected went wrong: %s", $exception->getMessage() );
}

A functional version of the code above can be found here

Feedback

Please submit issues, and send your feedback and suggestions as often as you have them.

Contributors


Jean Paul Ruiz

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

About

This is a PHP project which combines ETL with different strategies to extract data from multiple databases, files, and services, transform it and load it into multiple destinations.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages