Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: 4.0 #10

Merged
merged 1 commit into from
Oct 26, 2023
Merged

Refactor: 4.0 #10

merged 1 commit into from
Oct 26, 2023

Conversation

bpolaszek
Copy link
Owner

@bpolaszek bpolaszek commented Oct 25, 2023

Hey folks! 👋

It's been more than 4 years since a version 3 bentools/etl was drafted, but never got out of the alpha stability, mostly because of a lack of time but also, I have to admit, uncertainties about design directions taken.

Introducing bentools/etl v4

PHP 8 and a lot of projects on my side came in between, and I recently got the need of this library, but I wanted to keep the good ideas of the v3, and remove the bad ones as well.

So, I decided that a stable v3 will never sunrise, and because lots of classes have been renamed, most of them became immutable, here's a brand new v4 version.

What's new?

  • This version requires PHP 8.2 as a minimum, is 100% covered by tests (this wasn't the case before), and uses PHPStan to ensure types consistency at the highest level. A Github Actions CI has also been set up.

  • It introduces a new EtlState object, which is instantiated at the beginning of the ETL process, and passed through the different steps and event listeners. The EtlExecutor (previously the Etl class) is no longer mutable, since it basically holds the Extractor, the Transformer and the Loader objects, fires events and provides you with the state you need with the EtlState.

  • The EtlState is mostly readonly, but you can still call $state->skip() to skip items, $state->stop() to stop the process, $state->flush() to request an early flush, and you can use the $state->context array to pass arbitrary data between the different steps and events during the whole workflow.

How does it work?

Here's an example of the new API:

city_english_name,city_local_name,country_iso_code,continent,population
"New York","New York",US,"North America",8537673
"Los Angeles","Los Angeles",US,"North America",39776830
Tokyo,東京,JP,Asia,13929286
...
use Bentools\ETL\EtlConfiguration;
use Bentools\ETL\EtlExecutor;
use Bentools\ETL\EventDispatcher\Event\LoadEvent;
use Bentools\ETL\Extractor\CSVExtractor;
use Bentools\ETL\Loader\JSONLoader;
use Bentools\ETL\Recipe\LoggerRecipe;
use Monolog\Logger;

$etl = (new EtlExecutor(options: new EtlConfiguration(flushEvery: 100)))
    ->extractFrom(new CSVExtractor(options: ['columns' => 'auto']))
    ->transformWith(function (array $city) {
        $city['slug'] = strtr(strtolower($city['city_english_name']), [' ' => '-']);
        yield $city;
    })
    ->loadInto(new JSONLoader())
    ->onLoad(fn (LoadEvent $event) => print("Loading city `{$event->item['slug']}`".PHP_EOL))
    ->withRecipe(new LoggerRecipe(new Logger('etl-logs')));

$report = $etl->process(
    source: 'file:///tmp/cities.csv',
    destination: 'file:///tmp/cities.json',
);

var_dump($report->output); // file:///tmp/cities.json
[
    {
        "city_english_name": "New York",
        "city_local_name": "New York",
        "country_iso_code": "US",
        "continent": "North America",
        "population": 8537673,
        "slug": "new-york"
    },
    {
        "city_english_name": "Los Angeles",
        "city_local_name": "Los Angeles",
        "country_iso_code": "US",
        "continent": "North America",
        "population": 39776830,
        "slug": "los-angeles"
    },
    {
        "city_english_name": "Tokyo",
        "city_local_name": "東京",
        "country_iso_code": "JP",
        "continent": "Asia",
        "population": 13929286,
        "slug": "tokyo"
    }
]

I hope you'll enjoy this release as much as I enjoyed coding it! 😃

@codecov
Copy link

codecov bot commented Oct 25, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (master@ab170a6). Click here to learn what that means.

Additional details and impacted files
@@            Coverage Diff             @@
##             master       #10   +/-   ##
==========================================
  Coverage          ?   100.00%           
  Complexity        ?       188           
==========================================
  Files             ?        40           
  Lines             ?       479           
  Branches          ?         0           
==========================================
  Hits              ?       479           
  Misses            ?         0           
  Partials          ?         0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bpolaszek bpolaszek marked this pull request as ready for review October 26, 2023 07:28
@bpolaszek bpolaszek merged commit 9608d06 into master Oct 26, 2023
6 checks passed
@bpolaszek bpolaszek deleted the 4.0 branch October 26, 2023 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant