Skip to content
View matheusvazdata's full-sized avatar

Highlights

  • Pro

Block or report matheusvazdata

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
matheusvazdata/README.md

                           

👋 Hi, I'm Matheus Vaz

I am a data professional focused on building efficient data pipelines, orchestrating workflows, and enabling data-driven decisions.   My career goal is to consolidate myself as an Analytics Engineer or Data Engineer, applying best practices in data modeling, transformation, and automation.  

I am currently deepening my skills in both data engineering and backend development, actively seeking new challenges and positions as a Data Engineer or Backend Developer.

I enjoy solving real problems with clean code, reproducible pipelines, and data solutions that bring value to business.  

⚙️ Technical Skills

🏗 Data Engineering

  • SQL for modeling, querying, and performance optimization  
  • Python with Pandas, NumPy, PySpark, and automation scripts  
  • Apache Airflow for workflow orchestration  
  • ETL/ELT with Meltano, Embulk, and dbt  
  • Containerization with Docker and version control with Git  
  • Databricks Delta Lake for analytical workflows  
  • Cloud: AWS (S3, Glue, Lambda) and GCP (BigQuery, Cloud Functions)  

📊 Analytics & Visualization

  • Dashboards with Power BI, Metabase, and Looker Studio  
  • Advanced Excel for reporting and analysis  
  • Clear and concise data storytelling for stakeholders  

🚀 Highlighted Projects

Modular pipeline with multiple sources (PostgreSQL & CSV), fully containerized:

  • Data extraction with Embulk (13 tables)  
  • Custom Meltano tap for CSV ingestion  
  • Load into PostgreSQL using JSONL and CSV  
  • Orchestration with Airflow DAGs  
  • Automation with Shell Scripts and Makefile  

RESTful API in Flask with SQLite and Docker. Includes CRUD operations, Postman tests, and modular route separation.  

End-to-end pipeline for HR data: collection with Python, loading into BigQuery, and visualization with Power BI.  

Machine Learning project using XGBoost, SMOTE balancing, and model explainability. Covers EDA, training, evaluation, and delivery.  

📚 Education & Portfolio

📬 Contact

This space showcases part of my journey in data engineering and analytics. Always open to collaborations, learning, and new challenges.

Pinned Loading

  1. data-pipeline-airflow-meltano-embulk data-pipeline-airflow-meltano-embulk Public

    Forked from techindicium/code-challenge

    Pipeline completo de ingestão de dados desenvolvido para o Code Challenge da Indicium, com arquitetura modular em Docker, extração de múltiplas fontes e carga em PostgreSQL, usando ferramentas mode…

    Shell 1

  2. pipeline-do-zero-ao-estrelato-com-gcp pipeline-do-zero-ao-estrelato-com-gcp Public

    Forked from Pipeline-de-Dados/project

    Projeto de pipeline de dados ETL usando GCP e visualização de dados em Power BI

    Jupyter Notebook

  3. ml-classificacao-analise-de-inadimplencia ml-classificacao-analise-de-inadimplencia Public

    Modelo de classificação preditiva focado na identificação dos principais fatores associados à inadimplência de clientes. O projeto abrange desde a análise exploratória até a modelagem com XGBoost e…

    Jupyter Notebook

  4. tasks-flask-crud tasks-flask-crud Public

    Projeto de uma API utilizando Python + Flask, com funcionalidades de CRUD (Create, Read, Update, Delete)

    Python