Repository containing the code developed for the Federated Learning course at UNICAMP (MO839) — Institute of Computing.
This project gathers implementations and experiments related to Federated Learning, designed to study and practice the concepts presented in the course. Users can configure various parameters (e.g., dataset, number of clients, communication rounds, etc.) through a configuration file, and techniques to lead with Non-iid data. In this project, this framework was used for Federated Kidney data disease classification.
- Training machine learning models using a federated approach (e.g., FedAvg, FedProx algorithm).
- Flexible configuration through a JSON file — making it easy to adapt for different datasets, number of clients, communication rounds, and aggregation strategies.
- Clear organization of code and results, enabling easy replication and comparison of experiments.
- Easily extendable: you can add new datasets, models, or aggregation protocols by modifying only specific parts of the code (or configuration), without restructuring the entire project.
- Algoritm to create data using Data Augmentation for help on Non-iid scenario.
- Client Selector could be applied.
The modular structure makes it easy to run and reproduce experiments:
results/: Directory where the results generated by the experiments are stored.src/: Contains all modules and scripts required for running Federated Learning.src/config/config.json: JSON file with the configuration for federated training.src/data_processing: Files related to dataset handling.src/flower: Files related to clients/server/simulation/aggregation logic.src/models: Files related to model definition and training metrics.src/notebooks: Notebooks used during the development on Google Colab.src/validations: Files related to benchmark evaluations.src/main.py: Main entry point for running the FL server.
.gitignore: Git ignore file.README.md: Project documentation.
-
Clone this repository:
git clone https://github.com/LuisLVieira/FederatedLearning cd FederatedLearning -
Install environment
Instal Anaconda or Miniconda
run
cd src conda env create -f environment.yml -
Configure the experiment: Navigate to the
src/config/folder and edit theconfig.jsonfile. Set the desired parameters (number of clients, dataset, number of rounds, etc.). On Data Transform key, is possible to set data augmentations techniques and configure the Heterogeneous Data Handler Algoritm. -
Run federated training: Start the execution by specifying the configuration file path:
python main.py --config config/config.json
-
Analysis: The results, logs, and performance metrics will be automatically saved in the
results/directory for later analysis and experiments are logged on MLFlow. -
MLFlow To access mlflow, just open a terminal an type:
mlflow ui
You can use this repository to:
- Study how federated algorithms behave under different scenarios (e.g., non-IID data, different numbers of clients, data heterogeneity).
- Test various ML model architectures in a federated environment.
- Compare results between executions (with different seeds, configurations, datasets) in a reproducible way.
- Adapt the project to new datasets or domains by simply modifying the configuration or preprocessing steps.
This repository serves as a foundation for hands-on exploration of Federated Learning — suitable for study, experimentation, and prototyping within the MO839 course context. Its simple and modular structure makes it easy to adapt for other scenarios or future projects.