Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor db set up #13

Open
wants to merge 73 commits into
base: main
Choose a base branch
from
Open

Refactor db set up #13

wants to merge 73 commits into from

Conversation

milojevicdupontnikola
Copy link
Member

Handover version from Niko Jan 2025

Main changes:

  • change boundary system in db set up from GADM to LAU
  • add MSFT and OSM for all countries
  • misc fixes, housekeeping, etc

New set up tested:

  • e7fab99 : monitor losses during db set up
  • bb728b7 : Check LAUs and NUTS3 with no buildings

milojevicdupontnikola and others added 30 commits November 21, 2024 19:15
- rename gadm_level to nuts_level indicating which nuts or lau level corresponds to the dataset
- rename gadm_name to nuts_name indicating which nuts or lau name or code corresponds to the dataset
- remove rows of cities and regions that were not used -> large country file used instead
- remove sicily were gov data was found to be significantly incomplete compared to MSFT
- added is_has_lau_nuts3 column
Notebook that matches country info with NUTS 3 and either 1/2 as parent region and LAU
Uses LAU to create directories.

Directories are created for a whole country the first time the code is run if they dont exist yet.

Structure:

country
|_ region (nuts1/2)
|___nuts3

Data to be first saved at LAU level and then reagregated
Mask being done at country, region, nuts3 or LAU level to create LAU files only for LAUs that should be present in the dataset.

Uses new columns from inputs parsing file.

Existing mode 'rest' that takes every LAUs for a country that are not previously uses is conserved
Ensures that all masks strictly reproduce all LAUs and otherwise returns missing or duplicated LAUs
Update db_set_up.py
- only regions (mask does it already?)
- buffer (aggregation at nuts3 level will take care of this)
Modifies the main function and subfunctions to use relevant variables/columns

+ minor indent fix to city_paths_to_txt
All geom and attribute files are all squashed into one gpkg at the NUTS3 level. folder structure within a country is deleted.
Remove OSM country covered by government data and change path to new OSM data
Adapted overview to work with LAUs. First minimal version with only footprint area as metric.
Make load_lau func only load LAU boundaries and nothing else.
Further, store streets as geopackage and not as wkt-encoded csv.
handle cases where a LAU is missing
Old OSM specific stuff now handled by SLURM pipeline
Cleaned file to download again all countries for v1 (feature engineering, etc)
Create three files that correspond to the complete set of input datasets to be used for v1 + minor changes to main-parsing files for house keeping
Handle folder creation directly in the function
Streamline changes of folder structure
Dict lau-nuts was catching nuts in laus names in path creating duplicates and breaking the function. Now forced to just scan nuts
Create a French version of LAUs file as France provided in files at NUTS2 in OSM instead of NUTS1 used otherwise
Support two levels of parts
Fix true and false duplicates
Notebook to check for losses during db set up
Check LAUs and NUTS3 with no buildings
Some initial analysis to determine the cut off point MSFT / OSM
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@FlorianNachtigall FlorianNachtigall force-pushed the refactor-db-set-up branch 2 times, most recently from 5162c96 to e064be6 Compare January 2, 2025 18:53
@FlorianNachtigall FlorianNachtigall force-pushed the refactor-db-set-up branch 2 times, most recently from 93b3169 to 48c82f9 Compare January 3, 2025 13:12
Currently object columns contains mixed datatypes (e.g. strings and
np.nan for missings) which result in issues when writing the parquet
file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants