-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor db set up #13
Open
milojevicdupontnikola
wants to merge
73
commits into
main
Choose a base branch
from
refactor-db-set-up
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- rename gadm_level to nuts_level indicating which nuts or lau level corresponds to the dataset - rename gadm_name to nuts_name indicating which nuts or lau name or code corresponds to the dataset - remove rows of cities and regions that were not used -> large country file used instead - remove sicily were gov data was found to be significantly incomplete compared to MSFT - added is_has_lau_nuts3 column
Notebook that matches country info with NUTS 3 and either 1/2 as parent region and LAU
Uses LAU to create directories. Directories are created for a whole country the first time the code is run if they dont exist yet. Structure: country |_ region (nuts1/2) |___nuts3 Data to be first saved at LAU level and then reagregated
Mask being done at country, region, nuts3 or LAU level to create LAU files only for LAUs that should be present in the dataset. Uses new columns from inputs parsing file. Existing mode 'rest' that takes every LAUs for a country that are not previously uses is conserved
Ensures that all masks strictly reproduce all LAUs and otherwise returns missing or duplicated LAUs
Update db_set_up.py
- only regions (mask does it already?) - buffer (aggregation at nuts3 level will take care of this)
Modifies the main function and subfunctions to use relevant variables/columns + minor indent fix to city_paths_to_txt
All geom and attribute files are all squashed into one gpkg at the NUTS3 level. folder structure within a country is deleted.
Remove OSM country covered by government data and change path to new OSM data
Adapted overview to work with LAUs. First minimal version with only footprint area as metric.
Make load_lau func only load LAU boundaries and nothing else.
Further, store streets as geopackage and not as wkt-encoded csv.
handle cases where a LAU is missing
Old OSM specific stuff now handled by SLURM pipeline
Cleaned file to download again all countries for v1 (feature engineering, etc)
Create three files that correspond to the complete set of input datasets to be used for v1 + minor changes to main-parsing files for house keeping
Handle folder creation directly in the function
Housekeeping
Streamline changes of folder structure
Dict lau-nuts was catching nuts in laus names in path creating duplicates and breaking the function. Now forced to just scan nuts
Create a French version of LAUs file as France provided in files at NUTS2 in OSM instead of NUTS1 used otherwise
Support two levels of parts
Fix true and false duplicates
Notebook to check for losses during db set up
Check LAUs and NUTS3 with no buildings
Some initial analysis to determine the cut off point MSFT / OSM
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
FlorianNachtigall
force-pushed
the
refactor-db-set-up
branch
2 times, most recently
from
January 2, 2025 18:53
5162c96
to
e064be6
Compare
FlorianNachtigall
force-pushed
the
refactor-db-set-up
branch
from
January 3, 2025 10:52
e064be6
to
236fe48
Compare
Also add missing type hints.
FlorianNachtigall
force-pushed
the
refactor-db-set-up
branch
2 times, most recently
from
January 3, 2025 13:12
93b3169
to
48c82f9
Compare
Currently object columns contains mixed datatypes (e.g. strings and np.nan for missings) which result in issues when writing the parquet file.
FlorianNachtigall
force-pushed
the
refactor-db-set-up
branch
from
January 3, 2025 13:30
48c82f9
to
5a9898f
Compare
For other countries the age column is numeric, so using the .str accessor raises an error.
FlorianNachtigall
force-pushed
the
refactor-db-set-up
branch
from
January 15, 2025 22:08
fed19b3
to
d891ced
Compare
FlorianNachtigall
force-pushed
the
refactor-db-set-up
branch
from
January 20, 2025 20:10
b872dd0
to
402c1dd
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Handover version from Niko Jan 2025
Main changes:
New set up tested: