Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on notebook 'Demographics on Lebanon' #99

Open
abrilr1604 opened this issue Jan 16, 2025 · 0 comments
Open

Feedback on notebook 'Demographics on Lebanon' #99

abrilr1604 opened this issue Jan 16, 2025 · 0 comments

Comments

@abrilr1604
Copy link

Hi! I've reviewing and running the first notebook, 'Demographics of Lebanon', here's some feedback:

  1. I get an error after running this code:
hrsl_lebanon0 = LEBANON_ADM0.sjoin(hrsl)
hrsl_lebanon0 = hrsl_lebanon0.groupby(["admin0Name"]).sum()
hrsl_lebanon3 = LEBANON_ADM3.sjoin(hrsl)

Image

The issue happens because hrsl_lebanon0 has two datetime columns:

date datetime64[ms]
validOn datetime64[ms]

But it seems that hrsl_lebanon0 isn’t used after that. Is the notebook going to be updated? If not, it might make sense to either remove or comment out that line to avoid errors for other people running the notebook.

  1. I also get the same error after running this:
hrsl_lebanon3 = (
    hrsl_lebanon3.groupby(["admin0Name", "admin1Name", "admin2Name", "admin3Name"])
    .sum()
    .reset_index()[["admin0Name", "admin1Name", "admin3Name", "population"]]
)
hrsl_lebanon3 = hrsl_lebanon3.merge(
    LEBANON_ADM3[["admin0Name", "admin1Name", "admin3Name", "geometry"]],
    on=["admin0Name", "admin1Name", "admin3Name"],
)

hrsl_lebanon3 = gpd.GeoDataFrame(hrsl_lebanon3, crs="EPSG:4326", geometry="geometry")

Image

This happens because there are non-numeric columns (like geometry and datetime) that cannot be summed.

I fixed it by excluding those data types:

hrsl_lebanon3 = (
    hrsl_lebanon3.select_dtypes(exclude=["datetime", "geometry"])  # Exclude datetime and geometry columns
    .groupby(["admin0Name", "admin1Name", "admin2Name", "admin3Name"])
    .sum()
    .reset_index()[["admin0Name", "admin1Name", "admin3Name", "population"]]
)
  1. The lebanon_bbox variable is declared but not used. We could either changed the code to:
hrsl = hrsl[
    (hrsl["longitude"] > lebanon_bbox[0])  
    & (hrsl["longitude"] < lebanon_bbox[2])  
    & (hrsl["latitude"] > lebanon_bbox[1]) 
    & (hrsl["latitude"] < lebanon_bbox[3])  
]

or just delete the variable if it’s not required.

  1. I recommend adding a semicolon ; at the end of the cells that create maps to suppress unnecessary messages from appearing above them:
    Image

  2. It would be helpful to simplify this part by hiding the urls and adding hyperlinks.

Population data: Population data was used by the [estimates released by UNOCHA in 2002](https://cod.humanitarianresponse.info/fr/search/field_country_region/138, Accessed 05/01/2014). The lowest admin level at which data was available is admin 3.

Sex variables: Sex variables were derived from UN Urban and Rural Population by Age and Sex (URPAS) 2010 estimates, generated using URPAS, http://esa.un.org/unpd/popdev/urpas/urpas2014.aspx, Accessed 03/27/2015 at a national level.

Age variables: Age variables were obtained from UN Urban and Rural Population by Age and Sex (URPAS) 2010 estimates, generated using URPAS, http://esa.un.org/unpd/popdev/urpas/urpas2014.aspx, Accessed 03/27/2015 at a national level.

Boundaries:Boundaries were obtained from UN Office for the Coordination of Humanitarian Affairs (UN OCHA), http://www.humanitarianresponse.info/applications/data, Accessed 01/05/2011.

For example change this: Population data: Population data was used by the estimates released by UNOCHA in 2002 (https://cod.humanitarianresponse.info/fr/search/field_country_region/138, Accessed 05/01/2014).

to this: Population data: Population data was used by the estimates, released by UNOCHA in 2002 and accessed 05/01/2014.

I also noticed here that the links for Population data and Boundaries both lead to the same page (https://data.humdata.org/), even though the links are different.

  1. the references could be improved by organizing them in the bibliography.bib file, which is already stored in the repository. This would change the current citation format, for example from:

The analysis was validated in 18 countries (Tiecke, et.al., 2017).

to: (where the link in [1] would lead to the end of the page where the references are listed).

The analysis was validated in 18 countries [1].

  1. I noticed a few typos such as missing periods, I could correct them.

If you agree with any of these suggestions, please let me know so I can create a PR with the changes 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant