RDRP- 1143_function_to_sum_emp_sample #415

woodac · 2025-01-28T09:05:51Z

This function calculates the sum of column "employment" for the filtered dataframe to create lower_e.

Code

Documentation

Any new code includes all the following forms of documentation:

Function Documentation Docstrings within the function(s')/methods have been created
- Includes Args and returns for all major functions
- The docstring details data types
Updated Documentation: User and/or developer working doc has been updated

Data

All data needed to run this script is available in Dev/Test
All data is excluded from this pull request
Secrets checker pre-commit passes

Testing

Unit tests Unit tests have been created and are passing or a new ticket to create tests has been created

Peer Review Section

All requirements install from (updated) requirements.txt
Documentation has been created and is clear - check the working document
Doctrings (Google format) have been created and accurately describe the function's functionality
Unit tests pass, or if not present a new ticket to create tests has been created
Code runs The code runs on reviewer's machine and/or CDSW

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction.

I recommend merging this request.

Review comments

Insert detailed comments here!

These might include, but not exclusively:

bugs that need fixing (does it work as expected? and does it work with other code
that it is likely to interact with?)
alternative methods (could it be written more efficiently or with more clarity?)
documentation improvements (does the documentation reflect how the code actually works?)
additional tests that should be implemented (do the tests effectively assure that it
works correctly?)
code style improvements (could the code be written more clearly?)
Do the changes represent a change in functionality so the version number should increase? Start a discussion if so.
As a review you can generates the same outputs from running the code

Your suggestions should be tailored to the code that you are reviewing.
Be critical and clear, but not mean. Ask questions and set actions.

github-actions · 2025-01-28T09:06:24Z

Detailed Coverage Report

File	Stmts	Miss	Cover	Missing
src
__init__.py	0	0	100%
src/construction
__init__.py	0	0	100%
all_data_construction.py	47	47	0%	1, 3–4, 6, 8, 16, 22, 45–46, 49, 54–55, 59–61, 65–66, 69, 76–78, 83, 86–90, 92, 97–98, 103–104, 107–109, 113, 117–120, 125–126, 128, 133–134, 138, 140
construction_helpers.py	104	14	86%	32–43, 240, 307
construction_main.py	27	27	0%	3–4, 6, 8, 12–13, 16, 19, 55–58, 65, 69–72, 75, 79–82, 85, 90–92, 94
construction_read_validate.py	34	34	0%	3–4, 6, 9, 12, 17, 19, 22, 47–50, 52–53, 56–57, 60, 67–69, 73–74, 80, 83, 102–104, 106–107, 110, 117–118, 123, 130
construction_validation.py	102	9	91%	84, 87, 114, 122–123, 128, 139, 169, 219
postcode_construction.py	21	21	0%	1–2, 4–6, 9, 26, 29, 32, 37, 40–41, 44, 47–49, 51, 55, 59, 62–63
src/estimation
__init__.py	0	0	100%
apply_weights.py	18	2	88%	47–48
calculate_weights.py	58	0	100%
estimation_main.py	22	22	0%	3–5, 7–9, 11, 14, 30, 36, 39, 42–43, 45–52, 54
src/freezing
__init__.py	0	0	100%
freezing_apply_changes.py	70	21	70%	31–34, 37–38, 41–43, 46–49, 53, 61–64, 68–69, 71
freezing_compare.py	68	22	67%	107–108, 156–157, 181, 184, 187–189, 193–195, 197–199, 201–202, 234, 257, 260, 263, 266
freezing_main.py	46	46	0%	1–3, 5, 7–12, 15, 18, 42–43, 46–47, 49–53, 55, 63, 66–68, 71–72, 76–78, 83–84, 86–87, 90–91, 95, 98, 109–112, 114–116
freezing_utils.py	11	0	100%
src/imputation
MoR.py	147	5	96%	66, 286, 438, 484–485
__init__.py	0	0	100%
apportionment.py	36	10	72%	125, 127, 142, 144, 146, 158–160, 163, 165
expansion_imputation.py	37	30	18%	22–23, 25–26, 29, 32–34, 38, 41, 43, 46–47, 49, 52, 55, 75–77, 81, 84, 87, 90, 96, 101, 104, 106, 112, 115, 119
imputation_helpers.py	168	73	56%	64, 66, 82, 95–98, 101–102, 104–105, 107–108, 110–111, 113–114, 116–117, 119–120, 122, 131, 133–134, 136–137, 306, 308–309, 313–314, 316, 320, 341, 344, 348, 351–352, 354, 373, 375, 382–383, 385, 470, 472, 475–476, 479, 491–492, 495–496, 498, 500, 516–517, 520–521, 523, 589–590, 593, 596–599, 602, 605, 608–609, 611
imputation_main.py	72	72	0%	3–6, 8–18, 21, 24, 55, 58, 61–62, 65, 67, 72–73, 76, 79, 81–82, 84–85, 88, 92–94, 97, 101–102, 105–106, 109–111, 113, 117, 120–122, 124–128, 130, 132–134, 136–138, 140–142, 145, 148, 151–156, 158
impute_civ_def.py	98	47	52%	165, 168, 198, 200–201, 204–205, 208–210, 215–216, 219, 222, 224–227, 230, 233, 238–239, 242, 245, 247–249, 252, 255, 258, 261, 263, 265, 267, 280, 283–287, 289–290, 293–294, 296, 298, 300
manual_imputation.py	19	19	0%	1–2, 4, 6, 9–10, 28–29, 33, 35, 37, 44, 59, 61–62, 64–65, 67–68
sf_expansion.py	71	58	18%	25, 28, 31–32, 35–36, 39, 42, 45–46, 49, 51, 58–59, 62, 67–68, 71, 73, 76–77, 79, 81, 84, 88, 90, 101, 104, 108, 110–111, 113, 115, 118, 121, 130, 133, 136, 145, 148, 150, 156, 162, 174, 177, 182–183, 185, 193–194, 197, 201, 205, 208, 216, 218, 222, 224
short_to_long.py	21	0	100%
tmi_imputation.py	165	117	29%	26, 31, 36, 158, 161, 163, 166, 170, 174, 196–197, 200, 203, 205, 208, 211–212, 215, 217–218, 220, 222, 225–226, 228, 230, 233, 236–237, 239, 242–244, 246–249, 251, 267, 269, 274, 276–277, 279–280, 282, 284, 286, 288, 291, 294–295, 298, 300, 302, 320–321, 323, 325–327, 329–330, 333, 335–336, 339, 341–342, 360, 362, 365–367, 369, 373, 375–376, 379, 381–382, 385, 387, 389–390, 411, 413, 416, 419, 422, 424, 427, 430, 434–435, 437, 441, 444, 448, 451–452, 454, 458, 465–466, 471, 474–475, 477, 480, 482–483, 485, 487, 492, 494–495
src/mapping
__init__.py	0	0	100%
cellno_mapping.py	17	4	76%	50–52, 54
itl_mapping.py	10	0	100%
mapping_helpers.py	59	1	98%	85
mapping_main.py	50	50	0%	3–5, 7–15, 17, 20, 45, 50, 53, 63, 72, 79, 82, 89, 92–94, 101, 103, 106–108, 111, 113–115, 118, 121, 126–127, 130, 132–136, 138–140, 143, 146, 149
pg_conversion.py	40	8	80%	52, 55, 58, 122, 125, 128, 162–163
pnp_mapping.py	6	6	0%	1, 4, 7, 15, 31, 33
ultfoc_mapping.py	34	4	88%	49–51, 95
src/northern_ireland
__init__.py	0	0	100%
ni_headcount_fte.py	29	29	0%	3–5, 7, 9, 12, 26, 28, 30, 32, 34, 36, 38, 41, 56, 58–60, 62, 64–65, 67–68, 70, 73, 82, 84–85, 87
ni_main.py	21	21	0%	3–9, 11, 14, 34, 37–39, 45, 52–54, 62, 64–65, 67
ni_staging.py	33	33	0%	3–6, 8–9, 11, 14, 22, 25, 27–29, 31, 34, 40, 44, 46, 49, 78–79, 82, 87, 91, 94, 97–102, 104, 106
src/outlier_detection
__init__.py	0	0	100%
auto_outliers.py	90	0	100%
manual_outliers.py	16	0	100%
outlier_main.py	39	39	0%	3–5, 7–9, 11, 14, 43, 45–47, 49–50, 53, 56, 59–64, 66, 71–72, 75, 80–83, 86–90, 92, 95, 97, 99
src/outputs
__init__.py	0	0	100%
export_files.py	92	92	0%	4–10, 12–13, 17, 23, 41, 43, 50–53, 60, 67, 72, 75, 102, 108, 111, 118, 120, 123, 129, 131–132, 135–139, 142–143, 146, 157–159, 161, 164, 177, 179–180, 182, 185, 205, 208, 211–212, 219, 223, 226–228, 231, 233, 235, 237–238, 240, 243–247, 249–250, 252, 255–257, 260, 263, 266, 281, 284–285, 293, 296, 298, 300, 310, 312–314, 323, 325, 328–329
form_output_prep.py	18	13	27%	25–27, 29–30, 34, 36–37, 39, 41, 43, 47, 49
frozen_group.py	43	43	0%	3–4, 6, 8–11, 13, 16, 45, 47–50, 53, 71, 123, 134, 161, 164–168, 170, 172, 175, 178–179, 181, 184, 187, 192–193, 196–197, 200, 203–205, 208–209, 211
gb_sas.py	23	23	0%	3–5, 7–10, 12, 15, 31, 34, 39, 42, 45, 48, 53, 56, 59–61, 64–65, 67
intram_by_civil_defence.py	21	0	100%
intram_by_itl.py	54	15	72%	44–47, 51, 141, 144, 147–149, 152–153, 157–158, 167
intram_by_pg.py	35	0	100%
intram_by_sic.py	33	33	0%	3–6, 8, 11, 31–32, 35–36, 39–40, 42, 45–47, 52, 68–72, 74, 77, 80, 83, 86, 89–90, 93–94, 97, 99
intram_totals.py	14	14	0%	3–4, 6–7, 9, 12, 29–30, 33, 36, 39–40, 42–43
long_form.py	17	17	0%	3–5, 7–10, 12, 15, 30, 33, 36, 39–41, 43–44
manifest_output.py	82	82	0%	1–5, 9, 12–13, 16, 34, 50–53, 56–57, 61–62, 67–68, 76, 79–83, 86–92, 94, 112–113, 118, 120–121, 126, 129, 131, 133, 135, 139, 149, 154–155, 161, 164–165, 167–168, 174–177, 183–185, 187, 194, 196, 201, 203–205, 207–208, 210–211, 213–216, 218, 221, 223, 229–230, 233–234
map_output_cols.py	38	7	81%	151–152, 154, 156, 159, 162, 164
ni_sas.py	19	19	0%	3–9, 11, 14, 27, 30, 33, 36, 41, 44–46, 49–50
outputs_helpers.py	17	0	100%
outputs_main.py	83	83	0%	4–5, 8, 11–23, 25, 28, 50, 55–57, 62, 65, 68–70, 75, 78–79, 82–84, 90, 93–95, 101, 104–106, 108–109, 114, 117–119, 128, 131–133, 137–138, 147, 150–152, 159, 162–164, 168–169, 177, 180–182, 189, 192–194, 199, 202–204, 211, 214–216, 218–220, 222
short_form.py	34	13	61%	79, 86, 88, 105, 108, 111, 114, 117, 120–122, 124–125
tau.py	25	25	0%	3–9, 11, 14, 31, 35, 38, 41, 44, 47, 52, 55, 57–58, 61–63, 66–67, 69
total_fte.py	12	12	0%	3–6, 9, 12, 27, 29–30, 36, 41–42
src/site_apportionment
__init__.py	0	0	100%
output_status_filtered.py	26	15	42%	26, 28, 32–33, 35–36, 38, 45–46, 73, 76, 78, 80–81, 83
site_apportionment.py	126	3	97%	467, 469–470
site_apportionment_main.py	25	25	0%	3–5, 7–8, 12, 14, 17, 39–40, 43, 45, 48–49, 57–58, 60–61, 64, 69–72, 74–75
src/staging
__init__.py	0	0	100%
postcode_validation.py	56	3	94%	131–132, 220
spp_parser.py	14	0	100%
spp_snapshot_processing.py	34	0	100%
staging_helpers.py	88	15	82%	181, 184, 186, 189–190, 193–194, 196–197, 200, 204–205, 207, 211, 217
staging_main.py	100	100	0%	4–6, 8, 10–12, 16, 19, 65–67, 70, 72, 76–78, 80–81, 85, 87–88, 91, 94, 97–98, 100–101, 106–108, 112–115, 117–118, 124, 129–131, 133–134, 137, 149–151, 156, 159, 163, 165, 167–169, 172, 176, 178, 180–184, 187, 190, 192–193, 196, 198, 201–205, 208, 212–213, 215, 217–219, 221–223, 225, 228, 236, 245–246, 248–253, 255, 257–258, 260, 263, 274
validation.py	144	50	65%	34–36, 39, 42, 89–90, 101, 123, 136–137, 142, 146, 154–155, 161–163, 204–205, 211, 213, 216, 218, 225–226, 305, 307–308, 311–312, 315–316, 319, 322–323, 326, 328, 330–331, 345, 347–353, 356, 359
src/utils
__init__.py	0	0	100%
breakdown_validation.py	130	45	65%	30, 187–189, 191–195, 197, 199, 202–203, 209–210, 213, 216, 218, 221, 223, 235–238, 241–244, 260–261, 266, 309–310, 312, 318–319, 361–364, 366–369, 372
config.py	171	7	95%	152–153, 432, 493–496
defence.py	20	0	100%
helpers.py	73	18	75%	20–21, 25–26, 28, 135–138, 140–141, 144–146, 155, 157–158, 216
local_file_mods.py	120	47	60%	47–53, 98, 148–150, 199–203, 214–215, 226, 237, 248–249, 251, 262–263, 274–275, 284, 292, 303, 305–306, 308–309, 313–315, 319, 333–334, 336–337, 340–341, 343, 360, 365
path_helpers.py	142	8	94%	97, 312, 314, 316–317, 319–320, 322
postcode_reduction_helper.py	38	38	0%	12, 14–15, 17, 20, 22, 25, 27–28, 30–31, 34–35, 38, 40–41, 43–44, 47, 60, 62, 65, 68, 71–72, 74–75, 77, 80, 95, 97, 100–101, 104–105, 107, 109–110
s3_mods.py	127	127	0%	20–21, 23–24, 26, 34, 39–41, 45, 59, 61–62, 65–68, 70–71, 74, 85, 88, 93, 96–97, 100, 111–112, 114, 117, 131, 133–134, 136, 139, 149, 156, 159, 161, 164, 166, 169, 180–181, 183, 186, 194–195, 198, 207–212, 215, 226–227, 230–231, 234, 237, 240, 251–252, 254–257, 259–260, 263, 269, 272, 285, 288, 291, 293, 296, 302, 305, 308–309, 312, 320–322, 324, 327, 331–332, 334, 337, 366, 370, 372, 379, 382, 387–389, 396, 399, 415, 417–418, 420–421, 425, 427, 429, 432, 434–436, 438, 441, 451, 454–455, 458, 461, 463–464, 466–468
singleton_boto.py	20	20	0%	5–6, 9–11, 13–14, 16–23, 25–29
TOTAL	4020	1903	52%

Summary of tests

Tests	Skipped	Failures	Errors	Time
274	2 💤	0 ❌	0 🔥	3.766s ⏱️

AnneONS · 2025-01-28T10:04:22Z

src/estimation/calculate_weights.py

+    # Get unique references
+    unique_refs = df["reference"].unique()
+
+    # Group by cellnumber


I don't think it would work to groupby here, as df["reference"].unique() is a list

AnneONS · 2025-01-28T10:05:32Z

src/estimation/calculate_weights.py

+    unique_refs = unique_refs.groupby("cellnumber")
+
+    # Sum employment for each cell
+    e = unique_refs["cellnumber"]["employment"].sum()


you can assume that we've already done the groupby. What you need to do is assume you've been passed the filtered dataframe for just one group, and simply return a number which is the sum.

AnneONS

good start, I've put in a couple of pointers.

adriano-lopresti

@woodac , overall the function makes sense to me, but suggest a couple of small improvements

adriano-lopresti · 2025-01-31T14:16:11Z

src/estimation/calculate_weights.py

+    """
+    # Check if any of the key cols are missing
+    cols = set(df.columns)
+    if not ("employment" in cols) & (emp_col in cols):


just out of clarity, can I ask you to add some parentheses arond this logic? is it to be read "(not 1) and 2" "not ( 1 & 2 )", i.e.
(not ("employment" in cols) ) & (emp_col in cols)
or
not (("employment" in cols) & (emp_col in cols))
I think it's the first, but better to be unnecessarily explicit to avoid confusion

adriano-lopresti · 2025-01-31T14:18:51Z

src/estimation/calculate_weights.py

@@ -40,6 +40,28 @@ def calc_lower_n(df: pd.DataFrame, exp_col: str = "709") -> dict:
    return n


+def calc_lower_e(df: pd.DataFrame, emp_col: str = "711") -> dict:


why this definition seems to say that the function returns a dictionary whilst it actually returns an integer?

adriano-lopresti · 2025-01-31T14:20:53Z

tests/test_estimation/test_calculate_weights.py

+            "711",
+        ]
+        data = [
+            [1, 14],


is there a reason why in column 711 you chose to put 14, which is also the sum of the column, i.e. the result of the function.. it is not wrong but it's a bit confusing, since one might also think what the function does is returning the "unique non-null entry of column 711", whilst instead it returns the sum of employment".

calculate lower e

82e53e1

woodac requested review from adriano-lopresti, AnneONS and Ryan2Y79 as code owners January 28, 2025 09:05

AnneONS reviewed Jan 28, 2025

View reviewed changes

woodac added 6 commits January 28, 2025 15:00

calculate lower_e

09c9ff7

refector required cols

88c7bb4

removing grouping

4a95c15

Tests for lower_e calc

75cb23e

Tidying up script

6c1a3db

updating tests

0cc3205

adriano-lopresti reviewed Jan 31, 2025

View reviewed changes

woodac added 3 commits January 31, 2025 15:49

Reviewing comment suggestions

958173e

Reviewing comment suggestions

c25936b

Adjusting test

993b9b2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDRP- 1143_function_to_sum_emp_sample #415

RDRP- 1143_function_to_sum_emp_sample #415

woodac commented Jan 28, 2025

github-actions bot commented Jan 28, 2025 •

edited

Loading

AnneONS Jan 28, 2025

AnneONS Jan 28, 2025

AnneONS left a comment

adriano-lopresti left a comment

adriano-lopresti Jan 31, 2025

adriano-lopresti Jan 31, 2025

adriano-lopresti Jan 31, 2025

		@@ -40,6 +40,28 @@ def calc_lower_n(df: pd.DataFrame, exp_col: str = "709") -> dict:
		return n


		def calc_lower_e(df: pd.DataFrame, emp_col: str = "711") -> dict:

RDRP- 1143_function_to_sum_emp_sample #415

Are you sure you want to change the base?

RDRP- 1143_function_to_sum_emp_sample #415

Conversation

woodac commented Jan 28, 2025

Code

Documentation

Data

Testing

Peer Review Section

Final approval (post-review)

Review comments

github-actions bot commented Jan 28, 2025 • edited Loading

Summary of tests

AnneONS Jan 28, 2025

Choose a reason for hiding this comment

AnneONS Jan 28, 2025

Choose a reason for hiding this comment

AnneONS left a comment

Choose a reason for hiding this comment

adriano-lopresti left a comment

Choose a reason for hiding this comment

adriano-lopresti Jan 31, 2025

Choose a reason for hiding this comment

adriano-lopresti Jan 31, 2025

Choose a reason for hiding this comment

adriano-lopresti Jan 31, 2025

Choose a reason for hiding this comment

github-actions bot commented Jan 28, 2025 •

edited

Loading