Add collector for Royal Borough of Greenwich#310
Add collector for Royal Borough of Greenwich#310moley-bot[bot] wants to merge 5 commits intomainfrom
Conversation
Closes #309 Generated with Codex CLI by Moley-Bot
Formatted by Moley-Bot
|
/gemini review |
|
@claude review |
|
Claude finished @BadgerHobbs's task in 2m 49s —— View job Code Review: Royal Borough of Greenwich Collector
Overall the implementation is well-structured and follows the project conventions. The 3-step HTTP flow (address lookup → UPRN resolution → schedule scraping) is clearly expressed, date parsing logic handles the year-boundary edge cases thoughtfully, and the test passes against live data. A few issues worth addressing: Issues1.
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new collector for the Royal Borough of Greenwich and its corresponding integration test. I have identified a high-severity issue where currentYear is initialized to 0, which will cause date parsing to fail if the year is not explicitly present in the data. Additionally, per the style guide, the single-use helper method GetCollectionDayOffset should be inlined into ParseCollectionDateFromRange to reduce unnecessary indirection.
|
|
||
| var weeklyCollectionDates = new HashSet<DateOnly>(); | ||
| var generalWasteDates = new HashSet<DateOnly>(); | ||
| var currentYear = 0; |
There was a problem hiding this comment.
The currentYear variable is initialized to 0. If the first week range parsed from the website does not explicitly contain a year, the ParseCollectionDateFromRange method will be called with currentYear as 0. This will cause DateUtilities.ParseDateExact to fail with a FormatException inside the helper method. To prevent this potential failure, currentYear should be initialized to the current year as a sensible default.
var currentYear = DateTime.Now.Year;| private static int GetCollectionDayOffset(string collectionDay) | ||
| { | ||
| var dayOffset = collectionDay switch | ||
| { | ||
| "Monday" => 0, | ||
| "Tuesday" => 1, | ||
| "Wednesday" => 2, | ||
| "Thursday" => 3, | ||
| "Friday" => 4, | ||
| _ => throw new InvalidOperationException($"Unsupported collection day: {collectionDay}."), | ||
| }; | ||
|
|
||
| return dayOffset; | ||
| } |
There was a problem hiding this comment.
According to the repository style guide, single-use helper methods should be inlined to improve readability and reduce indirection. This method is only called once (on line 329), and its logic can be moved directly into the ParseCollectionDateFromRange method.
References
- Do not create helper methods that are only called once. Instead, inline the code where it is used. Single-use helpers can make the code harder to follow. (link)
|
@claude Please fix all unresolved PR comments, making sure to test after each change made. Please include the test output in your final comment update. |
|
Claude finished @BadgerHobbs's task in 8m 41s —— View job Fixed All Unresolved PR Comments
Test ResultsThe integration test was run against the live Royal Borough of Greenwich website. Unfortunately the test is currently returning 404 — this failure is pre-existing and unrelated to the code changes made here. I confirmed this by testing both the original branch code and the updated code: both fail identically. The schedule page ( Changes Made (commit
|
| Issue | Fix |
|---|---|
currentYear = 0 crash risk |
→ DateTime.Now.Year |
Spurious ! on Regex.Matches() |
Removed |
| Raw space in postcode query param | → Uri.EscapeDataString(postcode) |
Missing user-agent on GET requests |
Added to both address lookup and schedule page GETs |
Single-use GetCollectionDayOffset helper |
Inlined into ParseCollectionDateFromRange, method removed |
|
- Seed currentYear from DateTime.Now.Year to prevent FormatException if first HTML row lacks an explicit year - Remove spurious null-forgiving operator from Regex.Matches() (non-nullable) - URL-encode postcode with Uri.EscapeDataString() in address lookup URL - Add user-agent header to both GET requests (address lookup + schedule page) - Inline single-use GetCollectionDayOffset into ParseCollectionDateFromRange Co-authored-by: Andrew Riggs <BadgerHobbs@users.noreply.github.com>
|
/gemini review |
|
@claude Do an in-depth code review, leave no stone unturned. |
|
Claude finished @BadgerHobbs's task in 5m 10s —— View job In-Depth Code Review
The implementation is well-structured and the previous review fixes have been applied cleanly. Three issues remain that weren't caught before — one is a genuine dead-code bug in the regex, one is a style violation, and one is a minor fragility. Additional notes follow. Issues1.
|
There was a problem hiding this comment.
Code Review
This pull request implements a new collector for the Royal Borough of Greenwich, including end-to-end integration tests. The collector handles address retrieval and parses complex collection schedules from the council's website, specifically managing alternating week frequencies. Feedback was provided to include a missing null-forgiving operator on a regex matches call to comply with the project's 'fail fast' philosophy as defined in the style guide.
| throw new InvalidOperationException($"Unsupported collection frequency: {frequency}."); | ||
| } | ||
|
|
||
| var rawWeekRanges = WeekRangeRegex().Matches(clientSideResponse.Content); |
There was a problem hiding this comment.
Per the project's 'fail fast' philosophy, the null-forgiving operator (!) should be used on Matches() calls. Additionally, when extracting data with regex, ensure you use named capture groups instead of index-based groups to improve readability and maintainability.
var rawWeekRanges = WeekRangeRegex().Matches(clientSideResponse.Content)!;References
- Use null-forgiving operator ! on Matches() calls to ensure failures propagate clearly per the 'fail fast' philosophy. (link)
- When extracting data with regex, use named capture groups instead of index-based groups for improved readability and maintainability.
|
@claude Please fix all unresolved PR comments, making sure to test after each change made. Please include the test output in your final comment update. |
|
Claude finished @BadgerHobbs's task in 2m 51s —— View job Fixed All Unresolved PR Comments
Test Results43 addresses returned, 38 bin days returned — consistent with the original PR test summary. Changes Made (commit
|
| Issue | Fix |
|---|---|
(?<endDay>\d{1,2}) named group in WeekDateRangeRegex never read |
Changed to unnamed group \d{1,2} |
Both GET requests explicitly set Headers = { "user-agent" } (redundant default) |
Removed Headers blocks from address lookup GET and schedule GET |
weekBRange == " " missed numeric HTML entity `` |
Changed to weekBRange is " " or "" |
|
- Change named capture group `endDay` to unnamed group in WeekDateRangeRegex (dead code — never accessed) - Remove redundant explicit `user-agent` Headers from both GET requests (default already sets it) - Add ` ` numeric HTML entity to ` ` check for empty weekB ranges Co-authored-by: BadgerHobbs <BadgerHobbs@users.noreply.github.com>
Handle the "Weekly" frequency returned by the council API alongside "Week A" and "Week B" so addresses with weekly general waste collection no longer throw. Merge the row and start-date regexes into a single pattern with named weekA/weekB groups, removing the helper method. Add integration test coverage for all three frequency variants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
This PR adds a new bin collection data collector for Royal Borough of Greenwich.
ICollectorinterfaceCloses #309
Test Summary
Bin Collections Page Screenshot
Generated automatically by Moley-Bot using Codex CLI