Add `][` characters to page regex of `NY Slip Op` and other NY reporters #206

grossir · 2025-02-19T20:46:24Z

We may be failing to resolve (or resolve incorrectly) NY citations due to a variation in the page format in the following scenarios:

NY Slip Op citations: We accept an optional (U), but the opinions text may show an optional [U]
Misc 3d citations: We accept an optional (A), but the opinions text may show an optional [A]
Probably the other editions of these reporters...

Currently this happens, which may lead to incorrect matches

In [4]: get_citations("2024 NY Slip Op 51192(U),")[0].groups['page']
Out[4]: '51192(U)'

In [3]: get_citations("2024 NY Slip Op 51192[U],")[0].groups['page']
Out[3]: '51192'


In [6]: get_citations("11 Misc 3d 134[A]")[0].groups['page']
Out[6]: '134'

In [7]: get_citations("11 Misc 3d 134(A)")[0].groups['page']
Out[7]: '134(A)'

However, how could we standardize the page after extraction?

reporters-db/reporters_db/data/reporters.json

Line 18833 in d009aae

"2023 NYSlipOp 51350(U)"

Example
Highlighted the NY Slip Op; to the left is the Misc 3d

Another

courtlistener=> select count(*) from search_citation where reporter = 'NY Slip Op' and page like '%U%';
 count 
-------
  6259
(1 row)

courtlistener=> select count(*) from search_citation where reporter = 'Misc 3d' and page like '%A%';
 count 
-------
    11
(1 row)

The text was updated successfully, but these errors were encountered:

grossir · 2025-02-20T16:52:30Z

When we ingest a scraped opinion we do:

    return Citation(
        cluster=cluster,
        volume=citation_objs[0].groups["volume"],
        reporter=citation_objs[0].corrected_reporter(),
        page=citation_objs[0].groups["page"],
...)

When we try to match citations, we do

    filters.append(
        Q(
            "match_phrase",
            **{"citation.exact": full_citation.corrected_citation()},
        )
    )

The correction in eyecite models

    def corrected_citation(self):
        """Return citation with corrected reporter."""
        if self.edition_guess:
            return self.matched_text().replace(
                self.groups["reporter"], self.edition_guess.short_name
            )
        return self.matched_text()

So, we could probably add a corrected_page method,

that is also called in corrected_citation
that is also called when ingesting a scraped opinion

Which for now would only standardize NY pages so we can actually match them

What would we need to set a canonical page format in reporters-db? Or we could just hardcode in eyecite the values we know in NY, and prefer () to []

flooie · 2025-02-20T17:47:58Z

See #207

flooie · 2025-02-20T17:48:54Z

I think the answer here is to improve the regexes to enable both options- which is what we did. and we will just add more variations as we see them instead of hard coding anything.

github-project-automation bot added this to Case Law Sprint Feb 19, 2025

grossir changed the title ~~Add ][ characters to page regex of NY Slip Op reporter~~ Add ][ characters to page regex of NY Slip Op and other NY reporters Feb 19, 2025

flooie assigned flooie and grossir Feb 20, 2025

grossir closed this as completed Feb 20, 2025

github-project-automation bot moved this to Done in Case Law Sprint Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `][` characters to page regex of `NY Slip Op` and other NY reporters #206

Add `][` characters to page regex of `NY Slip Op` and other NY reporters #206

grossir commented Feb 19, 2025 •

edited

Loading

grossir commented Feb 20, 2025 •

edited

Loading

flooie commented Feb 20, 2025

flooie commented Feb 20, 2025

Add ][ characters to page regex of NY Slip Op and other NY reporters #206

Add ][ characters to page regex of NY Slip Op and other NY reporters #206

Comments

grossir commented Feb 19, 2025 • edited Loading

grossir commented Feb 20, 2025 • edited Loading

flooie commented Feb 20, 2025

flooie commented Feb 20, 2025

Add `][` characters to page regex of `NY Slip Op` and other NY reporters #206

Add `][` characters to page regex of `NY Slip Op` and other NY reporters #206

grossir commented Feb 19, 2025 •

edited

Loading

grossir commented Feb 20, 2025 •

edited

Loading