Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ][ characters to page regex of NY Slip Op and other NY reporters #206

Closed
grossir opened this issue Feb 19, 2025 · 3 comments
Closed
Assignees

Comments

@grossir
Copy link
Contributor

grossir commented Feb 19, 2025

We may be failing to resolve (or resolve incorrectly) NY citations due to a variation in the page format in the following scenarios:

  • NY Slip Op citations: We accept an optional (U), but the opinions text may show an optional [U]
  • Misc 3d citations: We accept an optional (A), but the opinions text may show an optional [A]
  • Probably the other editions of these reporters...

Currently this happens, which may lead to incorrect matches

In [4]: get_citations("2024 NY Slip Op 51192(U),")[0].groups['page']
Out[4]: '51192(U)'

In [3]: get_citations("2024 NY Slip Op 51192[U],")[0].groups['page']
Out[3]: '51192'


In [6]: get_citations("11 Misc 3d 134[A]")[0].groups['page']
Out[6]: '134'

In [7]: get_citations("11 Misc 3d 134(A)")[0].groups['page']
Out[7]: '134(A)'

However, how could we standardize the page after extraction?

"2023 NYSlipOp 51350(U)"

Example
Highlighted the NY Slip Op; to the left is the Misc 3d
Image

Another

Image


courtlistener=> select count(*) from search_citation where reporter = 'NY Slip Op' and page like '%U%';
 count 
-------
  6259
(1 row)

courtlistener=> select count(*) from search_citation where reporter = 'Misc 3d' and page like '%A%';
 count 
-------
    11
(1 row)
@grossir grossir changed the title Add ][ characters to page regex of NY Slip Op reporter Add ][ characters to page regex of NY Slip Op and other NY reporters Feb 19, 2025
@grossir
Copy link
Contributor Author

grossir commented Feb 20, 2025

When we ingest a scraped opinion we do:

    return Citation(
        cluster=cluster,
        volume=citation_objs[0].groups["volume"],
        reporter=citation_objs[0].corrected_reporter(),
        page=citation_objs[0].groups["page"],
...)

When we try to match citations, we do

    filters.append(
        Q(
            "match_phrase",
            **{"citation.exact": full_citation.corrected_citation()},
        )
    )

The correction in eyecite models

    def corrected_citation(self):
        """Return citation with corrected reporter."""
        if self.edition_guess:
            return self.matched_text().replace(
                self.groups["reporter"], self.edition_guess.short_name
            )
        return self.matched_text()

So, we could probably add a corrected_page method,

  • that is also called in corrected_citation
  • that is also called when ingesting a scraped opinion

Which for now would only standardize NY pages so we can actually match them

What would we need to set a canonical page format in reporters-db? Or we could just hardcode in eyecite the values we know in NY, and prefer () to []

@flooie
Copy link
Contributor

flooie commented Feb 20, 2025

See #207

@flooie
Copy link
Contributor

flooie commented Feb 20, 2025

I think the answer here is to improve the regexes to enable both options- which is what we did. and we will just add more variations as we see them instead of hard coding anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants