Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align the grammar of the Decimal string constructor with float's #128185

Open
KommuSoft opened this issue Dec 22, 2024 · 8 comments
Open

Align the grammar of the Decimal string constructor with float's #128185

KommuSoft opened this issue Dec 22, 2024 · 8 comments
Labels
docs Documentation in the Doc dir

Comments

@KommuSoft
Copy link

KommuSoft commented Dec 22, 2024

The grammar does not cover the case with underscores between the digits of a Decimal. The PEP-515 introduced underscores for integers, floating-point numbers and decimals. The documentation for integers and floating-point numbers are updated to cover underscores, but the grammar documentation for the Decimal does not.

Linked PRs

@KommuSoft KommuSoft added the docs Documentation in the Doc dir label Dec 22, 2024
@picnixz
Copy link
Member

picnixz commented Dec 22, 2024

The grammar does not cover the case with underscores between the digits of a Decimal

Whtat do you mean digits of a decimal? a Decimal object is constructed from either a float or from a string. Are you talking about the string constructor? AFAIK Decimal("123_456") is allowed.

Do you mean it's not specified on the docs? namely that you can have _ in the string form?

@picnixz picnixz added the pending The issue will be closed if no feedback is provided label Dec 22, 2024
@KommuSoft
Copy link
Author

Yes, but the documentation on that seems outdated, since the grammar mentioned in the docs does not mention the underscore style.

@picnixz
Copy link
Member

picnixz commented Dec 22, 2024

Well.. it says (emphasis mine)

it should conform to the decimal numeric string syntax after leading and trailing whitespace characters, as well as underscores throughout, are removed:

So technically, it's fine? we don't want to overcharge the grammar I think.

@KommuSoft
Copy link
Author

Well the grammar for the float(..) includes the underscore, and except for signaling NaNs, it is identical for Decimals, so now it is inconsistent. I think the documentation should either mention the underscores, and do not add it to the grammar, or include it in the grammar, but not present it in two different ways.

@picnixz
Copy link
Member

picnixz commented Dec 22, 2024

I see. I think it would be fine if we remove the mention of underscores in the text and add it to the grammar. Concerning leading and trailing whitespaces, I think this should be outside the grammar. WDYT?

@picnixz picnixz removed the pending The issue will be closed if no feedback is provided label Dec 22, 2024
@picnixz picnixz changed the title The grammar of the Decimal seems to be outdated Align the grammar of the Decimal string constructor with float's Dec 22, 2024
@skirpichev
Copy link
Member

it is identical for Decimals

It's not.

>>> Decimal('1__.__2')
Decimal('1.2')
>>> float('1__.__2')
Traceback (most recent call last):
  File "<python-input-3>", line 1, in <module>
    float('1__.__2')
    ~~~~~^^^^^^^^^^^
ValueError: could not convert string to float: '1__.__2'

That's known, see e.g. #88433

now it is inconsistent

It's ok. So far we have an important module in the stdlib (called "this").

I think the documentation should either mention the underscores, and do not add it to the grammar

That's what it does: "If value is a string, it should conform to the decimal numeric string syntax after leading and trailing whitespace characters, as well as underscores throughout, are removed: [grammar follows]".

@skirpichev skirpichev added the pending The issue will be closed if no feedback is provided label Dec 22, 2024
@picnixz
Copy link
Member

picnixz commented Dec 22, 2024

That's what it does: "If value is a string, it should conform to the decimal numeric string syntax after leading and trailing whitespace characters, as well as underscores throughout, are removed: [grammar follows]".

What I meant is to remove "as well as underscores throughout" and put the _ in the grammar definition. For floats, we have:

image

but for decimals, we have

image

so we could update decimal-part to include _ as for the floats. That would at least align the two grammars a bit more.

@skirpichev skirpichev removed the pending The issue will be closed if no feedback is provided label Dec 22, 2024
@picnixz picnixz added the easy label Dec 22, 2024
@skirpichev skirpichev removed the easy label Dec 29, 2024
@skirpichev
Copy link
Member

I still think it's not a documentation issue, at least. If we leave current parsing intact, IMO it's better to be more vague in description on how exactly underscores are removed.

Maybe it's ok to revisit of parsing decimal literals. Here is some motivation from Stefan Krah for current version:

I'd keep it simple for Decimal: Remove left and right whitespace (we're already doing this), then remove underscores from the remaining string (which must not contain any further whitespace), then use the IBM grammar.

We could add a clause to the PEP that only those strings that follow the spirit of the PEP are guaranteed to be accepted in the future.

One reason for keeping it simple is that I would not like to slow down string conversion, but thinking about two grammars is also a problem -- part of the string conversion in libmpdec is modeled in ACL2, which would be invalidated or at least complicated with two grammars.

But patch from #88433 doesn't look complex to me (if it's correct;-)). However, if current behavior will be changed - it should be properly deprecated first.

On another hand, even after this - the decimal grammar will be slightly different from floatvalue: Python floats doesn't support signaling nan. Nor "diagnostic" information for nans.

Also, current docs doesn't mention that case is insignificant. The specification says: "the characters in the strings accepted for infinity and nan may be in any case".

Removing "easy" label as it's not an easy issue at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
Status: Todo
Development

No branches or pull requests

3 participants