- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 19.2k
PERF: fix performance regression from #62542 #62623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 19 commits
be21b2e
              fc10a5f
              ab2fab8
              7e8033d
              5219386
              4ff07e3
              c7fc292
              4c8d770
              35f075a
              448f944
              cf0a26d
              2e5a47c
              ca32c01
              46c9883
              69c35ee
              40983dd
              00be2c2
              06297b6
              4f6c9a8
              832d99e
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
|  | @@ -1907,7 +1907,9 @@ int64_t str_to_int64(const char *p_item, int *error, char tsep) { | |
| int64_t number = strtoll(p, &endptr, 10); | ||
|  | ||
| if (errno == ERANGE) { | ||
| *error = ERROR_OVERFLOW; | ||
| // Python's integers can handle pure overflow errors, | ||
| // but for invalid characters, try using different conversion methods. | ||
| *error = *endptr ? ERROR_INVALID_CHARS : ERROR_OVERFLOW; | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you sure that  There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does, here is an example. #include <errno.h>
#include <stdio.h>
#include <stdlib.h>
int main(void) {
  // 1 << 65 + "foo"
  const char *str = "36893488147419103232foo";
  char *endptr;
  long long int number = strtoll(str, &endptr, 10);
  printf("Original String: %s\nNumber: %lld\nEndPtr: %s\nError: %d\n", str,
         number, endptr, errno);
  return 0;
}Output: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ERRNO 34 is ERANGE. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know if this is the official implementation of gcc, but looks like it only assigns errno to ERANGE. https://github.com/gcc-mirror/gcc/blob/master/libiberty/strtoll.c | ||
| errno = 0; | ||
| return 0; | ||
| } | ||
|  | @@ -1967,7 +1969,9 @@ uint64_t str_to_uint64(uint_state *state, const char *p_item, int *error, | |
| uint64_t number = strtoull(p, &endptr, 10); | ||
|  | ||
| if (errno == ERANGE) { | ||
| *error = ERROR_OVERFLOW; | ||
| // Python's integers can handle pure overflow errors, | ||
| // but for invalid characters, try using different conversion methods. | ||
| *error = *endptr ? ERROR_INVALID_CHARS : ERROR_OVERFLOW; | ||
| errno = 0; | ||
| return 0; | ||
| } | ||
|  | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this comment mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recently, I added a change that on overflow, it tries to convert to Python integers (PyLongObject).
pandas/pandas/_libs/parsers.pyx
Lines 1081 to 1084 in e95948f
Since Python supports big integers and it's used to represent big integers in Pandas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other part of the comment refers to the change in this PR, that flags
maybe_inttoFalseinpandas/_libs/parsers.pyxThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm just misunderstanding what a "pure overflow" is, since neither Python nor this operation are overflowing. Probably best just to remove this?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the comment is confusing. I will just remove it to avoid confusion.
What I mean by "pure" overflow is an overflow that occurs when the word contains only numbers. Basically, definitely is an integer.