Skip to content

Matching against variables needs to be improved #31

@MarkFuller1

Description

@MarkFuller1

Using the below example changing the variable names caused a no match to the given document in the database.

localhost:8080/matcher
{
    "pathToScraper": "no",
    "currentError": {
        "errorMessage": "AttributeError: 'asdf' object has no attribute 'test'",
        "isExternal": false,
        "lineNumber": 2666,
        "source": "../logscraper/logs/from_prod_anonymized/ccx_data_pipeline_1_anonymized.log",
        "stackOverflow": null,
        "nestedError": null
    },
    "variance": 0.85
}
{
        "_id": {
            "$oid": "5ef2f350f2f11a6a2efddfc0"
        },
        "url": "https://stackoverflow.com/questions/6797984/how-do-i-lowercase-a-string-in-python",
        "title": "How do I lowercase a string in Python? - Stack Overflow",
        "code": [
            ".lower()",
            "s = \"Kilometer\"\nprint(s.lower())\n",
            "str.lower()",
            "str.lower()",
            ">>> 'Kilometers'.lower()\n'kilometers'\n",
            ">>> 'Kilometers'.casefold()\n'kilometers'\n",
            ">>> \"Maße\".casefold()\n'masse'\n>>> \"Maße\".lower()\n'maße'\n>>> \"MASSE\" == \"Maße\"\nFalse\n>>> \"MASSE\".lower() == \"Maße\".lower()\nFalse\n>>> \"MASSE\".casefold() == \"Maße\".casefold()\nTrue\n",
            ">>> string = 'Километр'\n>>> string\n'Километр'\n>>> string.lower()\n'километр'\n",
            "utf-8",
            "lower",
            ">>> string = 'Километр'\n>>> string\n'\\xd0\\x9a\\xd0\\xb8\\xd0\\xbb\\xd0\\xbe\\xd0\\xbc\\xd0\\xb5\\xd1\\x82\\xd1\\x80'\n>>> string.lower()\n'\\xd0\\x9a\\xd0\\xb8\\xd0\\xbb\\xd0\\xbe\\xd0\\xbc\\xd0\\xb5\\xd1\\x82\\xd1\\x80'\n>>> print string.lower()\nКилометр\n",
            "str",
            "unicode",
            "u",
            "u",
            ">>> unicode_literal = u'Километр'\n>>> print(unicode_literal.lower())\nкилометр\n",
            "str",
            "'\\u'",
            "unicode", ">>> unicode_literal\nu'\К\и\л\о\м\е\т\р'\n>>> unicode_literal.lower()\nu'\к\и\л\о\м\е\т\р'\n",
            "str",
            "unicode",
            "unicode",
            "str.decode",
            "str",
            "unicode",
            ">>> unicode_from_string = unicode(string, 'utf-8') # \"encoding\" unicode from string\n>>> print(unicode_from_string.lower())\nкилометр\n>>> string_to_unicode = string.decode('utf-8') \n>>> print(string_to_unicode.lower())\nкилометр\n>>> unicode_from_string == string_to_unicode == unicode_literal\nTrue\n",
            "str",
            "utf-8", ">>> print string\nКилометр\n>>> string\n'\\xd0\\x9a\\xd0\\xb8\\xd0\\xbb\\xd0\\xbe\\xd0\\xbc\\xd0\\xb5\\xd1\\x82\\xd1\\x80'\n>>> string.decode('utf-8')\nu'\К\и\л\о\м\е\т\р'\n>>> string.decode('utf-8').lower()\nu'\к\и\л\о\м\е\т\р'\n>>> string.decode('utf-8').lower().encode('utf-8')\n'\\xd0\\xba\\xd0\\xb8\\xd0\\xbb\\xd0\\xbe\\xd0\\xbc\\xd0\\xb5\\xd1\\x82\\xd1\\x80'\n>>> print string.decode('utf-8').lower().encode('utf-8')\nкилометр\n",
            ">>> \"raison d'être\".casefold(); \"raison d'être\"",
            "unidecode",
            "decode('utf-8')",
            ">>> s='Километр'\n>>> print s.lower()\nКилометр\n>>> print s.decode('utf-8').lower()\nкилометр\n",
            "decode('utf-8')",
            "$python3;     >>>s='Километр';   >>>print (s.lower);   #result: километр   >>>s.decode('utf-8').lower();   #result: ...

_ AttributeError: 'str' object has no attribute 'decode'"_

            ">>>s.casefold()   #result: километр",
            "s = input('UPPER CASE')\nlower = s.lower()\n",
            "s = \"Kilometer\"\nprint(s.lower())     - kilometer\nprint(s)             - Kilometer\n",
            "s=s.lower()",
            "import string\ns='ABCD'\nprint(''.join([string.ascii_lowercase[string.ascii_uppercase.index(i)] for i in s]))\n",
            "abcd\n",
            "swapcase",
            "s='ABCD'\nprint(s.swapcase())\n",
            "abcd\n"
        ],
        "text": [
            "\nIs there a way to convert a string from uppercase, or even part uppercase to lowercase? \nFor example, \"Kilometers\" → \"kilometers\".\n",
            "\nUse .lower() - For example:\ns = \"Kilometer\"\nprint(s.lower())\n\nThe official 2.x documentation is here: str.lower()\nThe official 3.x documentation is here: str.lower()\n", "\n\nHow to convert string to lowercase in Python?\nIs there any way to convert an entire user inputted string from uppercase, or even part uppercase to lowercase?\nE.g. Kilometers --> kilometers\n\nThe canonical Pythonic way of doing this is\n>>> 'Kilometers'.lower()\n'kilometers'\n\nHowever, if the purpose is to do case insensitive matching, you should use case-folding:\n>>> 'Kilometers'.casefold()\n'kilometers'\n\nHere's why:\n>>> \"Maße\".casefold()\n'masse'\n>>> \"Maße\".lower()\n'maße'\n>>> \"MASSE\" == \"Maße\"\nFalse\n>>> \"MASSE\".lower() == \"Maße\".lower()\nFalse\n>>> \"MASSE\".casefold() == \"Maße\".casefold()\nTrue\n\nThis is a str method in Python 3, but in Python 2, you'll want to look at the PyICU or py2casefold - several answers address this here.\nUnicode Python 3\nPython 3 handles plain string literals as unicode:\n>>> string = 'Километр'\n>>> string\n'Километр'\n>>> string.lower()\n'километр'\n\nPython 2, plain string literals are bytes\nIn Python 2, the below, pasted into a shell, encodes the literal as a string of bytes, using utf-8.\nAnd lower doesn't map any changes that bytes would be aware of, so we get the same string.\n>>> string = 'Километр'\n>>> string\n'\\xd0\\x9a\\xd0\\xb8\\xd0\\xbb\\xd0\\xbe\\xd0\\xbc\\xd0\\xb5\\xd1\\x82\\xd1\\x80'\n>>> string.lower()\n'\\xd0\\x9a\\xd0\\xb8\\xd0\\xbb\\xd0\\xbe\\xd0\\xbc\\xd0\\xb5\\xd1\\x82\\xd1\\x80'\n>>> print string.lower()\nКилометр\n\nIn scripts, Python will object to non-ascii (as of Python 2.5, and warning in Python 2.4) bytes being in a string with no encoding given, since the intended coding would be ambiguous. For more on that, see the Unicode how-to in the docs and PEP 263\nUse Unicode literals, not str literals\nSo we need a unicode string to handle this conversion, accomplished easily with a unicode string literal, which disambiguates with a u prefix (and note the u prefix also works in Python 3):\n>>> unicode_literal = u'Километр'\n>>> print(unicode_literal.lower())\nкилометр\n\nNote that the bytes are completely different from the str bytes - the escape character is '\\u' followed by the 2-byte width, or 16 bit representation of these unicode letters:\n>>> unicode_literal\nu'\К\и\л\о\м\е\т\р'\n>>> unicode_literal.lower()\nu'\к\и\л\о\м\е\т\р'\n\nNow if we only have it in the form of a str, we need to convert it to unicode. Python's Unicode type is a universal encoding format that has many advantages relative to most other encodings. We can either use the unicode constructor or str.decode method with the codec to convert the str to unicode:\n>>> unicode_from_string = unicode(string, 'utf-8') # \"encoding\" unicode from string\n>>> print(unicode_from_string.lower())\nкилометр\n>>> string_to_unicode = string.decode('utf-8') \n>>> print(string_to_unicode.lower())\nкилометр\n>>> unicode_from_string == string_to_unicode == unicode_literal\nTrue\n\nBoth methods convert to the unicode type - and same as the unicode_literal.\nBest Practice, use Unicode\nIt is recommended that you always work with text in Unicode.\n\nSoftware should only work with Unicode strings internally, converting to a particular encoding on output.\n\nCan encode back when necessary\nHowever, to get the lowercase back in type str, encode the python string to utf-8 again:\n>>> print string\nКилометр\n>>> string\n'\\xd0\\x9a\\xd0\\xb8\\xd0\\xbb\\xd0\\xbe\\xd0\\xbc\\xd0\\xb5\\xd1\\x82\\xd1\\x80'\n>>> string.decode('utf-8')\nu'\К\и\л\о\м\е\т\р'\n>>> string.decode('utf-8').lower()\nu'\к\и\л\о\м\е\т\р'\n>>> string.decode('utf-8').lower().encode('utf-8')\n'\\xd0\\xba\\xd0\\xb8\\xd0\\xbb\\xd0\\xbe\\xd0\\xbc\\xd0\\xb5\\xd1\\x82\\xd1\\x80'\n>>> print string.decode('utf-8').lower().encode('utf-8')\nкилометр\n\nSo in Python 2, Unicode can encode into Python strings, and Python strings can decode into the Unicode type.\n",
            "\nWith Python 2, this doesn't work for non-English words in UTF-8. In this case decode('utf-8') can help:\n>>> s='Километр'\n>>> print s.lower()\nКилометр\n>>> print s.decode('utf-8').lower()\nкилометр\n\n",
            "\nAlso, you can overwrite some variables:\ns = input('UPPER CASE')\nlower = s.lower()\n\nIf you use like this:\ns = \"Kilometer\"\nprint(s.lower())     - kilometer\nprint(s)             - Kilometer\n\nIt will work just when called.\n",
            "\nDon't try this, totally un-recommend, don't do this:\nimport string\ns='ABCD'\nprint(''.join([string.ascii_lowercase[string.ascii_uppercase.index(i)] for i in s]))\n\nOutput:\nabcd\n\nSince no one wrote it yet you can use swapcase (so uppercase letters will become lowercase, and vice versa) (and this one you should use in cases where i just mentioned (convert upper to lower, lower to upper)):\ns='ABCD'\nprint(s.swapcase())\n\nOutput:\nabcd\n\n"
        ],
        "tags": [
            "python",
            "string",
            "uppercase",
            "lowercase",
            "python",
            "string",
            "uppercase",
            "lowercase"
        ]
    },

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions