Issue28701
Created on 2016-11-15 19:50 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| _PyUnicode_EqualToASCIIString.patch | serhiy.storchaka, 2016-11-15 19:50 | review | ||
| PyUnicode_CompareWithASCIIString.cocci | serhiy.storchaka, 2016-11-15 19:51 | Coccinella patch for replacing the function | ||
| _PyUnicode_EqualToASCII-runtime-check.diff | serhiy.storchaka, 2016-11-16 16:10 | review | ||
| Messages (16) | |||
|---|---|---|---|
| msg280882 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2016-11-15 19:50 | |
Proposed patch replaces calls of public function PyUnicode_CompareWithASCIIString() with new private function _PyUnicode_EqualToASCIIString(). The problem with PyUnicode_CompareWithASCIIString() is that it returns -1 for the result "less than" and error, but the error case is never checked. The patch is purposed for following purposes: 1. Readability. ``_PyUnicode_EqualToASCIIString(...)`` looks more readable than ``PyUnicode_CompareWithASCIIString(...) == 0`` or ``!PyUnicode_CompareWithASCIIString(...)``, especially in large expression. I always have to make an effort to understand correctly the meaning of the latter expression. 2. Efficiency. If the strings are not equal, _PyUnicode_EqualToASCIIString() can quickly return false, but PyUnicode_CompareWithASCIIString() needs to check whether what string is larger. 3. Correctness. Since no caller checks the error of PyUnicode_CompareWithASCIIString(), it is incorrectly interpreted as "less then". Exception set by PyUnicode_CompareWithASCIIString() can be leaked in following code causing mystical error or crash in debug build. There are too many callers to add error checking for them all. These would be non-trivial error-prone changes that add new lines of the code, new variables and new returns or gotos. On other hand replacing PyUnicode_CompareWithASCIIString() with _PyUnicode_EqualToASCIIString() is done by simple script. _PyUnicode_EqualToASCIIString() returns true value (1) if strings are equal, false value (0) if they are different, and doesn't raise exceptions. Unlike to PyUnicode_CompareWithASCIIString() it works only with ASCII characters and returns false if any string contains non-ASCII characters. The patch also documents the return value of PyUnicode_CompareWithASCIIString() in case of error. See issue21449 for similar issue with _PyUnicode_CompareWithId(). |
|||
| msg280919 - (view) | Author: Xiang Zhang (xiang.zhang) * | Date: 2016-11-16 07:05 | |
LGTM. |
|||
| msg280925 - (view) | Author: Inada Naoki (methane) * | Date: 2016-11-16 07:56 | |
Patch LGTM. But I don't know it's OK to commit it on 2.7, 3.5 and 3.6. |
|||
| msg280926 - (view) | Author: Roundup Robot (python-dev) | Date: 2016-11-16 08:20 | |
New changeset 386c682dcd75 by Serhiy Storchaka in branch '3.5': Issue #28701: Replace PyUnicode_CompareWithASCIIString with _PyUnicode_EqualToASCIIString. https://hg.python.org/cpython/rev/386c682dcd75 New changeset 72d07d13869a by Serhiy Storchaka in branch '3.6': Issue #28701: Replace PyUnicode_CompareWithASCIIString with _PyUnicode_EqualToASCIIString. https://hg.python.org/cpython/rev/72d07d13869a New changeset 6f0f77333da5 by Serhiy Storchaka in branch 'default': Issue #28701: Replace PyUnicode_CompareWithASCIIString with _PyUnicode_EqualToASCIIString. https://hg.python.org/cpython/rev/6f0f77333da5 |
|||
| msg280928 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2016-11-16 08:25 | |
Thanks Xiang and Inada for your reviews. The patch fixes a bug: error is not checked after PyUnicode_CompareWithASCIIString(). The patch is not applicable to 2.7 since PyUnicode_CompareWithASCIIString() is Python 3 only. |
|||
| msg280934 - (view) | Author: STINNER Victor (vstinner) * | Date: 2016-11-16 10:55 | |
(I reopen the issue to ask my question :-)) +/* Test whether a unicode is equal to ASCII string. Return 1 if true, + 0 otherwise. Return 0 if any argument contains non-ASCII characters. + Any error occurs inside will be cleared before return. */ Can you please also document the behaviour if you pass two non-ASCII strings which are equal? I understand that it returns also 0, right? Maybe the API should be more strict and require right to be ASCII: "right string must be encoded to ASCII". I expect an assertion error or a fatal error if right is non-ASCII when Python is compiled in debug mode. |
|||
| msg280935 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2016-11-16 11:26 | |
> Can you please also document the behaviour if you pass two non-ASCII strings which are equal? What mean "equal"? The left argument is a Unicode string, but the right argument is a byte string. For comparing them we should decode right argument or encode left argument. The result depends on using encoding. _PyUnicode_EqualToASCIIString() uses ASCII (as shown from its name). Non-ASCII strings can't be equal. This is documented. If the documentation is not clear, could you provide better wording? > Maybe the API should be more strict and require right to be ASCII: "right string must be encoded to ASCII". I expect an assertion error or a fatal error if right is non-ASCII when Python is compiled in debug mode. I hesitated about adding an assertion error or a fatal error in a bug fix. But this can be added in develop version. I don't know what is better -- return 0 in all builds or return 0 in release build and crash in debug build? |
|||
| msg280945 - (view) | Author: STINNER Victor (vstinner) * | Date: 2016-11-16 13:39 | |
I suggest "return 0 in release build and crash in debug build". |
|||
| msg280946 - (view) | Author: Roundup Robot (python-dev) | Date: 2016-11-16 13:41 | |
New changeset faf04a995031 by Serhiy Storchaka in branch '3.5': Issue #28701: Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId. https://hg.python.org/cpython/rev/faf04a995031 New changeset ff3dacc98b3a by Serhiy Storchaka in branch '3.6': Issue #28701: Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId. https://hg.python.org/cpython/rev/ff3dacc98b3a New changeset 765013f71bc4 by Serhiy Storchaka in branch 'default': Issue #28701: Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId. https://hg.python.org/cpython/rev/765013f71bc4 |
|||
| msg280947 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2016-11-16 13:43 | |
The correct issue for above commits is issue21449. |
|||
| msg280955 - (view) | Author: Roundup Robot (python-dev) | Date: 2016-11-16 14:13 | |
New changeset b607f835f170 by Serhiy Storchaka in branch '3.5': Fixed an off-by-one error in _PyUnicode_EqualToASCIIString (issue #28701). https://hg.python.org/cpython/rev/b607f835f170 New changeset 1369e51182b7 by Serhiy Storchaka in branch '3.6': Fixed an off-by-one error in _PyUnicode_EqualToASCIIString (issue #28701). https://hg.python.org/cpython/rev/1369e51182b7 New changeset ba14f8b61bd8 by Serhiy Storchaka in branch 'default': Fixed an off-by-one error in _PyUnicode_EqualToASCIIString (issue #28701). https://hg.python.org/cpython/rev/ba14f8b61bd8 |
|||
| msg280964 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2016-11-16 16:10 | |
Following patch adds checks in debug mode that the right argument of _PyUnicode_EqualToASCIIString and _PyUnicode_EqualToASCIIId is ASCII-only string. |
|||
| msg280975 - (view) | Author: STINNER Victor (vstinner) * | Date: 2016-11-16 17:56 | |
_PyUnicode_EqualToASCII-runtime-check.diff LGTM. |
|||
| msg280976 - (view) | Author: Roundup Robot (python-dev) | Date: 2016-11-16 18:03 | |
New changeset 6dd22ed7140e by Serhiy Storchaka in branch '3.6': Issue #28701: _PyUnicode_EqualToASCIIId and _PyUnicode_EqualToASCIIString now https://hg.python.org/cpython/rev/6dd22ed7140e New changeset 44874b20e612 by Serhiy Storchaka in branch 'default': Issue #28701: _PyUnicode_EqualToASCIIId and _PyUnicode_EqualToASCIIString now https://hg.python.org/cpython/rev/44874b20e612 |
|||
| msg284944 - (view) | Author: Stefan Krah (skrah) * | Date: 2017-01-07 23:30 | |
For the record: This is all that happened in decimal if a) you
pass a legacy string and b) force _PyUnicode_Ready() to throw
a MemoryError:
>>> from decimal import *
>>> import _testcapi
>>> context = Context()
>>> traps = _testcapi.unicode_legacy_string('traps')
>>> getattr(context, traps)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError
Both a) and b) are not trivial to accomplish at all and the result
is completely benign.
|
|||
| msg285037 - (view) | Author: Roundup Robot (python-dev) | Date: 2017-01-09 12:16 | |
New changeset 35334a4d41aa by Stefan Krah in branch '3.5': Issue #28701: Revert part of 5bdc8e1a50c8 for the following reasons: https://hg.python.org/cpython/rev/35334a4d41aa |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:39 | admin | set | github: 72887 |
| 2017-01-09 12:16:31 | python-dev | set | messages: + msg285037 |
| 2017-01-07 23:30:54 | skrah | set | nosy:
+ skrah messages: + msg284944 |
| 2016-11-16 18:04:01 | serhiy.storchaka | set | status: open -> closed resolution: fixed |
| 2016-11-16 18:03:24 | python-dev | set | messages: + msg280976 |
| 2016-11-16 17:56:03 | vstinner | set | messages: + msg280975 |
| 2016-11-16 16:10:17 | serhiy.storchaka | set | files:
+ _PyUnicode_EqualToASCII-runtime-check.diff messages: + msg280964 |
| 2016-11-16 14:13:31 | python-dev | set | messages: + msg280955 |
| 2016-11-16 13:59:39 | serhiy.storchaka | unlink | issue21449 dependencies |
| 2016-11-16 13:43:10 | serhiy.storchaka | set | messages: + msg280947 |
| 2016-11-16 13:41:49 | python-dev | set | messages: + msg280946 |
| 2016-11-16 13:39:37 | vstinner | set | messages: + msg280945 |
| 2016-11-16 11:26:13 | serhiy.storchaka | set | messages: + msg280935 |
| 2016-11-16 10:55:07 | vstinner | set | status: closed -> open resolution: fixed -> (no value) messages: + msg280934 |
| 2016-11-16 08:25:43 | serhiy.storchaka | set | status: open -> closed versions: - Python 2.7 messages: + msg280928 assignee: serhiy.storchaka |
| 2016-11-16 08:20:20 | python-dev | set | nosy:
+ python-dev messages: + msg280926 |
| 2016-11-16 07:56:23 | methane | set | nosy:
+ methane messages: + msg280925 |
| 2016-11-16 07:05:10 | xiang.zhang | set | messages: + msg280919 |
| 2016-11-15 19:53:34 | serhiy.storchaka | link | issue21449 dependencies |
| 2016-11-15 19:51:39 | serhiy.storchaka | set | files: + PyUnicode_CompareWithASCIIString.cocci |
| 2016-11-15 19:50:05 | serhiy.storchaka | create | |