GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign up|
cc @malemburg |
|
Oops, the change was done in Python 3.9, not in Python 3.8! PR updated. |
|
lgtm |
| @@ -439,6 +439,12 @@ Changes in the Python API | |||
| :data:`~errno.EBADF` error. | |||
| (Contributed by Victor Stinner in :issue:`39239`.) | |||
|
|
|||
| * :func:`codecs.lookup` now normalizes the encoding name the same way than | |||
There are other differences. For example, normalize_encoding("КОИ-8") returns "кои_8", but codecs.lookup normalizes it to "8".
The comment in the sources is also not correct.
encodings.normalize_encoding() says "Note that encoding names should be ASCII only." You're correct: "КОИ-8" is normalized to "8" by codecs.lookup() because the C function _Py_normalize_encoding() ignores non-ASCII letters.
I don't know which behavior is correct. It sounds strange to me to have a non-ASCII encoding name. Which encoding is supposed to be used to encoding the encoding name?!? :-D Maybe encodings.normalize_encoding() should also ignore non-ASCII letters, be more strict.
I created bpo-39337: codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them.
| @@ -439,6 +439,12 @@ Changes in the Python API | |||
| :data:`~errno.EBADF` error. | |||
| (Contributed by Victor Stinner in :issue:`39239`.) | |||
|
|
|||
| * :func:`codecs.lookup` now normalizes the encoding name the same way than | |||
| * :func:`codecs.lookup` now normalizes the encoding name the same way than | |
| * :func:`codecs.lookup` now normalizes the encoding name the same way as |
vstinner commentedJan 14, 2020
•
edited by bedevere-bot
https://bugs.python.org/issue37751