Issue30566
Created on 2017-06-04 15:49 by Vikram Hegde, last changed 2022-04-11 14:58 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 1986 | closed | vikhegde, 2017-06-07 16:18 | |
| PR 18632 | merged | berker.peksag, 2020-02-24 01:51 | |
| PR 18651 | merged | miss-islington, 2020-02-25 03:19 | |
| PR 18652 | merged | miss-islington, 2020-02-25 03:19 | |
| Messages (9) | |||
|---|---|---|---|
| msg295127 - (view) | Author: Vikram Hegde (Vikram Hegde) * | Date: 2017-06-04 15:49 | |
Here is the relevant code snippet from decode_generalized_number() in punycode.py
try:
char = ord(extended[extpos])
except IndexError:
if errors == "strict":
raise UnicodeError("incomplete punicode string")
return extpos + 1, None
extpos += 1
if 0x41 <= char <= 0x5A: # A-Z
digit = char - 0x41
elif 0x30 <= char <= 0x39:
digit = char - 22 # 0x30-26
elif errors == "strict":
raise UnicodeError("Invalid extended code point '%s'"
% extended[extpos])
While raising the UnicodeError() in the last line above, it accesses extended[extpos]. However extpos was incremented by 1 a few lines above that. This causes two errors:
1) The UnicodeError() prints the wrong character (the one after the character we want)
2) If the previous extpos was the last character in the string, then attempting to print character at extpos+1 will raise an IndexError.
|
|||
| msg295149 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2017-06-04 23:22 | |
Can you provide a reproducer, please? |
|||
| msg295270 - (view) | Author: Vikram Hegde (Vikram Hegde) * | Date: 2017-06-06 15:46 | |
I have a patch for this problem but my contributor agreement has not been accepted yet, so I can't do a pull request.
Use the python package tldextract to trigger the bug. Here is a sample
Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 12:22:00)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tldextract
>>> tldextract.extract("xn--w&")
Traceback (most recent call last):
File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/punycode.py", line 207, in decode
res = punycode_decode(input, errors)
File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/punycode.py", line 194, in punycode_decode
return insertion_sort(base, extended, errors)
File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/punycode.py", line 165, in insertion_sort
bias, errors)
File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/punycode.py", line 146, in decode_generalized_number
% extended[extpos])
IndexError: string index out of range
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/tldextract.py", line 358, in extract
return TLD_EXTRACTOR(url)
File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/tldextract.py", line 237, in __call__
translations = [decode_punycode(label).lower() for label in labels]
File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/tldextract.py", line 237, in <listcomp>
translations = [decode_punycode(label).lower() for label in labels]
File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/tldextract.py", line 232, in decode_punycode
return idna.decode(label.encode('ascii'))
File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/idna/core.py", line 384, in decode
result.append(ulabel(label))
File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/idna/core.py", line 302, in ulabel
label = label.decode('punycode')
IndexError: decoding with 'punycode' codec failed (IndexError: string index out of range)
>>>
|
|||
| msg295278 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2017-06-06 17:16 | |
You don't need an eternal package, just decoding b'xn--w&' with punycode will produce the traceback. |
|||
| msg302070 - (view) | Author: Vikram Hegde (vikhegde) * | Date: 2017-09-13 12:57 | |
Could someone please review my PR. It has been in the pending state for over three months. |
|||
| msg362613 - (view) | Author: Berker Peksag (berker.peksag) * | Date: 2020-02-25 03:19 | |
New changeset ba22e8f174309979d90047c5dc64fcb63bc2c32e by Berker Peksag in branch 'master': bpo-30566: Fix IndexError when using punycode codec (GH-18632) https://github.com/python/cpython/commit/ba22e8f174309979d90047c5dc64fcb63bc2c32e |
|||
| msg362617 - (view) | Author: Berker Peksag (berker.peksag) * | Date: 2020-02-25 03:42 | |
New changeset daef21ce7dfd3735101d85d6ebf7554187c33ab8 by Miss Islington (bot) in branch '3.8': bpo-30566: Fix IndexError when using punycode codec (GH-18632) https://github.com/python/cpython/commit/daef21ce7dfd3735101d85d6ebf7554187c33ab8 |
|||
| msg362618 - (view) | Author: Berker Peksag (berker.peksag) * | Date: 2020-02-25 03:43 | |
New changeset 55be9a6c09d4415f50b14212ce22eccefa83ca64 by Miss Islington (bot) in branch '3.7': bpo-30566: Fix IndexError when using punycode codec (GH-18632) https://github.com/python/cpython/commit/55be9a6c09d4415f50b14212ce22eccefa83ca64 |
|||
| msg362623 - (view) | Author: Berker Peksag (berker.peksag) * | Date: 2020-02-25 04:10 | |
Thanks for the report and for the initial patch! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:47 | admin | set | github: 74751 |
| 2020-02-25 04:10:48 | berker.peksag | set | status: open -> closed versions: + Python 3.7, Python 3.8, Python 3.9, - Python 3.6 messages: + msg362623 resolution: fixed |
| 2020-02-25 03:43:49 | berker.peksag | set | messages: + msg362618 |
| 2020-02-25 03:42:45 | berker.peksag | set | messages: + msg362617 |
| 2020-02-25 03:19:39 | miss-islington | set | pull_requests: + pull_request18013 |
| 2020-02-25 03:19:32 | miss-islington | set | nosy:
+ miss-islington pull_requests: + pull_request18012 |
| 2020-02-25 03:19:07 | berker.peksag | set | messages: + msg362613 |
| 2020-02-24 01:51:46 | berker.peksag | set | keywords:
+ patch nosy: + berker.peksag pull_requests:
+ pull_request17998 |
| 2018-07-11 07:48:59 | serhiy.storchaka | set | type: crash -> behavior |
| 2017-09-13 12:57:03 | vikhegde | set | nosy:
+ vikhegde messages: + msg302070 |
| 2017-06-07 16:18:51 | vikhegde | set | pull_requests: + pull_request2053 |
| 2017-06-06 17:16:36 | r.david.murray | set | messages: + msg295278 |
| 2017-06-06 15:46:30 | Vikram Hegde | set | nosy:
+ Vikram Hegde messages: + msg295270 |
| 2017-06-04 23:22:53 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg295149 |
| 2017-06-04 15:53:33 | Vikram Hegde | set | nosy:
- Vikram Hegde -> (no value) |
| 2017-06-04 15:49:54 | Vikram Hegde | create | |