Issue23247
Created on 2015-01-16 06:17 by martin.panter, last changed 2015-07-16 20:25 by vstinner. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| cjk_codecs_reset.patch | vstinner, 2015-07-15 23:00 | review | ||
| fix-multibytecodec-segfault.patch | Aaron1011, 2015-07-16 03:29 | review | ||
| fix-multibytecodec-segfault-with-test.patch | Aaron1011, 2015-07-16 16:47 | review | ||
| Messages (13) | |||
|---|---|---|---|
| msg234112 - (view) | Author: Martin Panter (martin.panter) * | Date: 2015-01-16 06:17 | |
$ python3 -c 'import codecs; from io import BytesIO; codecs.getwriter("big5")(BytesIO()).reset()'
Segmentation fault (core dumped)
[Exit 139]
Happens for all the multibyte codecs:
broken_stream_codecs = {
"big5", "big5hkscs", "cp932", "cp949", "cp950",
"euc_jp", "euc_jis_2004", "euc_jisx0213", "euc_kr",
"gb2312", "gbk", "gb18030", "hz",
"iso2022_jp", "iso2022_jp_1", "iso2022_jp_2", "iso2022_jp_2004",
"iso2022_jp_3", "iso2022_jp_ext", "iso2022_kr",
"johab", "shift_jis", "shift_jis_2004", "shift_jisx0213",
}
These codecs also share the property that their StreamReader.read() methods do not accept the second “chars” parameter:
>>> codecs.getreader("big5")(BytesIO()).read(1, 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: read expected at most 1 arguments, got 2
|
|||
| msg246781 - (view) | Author: Aaron Hill (Aaron1011) * | Date: 2015-07-15 22:27 | |
This is also present in the latest Python 3.6. I'm going to work on providing a patch for this, unless someone else already is |
|||
| msg246782 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-07-15 23:00 | |
> Happens for all the multibyte codecs: (...) All these codecs share the same C implementation: the "CJK" codecs. The bug was introduced in Python 3.4 by my huge changeset d621bdaed7c3: Issue "#17693: CJK encoders now use the new Unicode API (PEP 393)". It looks like there was no test for this specific bug :-/ Calling reset() just after creating a StreamReader object. Attached patch should fix the crash. @Aaron1011: Can you please try to write a patch adding a unit test to test_codecs reproducing the crash? |
|||
| msg246783 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-07-15 23:02 | |
> It looks like there was no test for this specific bug :-/ Calling reset() just after creating a StreamReader object. Oops: StreamWriter, not StreamReader. |
|||
| msg246784 - (view) | Author: Martin Panter (martin.panter) * | Date: 2015-07-16 00:56 | |
Perhaps this test case proposed in my patch for Issue 13881 could be useful: +def test_writer_reuse(self): + """StreamWriter should be reusable after reset""" Looks like that is where my “broken_stream_codecs” list from the original post came from. |
|||
| msg246785 - (view) | Author: Aaron Hill (Aaron1011) * | Date: 2015-07-16 02:57 | |
The included patch fixes the issue, and modifies the existing unittest to prevent a future regression. The patch corrects an issue where the 'pending' struct field was NULL, but was used as the input to multibytecodec_encode anyay. |
|||
| msg246786 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2015-07-16 03:27 | |
cjk_codecs_reset.patch LGTM. |
|||
| msg246787 - (view) | Author: Aaron Hill (Aaron1011) * | Date: 2015-07-16 03:29 | |
The patch didn't get attached for some reason. It's attached now. |
|||
| msg246801 - (view) | Author: Martin Panter (martin.panter) * | Date: 2015-07-16 12:57 | |
Aaron: Your version of the fix immediately returns None, while Victor’s tries to encode an empty string (if I understand it correctly). I imagine this shortcut could be slightly more efficient, but is it always correct? In any case, Aaron’s test looks okay to me. |
|||
| msg246802 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-07-16 13:01 | |
> Aaron: Your version of the fix immediately returns None, while Victor’s tries to encode an empty string (if I understand it correctly). Aaron patch is better. I misunderstood the code. reset() always return None, the empty string is unused in my patch. fix-multibytecodec-segfault.patch: please write a _new_ test mentionning this issue to check for non regression. |
|||
| msg246810 - (view) | Author: Aaron Hill (Aaron1011) * | Date: 2015-07-16 16:47 | |
I've added a test case to exercise reset() |
|||
| msg246818 - (view) | Author: Roundup Robot (python-dev) | Date: 2015-07-16 20:22 | |
New changeset 35a6fe0e2b27 by Victor Stinner in branch '3.4': Closes #23247: Fix a crash in the StreamWriter.reset() of CJK codecs https://hg.python.org/cpython/rev/35a6fe0e2b27 |
|||
| msg246819 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-07-16 20:25 | |
@Aaron Hill: Thanks for your patch! I only kept the minimal test of your patch. If you want to enhance the test suite, you may write new test to test the behaviour of reset(). I prefer to only commit the minimum patch. @Martin: Thanks for your report. The issue is now fixed. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2015-07-16 20:25:47 | vstinner | set | messages: + msg246819 |
| 2015-07-16 20:22:58 | python-dev | set | status: open -> closed nosy:
+ python-dev resolution: fixed |
| 2015-07-16 16:47:36 | Aaron1011 | set | files:
+ fix-multibytecodec-segfault-with-test.patch messages: + msg246810 |
| 2015-07-16 13:01:32 | vstinner | set | messages: + msg246802 |
| 2015-07-16 12:57:06 | martin.panter | set | messages:
+ msg246801 stage: test needed -> patch review |
| 2015-07-16 03:29:46 | Aaron1011 | set | files:
+ fix-multibytecodec-segfault.patch messages: + msg246787 |
| 2015-07-16 03:27:43 | serhiy.storchaka | set | assignee: serhiy.storchaka -> vstinner messages: + msg246786 stage: test needed |
| 2015-07-16 02:57:43 | Aaron1011 | set | messages: + msg246785 |
| 2015-07-16 00:56:58 | martin.panter | set | messages: + msg246784 |
| 2015-07-15 23:02:01 | vstinner | set | messages: + msg246783 |
| 2015-07-15 23:00:54 | vstinner | set | files:
+ cjk_codecs_reset.patch versions: + Python 3.5 title: Multibyte codec StreamWriter.reset() crashes -> Crash in the reset() method of StreamWriter of CJK codecs messages: + msg246782 keywords: + patch |
| 2015-07-15 22:27:47 | Aaron1011 | set | nosy:
+ Aaron1011 messages:
+ msg246781 |
| 2015-02-28 13:34:48 | serhiy.storchaka | set | priority: normal -> high assignee: serhiy.storchaka nosy: + serhiy.storchaka |
| 2015-01-16 06:17:50 | martin.panter | create | |