Issue31852
Created on 2017-10-23 19:22 by Alexandre Hamelin, last changed 2017-10-31 11:01 by vstinner. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| async_parser_crash.py | vstinner, 2017-10-25 15:56 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 4122 | merged | pablogsal, 2017-10-25 22:21 | |
| Messages (7) | |||
|---|---|---|---|
| msg304835 - (view) | Author: Alexandre Hamelin (Alexandre Hamelin) | Date: 2017-10-23 19:22 | |
Hi. Python 3.6.2 crashes when interpreting lines with the text "async \" (future keyword 'async' and ending with a backslash). Tested in a docker environment (debian jessie). (see github.com/0xquad/docker-python36 if needed) Examples: $ docker run -ti --rm python36 4c09392f83c8">root@4c09392f83c8:/# python3.6 Python 3.6.2 (default, Aug 4 2017, 14:35:04) [GCC 6.4.0 20170724] on linux Type "help", "copyright", "credits" or "license" for more information. >>> async \ ... File "<stdin>", line 1 \ufffd\ufffdF\ufffd\ufffd ^ SyntaxError: invalid syntax >>> async \ Segmentation fault 4c09392f83c8">root@4c09392f83c8:/# Also, ----- file: test.py #/usr/bin/python3.6 async \ <repeated 30000 times> ----- $ ./test.py Segmentation fault $ Haven't taken the time to produce a backtrace or investigate with latest the dev versions or any further. Let me know if I can assist in any way. |
|||
| msg304975 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2017-10-25 10:16 | |
This issue is fixed in the master branch (version 3.7.0 alpha 2). The issue was fixed in this PR: https://github.com/python/cpython/pull/1669 The cause is that async was not a proper keyword and the parser segfaults when checking for the new token and parsing the newline. In particular, this happens here: translate_newlines at Parser/tokenizer.c:713 713 buf = PyMem_MALLOC(needed_length); This is the stack trace: #0 _PyObject_Alloc (ctx=<optimized out>, elsize=10, nelem=1, use_calloc=0) at Objects/obmalloc.c:806 #1 _PyObject_Malloc (ctx=<optimized out>, nbytes=10) at Objects/obmalloc.c:985 #2 0x0000000000453020 in translate_newlines (tok=0x9187b0, exec_input=0, s=0x7ffff7fa40e0 "async \\\n") at Parser/tokenizer.c:713 #3 tok_nextc (tok=tok@entry=0x9187b0) at Parser/tokenizer.c:943 #4 0x0000000000454948 in tok_get (tok=tok@entry=0x9187b0, p_start=p_start@entry=0x7fffffffdc40, p_end=p_end@entry=0x7fffffffdc50) at Parser/tokenizer.c:1382 #5 0x0000000000455749 in PyTokenizer_Get (tok=tok@entry=0x9187b0, p_start=p_start@entry=0x7fffffffdc40, p_end=p_end@entry=0x7fffffffdc50) at Parser/tokenizer.c:1902 #6 0x000000000045158d in parsetok (tok=0x9187b0, g=<optimized out>, start=256, err_ret=err_ret@entry=0x7fffffffdce0, flags=flags@entry=0x7fffffffdcd0) at Parser/parsetok.c:208 #7 0x0000000000452280 in PyParser_ParseFileObject (fp=<optimized out>, filename=filename@entry=0x7ffff7f1b848, enc=<optimized out>, g=<optimized out>, start=<optimized out>, ps1=<optimized out>, ps2=0x7ffff7e63648 "... ", err_ret=err_ret@entry=0x7fffffffdce0, flags=flags@entry=0x7fffffffdcd0) at Parser/parsetok.c:134 #8 0x0000000000433949 in PyParser_ASTFromFileObject (fp=<optimized out>, filename=0x7ffff7f1b848, enc=<optimized out>, start=<optimized out>, ps1=<optimized out>, ps2=<optimized out>, flags=0x7fffffffde90, errcode=0x7fffffffdd80, arena=0x7ffff7fe2168) at Python/pythonrun.c:1166 #9 0x0000000000433b5b in PyRun_InteractiveOneObject (fp=fp@entry=0x7ffff74b2640 <_IO_2_1_stdin_>, filename=filename@entry=0x7ffff7f1b848, flags=flags@entry=0x7fffffffde90) at Python/pythonrun.c:218 #10 0x0000000000433eae in PyRun_InteractiveLoopFlags (fp=fp@entry=0x7ffff74b2640 <_IO_2_1_stdin_>, filename_str=filename_str@entry=0x5dd7a4 "<stdin>", flags=flags@entry=0x7fffffffde90) at Python/pythonrun.c:115 #11 0x0000000000433fbc in PyRun_AnyFileExFlags (fp=0x7ffff74b2640 <_IO_2_1_stdin_>, filename=0x5dd7a4 "<stdin>", closeit=0, flags=0x7fffffffde90) at Python/pythonrun.c:77 #12 0x00000000004476fa in run_file (p_cf=0x7fffffffde90, filename=<optimized out>, fp=0x7ffff74b2640 <_IO_2_1_stdin_>) at Modules/main.c:341 #13 Py_Main (argc=argc@entry=1, argv=argv@entry=0x910010) at Modules/main.c:895 #14 0x000000000041e17a in main (argc=1, argv=<optimized out>) at ./Programs/python.c:102 After applying commit ac317700ce7439e38a8b420218d9a5035bba92ed the issue is fixed. Does it make sense to backport ac317700ce7439e38a8b420218d9a5035bba92ed to 3.6? |
|||
| msg304995 - (view) | Author: STINNER Victor (vstinner) * | Date: 2017-10-25 15:56 | |
> Does it make sense to backport ac317700ce7439e38a8b420218d9a5035bba92ed to 3.6? No, async was not a keyword in Python 3.6 on purpose. Making it a keyword can break a lot of code. I confirm that Python 3.6 still crashs with a very high number of "async " prefixes: try attached async_parser_crash.py. Extract of the gdb traceback on a crash: (...) #665 0x0000000000454867 in tok_get (tok=0x7fffff8b98c0, p_start=0x7fffff8b9cb8, p_end=0x7fffff8b9cb0) at Parser/tokenizer.c:1571 #666 0x0000000000454867 in tok_get (tok=0x7fffff8b9d40, p_start=0x7fffff8ba138, p_end=0x7fffff8ba130) at Parser/tokenizer.c:1571 #667 0x0000000000454867 in tok_get (tok=0x7fffff8ba1c0, p_start=0x7fffff8ba5b8, p_end=0x7fffff8ba5b0) at Parser/tokenizer.c:1571 #668 0x0000000000454867 in tok_get (tok=0x7fffff8ba640, p_start=0x7fffff8baa38, p_end=0x7fffff8baa30) at Parser/tokenizer.c:1571 #669 0x0000000000454867 in tok_get (tok=0x7fffff8baac0, p_start=0x7fffff8baeb8, p_end=0x7fffff8baeb0) at Parser/tokenizer.c:1571 #670 0x0000000000454867 in tok_get (tok=0x7fffff8baf40, p_start=0x7fffff8bb338, p_end=0x7fffff8bb330) at Parser/tokenizer.c:1571 #671 0x0000000000454867 in tok_get (tok=0x7fffff8bb3c0, p_start=0x7fffff8bb7b8, p_end=0x7fffff8bb7b0) at Parser/tokenizer.c:1571 (...) It looks like a stack overflow. The tokenizer may fail earlier on "async async ". |
|||
| msg305265 - (view) | Author: STINNER Victor (vstinner) * | Date: 2017-10-31 00:46 | |
New changeset 690c36f2f1085145d364a89bfed5944dd2470308 by Victor Stinner (Pablo Galindo) in branch '3.6': [3.6] bpo-31852: Fix segfault caused by using the async soft keyword (GH-4122) https://github.com/python/cpython/commit/690c36f2f1085145d364a89bfed5944dd2470308 |
|||
| msg305266 - (view) | Author: STINNER Victor (vstinner) * | Date: 2017-10-31 00:50 | |
Thank you Alexandre Hamelin for the bug report and Pablo Galindo for the fix ;-) |
|||
| msg305267 - (view) | Author: Alexandre Hamelin (Alexandre Hamelin) | Date: 2017-10-31 02:44 | |
Awesome work, thanks to you! Would it also be the case for 'await' ? |
|||
| msg305286 - (view) | Author: STINNER Victor (vstinner) * | Date: 2017-10-31 11:01 | |
> Would it also be the case for 'await' ? "async" requires to maintain a "async_def" state. It seems like await doesn't need a state for itself, but rely on the "async_def" state which has been fixed. Extract of Parser/tokenizer.c: /* Current token length is 5. */ if (tok->async_def) { /* We're inside an 'async def' function. */ if (memcmp(tok->start, "async", 5) == 0) { return ASYNC; } if (memcmp(tok->start, "await", 5) == 0) { return AWAIT; } } |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2018-09-24 15:32:03 | xtreak | link | issue26000 superseder |
| 2017-10-31 11:01:46 | vstinner | set | messages: + msg305286 |
| 2017-10-31 02:44:00 | Alexandre Hamelin | set | messages: + msg305267 |
| 2017-10-31 00:50:32 | vstinner | set | messages: + msg305266 |
| 2017-10-31 00:49:59 | vstinner | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2017-10-31 00:46:38 | vstinner | set | messages: + msg305265 |
| 2017-10-25 22:21:52 | pablogsal | set | keywords:
+ patch stage: patch review pull_requests: + pull_request4091 |
| 2017-10-25 15:56:06 | vstinner | set | files:
+ async_parser_crash.py nosy: + vstinner messages: + msg304995 |
| 2017-10-25 15:49:50 | vstinner | set | nosy:
+ yselivanov |
| 2017-10-25 10:16:39 | pablogsal | set | nosy:
+ pablogsal messages: + msg304975 |
| 2017-10-23 19:22:17 | Alexandre Hamelin | create | |