Issue35014
Created on 2018-10-18 08:50 by natim, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| demo.py | natim, 2018-10-18 10:53 | |||
| Messages (15) | |||
|---|---|---|---|
| msg327945 - (view) | Author: Rémy Hubscher [:natim] (natim) * | Date: 2018-10-18 08:50 | |
Asyncio.create_subprocess_exec accepts a list of str as parameter which lead to UnicodeEncodeError I think it should accept only bytes shouldn't it? |
|||
| msg327953 - (view) | Author: Andrew Svetlov (asvetlov) * | Date: 2018-10-18 09:50 | |
List of strings works on both my local Linux box and CPython test suite. Please provide more info about the error. Stacktrace can help |
|||
| msg327954 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-10-18 09:54 | |
Hi Remy, > Asyncio.create_subprocess_exec accepts a list of str as parameter which lead to UnicodeEncodeError I think it should accept only bytes shouldn't it? Can you elaborate? On which OS? What is your error message? Can you paste a traceback? |
|||
| msg327955 - (view) | Author: Rémy Hubscher [:natim] (natim) * | Date: 2018-10-18 10:04 | |
> List of strings works on both my local Linux box and CPython test suite. Indeed that's why I posted this bug report, in my opinion it should work only with bytes string. > Can you elaborate? On which OS? What is your error message? Can you paste a traceback? If you try to send a UTF-8 string on a linux box for instance, you might get a UnicodeEncodeError. Let me try to provide you with a script to reproduce this error. |
|||
| msg327962 - (view) | Author: Rémy Hubscher [:natim] (natim) * | Date: 2018-10-18 10:53 | |
I though this would be sufficient to actually reproduce the issue.
However it seems that if the system encoding is UTF-8 it does work properly.
Here is the traceback I had:
```
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 69: ordinal not in range(128)
File "worker.py", line 393, in <module>
return_code = loop.run_until_complete(main(loop))
File "asyncio/base_events.py", line 467, in run_until_complete
return future.result()
File "worker.py", line 346, in main
'-f mp4', '-o', '{}/{}.mp4'.format(download_tempdir, video_id))
File "worker.py", line 268, in run_command
proc = await create
File "asyncio/subprocess.py", line 225, in create_subprocess_exec
stderr=stderr, **kwds)
File "asyncio/base_events.py", line 1191, in subprocess_exec
bufsize, **kwargs)
File "asyncio/unix_events.py", line 191, in _make_subprocess_transport
**kwargs)
File "asyncio/base_subprocess.py", line 39, in __init__
stderr=stderr, bufsize=bufsize, **kwargs)
File "asyncio/unix_events.py", line 697, in _start
universal_newlines=False, bufsize=bufsize, **kwargs)
File "python3.6/subprocess.py", line 707, in __init__
restore_signals, start_new_session)
File "python3.6/subprocess.py", line 1267, in _execute_child
restore_signals, start_new_session, preexec_fn)
```
|
|||
| msg327964 - (view) | Author: Rémy Hubscher [:natim] (natim) * | Date: 2018-10-18 11:03 | |
I am adding the following info: If I run the following on the Docker image where I got the error I get: ``` import sys import locale print(sys.getdefaultencoding()) print(locale.getpreferredencoding()) ``` utf-8 ANSI_X3.4-1968 While if I run it on my machine I get: utf-8 UTF-8 I don't know how to force the usage of the later locally to reproduce. Settings LC_ALL=C and LANG=C didn't do the trick |
|||
| msg327965 - (view) | Author: Rémy Hubscher [:natim] (natim) * | Date: 2018-10-18 11:06 | |
Here we go:
```
$ python3.7 demo.py
utf-8
UTF-8
Traceback (most recent call last):
File "demo.py", line 21, in <module>
asyncio.run(main())
File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.7/asyncio/base_events.py", line 568, in run_until_complete
return future.result()
File "demo.py", line 14, in main
sys.stdout.write(out.decode('utf-8'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 1: ordinal not in range(128)
```
|
|||
| msg327966 - (view) | Author: Andrew Svetlov (asvetlov) * | Date: 2018-10-18 11:07 | |
I think you'll get the same error on `subprocess.run()` call if your current locale is not UTF-8. I don't recall the details but the Intenet has a lot info about setting locale per user and system-wide. |
|||
| msg327967 - (view) | Author: Rémy Hubscher [:natim] (natim) * | Date: 2018-10-18 11:08 | |
I believe Python 3.7 brings explicit unicode encoding/decoding. If depending on the environment the create_subprocess_exec method can fail, I believe we should not try to encode the command lines attribute but rather enforce it to be bytes. |
|||
| msg327970 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-10-18 11:57 | |
I added the UTF-8 Mode for you, for the Docker use case: python3.7 -X utf8. Using that, Python ignores your locale and speaks UTF-8. What is your locale? Try the "locale" command. |
|||
| msg327974 - (view) | Author: Rémy Hubscher [:natim] (natim) * | Date: 2018-10-18 12:40 | |
Here are the locale set: ``` LANG= LANGUAGE= LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= ``` |
|||
| msg327975 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-10-18 12:41 | |
> LC_CTYPE="POSIX" I modified Python 3.7.1 to enable the UTF-8 Mode when the LC_CTYPE is "POSIX". In Python 3.7.0, the UTF-8 Mode is only enabled if the LC_CTYPE is "C". |
|||
| msg327976 - (view) | Author: Rémy Hubscher [:natim] (natim) * | Date: 2018-10-18 12:43 | |
Unicode is complicated, the answer is somewhere here: https://unicodebook.readthedocs.io/ Sorry for the bothering, I thought it was a bug but apparently it's a feature. Thank you for your help, thank you for making Python better. |
|||
| msg327977 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-10-18 12:44 | |
This issue is not an asyncio bug: the bug occurs in subprocess. The bug is not a subprocess bug: subprocess works as expected, it encodes Unicode with sys.getfilesystemencoding() (see os.fsencode()). The bug is that you use non-ASCII strings whereas your filesystem encoding is ASCII. You have a different options to fix *your* issue: * Use a different locale which uses a UTF-8 locale * Enable the Python 3.7 UTF-8 mode * Wait for Python 3.7.1 (which enables automatically the UTF-8 Mode for LC_CTYPE="POSIX") Note: You might want to read my ebook http://unicodebook.readthedocs.io/ which explains how to deal with Unicode. |
|||
| msg327978 - (view) | Author: Rémy Hubscher [:natim] (natim) * | Date: 2018-10-18 12:44 | |
> I modified Python 3.7.1 to enable the UTF-8 Mode when the LC_CTYPE is "POSIX". In Python 3.7.0, the UTF-8 Mode is only enabled if the LC_CTYPE is "C" Ok works for me thanks :) |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:07 | admin | set | github: 79195 |
| 2018-10-18 12:44:24 | natim | set | messages: + msg327978 |
| 2018-10-18 12:44:22 | vstinner | set | messages: + msg327977 |
| 2018-10-18 12:43:29 | natim | set | status: open -> closed resolution: not a bug messages: + msg327976 stage: resolved |
| 2018-10-18 12:41:17 | vstinner | set | messages: + msg327975 |
| 2018-10-18 12:40:04 | natim | set | messages: + msg327974 |
| 2018-10-18 11:57:42 | vstinner | set | messages: + msg327970 |
| 2018-10-18 11:08:40 | natim | set | messages: + msg327967 |
| 2018-10-18 11:07:30 | asvetlov | set | messages: + msg327966 |
| 2018-10-18 11:06:49 | natim | set | messages: + msg327965 |
| 2018-10-18 11:03:03 | natim | set | messages: + msg327964 |
| 2018-10-18 10:53:44 | natim | set | files:
+ demo.py messages: + msg327962 |
| 2018-10-18 10:04:25 | natim | set | messages: + msg327955 |
| 2018-10-18 09:54:30 | vstinner | set | nosy:
+ vstinner messages: + msg327954 |
| 2018-10-18 09:50:45 | asvetlov | set | messages: + msg327953 |
| 2018-10-18 08:50:17 | natim | create | |