Issue 37348: Optimize PyUnicode_FromString for short ASCII strings [proxy]

The Wayback Machine - https://web.archive.org/web/20210125014653/https://bugs.python.org/issue37348

Issue37348

classification

Title:	Optimize PyUnicode_FromString for short ASCII strings
Type:	performance	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.9

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	methane, serhiy.storchaka, vstinner
Priority:	normal	Keywords:	patch

Created on 2019-06-20 11:40 by methane, last changed 2019-06-24 07:08 by methane. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 14264	closed	methane, 2019-06-20 11:41
PR 14273	closed	methane, 2019-06-20 16:00
PR 14283	merged	methane, 2019-06-21 10:28
PR 14291	closed	methane, 2019-06-21 16:53

Messages (15)
msg346115 - (view)	Author: Inada Naoki (methane) *	Date: 2019-06-20 11:40
_PyUnicode_FromASCII(s, len) is faster than PyUnicode_FromString(s) because PyUnicode_FromString() uses temporary _PyUnicodeWriter to support UTF-8. But _PyUnicode_FromASCII("hello", strlen("hello"))` is not as easy as `PyUnicode_FromString("hello")`. _PyUnicode_FROM_ASCII() is simple macro which wraps _PyUnicode_FromASCII which calls strlen() automatically: ``` /* Convenient wrapper for _PyUnicode_FromASCII */ #define _PyUnicode_FROM_ASCII(s) _PyUnicode_FromASCII((s), strlen(s)) ``` I believe recent compilers optimize away calls of strlen().
msg346116 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-06-20 11:58
> #define _PyUnicode_FROM_ASCII(s) _PyUnicode_FromASCII((s), strlen(s)) LGTM.
msg346117 - (view)	Author: Inada Naoki (methane) *	Date: 2019-06-20 12:27
I confirmed at least one measurable speedup: ./python -m pyperf timeit -s 'd={}' -- 'repr(d)' FromString: Mean +- std dev: 157 ns +- 3 ns FROM_ASCII: Mean +- std dev: 132 ns +- 3 ns >>> (157-132)/157 0.1592356687898089
msg346118 - (view)	Author: Inada Naoki (methane) *	Date: 2019-06-20 12:39
Should we make these APIs public? * rename `_PyUnicode_FromASCII` to `PyUnicode_FromASCIIAndSize`. * add `PyUnicode_FromASCII` as static inline function (without ABI). * add `#define _PyUnicode_FromASCII PyUnicode_FromASCIIAndSize` for source code compatibility.
msg346121 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-06-20 14:17
Most of changes are in not performance sensitive code. I do not think there is a benefit of using new macro there. If PyUnicode_FromString() is slow I prefer to optimize it instead of adding yet one esoteric private function for internal needs. In case of dict.__repr__() we can get even more gain by using _Py_IDENTIFIER or more general API proposed by Victor.
msg346127 - (view)	Author: Inada Naoki (methane) *	Date: 2019-06-20 16:03
> Most of changes are in not performance sensitive code. I do not think there is a benefit of using new macro there. Because I used just sed. > If PyUnicode_FromString() is slow I prefer to optimize it instead of adding yet one esoteric private function for internal needs. OK. There are some code like `PyUnicode_FromString(name)`. Optimizing PyUnicode_FromString will be more effective. I created PR 14273. > In case of dict.__repr__() we can get even more gain by using _Py_IDENTIFIER or more general API proposed by Victor. Of course, I used it just for micro benchmarking. Optimizing it is not a goal. In case of PR 14273: $ ./python -m pyperf timeit -s 'd={}' -- 'repr(d)' ..................... Mean +- std dev: 138 ns +- 2 ns
msg346132 - (view)	Author: Inada Naoki (methane) *	Date: 2019-06-20 16:32
Oh, wait. Why we used _PyUnicodeWriter here? Decoding UTF-8 must not require it. 2-pass is enough.
msg346134 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-06-20 16:33
> _PyUnicode_FromASCII(s, len) is faster than PyUnicode_FromString(s) because PyUnicode_FromString() uses temporary _PyUnicodeWriter to support UTF-8. I don't understand how _PyUnicodeWriter could be slow. It does not overallocate by default. It's just wrapper to implement efficient memory management. > Oh, wait. Why we used _PyUnicodeWriter here? To optimize decoding errors: the error handler can use replacement string longer than 1 character. Overallocation is used in this case.
msg346202 - (view)	Author: Inada Naoki (methane) *	Date: 2019-06-21 10:30
> I don't understand how _PyUnicodeWriter could be slow. It does not overallocate by default. It's just wrapper to implement efficient memory management. I misunderstood _PyUnicodeWriter. I thought it caused one more allocation, but it doesn't. But _PyUnicodeWriter is still slow, because gcc and clang are not smart enough to optimize _PyUnicodeWriter_Init() & _PyUnicodeWriter_Prepare(). See this example: ``` #define PY_SSIZE_T_CLEAN #include <Python.h> #define S(s) (s),strlen(s) int main(int argc, char argv[]) { Py_Initialize(); for (int i=0; i<100000000; i++) { //PyObject s = PyUnicode_FromString("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"); PyObject *s = _PyUnicode_FromASCII(S("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")); Py_DECREF(s); } return 0; } ``` PyUnicode_FromString() takes about 4 sec on my machine. _PyUnicode_FromASCII() is about 2 sec. By skipping _PyUnicodeWriter for ASCII string (GH-14283), PyUnicode_FromString() takes about 3 sec. ``` $ time ./x # PyUnicode_FromString real 0m4.085s user 0m4.081s sys 0m0.004s $ time ./y # PyUnicode_FromString (skip _PyUnicode_Writer, GH-14283) real 0m2.988s user 0m2.988s sys 0m0.000s $ time ./z # _PyUnicode_FromASCII $ time ./z real 0m1.975s user 0m1.975s sys 0m0.000s ```
msg346203 - (view)	Author: Inada Naoki (methane) *	Date: 2019-06-21 10:37
Another micro benchmark: ``` $ ./python-master -m pyperf timeit -o m1.json 'b=b"foobar"' -- 'b.decode()' ..................... Mean +- std dev: 93.1 ns +- 2.4 ns $ ./python -m pyperf timeit -o m2.json 'b=b"foobar"' -- 'b.decode()' ..................... Mean +- std dev: 83.1 ns +- 2.6 ns $ ./python -m pyperf compare_to m1.json m2.json Mean +- std dev: [m1] 93.1 ns +- 2.4 ns -> [m2] 83.1 ns +- 2.6 ns: 1.12x faster (-11%) ```
msg346237 - (view)	Author: Inada Naoki (methane) *	Date: 2019-06-21 17:10
PR 14291 seems simpler than PR 14283. But PR 14283 is faster because _PyUnicodeWriter is a learge struct. master: 3.7sec PR 14283: 2.9sec PR 14291: 3.45sec compiler: gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 without LTO, without PGO
msg346247 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-06-21 21:03
I'm confused by the issue title: PyUnicode_GetString() doesn't exist, it's PyUnicode_FromString() :-) I changed the title.
msg346351 - (view)	Author: Inada Naoki (methane) *	Date: 2019-06-24 03:30
New changeset 770847a7db33b3d4c451b42372b6942687aa6121 by Inada Naoki in branch 'master': bpo-37348: optimize decoding ASCII string (GH-14283) https://github.com/python/cpython/commit/770847a7db33b3d4c451b42372b6942687aa6121
msg346354 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-06-24 06:13
Could you please measure the performance for long strings (1000, 10000 and 100000 characters): a long ASCII string and a long ASCII string ending with a non-ASCII character?
msg346356 - (view)	Author: Inada Naoki (methane) *	Date: 2019-06-24 07:08
This optimization is only for short strings. There is no significant difference for long and non-ASCII strings. ``` # 1000 ASCII $ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"f"1000' -- 'b.decode()' python-master: ..................... 196 ns +- 1 ns python: ..................... 185 ns +- 1 ns Mean +- std dev: [python-master] 196 ns +- 1 ns -> [python] 185 ns +- 1 ns: 1.06x faster (-6%) # 10000 ASCII $ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"f"10000' -- 'b.decode()' python-master: ..................... 671 ns +- 4 ns python: ..................... 662 ns +- 1 ns Mean +- std dev: [python-master] 671 ns +- 4 ns -> [python] 662 ns +- 1 ns: 1.01x faster (-1%) # 100000 ASCII $ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"f"100000' -- 'b.decode()' python-master: ..................... 5.86 us +- 0.03 us python: ..................... 5.85 us +- 0.02 us Mean +- std dev: [python-master] 5.86 us +- 0.03 us -> [python] 5.85 us +- 0.02 us: 1.00x faster (-0%) # 1 non-ASCII $ ./python -m pyperf timeit --compare-to=./python-master -s 'b="あ".encode()' -- 'b.decode()' python-master: ..................... 138 ns +- 2 ns python: ..................... 136 ns +- 2 ns Mean +- std dev: [python-master] 138 ns +- 2 ns -> [python] 136 ns +- 2 ns: 1.02x faster (-2%) # 1000 ASCII + 1 non-ASCII $ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"x"1000 + "あ".encode()' -- 'b.decode()' python-master: ..................... 361 ns +- 9 ns python: ..................... 360 ns +- 5 ns Mean +- std dev: [python-master] 361 ns +- 9 ns -> [python] 360 ns +- 5 ns: 1.00x faster (-0%) Not significant! # 10000 ASCII + 1 non-ASCII $ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"x"*10000 + "あ".encode()' -- 'b.decode()' python-master: ..................... 2.83 us +- 0.02 us python: ..................... 2.83 us +- 0.03 us Mean +- std dev: [python-master] 2.83 us +- 0.02 us -> [python] 2.83 us +- 0.03 us: 1.00x slower (+0%) Not significant! ```

History
Date	User	Action	Args
2019-06-24 07:08:11	methane	set	messages: + msg346356
2019-06-24 06:13:28	serhiy.storchaka	set	messages: + msg346354
2019-06-24 03:30:43	methane	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2019-06-24 03:30:30	methane	set	messages: + msg346351
2019-06-21 21:03:51	vstinner	set	messages: + msg346247 title: Optimize PyUnicode_GetString for short ASCII strings -> Optimize PyUnicode_FromString for short ASCII strings
2019-06-21 17:10:02	methane	set	messages: + msg346237
2019-06-21 16:53:37	methane	set	pull_requests: + pull_request14114
2019-06-21 10:37:24	methane	set	messages: + msg346203
2019-06-21 10:30:07	methane	set	messages: + msg346202
2019-06-21 10:28:36	methane	set	pull_requests: + pull_request14105
2019-06-20 16:33:54	vstinner	set	messages: + msg346134
2019-06-20 16:32:09	methane	set	messages: + msg346132
2019-06-20 16:03:34	methane	set	messages: + msg346127 title: add _PyUnicode_FROM_ASCII macro -> Optimize PyUnicode_GetString for short ASCII strings
2019-06-20 16:00:33	methane	set	pull_requests: + pull_request14097
2019-06-20 14:17:35	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg346121
2019-06-20 12:39:17	methane	set	messages: + msg346118
2019-06-20 12:27:06	methane	set	messages: + msg346117
2019-06-20 11:58:17	vstinner	set	nosy: + vstinner messages: + msg346116
2019-06-20 11:41:20	methane	set	keywords: + patch stage: patch review pull_requests: + pull_request14091
2019-06-20 11:40:57	methane	create