Issue37348
Created on 2019-06-20 11:40 by methane, last changed 2019-06-24 07:08 by methane. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 14264 | closed | methane, 2019-06-20 11:41 | |
| PR 14273 | closed | methane, 2019-06-20 16:00 | |
| PR 14283 | merged | methane, 2019-06-21 10:28 | |
| PR 14291 | closed | methane, 2019-06-21 16:53 | |
| Messages (15) | |||
|---|---|---|---|
| msg346115 - (view) | Author: Inada Naoki (methane) * | Date: 2019-06-20 11:40 | |
_PyUnicode_FromASCII(s, len) is faster than PyUnicode_FromString(s) because PyUnicode_FromString() uses temporary _PyUnicodeWriter to support UTF-8.
But _PyUnicode_FromASCII("hello", strlen("hello"))` is not as easy as `PyUnicode_FromString("hello")`.
_PyUnicode_FROM_ASCII() is simple macro which wraps _PyUnicode_FromASCII which calls strlen() automatically:
```
/* Convenient wrapper for _PyUnicode_FromASCII */
#define _PyUnicode_FROM_ASCII(s) _PyUnicode_FromASCII((s), strlen(s))
```
I believe recent compilers optimize away calls of strlen().
|
|||
| msg346116 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-06-20 11:58 | |
> #define _PyUnicode_FROM_ASCII(s) _PyUnicode_FromASCII((s), strlen(s)) LGTM. |
|||
| msg346117 - (view) | Author: Inada Naoki (methane) * | Date: 2019-06-20 12:27 | |
I confirmed at least one measurable speedup:
./python -m pyperf timeit -s 'd={}' -- 'repr(d)'
FromString: Mean +- std dev: 157 ns +- 3 ns
FROM_ASCII: Mean +- std dev: 132 ns +- 3 ns
>>> (157-132)/157
0.1592356687898089
|
|||
| msg346118 - (view) | Author: Inada Naoki (methane) * | Date: 2019-06-20 12:39 | |
Should we make these APIs public? * rename `_PyUnicode_FromASCII` to `PyUnicode_FromASCIIAndSize`. * add `PyUnicode_FromASCII` as static inline function (without ABI). * add `#define _PyUnicode_FromASCII PyUnicode_FromASCIIAndSize` for source code compatibility. |
|||
| msg346121 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2019-06-20 14:17 | |
Most of changes are in not performance sensitive code. I do not think there is a benefit of using new macro there. If PyUnicode_FromString() is slow I prefer to optimize it instead of adding yet one esoteric private function for internal needs. In case of dict.__repr__() we can get even more gain by using _Py_IDENTIFIER or more general API proposed by Victor. |
|||
| msg346127 - (view) | Author: Inada Naoki (methane) * | Date: 2019-06-20 16:03 | |
> Most of changes are in not performance sensitive code. I do not think there is a benefit of using new macro there. Because I used just sed. > If PyUnicode_FromString() is slow I prefer to optimize it instead of adding yet one esoteric private function for internal needs. OK. There are some code like `PyUnicode_FromString(name)`. Optimizing PyUnicode_FromString will be more effective. I created PR 14273. > In case of dict.__repr__() we can get even more gain by using _Py_IDENTIFIER or more general API proposed by Victor. Of course, I used it just for micro benchmarking. Optimizing it is not a goal. In case of PR 14273: $ ./python -m pyperf timeit -s 'd={}' -- 'repr(d)' ..................... Mean +- std dev: 138 ns +- 2 ns |
|||
| msg346132 - (view) | Author: Inada Naoki (methane) * | Date: 2019-06-20 16:32 | |
Oh, wait. Why we used _PyUnicodeWriter here? Decoding UTF-8 must not require it. 2-pass is enough. |
|||
| msg346134 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-06-20 16:33 | |
> _PyUnicode_FromASCII(s, len) is faster than PyUnicode_FromString(s) because PyUnicode_FromString() uses temporary _PyUnicodeWriter to support UTF-8. I don't understand how _PyUnicodeWriter could be slow. It does not overallocate by default. It's just wrapper to implement efficient memory management. > Oh, wait. Why we used _PyUnicodeWriter here? To optimize decoding errors: the error handler can use replacement string longer than 1 character. Overallocation is used in this case. |
|||
| msg346202 - (view) | Author: Inada Naoki (methane) * | Date: 2019-06-21 10:30 | |
> I don't understand how _PyUnicodeWriter could be slow. It does not overallocate by default. It's just wrapper to implement efficient memory management.
I misunderstood _PyUnicodeWriter. I thought it caused one more allocation, but it doesn't.
But _PyUnicodeWriter is still slow, because gcc and clang are not smart enough to optimize _PyUnicodeWriter_Init() & _PyUnicodeWriter_Prepare().
See this example:
```
#define PY_SSIZE_T_CLEAN
#include <Python.h>
#define S(s) (s),strlen(s)
int
main(int argc, char *argv[])
{
Py_Initialize();
for (int i=0; i<100000000; i++) {
//PyObject *s = PyUnicode_FromString("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
PyObject *s = _PyUnicode_FromASCII(S("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));
Py_DECREF(s);
}
return 0;
}
```
PyUnicode_FromString() takes about 4 sec on my machine. _PyUnicode_FromASCII() is about 2 sec.
By skipping _PyUnicodeWriter for ASCII string (GH-14283), PyUnicode_FromString() takes about 3 sec.
```
$ time ./x # PyUnicode_FromString
real 0m4.085s
user 0m4.081s
sys 0m0.004s
$ time ./y # PyUnicode_FromString (skip _PyUnicode_Writer, GH-14283)
real 0m2.988s
user 0m2.988s
sys 0m0.000s
$ time ./z # _PyUnicode_FromASCII
$ time ./z
real 0m1.975s
user 0m1.975s
sys 0m0.000s
```
|
|||
| msg346203 - (view) | Author: Inada Naoki (methane) * | Date: 2019-06-21 10:37 | |
Another micro benchmark: ``` $ ./python-master -m pyperf timeit -o m1.json 'b=b"foobar"' -- 'b.decode()' ..................... Mean +- std dev: 93.1 ns +- 2.4 ns $ ./python -m pyperf timeit -o m2.json 'b=b"foobar"' -- 'b.decode()' ..................... Mean +- std dev: 83.1 ns +- 2.6 ns $ ./python -m pyperf compare_to m1.json m2.json Mean +- std dev: [m1] 93.1 ns +- 2.4 ns -> [m2] 83.1 ns +- 2.6 ns: 1.12x faster (-11%) ``` |
|||
| msg346237 - (view) | Author: Inada Naoki (methane) * | Date: 2019-06-21 17:10 | |
PR 14291 seems simpler than PR 14283. But PR 14283 is faster because _PyUnicodeWriter is a learge struct. master: 3.7sec PR 14283: 2.9sec PR 14291: 3.45sec compiler: gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 without LTO, without PGO |
|||
| msg346247 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-06-21 21:03 | |
I'm confused by the issue title: PyUnicode_GetString() doesn't exist, it's PyUnicode_FromString() :-) I changed the title. |
|||
| msg346351 - (view) | Author: Inada Naoki (methane) * | Date: 2019-06-24 03:30 | |
New changeset 770847a7db33b3d4c451b42372b6942687aa6121 by Inada Naoki in branch 'master': bpo-37348: optimize decoding ASCII string (GH-14283) https://github.com/python/cpython/commit/770847a7db33b3d4c451b42372b6942687aa6121 |
|||
| msg346354 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2019-06-24 06:13 | |
Could you please measure the performance for long strings (1000, 10000 and 100000 characters): a long ASCII string and a long ASCII string ending with a non-ASCII character? |
|||
| msg346356 - (view) | Author: Inada Naoki (methane) * | Date: 2019-06-24 07:08 | |
This optimization is only for short strings. There is no significant difference for long and non-ASCII strings. ``` # 1000 ASCII $ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"f"*1000' -- 'b.decode()' python-master: ..................... 196 ns +- 1 ns python: ..................... 185 ns +- 1 ns Mean +- std dev: [python-master] 196 ns +- 1 ns -> [python] 185 ns +- 1 ns: 1.06x faster (-6%) # 10000 ASCII $ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"f"*10000' -- 'b.decode()' python-master: ..................... 671 ns +- 4 ns python: ..................... 662 ns +- 1 ns Mean +- std dev: [python-master] 671 ns +- 4 ns -> [python] 662 ns +- 1 ns: 1.01x faster (-1%) # 100000 ASCII $ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"f"*100000' -- 'b.decode()' python-master: ..................... 5.86 us +- 0.03 us python: ..................... 5.85 us +- 0.02 us Mean +- std dev: [python-master] 5.86 us +- 0.03 us -> [python] 5.85 us +- 0.02 us: 1.00x faster (-0%) # 1 non-ASCII $ ./python -m pyperf timeit --compare-to=./python-master -s 'b="あ".encode()' -- 'b.decode()' python-master: ..................... 138 ns +- 2 ns python: ..................... 136 ns +- 2 ns Mean +- std dev: [python-master] 138 ns +- 2 ns -> [python] 136 ns +- 2 ns: 1.02x faster (-2%) # 1000 ASCII + 1 non-ASCII $ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"x"*1000 + "あ".encode()' -- 'b.decode()' python-master: ..................... 361 ns +- 9 ns python: ..................... 360 ns +- 5 ns Mean +- std dev: [python-master] 361 ns +- 9 ns -> [python] 360 ns +- 5 ns: 1.00x faster (-0%) Not significant! # 10000 ASCII + 1 non-ASCII $ ./python -m pyperf timeit --compare-to=./python-master -s 'b=b"x"*10000 + "あ".encode()' -- 'b.decode()' python-master: ..................... 2.83 us +- 0.02 us python: ..................... 2.83 us +- 0.03 us Mean +- std dev: [python-master] 2.83 us +- 0.02 us -> [python] 2.83 us +- 0.03 us: 1.00x slower (+0%) Not significant! ``` |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2019-06-24 07:08:11 | methane | set | messages: + msg346356 |
| 2019-06-24 06:13:28 | serhiy.storchaka | set | messages: + msg346354 |
| 2019-06-24 03:30:43 | methane | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2019-06-24 03:30:30 | methane | set | messages: + msg346351 |
| 2019-06-21 21:03:51 | vstinner | set | messages:
+ msg346247 title: Optimize PyUnicode_GetString for short ASCII strings -> Optimize PyUnicode_FromString for short ASCII strings |
| 2019-06-21 17:10:02 | methane | set | messages: + msg346237 |
| 2019-06-21 16:53:37 | methane | set | pull_requests: + pull_request14114 |
| 2019-06-21 10:37:24 | methane | set | messages: + msg346203 |
| 2019-06-21 10:30:07 | methane | set | messages: + msg346202 |
| 2019-06-21 10:28:36 | methane | set | pull_requests: + pull_request14105 |
| 2019-06-20 16:33:54 | vstinner | set | messages: + msg346134 |
| 2019-06-20 16:32:09 | methane | set | messages: + msg346132 |
| 2019-06-20 16:03:34 | methane | set | messages:
+ msg346127 title: add _PyUnicode_FROM_ASCII macro -> Optimize PyUnicode_GetString for short ASCII strings |
| 2019-06-20 16:00:33 | methane | set | pull_requests: + pull_request14097 |
| 2019-06-20 14:17:35 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg346121 |
| 2019-06-20 12:39:17 | methane | set | messages: + msg346118 |
| 2019-06-20 12:27:06 | methane | set | messages: + msg346117 |
| 2019-06-20 11:58:17 | vstinner | set | nosy:
+ vstinner messages: + msg346116 |
| 2019-06-20 11:41:20 | methane | set | keywords:
+ patch stage: patch review pull_requests: + pull_request14091 |
| 2019-06-20 11:40:57 | methane | create | |