Created on 2020-04-17 20:58 by Antony.Lee, last changed 2020-04-21 00:32 by vstinner. This issue is now closed.
Consider the following example, linewrapping 10^4 bytes in hex form to 128 characters per line, on Py 3.8.2 (Arch Linux repo package):
In [1]: import numpy as np, math
In [2]: data = np.random.randint(0, 256, (100, 100), dtype=np.uint8).tobytes()
In [3]: %timeit data.hex("\n", -64)
123 µs ± 5.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [4]: %timeit h = data.hex(); "\n".join([h[n * 128 : (n+1) * 128] for n in range(math.ceil(len(h) / 128))])
45.4 µs ± 746 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [5]: h = data.hex(); "\n".join([h[n * 128 : (n+1) * 128] for n in range(math.ceil(len(h) / 128))]) == data.hex("\n", -64)
Out[5]: True
(the last line checks the validity of the code.)
It appears that a naive manual wrap is nearly 3x faster than the builtin functionality.
I replicated this behavior. This looks like the relevant loop in pystrhex.c:
for (i=j=0; i < arglen; ++i) {
assert((j + 1) < resultlen);
unsigned char c;
c = (argbuf[i] >> 4) & 0x0f;
retbuf[j++] = Py_hexdigits[c];
c = argbuf[i] & 0x0f;
retbuf[j++] = Py_hexdigits[c];
if (bytes_per_sep_group && i < arglen - 1) {
Py_ssize_t anchor;
anchor = (bytes_per_sep_group > 0) ? (arglen - 1 - i) : (i + 1);
if (anchor % abs_bytes_per_sep == 0) {
retbuf[j++] = sep_char;
}
}
}
It looks like this can be refactored a bit for a tighter inner loop with fewer if-tests. I can work on a PR.
========== Master ==========
.\python.bat -m pyperf timeit -s "import random, math; data=random.getrandbits(8*10_000_000).to_bytes(10_000_000, 'big')" "temp = data.hex(); '\n'.join(temp[n:n+128] for n in range(0, len(temp), 128))"
Mean +- std dev: 74.3 ms +- 1.1 ms
.\python.bat -m pyperf timeit -s "import random; data=random.getrandbits(8*10_000_000).to_bytes(10_000_000, 'big')" "data.hex('\n', -64)"
Mean +- std dev: 44.0 ms +- 0.3 ms
========== PR 19594 ==========
.\python.bat -m pyperf timeit -s "import random, math; data=random.getrandbits(8*10_000_000).to_bytes(10_000_000, 'big')" "temp = data.hex(); '\n'.join(temp[n:n+128] for n in range(0, len(temp), 128))"
Mean +- std dev: 65.2 ms +- 0.6 ms
.\python.bat -m pyperf timeit -s "import random; data=random.getrandbits(8*10_000_000).to_bytes(10_000_000, 'big')" "data.hex('\n', -64)"
Mean +- std dev: 18.1 ms +- 0.1 ms
New changeset 6a9e80a93148b13e4d3bceaab5ea1804ab0e64d5 by sweeneyde in branch 'master': bpo-40313: speed up bytes.hex() (GH-19594) https://github.com/python/cpython/commit/6a9e80a93148b13e4d3bceaab5ea1804ab0e64d5
Thanks Dennis for the optimization! FYI I also pushed another optimization recently: commit 455df9779873b8335b20292b8d0c43d66338a4db Author: Victor Stinner <vstinner@python.org> Date: Wed Apr 15 14:05:24 2020 +0200 Optimize _Py_strhex_impl() (GH-19535) Avoid a temporary buffer to create a bytes string: use PyBytes_FromStringAndSize() to directly allocate a bytes object.
nosy:
+ vstinner
messages:
+ msg366904
resolution: fixed
stage: patch review -> resolved
messages:
+ msg366761
versions:
+ Python 3.9, - Python 3.8