Issue24874
Created on 2015-08-15 21:23 by rhettinger, last changed 2022-04-11 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| time_cycle.py | rhettinger, 2015-08-15 22:30 | Simple timing suite for cycle() | ||
| cycle5_brokensetstate.diff | rhettinger, 2015-08-15 23:28 | Partial patch -- still needs work on __setstate__ | review | |
| cycle9.diff | rhettinger, 2015-08-16 05:05 | More complete patch that passes all tests | review | |
| cycle_reduce_2.patch | serhiy.storchaka, 2015-08-16 08:44 | Simpler, faster and more memory efficient pickling | review | |
| cycle_reduce_3.patch | serhiy.storchaka, 2015-08-16 08:44 | Faster unpickled cycle object | review | |
| Messages (8) | |||
|---|---|---|---|
| msg248662 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2015-08-15 21:32 | |
When a cycle object has fully consumed its input iterable, __reduce__ method uses the returns a space-inefficient result when space-efficient alternative is available.
# Current way of restoring a cycle object with excess info in setstate:
>>> c = cycle(iter('de'))
>>> c.__setstate__((['a', 'b', 'c', 'd', 'e'], 1))
>>> ''.join(next(c) for i in range(20)) # next 20 values
'deabcdeabcdeabcdeabc'
# The same result can be achieved with less information:
>>> c = cycle(iter('de'))
>>> c.__setstate__((['a', 'b', 'c'], 0))
>>> ''.join(next(c) for i in range(20)) # next 20 values
'deabcdeabcdeabcdeabc'
|
|||
| msg248664 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2015-08-15 21:54 | |
Also, looking at the source for itertools.cycle(), it looks like the overall speed could be boosted considerably by looping over the saved list directly rather than allocating a new list iterator every time the cycle loops around. |
|||
| msg248669 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2015-08-15 23:28 | |
Attaching a partial patch: * More than doubles the speed of cycle() * Cuts size of __reduce__ result by about a third (on average) * Still needs work on __setstate__ for a correct restore. |
|||
| msg248674 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2015-08-16 04:57 | |
Current cycle implementation is simple and clever, but can be optimized. The part about iterating LGTM (but looks the firstpass field can be eliminated at all). But __reduce__ doesn't look so optimal. It makes a copy of a list and makes iterating an unpickled cycle object slow. It would be more optimal if create new list with rotated content or even rotate original list inplace. |
|||
| msg248675 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2015-08-16 05:52 | |
Added an updated patch that passes all tests. |
|||
| msg248677 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2015-08-16 08:44 | |
Original Raymonds reason in msg248662 is not valid. Pickling a cycle object that fully consumed its input iterable is already space-inefficient. >>> import itertools, pickle, pickletools >>> c = itertools.cycle(iter('abcde')) >>> [next(c) for i in range(8)] ['a', 'b', 'c', 'd', 'e', 'a', 'b', 'c'] >>> pickle.dumps(c) b'\x80\x03citertools\ncycle\nq\x00cbuiltins\niter\nq\x01]q\x02(X\x01\x00\x00\x00aq\x03X\x01\x00\x00\x00bq\x04X\x01\x00\x00\x00cq\x05X\x01\x00\x00\x00dq\x06X\x01\x00\x00\x00eq\x07e\x85q\x08Rq\tK\x03b\x85q\nRq\x0bh\x02K\x01\x86q\x0cb.' >>> pickletools.dis(pickle.dumps(c)) 0: \x80 PROTO 3 2: c GLOBAL 'itertools cycle' 19: q BINPUT 0 21: c GLOBAL 'builtins iter' 36: q BINPUT 1 38: ] EMPTY_LIST 39: q BINPUT 2 41: ( MARK 42: X BINUNICODE 'a' 48: q BINPUT 3 50: X BINUNICODE 'b' 56: q BINPUT 4 58: X BINUNICODE 'c' 64: q BINPUT 5 66: X BINUNICODE 'd' 72: q BINPUT 6 74: X BINUNICODE 'e' 80: q BINPUT 7 82: e APPENDS (MARK at 41) 83: \x85 TUPLE1 84: q BINPUT 8 86: R REDUCE 87: q BINPUT 9 89: K BININT1 3 91: b BUILD 92: \x85 TUPLE1 93: q BINPUT 10 95: R REDUCE 96: q BINPUT 11 98: h BINGET 2 100: K BININT1 1 102: \x86 TUPLE2 103: q BINPUT 12 105: b BUILD 106: . STOP highest protocol among opcodes = 2 An internal iterator is not pickled as iter("de"), but as an iterator of the list ["a", "b", "c", "d", "e"] with 3 items consumed. This list also saved as a part of a cycle object state, but not as a copy, but as a reference. There are two alternative patches. Both keep Raymonds optimization of cycle iterating, but have advantages. cycle_reduce_2.patch makes __reduce__ faster and more memory efficient than Raymonds variant. cycle_reduce_3.patch makes unpickled cycle object so optimized as original. |
|||
| msg248692 - (view) | Author: Roundup Robot (python-dev) | Date: 2015-08-16 21:49 | |
New changeset 17b5c8ba6875 by Raymond Hettinger in branch 'default': Issue #24874: Speed-up itertools and make it pickles more compact. https://hg.python.org/cpython/rev/17b5c8ba6875 |
|||
| msg248693 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2015-08-16 21:51 | |
Applied the cycle2 patch but kept the signature the same as the original reduce (using a number instead of a boolean). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:19 | admin | set | github: 69062 |
| 2015-08-16 21:52:00 | rhettinger | set | status: open -> closed resolution: fixed messages: + msg248693 stage: patch review -> resolved |
| 2015-08-16 21:49:42 | python-dev | set | nosy:
+ python-dev messages: + msg248692 |
| 2015-08-16 15:45:20 | rhettinger | set | priority: normal -> low |
| 2015-08-16 08:44:41 | serhiy.storchaka | set | files: + cycle_reduce_3.patch |
| 2015-08-16 08:44:29 | serhiy.storchaka | set | files: + cycle_reduce_2.patch |
| 2015-08-16 08:44:10 | serhiy.storchaka | set | messages: + msg248677 |
| 2015-08-16 05:52:39 | rhettinger | set | messages: + msg248675 |
| 2015-08-16 05:05:26 | rhettinger | set | files: + cycle9.diff |
| 2015-08-16 04:57:52 | serhiy.storchaka | set | messages: + msg248674 |
| 2015-08-16 04:03:17 | serhiy.storchaka | set | nosy:
+ pitrou, alexandre.vassalotti, serhiy.storchaka stage: patch review |
| 2015-08-15 23:28:54 | rhettinger | set | files:
+ cycle5_brokensetstate.diff keywords: + patch messages: + msg248669 |
| 2015-08-15 22:30:16 | rhettinger | set | files: + time_cycle.py |
| 2015-08-15 21:54:15 | rhettinger | set | messages: + msg248664 |
| 2015-08-15 21:32:35 | rhettinger | set | messages: - msg248657 |
| 2015-08-15 21:32:26 | rhettinger | set | messages: + msg248662 |
| 2015-08-15 21:23:07 | rhettinger | create | |