Issue25856
Created on 2015-12-13 16:17 by serhiy.storchaka, last changed 2016-09-09 21:58 by serhiy.storchaka. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| intern___module__.patch | serhiy.storchaka, 2015-12-18 16:49 | review | ||
| intern_and_cache___module__.patch | serhiy.storchaka, 2015-12-29 08:23 | review | ||
| intern_and_cache___module__2.patch | serhiy.storchaka, 2016-09-07 11:50 | review | ||
| intern_and_cache___module__3.patch | serhiy.storchaka, 2016-09-07 18:19 | review | ||
| Messages (15) | |||
|---|---|---|---|
| msg256322 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2015-12-13 16:17 | |
One of purposes of the STACK_GLOBAL opcode introduced in pickle protocol 4 is to avoid repeating module name for different globals of the same module.
>>> pickletools.dis(pickletools.optimize(pickle.dumps([sys.getsizeof, sys.intern], 4)))
0: \x80 PROTO 4
2: \x95 FRAME 33
11: ] EMPTY_LIST
12: ( MARK
13: \x8c SHORT_BINUNICODE 'sys'
18: \x94 MEMOIZE (as 0)
19: \x8c SHORT_BINUNICODE 'getsizeof'
30: \x93 STACK_GLOBAL
31: h BINGET 0
33: \x8c SHORT_BINUNICODE 'intern'
41: \x93 STACK_GLOBAL
42: e APPENDS (MARK at 12)
43: . STOP
highest protocol among opcodes = 4
But this doesn't work with the itertools module.
>>> pickletools.dis(pickletools.optimize(pickle.dumps([itertools.chain, itertools.accumulate], 4)))
0: \x80 PROTO 4
2: \x95 FRAME 47
11: ] EMPTY_LIST
12: ( MARK
13: \x8c SHORT_BINUNICODE 'itertools'
24: \x8c SHORT_BINUNICODE 'chain'
31: \x93 STACK_GLOBAL
32: \x8c SHORT_BINUNICODE 'itertools'
43: \x8c SHORT_BINUNICODE 'accumulate'
55: \x93 STACK_GLOBAL
56: e APPENDS (MARK at 12)
57: . STOP
highest protocol among opcodes = 4
That is because the __module__ attribute of itertools members is not interned.
>>> sys.getsizeof.__module__ is sys.intern.__module__
True
>>> itertools.chain.__module__ is itertools.chain.__module__
False
In addition to inefficient pickle this perhaps leads to small performance hit on accessing the __module__ attribute or using its value as dictionary key.
|
|||
| msg256694 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2015-12-18 16:49 | |
Actually this is not specific to itertools. Every non-heap class not from the builtins module is affected. Proposed patch just interns the __module__ value. The patch also cleans up the code. |
|||
| msg257171 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2015-12-29 07:43 | |
Interning the __module__ string causes small performance hit: $ ./python -m timeit -s "from itertools import chain" -- "chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__" Unpatched: 1.93 usec per loop Patched: 4.09 usec per loop This can be avoided if cache created string in type's __dict__. Following patch makes __module__ retrieving for non-heap types as fast as for heap types: Patched2: 0.871 usec per loop |
|||
| msg274799 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2016-09-07 11:43 | |
Could anyone please make a review? |
|||
| msg274842 - (view) | Author: Benjamin Peterson (benjamin.peterson) * | Date: 2016-09-07 17:18 | |
I don't understand this code: type->tp_dict && PyDict_Check(type->tp_dict) since the code explicitly assume it's not NULL and access it as a dict earlier in the function |
|||
| msg274857 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2016-09-07 18:19 | |
Good catch Benjamin! There is yet one bug -- the type type already has the __module__ attribute, but it is a descriptor, not a string. Updated patch fixes these bugs. |
|||
| msg274861 - (view) | Author: Benjamin Peterson (benjamin.peterson) * | Date: 2016-09-07 18:30 | |
I don't understand why you need to check the validity of tp_dict at all. We generally assume it's a dict. |
|||
| msg274862 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2016-09-07 18:36 | |
Indeed, the PyDict_Check() check can be omitted. |
|||
| msg274863 - (view) | Author: Benjamin Peterson (benjamin.peterson) * | Date: 2016-09-07 18:40 | |
I don't think it can be NULL either. On Wed, Sep 7, 2016, at 11:36, Serhiy Storchaka wrote: > > Serhiy Storchaka added the comment: > > Indeed, the PyDict_Check() check can be omitted. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue25856> > _______________________________________ |
|||
| msg274865 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2016-09-07 18:45 | |
It can be NULL in very rare cases. See for example issue26906. |
|||
| msg275166 - (view) | Author: Benjamin Peterson (benjamin.peterson) * | Date: 2016-09-08 21:29 | |
I think it's better not write into the type dict in a getter. You might just use PyUnicode_InternFromString every time. FWIW, __name__ also has this behavior. |
|||
| msg275175 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2016-09-08 21:39 | |
This is what my first path does. But this slows down retrieving the __module__ attribute (from 0.2 usec to 0.4 usec on my computer). Maybe I haven't bother. Interning __name__ and __qualname__ is less important, because different functions and classes usually have different names. |
|||
| msg275176 - (view) | Author: Benjamin Peterson (benjamin.peterson) * | Date: 2016-09-08 21:42 | |
I'm not too worried about slowing down __module__ especially since it's not any slower for heap types or types in builtins. On Thu, Sep 8, 2016, at 14:39, Serhiy Storchaka wrote: > > Serhiy Storchaka added the comment: > > This is what my first path does. But this slows down retrieving the > __module__ attribute (from 0.2 usec to 0.4 usec on my computer). Maybe I > haven't bother. > > Interning __name__ and __qualname__ is less important, because different > functions and classes usually have different names. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue25856> > _______________________________________ |
|||
| msg275460 - (view) | Author: Roundup Robot (python-dev) | Date: 2016-09-09 21:55 | |
New changeset 861ddad3e0c1 by Serhiy Storchaka in branch 'default': Issue #25856: The __module__ attribute of extension classes and functions https://hg.python.org/cpython/rev/861ddad3e0c1 |
|||
| msg275464 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2016-09-09 21:58 | |
Thank you for helpful review Benjamin. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2016-09-09 21:58:55 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: + msg275464 stage: patch review -> resolved |
| 2016-09-09 21:55:21 | python-dev | set | nosy:
+ python-dev messages: + msg275460 |
| 2016-09-08 21:42:50 | benjamin.peterson | set | messages: + msg275176 |
| 2016-09-08 21:39:35 | serhiy.storchaka | set | messages: + msg275175 |
| 2016-09-08 21:29:36 | benjamin.peterson | set | messages: + msg275166 |
| 2016-09-07 18:45:24 | serhiy.storchaka | set | messages: + msg274865 |
| 2016-09-07 18:40:12 | benjamin.peterson | set | messages: + msg274863 |
| 2016-09-07 18:36:12 | serhiy.storchaka | set | messages: + msg274862 |
| 2016-09-07 18:30:00 | benjamin.peterson | set | messages: + msg274861 |
| 2016-09-07 18:19:55 | serhiy.storchaka | set | files:
+ intern_and_cache___module__3.patch messages: + msg274857 |
| 2016-09-07 17:18:43 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg274842 |
| 2016-09-07 11:50:55 | serhiy.storchaka | set | files: + intern_and_cache___module__2.patch |
| 2016-09-07 11:43:10 | serhiy.storchaka | set | assignee: serhiy.storchaka messages: + msg274799 |
| 2015-12-29 08:23:12 | serhiy.storchaka | set | files: + intern_and_cache___module__.patch |
| 2015-12-29 08:22:54 | serhiy.storchaka | set | files: - intern_and_cache___module__.patch |
| 2015-12-29 07:43:41 | serhiy.storchaka | set | files:
+ intern_and_cache___module__.patch type: performance -> enhancement messages: + msg257171 |
| 2015-12-18 16:49:06 | serhiy.storchaka | set | files:
+ intern___module__.patch title: The __module__ attribute of itertools members is not interned -> The __module__ attribute of non-heap classes is not interned messages: + msg256694 keywords:
+ patch |
| 2015-12-13 16:17:44 | serhiy.storchaka | create | |