Issue36012
Created on 2019-02-16 20:15 by rhettinger, last changed 2019-02-20 17:29 by rhettinger. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| perf-screenshot.png | nascheme, 2019-02-19 18:35 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 11907 | merged | scoder, 2019-02-17 14:31 | |
| Messages (11) | |||
|---|---|---|---|
| msg335714 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2019-02-16 20:15 | |
Benchmark show what writes to class variables are anomalously slow.
class A(object):
pass
A.x = 1 # This write is 3 to 5 times slower than other writes.
FWIW, the same operation for old-style classes in Python 2.7 was several times faster.
We should investigate to understand why the writes are so slow. There might be a good reason or there might be an opportunity for optimization.
-------------------------------------------------
$ python3.8 Tools/scripts/var_access_benchmark.py
Variable and attribute read access:
4.3 ns read_local
4.6 ns read_nonlocal
14.5 ns read_global
19.0 ns read_builtin
18.4 ns read_classvar_from_class
16.2 ns read_classvar_from_instance
24.7 ns read_instancevar
19.7 ns read_instancevar_slots
19.5 ns read_namedtuple
26.4 ns read_boundmethod
Variable and attribute write access:
4.4 ns write_local
5.1 ns write_nonlocal
18.2 ns write_global
103.9 ns write_classvar <== Outlier
35.4 ns write_instancevar
25.6 ns write_instancevar_slots
|
|||
| msg335733 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2019-02-16 23:34 | |
It seems 50% of the overhead (50ns) is due to two reasons: - 30-40% is due to the call to update_slot(type, name) after the item is set in the class dictionary. - 70-60% is due to all the extra work from _PyObject_GenericSetAttrWithDict until it ends calling _PyObjectDict_SetItem with the slot. If in the code of typeobject.c:type_setattro you change: res = _PyObject_GenericSetAttrWithDict((PyObject *)type, name, value, NULL); by PyObject* dictptr = _PyObject_GetDictPtr(type); res = _PyObjectDict_SetItem(type, dictptr, name, value); and delete the update_slot(type, name) call afterwards, the times are reduced to 50ns. |
|||
| msg335747 - (view) | Author: Yasser Alshalaan (Yasser Alshalaan) * | Date: 2019-02-17 02:53 | |
can you include your python 2.7 runs? for me it looks similar |
|||
| msg335763 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2019-02-17 08:50 | |
> can you include your python 2.7 runs? for me it looks similar
It will give similar results unless you switch to old-style classes (edit out the inheritance from object).
class A:
pass
A.x = 1
---------------------------------------
$ python2.7 var_access_benchmark.py
Variable and attribute read access:
6.7 ns read_local
10.9 ns read_global
18.9 ns read_builtin
24.4 ns read_classvar_from_class
30.2 ns read_classvar_from_instance
22.7 ns read_instancevar
25.5 ns read_instancevar_slots
99.3 ns read_namedtuple
40.9 ns read_boundmethod
Variable and attribute write access:
8.2 ns write_local
18.7 ns write_global
32.9 ns write_classvar <== Not formerly an outlier
32.5 ns write_instancevar
31.2 ns write_instancevar_slots
|
|||
| msg335776 - (view) | Author: Stefan Behnel (scoder) * | Date: 2019-02-17 14:36 | |
It turns out that "update_slot()" is always called, even when we are not updating a slot name (which is always a special dunder-name). The linear search for names in "update_slots()" is a huge waste of time here, and short-circuiting out of it when the name does not start with "_" cuts the overall update time by 50%. I pushed a PR. Another improvement would be a sub-linear algorithm for searching the slot name, but that's a bigger change. |
|||
| msg335794 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2019-02-18 00:59 | |
This are the timings that I am measuring with PR 11907: Variable and attribute read access: 5.7 ns read_local 5.9 ns read_nonlocal 16.2 ns read_global 24.5 ns read_builtin 20.9 ns read_classvar_from_class 20.0 ns read_classvar_from_instance 29.7 ns read_instancevar 24.7 ns read_instancevar_slots 22.9 ns read_namedtuple 36.8 ns read_boundmethod Variable and attribute write access: 6.9 ns write_local 6.9 ns write_nonlocal 26.7 ns write_global 65.4 ns write_classvar <----- Down from 120.6 ns 49.5 ns write_instancevar 34.5 ns write_instancevar_slots |
|||
| msg335795 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2019-02-18 01:32 | |
Wow, I didn't expect to get an immediate win this of this magnitude :-) |
|||
| msg335992 - (view) | Author: Neil Schemenauer (nascheme) * | Date: 2019-02-19 18:25 | |
Some profiling using 'perf'. This is for cpython 63fa1cfece4912110ce3a0ff11fb3ade3ff5e756. children self [...] + 97.27% 0.00% run_mod (inlined) + 88.53% 6.33% PyObject_SetAttr + 79.34% 6.80% type_setattro + 43.92% 43.92% update_slot + 26.87% 5.84% _PyObject_GenericSetAttrWithDict + 14.62% 6.52% insertdict + 8.80% 8.80% lookdict_unicode_nodummy + 6.57% 0.00% _Py_DECREF (inlined) + 5.19% 5.19% PyUnicode_InternInPlace + 3.57% 3.57% _PyObjectDict_SetItem + 3.38% 3.38% _PyType_Lookup + 1.71% 0.00% _Py_INCREF (inlined) + 1.42% 0.00% _Py_XDECREF (inlined) [...] After applying PR 11907: children self [...] + 94.76% 0.00% run_mod (inlined) + 75.22% 6.77% PyObject_SetAttr + 64.65% 13.08% type_setattro + 47.51% 11.96% _PyObject_GenericSetAttrWithDict + 22.99% 13.95% insertdict + 10.10% 10.10% lookdict_unicode_nodummy + 9.47% 9.47% PyUnicode_InternInPlace + 7.10% 7.10% _PyObjectDict_SetItem + 7.02% 7.02% _PyType_Lookup + 6.52% 0.00% _Py_DECREF (inlined) + 3.17% 0.00% _Py_INCREF (inlined) + 3.16% 0.00% _Py_XDECREF (inlined) + 2.59% 0.00% PyDict_SetItem (inlined) + 1.50% 0.00% is_dunder_name (inlined) [...] The PyUnicode_InternInPlace() can mostly be eliminated by testing PyUnicode_CHECK_INTERNED() first (we already have called PyUnicode_Check() on it). That only gives a 7% speedup on my machine though. The is_dunder_name() is a much bigger optimization. |
|||
| msg335995 - (view) | Author: Neil Schemenauer (nascheme) * | Date: 2019-02-19 18:35 | |
BTW, 'perf report [...]' has a really neat annotated assembly view. Scroll to the function you are interested in and press 'a'. Press 't' to toggle the time units (left side numbers). I'm attaching a screenshot of the disassembly of the lookdict function. The time units are sample accounts. Each count is a point where the profiler woke up on that specific instruction. |
|||
| msg336031 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2019-02-20 00:45 | |
Thanks Neil. The tooling does indeed look nice. I added your PyUnicode_InternInPlace() suggestion to the PR. At this point, the PR looks ready-to-go unless any of you think we've missed some low-hanging fruit. |
|||
| msg336129 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2019-02-20 17:29 | |
New changeset d8b9e1fc2e45d2bc3f4a9737c375f2adb8a8c7de by Raymond Hettinger (Stefan Behnel) in branch 'master': bpo-36012: Avoid linear slot search for non-dunder methods (GH-11907) https://github.com/python/cpython/commit/d8b9e1fc2e45d2bc3f4a9737c375f2adb8a8c7de |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2019-02-20 17:29:55 | rhettinger | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2019-02-20 17:29:26 | rhettinger | set | messages: + msg336129 |
| 2019-02-20 00:45:18 | rhettinger | set | messages: + msg336031 |
| 2019-02-19 18:35:45 | nascheme | set | files:
+ perf-screenshot.png messages: + msg335995 |
| 2019-02-19 18:25:57 | nascheme | set | messages: + msg335992 |
| 2019-02-19 11:27:22 | vstinner | set | nosy:
- vstinner |
| 2019-02-18 01:32:29 | rhettinger | set | messages: + msg335795 |
| 2019-02-18 00:59:21 | pablogsal | set | messages: + msg335794 |
| 2019-02-17 14:36:15 | scoder | set | nosy:
+ scoder messages: + msg335776 |
| 2019-02-17 14:31:39 | scoder | set | keywords:
+ patch stage: patch review pull_requests: + pull_request11932 |
| 2019-02-17 08:50:24 | rhettinger | set | messages: + msg335763 |
| 2019-02-17 02:53:16 | Yasser Alshalaan | set | nosy:
+ Yasser Alshalaan messages: + msg335747 |
| 2019-02-16 23:34:44 | pablogsal | set | messages: + msg335733 |
| 2019-02-16 20:15:18 | rhettinger | create | |