Created on 2019-07-02 12:14 by jdemeyer, last changed 2020-11-01 13:28 by corona10. This issue is now closed.
As discussed in https://mail.python.org/archives/list/capi-sig@python.org/thread/X6HJOEX6RRFLNKAQSQR6HLD7K4QZ4OTZ/ it would be convenient to have a function PyObject_CallOneArg() for making calls with exactly 1 positional argument and no keyword arguments. Such calls are very common. This would be analogous to PyObject_CallNoArgs().
What do you think about macro like this?
#define _PyObject_CALL_WITH_ARGS(func, ...) \
_PyObject_Vectorcall(func, (PyObject*[]){NULL, __VA_ARGS__} + 1, \
sizeof((PyObject*[]){__VA_ARGS__})/sizeof(PyObject*) | PY_VECTORCALL_ARGUMENTS_OFFSET, \
NULL)
Pros: it can be used for 2 or 3 arguments too.
Cons: readability...
Variadic macros are not part of C89, so that would require changing PEP 7.
> Cons: readability... It's a single relatively short macro. I'm not worried about that. I'm more about compiler support for such macros. I'll make a PR with this idea and see what CI says.
> Variadic macros are not part of C89, so that would require changing PEP 7. PEP 7 uses C99 since Python 3.6: https://www.python.org/dev/peps/pep-0007/#c-dialect > What do you think about macro like this? Macros cannot be used in all C extensions. For the PEP 587, I was asked to replace macros with functions. I would prefer a PyObject_CallOneArg() function than yet another ugly and dangerous macro. Sometimes, a regular function is better for stack consumption: it doesn't increase the stack consumption in the call site, stack memory is "released" once PyObject_CallOneArg() exit. In the past, I had to disable inlining in some functions to reduce the stack consumption. (I'm not sure if this optimization is still around.)
AFAIK, gcc, clang, and MSVC support it. Another cons is, general pitfall of macro: _PyObject_CALL_WITH_ARGS(func, PyDict_GetItem(d, key)); // PyDict_GetItem(d, key) is called twice. If two or more arguments are not common, I prefer _PyObject_CallOneArg to macro.
> _PyObject_CALL_WITH_ARGS(func, PyDict_GetItem(d, key)); // PyDict_GetItem(d, key) is called twice. That's pretty bad and in my opinion a good reason to reject this idea. > If two or more arguments are not common, I prefer _PyObject_CallOneArg to macro. I posted some counts at https://mail.python.org/archives/list/capi-sig@python.org/message/3I76R27YMFSKB5JQIM3E4NA3BECVTZHP/
> _PyObject_CALL_WITH_ARGS(func, PyDict_GetItem(d, key)); // PyDict_GetItem(d, key) is called twice. Actually, it's not a problem: sizeof() is special, it only looks at the type of its argument, it doesn't execute the code.
> PEP 7 uses C99 since Python 3.6: https://www.python.org/dev/peps/pep-0007/#c-dialect That's not what the PEP says: "Python versions greater than or equal to 3.6 use C89 with several select C99 features" "several select C99 features" is not the same of using C99. Plus, I'm also worried about C++. We also support compiling C++ extensions with the same header files, so it must be C++ compatible also.
> Macros cannot be used in all C extensions. For the PEP 587, I was asked to replace macros with functions. This is just an helper inline function / macro to ease calling _PyObject_Vectorcall. Extensions can port this helper inline function / macro to ease calling _PyObject_Vectorcall. I don't want to make this non-inline function.
The motivation for this PR is "it would be convenient to have this function". This is probably true, but generally I would not change 47 files at once. Most of the locations are probably not performance sensitive.
This change doesn't make caller code complicated. It makes caller little simpler. Choosing performance sensitive code is much hard than replace all occurrences. So I'm OK to change all callers except code owner opposed at once.
Exactly. I see no reason to prefer PyObject_CallFunctionObjArgs(func, arg, NULL) over _PyObject_CallOneArg(func, arg)
It adds yet another special case underscore function that one cannot use in external projects. So I would not say that is simpler. Has there been any performance measurement at all?
> It adds yet another special case underscore function that one cannot use in external projects. So I would not say that is simpler. If you're worried about the underscore, I will make a separate PR to add a non-underscored version, similar to PyObject_CallNoArgs()
> Exactly. I see no reason to prefer PyObject_CallFunctionObjArgs(func, arg, NULL) over _PyObject_CallOneArg(func, arg) In this case, maybe the whole idea of _PyObject_CallOneArg() is not worth it? Is there any benchmark showing if it's faster or use less stack?
> It adds yet another special case underscore function that one cannot use in external projects. So I would not say that is simpler. I don't get what you mean. Do you care about *adding* API with underscore? If so, it doesn't make caller code complex. It can not be a reason to reject changing many files. Or do you care about *using* API with underscore? If so, I'm OK to stop changing some callers which are not tightly coupled with Python. (e.g. sqlite)
> Or do you care about *using* API with underscore? If so, I'm OK to stop changing some callers which are not tightly coupled with Python. I care about this one. Indeed I think underscore functions should be used in strategic places inside the core interpreter and not sprinkled across the whole code base by default. But in general I also care about not having too many new functions to avoid churn.
> Is there any benchmark showing if it's faster
Here is one example:
class D(dict):
def __missing__(self, key):
return None
d = D()
and now benchmark d[0]
**before**: Mean +- std dev: 173 ns +- 1 ns
**after**: Mean +- std dev: 162 ns +- 1 ns
To be precise, I ran: ./python -m perf timeit --duplicate 200 -s 'class D(dict):' -s ' def __missing__(self, key):' -s ' return None' -s 'd = D()' 'd[0]'
Stefan: I used an underscore by analogy with PyObject_CallNoArgs()/_PyObject_CallNoArg(), where the first is in the limited API and the second is an inline function in the cpython API. But maybe we could revisit that decision.
I don't want to add many functions to limited/public APIs. Note that public APIs defined outside of cpython/ are considered mandatory for all Python implementations supporting Python/C API. When portability is more important than performance, there are many APIs already. Please keep _PyObject_CallOneArg() as just an easy wrapper of _PyObject_Vectorcall which is CPython specific in this issue. Candidate of revert: _csv, _decimal, _sqlite, pyexpat, _testbuffer, _xxtestfuzz They doesn't use much private APIs, or they are tests which is useful for other interpreters.
Victor, what's your opinion on adding PyObject_CallOneArg() to the limited API?
Test of stack usage:
from _testcapi import stack_pointer
class D(dict):
def __missing__(self, key):
sp = stack_pointer()
print(f"stack usage = {TOP - sp}")
return None
d = D()
TOP = stack_pointer()
d[0]
**before**: stack usage = 1296
**after**: stack usage = 976
New changeset 196a530e00d88a138973bf9182e013937e293f97 by Inada Naoki (Jeroen Demeyer) in branch 'master': bpo-37483: add _PyObject_CallOneArg() function (#14558) https://github.com/python/cpython/commit/196a530e00d88a138973bf9182e013937e293f97
The following commit introduced a reference leak: commit 196a530e00d88a138973bf9182e013937e293f97 Author: Jeroen Demeyer <J.Demeyer@UGent.be> Date: Thu Jul 4 12:31:34 2019 +0200 bpo-37483: add _PyObject_CallOneArg() function (#14558) Example: https://buildbot.python.org/all/#/builders/80/builds/642 test_xml_etree leaked [1, 1, 1] references, sum=3 test_threading leaked [6, 6, 6] references, sum=18 test_atexit leaked [6, 6, 6] references, sum=18 test_compile leaked [1, 1, 1] references, sum=3 test_zipfile leaked [1, 1, 1] references, sum=3 test_tokenize leaked [3, 3, 3] references, sum=9 test_pydoc leaked [1, 1, 1] references, sum=3 test_codecs leaked [43, 43, 43] references, sum=129 test_codecs leaked [11, 11, 11] memory blocks, sum=33 Re-running failed tests in verbose mode test__xxsubinterpreters leaked [136, 136, 136] references, sum=408 test_getargs2 leaked [4, 4, 4] references, sum=12 test_source_encoding leaked [3, 3, 3] references, sum=9 test_capi leaked [4, 4, 4] references, sum=12 test_email leaked [17, 17, 17] references, sum=51 test_xml_etree leaked [1, 1, 1] references, sum=3 test_threading leaked [6, 6, 6] references, sum=18 test_atexit leaked [6, 6, 6] references, sum=18 test_compile leaked [1, 1, 1] references, sum=3 test_zipfile leaked [1, 1, 1] references, sum=3 test_tokenize leaked [3, 3, 3] references, sum=9 test_pydoc leaked [1, 1, 1] references, sum=3 test_codecs leaked [43, 43, 43] references, sum=129 test_codecs leaked [11, 11, 11] memory blocks, sum=33 (...)
INADA-san, Jeroen: hum, you both proposed a similar fix :-) One fix is enough. See my review on PR 14600.
> Jeroen: hum, you both proposed a similar fix :-) It seems that I lost the race ;-) But seriously: if we both independently came up with the same solution, that's a good sign that the solution is correct.
New changeset 6e43d07324ca799118e805751a10a7eff71d5a04 by Inada Naoki (Jeroen Demeyer) in branch 'master': bpo-37483: fix reference leak in _PyCodec_Lookup (GH-14600) https://github.com/python/cpython/commit/6e43d07324ca799118e805751a10a7eff71d5a04
Thanks for the quick fix ;-)
Oh, PyObject_CallOneArg() is missing in the What's New in Python 3.9: https://docs.python.org/dev/whatsnew/3.9.html#id1 Can someone please propose a PR to add it? I reopen the issue.
New changeset 7feb54a6348f6220b27986777786c812f110b53d by Dong-hee Na in branch 'master': bpo-37483: Add PyObject_CallOneArg() in the What's New in Python 3.9 (GH-23062) https://github.com/python/cpython/commit/7feb54a6348f6220b27986777786c812f110b53d
New changeset 0312efcd2bb3d7964dbfe2b4cbd5f5b440aed049 by Miss Skeleton (bot) in branch '3.9': bpo-37483: Add PyObject_CallOneArg() in the What's New in Python 3.9 (GH-23062) https://github.com/python/cpython/commit/0312efcd2bb3d7964dbfe2b4cbd5f5b440aed049