Created on 2007-01-15 10:26 by kylotan, last changed 2020-05-18 13:05 by vstinner.
This C code:
#include <Python.h>
int main(int argc, char *argv[])
{
Py_Initialize(); Py_Finalize();
Py_Initialize(); Py_Finalize();
Py_Initialize(); Py_Finalize();
Py_Initialize(); Py_Finalize();
Py_Initialize(); Py_Finalize();
Py_Initialize(); Py_Finalize();
Py_Initialize(); Py_Finalize();
}
Produces this output:
[7438 refs]
[7499 refs]
[7550 refs]
[7601 refs]
[7652 refs]
[7703 refs]
[7754 refs]
A similar program configured to call the Py_Initialize()/Py_Finalize() 1000 times ends up with:
...
[58295 refs]
[58346 refs]
[58397 refs]
This is with a fresh debug build of Python 2.5.0 on Windows XP, using Visual C++ 2003.
Does the title of this issue accurately reflect the current status of the Python interpreter?
Yes, some objects are not cleaned in finalization. This is not a problem in usual cases though, when the interpreter is started only once.
> Does the title of this issue accurately reflect the current status of the Python interpreter? Yes, here is the running result on current 3.3 latest code: [37182 refs] [39415 refs] [41607 refs] [43799 refs] [45991 refs] [48183 refs] [50375 refs] This seems to be a known bug that Py_Finalize() doesn't free all objects according doc http://docs.python.org/dev/c-api/init.html?highlight=py_finalize#Py_Finalize
Interestingly enough, some of the leaked memory came from the finalize routine itself! Here's one example:
0:004> !heap -p -a 0x000000DB144346F0
address 000000db144346f0 found in
_HEAP @ db0cae0000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
000000db14434690 030a 0000 [00] 000000db144346c0 03074 - (busy)
7ffc55628b04 ntdll!RtlpCallInterceptRoutine+0x0000000000000040
7ffc555f9f36 ntdll!RtlAllocateHeap+0x0000000000079836
7ffc2a60c4da ucrtbased!calloc_base+0x000000000000123a
7ffc2a60c27d ucrtbased!calloc_base+0x0000000000000fdd
7ffc2a60f34f ucrtbased!malloc_dbg+0x000000000000002f
7ffc2a60fdde ucrtbased!malloc+0x000000000000001e
5a5e6ef9 python36_d!_PyMem_RawMalloc+0x0000000000000029
5a5e78c7 python36_d!_PyMem_DebugAlloc+0x0000000000000087
5a5e5e6f python36_d!_PyMem_DebugMalloc+0x000000000000001f
5a5e7230 python36_d!PyMem_Malloc+0x0000000000000030
5a582047 python36_d!new_keys_object+0x0000000000000077
5a57f7c5 python36_d!dictresize+0x0000000000000085
5a57a4b2 python36_d!PyDict_Merge+0x0000000000000112
5a57bf33 python36_d!PyDict_Update+0x0000000000000023
5a75fb1d python36_d!PyImport_Cleanup+0x000000000000045d
5a778f9e python36_d!Py_Finalize+0x000000000000005e
I tested on the master branch of Python:
---
#include <Python.h>
void func()
{
Py_Initialize(); Py_Finalize();
Py_ssize_t cnt = _Py_GetRefTotal();
printf("sys.gettotalrefcount(): %zd\n", cnt);
}
int main(int argc, char *argv[])
{
Py_SetProgramName(L"./_testembed");
for (int i=0; i < 10; i++) {
func();
}
}
---
Each iteration leaks around 5,000 Python objects:
---
sys.gettotalrefcount(): 15113
sys.gettotalrefcount(): 19527
sys.gettotalrefcount(): 23941
sys.gettotalrefcount(): 28355
sys.gettotalrefcount(): 32769
sys.gettotalrefcount(): 37183
sys.gettotalrefcount(): 41597
sys.gettotalrefcount(): 46011
sys.gettotalrefcount(): 50425
sys.gettotalrefcount(): 54839
---
I marked bpo-6741 as a duplicate of this issue.
I marked bpo-26888 as a duplicate of this issue.
I marked bpo-21387 as a duplicate of this issue.
One part of this issue is that all C extensions of the stdlib should be updated to implement the PEP 489 "Multi-phase extension module initialization".
I marked bpo-32026 as a duplicate of this issue.
One part of this issue is that all C extensions of the stdlib should be updated to implement the PEP 489 "Multi-phase extension module initialization". > I try to port _json extension module to multiphase initialization module, but the baseline(using victor's code) in my vm not changed~
Compare to _Py_ForgetReference(), _Py_INC_REFTOTAL in _Py_NewReference() looks redundant. REF: https://github.com/python/cpython/blob/master/Include/object.h#L442 master brach baseline in my vm: ``` sys.gettotalrefcount(): 18049 sys.gettotalrefcount(): 22463 ``` after PR17883 ``` sys.gettotalrefcount(): 17589 sys.gettotalrefcount(): 22000 ```
FWIW, i counted the difference of each file's refs after `Py_Finalize()`.
[('Objects/dictobject.c', 21434), ('Python/marshal.c', 8135), ('Objects/codeobject.c', 6245), ('Objects/listobject.c', 6037), ('Objects/tupleobject.c', 4169), ('Objects/boolobject.c', 2433), ('Objects/object.c', 2364), ('Objects/unicodeobject.c', 1541), ('Objects/longobject.c', 1387), ('Objects/funcobject.c', 528), ('Objects/classobject.c', 528), ('Objects/abstract.c', 463), ('Python/structmember.c', 369), ('./Include/objimpl.h', 277), ('Objects/stringlib/partition.h', 273), ('Python/import.c', 259), ('Python/codecs.c', 197), ('./Modules/signalmodule.c', 61), ('./Modules/_threadmodule.c', 59), ('Objects/exceptions.c', 15), ('Objects/bytesobject.c', 5), ('./Modules/_weakref.c', 4), ('Python/_warnings.c', 3), ('./Modules/timemodule.c', 1), ('./Modules/_codecsmodule.c', 1), ('Objects/bytearrayobject.c', 1), ('Python/compile.c', 1), ('Objects/sliceobject.c', 0), ('Objects/memoryobject.c', 0), ('Python/context.c', -1), ('Objects/clinic/longobject.c.h', -1), ('Objects/enumobject.c', -1), ('Modules/gcmodule.c', -1), ('Objects/namespaceobject.c', -1), ('Objects/stringlib/unicode_format.h', -2), ('Objects/rangeobject.c', -3), ('Python/pystate.c', -4), ('Objects/fileobject.c', -14), ('./Modules/_io/clinic/bufferedio.c.h', -17), ('./Modules/_io/iobase.c', -21), ('Python/modsupport.c', -28), ('./Modules/_io/fileio.c', -28), ('Python/pylifecycle.c', -37), ('./Modules/_io/textio.c', -39), ('Objects/genobject.c', -53), ('Objects/weakrefobject.c', -54), ('./Modules/_io/bufferedio.c', -56), ('./Python/sysmodule.c', -68), ('./Modules/_io/_iomodule.c', -82), ('Python/errors.c', -90), ('Objects/descrobject.c', -110), ('Objects/structseq.c', -113), ('Python/bltinmodule.c', -118), ('Objects/setobject.c', -339), ('Objects/moduleobject.c', -454), ('./Modules/posixmodule.c', -614), ('./Modules/_abc.c', -664), ('Objects/call.c', -755), ('Objects/typeobject.c', -2035), ('Objects/frameobject.c', -6538), ('Python/ceval.c', -7857), ('./Include/object.h', -48292)]
New changeset ed154c387efc5f978ec97900ec9e0ec6631d5498 by Victor Stinner (Hai Shi) in branch 'master': bpo-1635741: Port _json extension module to multiphase initialization (PEP 489) (GH-17835) https://github.com/python/cpython/commit/ed154c387efc5f978ec97900ec9e0ec6631d5498
i thinkt that not checking `PyModule_AddObject()`'s result may cause this probleam too. 1) python-ast.c have one question, i fix it in PR18358. 2) most of the questions in extension module, for example: https://github.com/python/cpython/blob/master/Modules/gcmodule.c#L2019-L2022
update the above info: 1) python-ast.c have one question, i fix it in PR18365.
> 1) python-ast.c have one question, i fix it in PR18365. > 2) most of the questions in extension module, for example: https://github.com/python/cpython/blob/master/Modules/gcmodule.c#L2019-L2022 brandt does relevant work already in PR17276、PR38823.
New changeset 1ea45ae257971ee7b648e3b031603a31fc059f81 by Hai Shi in branch 'master': bpo-1635741: Port _codecs extension module to multiphase initialization (PEP 489) (GH-18065) https://github.com/python/cpython/commit/1ea45ae257971ee7b648e3b031603a31fc059f81
Leave a note for myself: I check the remaining object roughly(though dump_refs function), most of remaining object is 'str', such as: '0x7f779cf88880 [13] str'->'0x7f779cf88880 [26] str' So far, I don't know which file and fileno create those object. MAYBE I need find a hack way to sign this mallocing operation?(not sure)
New changeset b2b6e27bcab44e914d0a0b170e915d6f1604a76d by Hai Shi in branch 'master': bpo-1635741: Port _crypt extension module to multiphase initialization (PEP 489) (GH-18404) https://github.com/python/cpython/commit/b2b6e27bcab44e914d0a0b170e915d6f1604a76d
New changeset 7d7956833cc37a9d42807cbfeb7dcc041970f579 by Hai Shi in branch 'master': bpo-1635741: Port _contextvars module to multiphase initialization (PEP 489) (GH-18374) https://github.com/python/cpython/commit/7d7956833cc37a9d42807cbfeb7dcc041970f579
New changeset 4c1b6a6f4fc46add0097efb3026cf3f0c89f88a2 by Hai Shi in branch 'master': bpo-1635741: Port _abc extension to multiphase initialization (PEP 489) (GH-18030) https://github.com/python/cpython/commit/4c1b6a6f4fc46add0097efb3026cf3f0c89f88a2
New changeset 5d38517aa1836542a5417b724c093bcb245f0f47 by Hai Shi in branch 'master': bpo-1635741: Port _bz2 extension module to multiphase initialization(PEP 489) (GH-18050) https://github.com/python/cpython/commit/5d38517aa1836542a5417b724c093bcb245f0f47
New changeset a158168a787e82c4b7b18f6833153188e93627a5 by Hai Shi in branch 'master': bpo-1635741: Port _locale extension module to multiphase initialization (PEP 489) (GH-18358) https://github.com/python/cpython/commit/a158168a787e82c4b7b18f6833153188e93627a5
New changeset 41fbf865a35d4fb64f047f98dc24690cb0c170fd by Hai Shi in branch 'master': bpo-1635741: Port audioop extension module to multiphase initialization (PEP 489) (GH-18608) https://github.com/python/cpython/commit/41fbf865a35d4fb64f047f98dc24690cb0c170fd
New changeset aa0c0808efbfdee813d2829e49030c667da44e72 by Hai Shi in branch 'master': bpo-1635741: Fix potential refleaks in binascii module (GH-18613) https://github.com/python/cpython/commit/aa0c0808efbfdee813d2829e49030c667da44e72
Thanks Hai Shi for your 3 latest PRs, I merged them.
New changeset 196f1eb6adcfc6a7239330ef508b8bf9dff9940f by Hai Shi in branch 'master': bpo-1635741: Fix refleaks of time module error handling (GH-18486) https://github.com/python/cpython/commit/196f1eb6adcfc6a7239330ef508b8bf9dff9940f
hundreds of encoding names can not be released in Py_Finalize(). for example: ``` 0x7ff482f589e0 [1] 'iso_8859_1_1987' 0x7ff482f58970 [1] 'iso_8859_1' ``` --> ``` 0x7ff482f589e0 [2] 'iso_8859_1_1987' 0x7ff482f58970 [2] 'iso_8859_1' ```
New changeset 356c878fbf2a97aa3ab7951fd7456d219ff0b466 by Dong-hee Na in branch 'master': bpo-1635741: Port _statistics module to multiphase initialization (GH-19015) https://github.com/python/cpython/commit/356c878fbf2a97aa3ab7951fd7456d219ff0b466
New changeset 2037502613471a0a0a0262085cc50adb378ebbad by Hai Shi in branch 'master': bpo-1635741: Port _ctypes_test extension to multiphase initialization (PEP 489) (GH-19012) https://github.com/python/cpython/commit/2037502613471a0a0a0262085cc50adb378ebbad
New changeset 514c469719f149e1722a91a9d0c63bf89dfefb2a by Dong-hee Na in branch 'master': bpo-1635741: Port itertools module to multiphase initialization (GH-19044) https://github.com/python/cpython/commit/514c469719f149e1722a91a9d0c63bf89dfefb2a
New changeset 4657a8a0d006c76699ba3d1d4d21a04860bb2586 by Dong-hee Na in branch 'master': bpo-1635741: Port _heapq module to multiphase initialization (GH19057) https://github.com/python/cpython/commit/4657a8a0d006c76699ba3d1d4d21a04860bb2586
New changeset 77248a28896d39cae0a7e084965b9ffc2624b7f4 by Dong-hee Na in branch 'master': bpo-1635741: Port _collections module to multiphase initialization (GH-19074) https://github.com/python/cpython/commit/77248a28896d39cae0a7e084965b9ffc2624b7f4
New changeset 8334f30a74abcf7e469b901afc307887aa85a888 by Hai Shi in branch 'master': bpo-1635741: Port _weakref extension module to multiphase initialization (PEP 489) (GH-19084) https://github.com/python/cpython/commit/8334f30a74abcf7e469b901afc307887aa85a888
About half of the remaining refs are related to encodings. I noticed that caches on Lib/encodings/__init__.py and codec_search_cach of PyInterpreterState are the places holding the refs. I removed those caches and number went do to: Before: 4382 refs left After : 2344 refs left (-46%) The way to destroy codec_search_cache was recently changed on #36854 and $38962. (Not proposing to merge this, but my changes are at https://github.com/python/cpython/compare/master...phsilva:remove-codec-caches).
> I noticed that caches on Lib/encodings/__init__.py and codec_search_cach of PyInterpreterState are the places holding the refs. I removed those caches and number went do to. Good Catch, Paulo. IMHO, caches is useful in codecs(it's improve the search efficiency). I have two humble idea: 1. Clean all item of codec_search_xxx in `Py_Finalize()`; 2. change the refcount mechanism(in this case, refcount+1 or refcount+2 make no differenct);
The last merged pull request, GH-GH-19084, causes refleaks in importlib tests. Stable buildbots are failing, I can reproduce on macOS Catalina. You can test yourself by running: $ ./python.exe -E -Wd -m test -uall,-gui -l -L -R: test_importlib Master at 2de7ac9798 does not fail while the next commit, 8334f30a74, introduces the failure.
> The last merged pull request, GH-GH-19084, causes refleaks in importlib tests. Stable buildbots are failing, I can reproduce on macOS Catalina. thanks, Łukasz. I catched this problem in my vm of centos too. I don't the broken reason temporarily.
New changeset bd409bb5b78e7ccac5fcda9ab4cec770552f3090 by Paulo Henrique Silva in branch 'master': bpo-1635741: Port time module to multiphase initialization (PEP 489) (GH-19107) https://github.com/python/cpython/commit/bd409bb5b78e7ccac5fcda9ab4cec770552f3090
> The last merged pull request, GH-GH-19084, causes refleaks in importlib tests. Stable buildbots are failing, I can reproduce on macOS Catalina. I expect that the bug is non-trivial, so I prefer to open a separated issue: bpo-40050 "test_importlib leaked [6303, 6299, 6303] references".
New changeset 188078c39dec24aa5b3f2073bdc9a68ebaae42de by Victor Stinner in branch 'master': Revert "bpo-1635741: Port _weakref extension module to multiphase initialization (PEP 489) (GH-19084)" (#19128) https://github.com/python/cpython/commit/188078c39dec24aa5b3f2073bdc9a68ebaae42de
New changeset 93460d097f50db0870161a63911d61ce3c5f4583 by Victor Stinner in branch 'master': bpo-1635741: Port _weakref extension module to multiphase initialization (PEP 489) (GH-19140) https://github.com/python/cpython/commit/93460d097f50db0870161a63911d61ce3c5f4583
I managed to identify bpo-40050 (test_importlib reference leak) root issue and to fix it, so I reapplied Hai Shi's change for _weakref.
Updating on my findings on msg364833. It looks like encodings module is not being destoyed at all and keeping all the encoding refs alive. Looks like some cycle but I am not sure yet how to solve it. To validate this, I: - removed codec_search_cach of PyInterpreterState. - Py_DECREFd(encodings) after loading it on codecs.c. Before: 4376 refs left (37fcbb65d4) After : 352 refs left (-92%) I've updated the changes at https://github.com/python/cpython/compare/master...phsilva:remove-codec-caches (not a proposed patch, just to validate the idea)
New changeset f3d5ac47720045a72f7ef5af13046d9531e6007b by Paulo Henrique Silva in branch 'master': bpo-1635741: Port operator module to multiphase initialization (PEP 489) (GH-19150) https://github.com/python/cpython/commit/f3d5ac47720045a72f7ef5af13046d9531e6007b
New changeset 7dd549eb08939e1927fba818116f5202e76f8d73 by Paulo Henrique Silva in branch 'master': bpo-1635741: Port _functools module to multiphase initialization (PEP 489) (GH-19151) https://github.com/python/cpython/commit/7dd549eb08939e1927fba818116f5202e76f8d73
Hum, some clarification is needed here. "Port xxx extension module to multiphase initialization (PEP 489)" changes are helping to fix "Py_Finalize() doesn't clear all Python objects at exit", but alone they don't fix all issues. -- For example, if a module still uses globals using "static ..." in C, these globals will not be cleared magically. Example with _datetimemodule.c: static PyObject *us_per_hour = NULL; /* 1e6 * 3600 as Python int */ static PyObject *us_per_day = NULL; /* 1e6 * 3600 * 24 as Python int */ static PyObject *us_per_week = NULL; /* 1e6*3600*24*7 as Python int */ These variables initialized once in PyInit__datetime(): us_per_hour = PyLong_FromDouble(3600000000.0); us_per_day = PyLong_FromDouble(86400000000.0); us_per_week = PyLong_FromDouble(604800000000.0); Converting the module to multiphase initialization will not magically clear these variables at exit. The _datetime module should be modified to store these variables in a module state: this module could be cleared at exit. The binascii is a good example: it has a module state, traverse, clear and free methods, and it uses the multiphase initialization. This module can be fully unloaded at exit. It's a "simple" module: it doesn't define types for example. -- Another issue is that converting a module to the multiphase initialization doesn't magically fully isolate two instances of the module. For exmaple, the _abc module still uses a type defined statically: static PyTypeObject _abc_data_type = { PyVarObject_HEAD_INIT(NULL, 0) "_abc_data", /*tp_name*/ sizeof(_abc_data), /*tp_basicsize*/ .tp_dealloc = (destructor)abc_data_dealloc, .tp_flags = Py_TPFLAGS_DEFAULT, .tp_alloc = PyType_GenericAlloc, .tp_new = abc_data_new, }; Example: vstinner@apu$ ./python Python 3.9.0a5+ (heads/pr/19122:0ac3031a80, Mar 25 2020, 02:25:19) >>> import _abc >>> class Bla: pass ... >>> _abc._abc_init(Bla) >>> type(Bla._abc_impl) <class '_abc_data'> # load a second instance of the module >>> import sys; del sys.modules['_abc'] >>> import _abc as _abc2 >>> class Bla2: pass ... >>> _abc._abc_init(Bla2) >>> type(Bla2._abc_impl) <class '_abc_data'> # _abc and _abc2 have exactly the same type, # they are not fully isolated >>> type(Bla2._abc_impl) is type(Bla._abc_impl) True That's more an issue for subinterpreters: each interpreter should have its own fully isolated instance of an C extension module.
Thanks for the clarifications. I will keep looking for simple modules, no state and easy to migrate but also dedicate more time to work on the more complex like datetime. I'm working on PR19122 corrections.
> Thanks for the clarifications. I will keep looking for simple modules, no state and easy to migrate but also dedicate more time to work on the more complex like datetime. I'm working on PR19122 corrections. I like changes which convert C extension modules to multiphase initialization API since they fix the error path: they implicitly ensures that the module is properly destroyed if something goes wrong. Moreover, it will ease the work to fix the other issues that I listed.
Sorry for the noise, but I just wanted to say thanks to the people working on this issue 13 years after I reported it. :) Far too many open-source projects arbitrarily close bugs just because they don't have time to fix them and they never get fixed, so I'm glad this wasn't the case here.
>Sorry for the noise, but I just wanted to say thanks to the people working on this issue 13 years after I reported it. :) Far too many open-source projects arbitrarily close bugs just because they don't have time to fix them and they never get fixed, so I'm glad this wasn't the case here. cpython is a big family ;)
> bpo-1635741: Port _functools module to multiphase initialization (PEP 489) (GH-19151) > https://github.com/python/cpython/commit/7dd549eb08939e1927fba818116f5202e76f8d73 This change introduced a regression: bpo-40071 "test__xxsubinterpreters leaked [1, 1, 1] references: test_ids_global()".
New changeset 1cb763b8808745b9a368c1158fda19d329f63f6f by Dong-hee Na in branch 'master': bpo-1635741: Port _uuid module to multiphase initialization (GH-19242) https://github.com/python/cpython/commit/1cb763b8808745b9a368c1158fda19d329f63f6f
New changeset 5be8241392453751beea21d2e32096c15a8d47db by Dong-hee Na in branch 'master': bpo-1635741: Port math module to multiphase initialization (GH-19243) https://github.com/python/cpython/commit/5be8241392453751beea21d2e32096c15a8d47db
I created bpo-40137: TODO list when PEP 573 "Module State Access from C Extension Methods" will be implemented. It tracks code that should be fixed once PEP 573 will be implemented, like _functools and _abc modules.
New changeset 45f7008a66a30cdf749ec03e580bd2692be9a8df by Hai Shi in branch 'master': bpo-1635741: Port resource extension module to multiphase initialization (PEP 489) (GH-19252) https://github.com/python/cpython/commit/45f7008a66a30cdf749ec03e580bd2692be9a8df
New changeset 7a6f3bcc43ed729f8038524528c0b326b5610506 by Hai Shi in branch 'master': bpo-1635741: Fix refleak in _locale init error handling (GH-19307) https://github.com/python/cpython/commit/7a6f3bcc43ed729f8038524528c0b326b5610506
New changeset 84724dd239c30043616487812f6a710b1d70cd4b by Dong-hee Na in branch 'master': bpo-1635741: Port _stat module to multiphase initialization (GH-19798) https://github.com/python/cpython/commit/84724dd239c30043616487812f6a710b1d70cd4b
New changeset b66c0ff8af0c1a4adc6908897b2d05afc78cc27e by Victor Stinner in branch 'master': bpo-1635741: Fix compiler warning in _stat.c (GH-19822) https://github.com/python/cpython/commit/b66c0ff8af0c1a4adc6908897b2d05afc78cc27e
New changeset 92a98ed97513c6e365ce8765550ea65d0ddc8cd7 by Dong-hee Na in branch 'master': bpo-1635741: Port syslog module to multiphase initialization (GH-19907) https://github.com/python/cpython/commit/92a98ed97513c6e365ce8765550ea65d0ddc8cd7
New changeset 3466922320d54a922cfe6d6d44e89e1cea4023ef by Dong-hee Na in branch 'master': bpo-1635741: Port errno module to multiphase initialization (GH-19923) https://github.com/python/cpython/commit/3466922320d54a922cfe6d6d44e89e1cea4023ef
versions: + Python 3.9, - Python 3.1, Python 2.7, Python 3.2
messages:
+ msg110895
versions:
+ Python 3.1, Python 2.7, Python 3.2, - Python 2.6, Python 3.0