Issue36560
Created on 2019-04-08 14:22 by vstinner, last changed 2020-06-11 14:23 by vstinner.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| test_functools.py | vstinner, 2019-04-08 15:33 | |||
| test_functools2.py | vstinner, 2019-04-08 17:11 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 12743 | closed | vstinner, 2019-04-09 10:55 | |
| PR 12744 | merged | vstinner, 2019-04-09 12:01 | |
| PR 12745 | merged | vstinner, 2019-04-09 12:42 | |
| PR 12747 | merged | vstinner, 2019-04-09 16:01 | |
| PR 12749 | merged | vstinner, 2019-04-09 16:31 | |
| Messages (17) | |||
|---|---|---|---|
| msg339643 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-04-08 14:22 | |
Sometimes, each run of test_functools leaks exactly 1 memory block, even when the whole test is "re-run in verbose mode". Sometimes, it doesn't leak. https://buildbot.python.org/all/#/builders/80/builds/550 test_functools leaked [1, 1, 1] memory blocks, sum=3 Re-running test 'test_functools' in verbose mode test_functools leaked [1, 1, 1] memory blocks, sum=3 Maybe the problem comes from Example on Linux: $ ./python -m test -F -r -j1 -R 3:3 test_functools Using random seed 3891892 Run tests in parallel using 1 child processes 0:00:01 load avg: 2.38 [ 1] test_functools passed beginning 6 repetitions 123456 ...... (...) 0:00:06 load avg: 2.27 [ 6] test_functools passed beginning 6 repetitions 123456 ...... 0:00:07 load avg: 2.27 [ 7/1] test_functools failed beginning 6 repetitions 123456 ...... test_functools leaked [1, 2, 1] memory blocks, sum=4 0:00:08 load avg: 2.27 [ 8/1] test_functools passed beginning 6 repetitions 123456 ...... == Tests result: FAILURE == 7 tests OK. 1 test failed: test_functools Total duration: 8 sec 333 ms Tests result: FAILURE |
|||
| msg339651 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-04-08 15:33 | |
Current bisection progress: I'm able to reproduce a failure with attached test_functools.py (211 lines) using: ./python -m test -F -j5 -R 3:3 test_functools It takes between 4 and 200 runs to reproduce the failure, I don't understand what triggers the bug. I failed to identify if the bug is more likely when the system is idle or when the system is busy. It looks purely random. |
|||
| msg339662 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-04-08 17:11 | |
Some progress. I'm now able to reproduce the issue with attached test_functools2.py (103 lines, a single test method): $ cp (...)test_functools2.py Lib/test/ $ ./configure -C --with-pydebug CFLAGS=-O0 $ make $ ./python -m test -F -j5 -R 3:3 test_functools2 (...) 0:00:02 load avg: 0.95 [ 19/1] test_functools failed beginning 6 repetitions 123456 ...... test_functools leaked [1, 1, 1] memory blocks, sum=3 (...) Tests result: FAILURE I disabled the C accelerator _functools. I'm now trying to disable the C accelerator of abc, but I had a small issue: bpo-36565. |
|||
| msg339663 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-04-08 17:14 | |
The issue comes maybe from the complex code in Lib/test/libregrtest/refleak.py to handle the ABC cache: abcs = {} for abc in [getattr(collections.abc, a) for a in collections.abc.__all__]: if not isabstract(abc): continue for obj in abc.__subclasses__() + [abc]: abcs[obj] = _get_dump(obj)[0] ... # clear type cache sys._clear_type_cache() # Clear ABC registries, restoring previously saved ABC registries. abs_classes = [getattr(collections.abc, a) for a in collections.abc.__all__] abs_classes = filter(isabstract, abs_classes) for abc in abs_classes: for obj in abc.__subclasses__() + [abc]: for ref in abcs.get(obj, set()): if ref() is not None: obj.register(ref()) obj._abc_caches_clear() clear_caches() |
|||
| msg339731 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-04-09 11:20 | |
This issue isn't a real memory leak: if I use -R 3:10 instead of -R 3:3, the test doesn't fail anymore. But the issue is still annoying since it makes Refleaks buildbot workers fail randomly :-/ This issue remembers me the unstable multiprocessing tests: * bpo-33735: test_multiprocessing_spawn leaked [1, 2, 1] memory blocks on AMD64 Windows8.1 Refleaks 3.7 * bpo-33984: test_multiprocessing_forkserver leaked [1, 2, 1] memory blocks on x86 Gentoo Refleaks 3.x Patch to always display memory allocations differences: diff --git a/Lib/test/libregrtest/refleak.py b/Lib/test/libregrtest/refleak.py index d68ea63b5b..997be819fa 100644 --- a/Lib/test/libregrtest/refleak.py +++ b/Lib/test/libregrtest/refleak.py @@ -118,6 +118,8 @@ def dash_R(the_module, test, indirect_test, huntrleaks): print(msg, file=refrep) refrep.flush() failed = True + if not failed: + print(alloc_deltas[nwarmup:]) return failed Truncated output with the patch: vstinner@apu$ ./python -m test -F -r -j1 -R 3:10 test_functools Using random seed 4308771 Run tests in parallel using 1 child processes 0:00:04 load avg: 0.91 [ 1] test_functools passed [0, 1, 2, 0, 0, 0, 0, 0, 0, 0] ... 0:00:13 load avg: 0.92 [ 3] test_functools passed [2, 1, 0, 0, 0, 0, 0, 0, 0, 0] ... 0:00:17 load avg: 0.93 [ 4] test_functools passed [0, 3, 0, 0, 0, 0, 0, 0, 0, 0] ... 0:00:21 load avg: 0.93 [ 5] test_functools passed [0, 1, 0, 0, 2, 0, 0, 0, 0, 0] ... 0:00:26 load avg: 0.93 [ 6] test_functools passed [0, 4, 0, 0, 0, 0, 0, 0, 0, 0] ... 0:00:34 load avg: 0.87 [ 8] test_functools passed [0, 1, 0, 2, 0, 0, 0, 0, 0, 0] ... 0:01:06 load avg: 1.15 [ 15] test_functools passed [0, 1, 0, 2, 0, -1, 1, 0, 0, 0] ... 0:01:10 load avg: 1.46 [ 16] test_functools passed [0, 4, 0, 0, 0, 0, 0, 0, -1, 1] ... The maximum sum() of these list is around 5 on 10 runs: not every run leaks a memory block. It looks more like a internal cache which is "unstable" if you look at the number of allocated memory blocks. |
|||
| msg339732 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-04-09 11:22 | |
I wrote PR 12743 to modify _abc: use weakref.WeakSet rather than using directly a set of weak references. With this PR, calling _get_dump() should clear all broken weak references. I'm not sure if it's really useful for this specific issue, since dash_R_cleanup() calls obj._abc_caches_clear() on all abstract base classes (ABC): all caches must be removed before reading the total number of allocated memory blocks (sys.getallocatedblocks()). |
|||
| msg339733 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2019-04-09 11:28 | |
_abc was written for optimization. Your PR makes it using slower Python code. Could you please test how much this hits the performance of abc? |
|||
| msg339735 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-04-09 12:05 | |
If I modify libregrtest with the following patch: diff --git a/Lib/test/libregrtest/refleak.py b/Lib/test/libregrtest/refleak.py index 0bb8a0a2bf..f0225a9768 100644 --- a/Lib/test/libregrtest/refleak.py +++ b/Lib/test/libregrtest/refleak.py @@ -128,7 +128,7 @@ def dash_R(ns, the_module, test_name, test_func): failed = False for deltas, item_name, checker in [ (rc_deltas, 'references', check_rc_deltas), - (alloc_deltas, 'memory blocks', check_rc_deltas), + (alloc_deltas, 'memory blocks', check_fd_deltas), (fd_deltas, 'file descriptors', check_fd_deltas) ]: # ignore warmup runs And I add the following file Lib/test/test_noop.py: import unittest class NoopTests(unittest.TestCase): def test_noop(self): pass regrtest detects a "leak": $ ./python -m test -R 3:3 test_noop Run tests sequentially 0:00:00 load avg: 0.55 [1/1] test_noop beginning 6 repetitions 123456 ...... test_noop leaked [0, 1, 0] memory blocks, sum=1 test_noop failed == Tests result: FAILURE == 1 test failed: test_noop Total duration: 113 ms Tests result: FAILURE The issue comes from this look in Lib/test/libregrtest/refleak.py: for i in range(repcount): indirect_test() alloc_after, rc_after, fd_after = dash_R_cleanup(fs, ps, pic, zdc, abcs) print('.', end='', file=sys.stderr, flush=True) if i >= nwarmup: rc_deltas[i] = get_pooled_int(rc_after - rc_before) alloc_deltas[i] = get_pooled_int(alloc_after - alloc_before) fd_deltas[i] = get_pooled_int(fd_after - fd_before) alloc_before = alloc_after rc_before = rc_after fd_before = fd_after Because of "if i >= nwarmup:", get_pooled_int() isn't call during "warmup", whereas the purpose of the warmup is to warmup *everything*. Maybe get_pooled_int() allocates one frame object and keeps it alive in its "zombi frame". Maybe something else is allocated and kept alive. Anything, removing "if i >= nwarmup:" to always compute deltas fix this specific issue. Attached PR 12744 fix this bug. |
|||
| msg339738 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-04-09 12:23 | |
New changeset 5aaac94eeb44697e92b0951385cd557bc27e0f6a by Victor Stinner in branch 'master': bpo-36560: Fix reference leak hunting in regrtest (GH-12744) https://github.com/python/cpython/commit/5aaac94eeb44697e92b0951385cd557bc27e0f6a |
|||
| msg339768 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-04-09 16:01 | |
New changeset 9c14061a2c2df9a9b84d0aab190a50c24a0d52f4 by Victor Stinner in branch '2.7': bpo-36560: Fix reference leak hunting in regrtest (GH-12744) (GH-12745) https://github.com/python/cpython/commit/9c14061a2c2df9a9b84d0aab190a50c24a0d52f4 |
|||
| msg339772 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-04-09 16:26 | |
New changeset bb4447897a5f141eecf42987a1191a3330c5d7ed by Victor Stinner in branch 'master': bpo-36560: regrtest: don't collect the GC twice (GH-12747) https://github.com/python/cpython/commit/bb4447897a5f141eecf42987a1191a3330c5d7ed |
|||
| msg339777 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-04-09 16:43 | |
> _abc was written for optimization. Your PR makes it using slower Python code. Could you please test how much this hits the performance of abc? I mostly wrote PR 12743 to debug this issue. Please see the discussion directly on the PR ;-) |
|||
| msg339781 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-04-09 16:56 | |
New changeset 86f0354fcb815312295b923c55e39364d85d0388 by Victor Stinner in branch '3.7': [3.7] bpo-36560: regrtest: don't collect the GC twice (GH-12747) (GH-12749) https://github.com/python/cpython/commit/86f0354fcb815312295b923c55e39364d85d0388 |
|||
| msg339965 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-04-11 11:21 | |
Patch making check on memory block leaks stricter: diff --git a/Lib/test/libregrtest/refleak.py b/Lib/test/libregrtest/refleak.py index 235d6bfd3a..dfadabdef6 100644 --- a/Lib/test/libregrtest/refleak.py +++ b/Lib/test/libregrtest/refleak.py @@ -130,7 +130,7 @@ def dash_R(ns, the_module, test_name, test_func): failed = False for deltas, item_name, checker in [ (rc_deltas, 'references', check_rc_deltas), - (alloc_deltas, 'memory blocks', check_rc_deltas), + (alloc_deltas, 'memory blocks', check_fd_deltas), (fd_deltas, 'file descriptors', check_fd_deltas) ]: # ignore warmup runs Using this patch, at least the following tests fail: * test_asyncio * test_code * test_collections * test_contextlib * test_contextlib_async * test_ctypes * test_functools * test_multiprocessing_forkserver * test_multiprocessing_spawn * test_regrtest * test_statistics * test_typing * test_xml_etree_c I didn't analyze why yet. I guess that they are not real memory leaks, but more minor issue in the code checking for memory leaks. Sadly, it seems like such small glitch can cause a whole Refleak buildbot worker to fail :-( |
|||
| msg340386 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-04-17 10:16 | |
Another example: AMD64 Fedora Rawhide Refleaks 3.6 https://buildbot.python.org/all/#/builders/193/builds/5 test_contextlib leaked [8, 1, 1] memory blocks, sum=10 I cannot reproduce the issue on my ("idle") laptop. It sounds like a random failure (race condition). |
|||
| msg353071 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-09-24 10:45 | |
Recent fail on AMD64 Fedora Rawhide Refleaks 3.x: https://buildbot.python.org/all/#builders/189/builds/162 test_functools leaked [1, 2, 1] memory blocks, sum=4 |
|||
| msg371286 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-06-11 14:23 | |
AMD64 Fedora Rawhide Refleaks 3.8: https://buildbot.python.org/all/#/builders/79/builds/196 test_functools leaked [1, 1, 1] memory blocks, sum=3 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2020-06-11 14:23:37 | vstinner | set | messages: + msg371286 |
| 2019-09-24 10:45:21 | vstinner | set | messages: + msg353071 |
| 2019-04-17 10:16:39 | vstinner | set | messages: + msg340386 |
| 2019-04-11 11:21:06 | vstinner | set | messages: + msg339965 |
| 2019-04-09 16:56:07 | vstinner | set | messages: + msg339781 |
| 2019-04-09 16:43:06 | vstinner | set | messages: + msg339777 |
| 2019-04-09 16:31:39 | vstinner | set | pull_requests: + pull_request12676 |
| 2019-04-09 16:26:21 | vstinner | set | messages: + msg339772 |
| 2019-04-09 16:01:27 | vstinner | set | pull_requests: + pull_request12674 |
| 2019-04-09 16:01:19 | vstinner | set | messages: + msg339768 |
| 2019-04-09 12:42:33 | vstinner | set | pull_requests: + pull_request12668 |
| 2019-04-09 12:23:51 | vstinner | set | messages: + msg339738 |
| 2019-04-09 12:05:09 | vstinner | set | messages: + msg339735 |
| 2019-04-09 12:01:30 | vstinner | set | pull_requests: + pull_request12667 |
| 2019-04-09 11:28:48 | serhiy.storchaka | set | nosy:
+ methane, levkivskyi messages: + msg339733 |
| 2019-04-09 11:22:53 | vstinner | set | messages: + msg339732 |
| 2019-04-09 11:20:13 | vstinner | set | messages: + msg339731 |
| 2019-04-09 10:55:15 | vstinner | set | keywords:
+ patch stage: patch review pull_requests: + pull_request12666 |
| 2019-04-08 17:23:43 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka |
| 2019-04-08 17:14:54 | vstinner | set | messages: + msg339663 |
| 2019-04-08 17:11:59 | vstinner | set | files:
+ test_functools2.py messages: + msg339662 |
| 2019-04-08 15:33:24 | vstinner | set | files:
+ test_functools.py messages: + msg339651 |
| 2019-04-08 14:22:16 | vstinner | create | |