Issue31070
Created on 2017-07-28 15:03 by vstinner, last changed 2017-10-28 04:46 by ncoghlan. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 3029 | closed | vstinner, 2017-08-08 23:42 | |
| PR 3033 | merged | serhiy.storchaka, 2017-08-09 06:42 | |
| PR 3038 | merged | serhiy.storchaka, 2017-08-09 11:37 | |
| Messages (13) | |||
|---|---|---|---|
| msg299393 - (view) | Author: STINNER Victor (vstinner) * | Date: 2017-07-28 15:03 | |
See bpo-bpo-30891 and bpo-30876 for recent changes in importlib. http://buildbot.python.org/all/builders/AMD64%20Windows8.1%20Refleaks%203.x/builds/57/steps/test/logs/stdio 1:33:30 [312/406/2] test_threaded_import passed (30 sec) -- running: test_decimal (826 sec), test_set (84 sec), test_mmap (1151 sec) beginning 6 repetitions 123456 ...Exception ignored in: <function _get_module_lock.<locals>.cb at 0x000000813CA18EB8> Traceback (most recent call last): File "<frozen importlib._bootstrap>", line 176, in cb KeyError: ('random',) ... |
|||
| msg299951 - (view) | Author: STINNER Victor (vstinner) * | Date: 2017-08-08 22:59 | |
Oh, it's easy to reproduce the bug on the master branch. Example on Linux: haypo@selma$ ./python -m test test_threaded_import Run tests sequentially 0:00:00 load avg: 0.16 [1/1] test_threaded_import Exception ignored in: <function _get_module_lock.<locals>.cb at 0x7f53201e2f70> Traceback (most recent call last): File "<frozen importlib._bootstrap>", line 176, in cb KeyError: ('random',) 1 test OK. Total duration: 2 sec Tests result: SUCCESS |
|||
| msg299952 - (view) | Author: STINNER Victor (vstinner) * | Date: 2017-08-08 23:40 | |
The problem is the weakref callback of _module_locks. It seems like the callback is called after a second thread replaced _module_locks[name]
* Thread 1: _get_module_lock('random') sets _module_locks['random'] and registers a weakref callback on the newly created lock
* Thread 1: the lock is destroyed, the weakref callback will be called as soon as possible
* Thread 2: _get_module_lock('random') sees that the weakref is dead and so replaces _module_locks['random'] with a new lock, and registers a new weakref callback on the second lock
* Thread 1: Call the weakref callback removing _module_locks['random'] -- BUG: the callback is called too late, after Thread 2 already sets a new lock
* Thread 2: The lock is destroyed, the second weakref callback will be called as soon as possible
* Thread 2: The second callback is called and fails with KeyError since _module_locks['random'] was already called in the Thread 1 in the meanwhile
|
|||
| msg299953 - (view) | Author: STINNER Victor (vstinner) * | Date: 2017-08-08 23:43 | |
I wrote https://github.com/python/cpython/pull/3029 which seems to fix the issue. I don't know importlib well enough to understand why we need a weak reference to a lock. My PR adds a second lock per module lock (!) to be able to wait until _module_locks[name] is deleted when a thread detects that _module_locks[name] exists and the associated lock was destroyed. |
|||
| msg299971 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2017-08-09 06:45 | |
I think the solution can be simpler. PR 3033 uses the global import lock for guarding modification of the _module_locks dictionary. |
|||
| msg299980 - (view) | Author: STINNER Victor (vstinner) * | Date: 2017-08-09 09:23 | |
Serhiy: "I think the solution can be simpler. PR 3033 uses the global import lock for guarding modification of the _module_locks dictionary." I tried exactly that, but it wasn't enough. But your PR is correct because it also checks the current value of _module_locks[name] using the callback parameter. I disliked my own PR, so I'm happy that someone else wrote a simpler fix! (I already abandonned my PR.) |
|||
| msg299989 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2017-08-09 11:29 | |
New changeset 9b0d1d647e3d2ec9d299e5c9f49b02fbbb810a5a by Serhiy Storchaka in branch 'master': bpo-31070: Fix a race condition in importlib _get_module_lock(). (#3033) https://github.com/python/cpython/commit/9b0d1d647e3d2ec9d299e5c9f49b02fbbb810a5a |
|||
| msg299992 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2017-08-09 11:56 | |
New changeset f3b891718e104b6e7018b58bbcd86421a2837fb8 by Serhiy Storchaka in branch '3.6': [3.6] bpo-31070: Fix a race condition in importlib _get_module_lock(). (GH-3033). (#3038) https://github.com/python/cpython/commit/f3b891718e104b6e7018b58bbcd86421a2837fb8 |
|||
| msg299993 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2017-08-09 11:57 | |
Thank you for review Victor and Nick. |
|||
| msg299999 - (view) | Author: STINNER Victor (vstinner) * | Date: 2017-08-09 13:12 | |
I'm curious to know if this issue is a regression caused by recent changes: bpo-30891 and bpo-30876. But well, it doesn't matter as soon as it's fixed :-) |
|||
| msg300007 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2017-08-09 13:46 | |
I don't think the changes created the bug, but I do think the new tests to provoke the other race conditions made it easier for the test suite to hit this race as well. |
|||
| msg300009 - (view) | Author: STINNER Victor (vstinner) * | Date: 2017-08-09 14:01 | |
> I don't think the changes created the bug, but I do think the new tests to provoke the other race conditions made it easier for the test suite to hit this race as well. Oh ok, that's a good news! |
|||
| msg305147 - (view) | Author: Nick Coghlan (ncoghlan) * | Date: 2017-10-28 04:46 | |
A recent numpy threaded import bug report that appears in 3.6.1 but is absent in 3.6.3: https://github.com/numpy/numpy/issues/9931 So that's additional evidence that this really was a pre-existing race condition that the new tests for the SystemError cases revealed. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2017-10-28 04:46:37 | ncoghlan | set | messages: + msg305147 |
| 2017-08-09 14:01:24 | vstinner | set | messages: + msg300009 |
| 2017-08-09 13:46:14 | ncoghlan | set | messages: + msg300007 |
| 2017-08-09 13:12:19 | vstinner | set | messages: + msg299999 |
| 2017-08-09 11:57:47 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: + msg299993 stage: patch review -> resolved |
| 2017-08-09 11:56:16 | serhiy.storchaka | set | messages: + msg299992 |
| 2017-08-09 11:37:26 | serhiy.storchaka | set | pull_requests: + pull_request3071 |
| 2017-08-09 11:29:15 | serhiy.storchaka | set | messages: + msg299989 |
| 2017-08-09 09:23:04 | vstinner | set | messages: + msg299980 |
| 2017-08-09 08:47:24 | vstinner | set | title: test_threaded_import: KeyError ignored in _get_module_lock.<locals>.cb on AMD64 Windows8.1 Refleaks 3.x -> test_threaded_import: KeyError ignored in _get_module_lock.<locals>.cb |
| 2017-08-09 06:45:12 | serhiy.storchaka | set | versions:
+ Python 3.6 messages: + msg299971 components:
+ Interpreter Core, - Tests |
| 2017-08-09 06:42:41 | serhiy.storchaka | set | pull_requests: + pull_request3068 |
| 2017-08-08 23:43:57 | vstinner | set | messages: + msg299953 |
| 2017-08-08 23:42:16 | vstinner | set | pull_requests: + pull_request3064 |
| 2017-08-08 23:40:56 | vstinner | set | messages: + msg299952 |
| 2017-08-08 22:59:16 | vstinner | set | nosy:
+ brett.cannon, ncoghlan messages: + msg299951 |
| 2017-07-28 15:03:41 | vstinner | create | |