bpo-35780: Add link guards to the lru_cache() C code by rhettinger · Pull Request #11733 · python/cpython

rhettinger · 2019-02-02T04:18:01Z

Hello Tim, if you have the time can you give your thoughts on this?

I've been through the code a few times and can't convince myself that the existing code won't fail in exotic cases (ones where the __eq__ call triggers arbitrary code). My thought is to mark links as unused when they are new or have been extracted from the doubly linked list, and then to use that information to skip invalid actions if another thread or reentrant call is also modifying the same link or the cache dictionary.

In the last PR, GH-11623, I fixed-up the more obvious bugs and added extensive commentary that may help with the analysis.

https://bugs.python.org/issue35780

…ded.

rhettinger · 2019-02-06T00:47:06Z

Alternatives:

Add a per-lru-instance state flag to allow reentrant or concurrent calls to be detected.
Temporarily INCREF a link that is being used so that another thread or reentrant call can't destrory the link.
Add a in-use flag to each link to preclude double updating or a free while in use.
Go back to just the pure python version with its reentrant lock and more extensive and robust ref count logic. Keep the the make_key() step in C (while adding a field to make sure the hash only gets computed once).

Other ideas:

Write a TLA+ spec for the lru_cache and use the model checker to identify the failing cases.

rhettinger · 2019-02-09T00:44:54Z

Here's a scenario that I'm worried about but haven't proved that it can occur.

When a link is created, the lru_cache gets one reference but doesn't store it anywhere.
When the link is added to the linked list, it is also added to the cache dict, giving a second reference.
When a link is about to be evicted, it is first removed from the linked list.
If a dict access triggers a reentrant or concurrent call into the lru_cache, that new code path can use the dict to find the link (already removed from the linked list but still in the dict) and try to move it to the front of the cache by updating the link fields. That puts it back into the linked list.
The original code path resumes, believing that the link is removed from the linked list, and deletes the link from the dict, killing that last reference to the link.
Now, the linked list has a list that is an orphan (no cache dict entry points at it) and that has a refcount of zero (deleted but still in memory, ready for reuse), and that will fail when it comes time to evict it. That will either segfault, overwrite something in use, drive the refcount negative, or silently corrupt the output.

Possible solutions (if in fact there is a problem):

Don't use borrowed references in the links. Use actual references to establish the invariant: all links in the linked list have a ref count greater than zero, even if the links are orphans.
Adopt the rotating root technique used in the pure Python code so that links don't get removed and readded with intervening dictionary calls. For a cache miss, the pure Python code leaves the links in place and only updates the key/result fields in the new link and invalidates those fields in the old link. In other words, it never leaves the links in an inconsistent state.

tim-one · 2019-02-13T04:51:56Z

On general principle, I'd urge looking at ways that are "obviously correct" instead of "so clever I'm not sure whether I can concoct a failing case".

Two possibly relevant things follow from that. Whenever other threads or reentrancy may occur:

Everything must be in a consistent state.
Any reference that's borrowed when threading/reentrancy may begin must never be referenced again after threading/reentrancy returns. The easiest way to ensure this is to avoid borrowed references entirely.

Those are big steps on the way to "obviously correct". And obviously so, right? Don't fight sanity 😄.

iritkatriel · 2022-04-11T22:01:21Z

https://bugs.python.org/issue35780 is closed. What is the status of this PR?

rhettinger added 5 commits February 1, 2019 17:06

Start adding link guard macros

c3cc927

Mark unused when links are created or extracted. Mark used when appen…

423fbd8

…ded.

Cover remaining code paths

5c7aa81

Add usage note for the macros

684f58d

Fix nits

ebbec89

rhettinger added type-bug An unexpected behavior, bug, or error skip news labels Feb 2, 2019

rhettinger assigned tim-one Feb 2, 2019

the-knights-who-say-ni added the CLA signed label Feb 2, 2019

bedevere-bot added the awaiting merge label Feb 2, 2019

rhettinger added 3 commits February 2, 2019 09:40

Add guard to prepend_link

8bd21d2

Fix refcnt

df4ae2e

Add more extensive comments

34dbfa0

rhettinger added the DO-NOT-MERGE label Feb 2, 2019

rhettinger added 2 commits February 2, 2019 12:23

Note differences from the pure Python version

b75e4cc

Note doubts about a comment's veracity

997d521

rhettinger closed this Apr 18, 2022

Uh oh!

rhettinger commented Feb 2, 2019 •

edited by bedevere-bot

Loading

Uh oh!

rhettinger commented Feb 6, 2019 •

edited

Loading

Uh oh!

rhettinger commented Feb 9, 2019

Uh oh!

tim-one commented Feb 13, 2019

Uh oh!

iritkatriel commented Apr 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

rhettinger commented Feb 2, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhettinger commented Feb 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhettinger commented Feb 9, 2019

Uh oh!

tim-one commented Feb 13, 2019

Uh oh!

iritkatriel commented Apr 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rhettinger commented Feb 2, 2019 •

edited by bedevere-bot

Loading

rhettinger commented Feb 6, 2019 •

edited

Loading