[proxy] github.com← back | site home | direct (HTTPS) ↗ | proxy home | ◑ dark◐ light
/ cpython Public

Conversation

Copy link
Contributor

rhettinger commented Feb 2, 2019

Hello Tim, if you have the time can you give your thoughts on this?

I've been through the code a few times and can't convince myself that the existing code won't fail in exotic cases (ones where the __eq__ call triggers arbitrary code). My thought is to mark links as unused when they are new or have been extracted from the doubly linked list, and then to use that information to skip invalid actions if another thread or reentrant call is also modifying the same link or the cache dictionary.

In the last PR, GH-11623, I fixed-up the more obvious bugs and added extensive commentary that may help with the analysis.

https://bugs.python.org/issue35780

Copy link
Contributor Author

rhettinger commented Feb 6, 2019

Alternatives:

  • Add a per-lru-instance state flag to allow reentrant or concurrent calls to be detected.
  • Temporarily INCREF a link that is being used so that another thread or reentrant call can't destrory the link.
  • Add a in-use flag to each link to preclude double updating or a free while in use.
  • Go back to just the pure python version with its reentrant lock and more extensive and robust ref count logic. Keep the the make_key() step in C (while adding a field to make sure the hash only gets computed once).

Other ideas:

  • Write a TLA+ spec for the lru_cache and use the model checker to identify the failing cases.

Copy link
Contributor Author

Here's a scenario that I'm worried about but haven't proved that it can occur.

  • When a link is created, the lru_cache gets one reference but doesn't store it anywhere.
  • When the link is added to the linked list, it is also added to the cache dict, giving a second reference.
  • When a link is about to be evicted, it is first removed from the linked list.
  • If a dict access triggers a reentrant or concurrent call into the lru_cache, that new code path can use the dict to find the link (already removed from the linked list but still in the dict) and try to move it to the front of the cache by updating the link fields. That puts it back into the linked list.
  • The original code path resumes, believing that the link is removed from the linked list, and deletes the link from the dict, killing that last reference to the link.
  • Now, the linked list has a list that is an orphan (no cache dict entry points at it) and that has a refcount of zero (deleted but still in memory, ready for reuse), and that will fail when it comes time to evict it. That will either segfault, overwrite something in use, drive the refcount negative, or silently corrupt the output.

Possible solutions (if in fact there is a problem):

  • Don't use borrowed references in the links. Use actual references to establish the invariant: all links in the linked list have a ref count greater than zero, even if the links are orphans.
  • Adopt the rotating root technique used in the pure Python code so that links don't get removed and readded with intervening dictionary calls. For a cache miss, the pure Python code leaves the links in place and only updates the key/result fields in the new link and invalidates those fields in the old link. In other words, it never leaves the links in an inconsistent state.

Copy link
Member

tim-one commented Feb 13, 2019

On general principle, I'd urge looking at ways that are "obviously correct" instead of "so clever I'm not sure whether I can concoct a failing case".

Two possibly relevant things follow from that. Whenever other threads or reentrancy may occur:

  • Everything must be in a consistent state.

  • Any reference that's borrowed when threading/reentrancy may begin must never be referenced again after threading/reentrancy returns. The easiest way to ensure this is to avoid borrowed references entirely.

Those are big steps on the way to "obviously correct". And obviously so, right? Don't fight sanity 😄.

Copy link
Member

https://bugs.python.org/issue35780 is closed. What is the status of this PR?

rhettinger closed this Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting merge DO-NOT-MERGE skip news type-bug An unexpected behavior, bug, or error

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants