[proxy] web.archive.org← back | site home | direct (HTTPS) ↗ | proxy home | ◑ dark◐ light
/ cpython Public
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-32734: Fix asyncio.Lock multiple acquire safety issue #5466

Merged
merged 5 commits into from Feb 2, 2018

Conversation

Copy link
Contributor

bharel commented Jan 31, 2018

As promised, found and fixed in the same day due to the high importance.
This will cause almost non-existing overhead (one try statement) and eliminate the bug.

https://bugs.python.org/issue32734

Copy link
Contributor

asvetlov left a comment

Looks good, the change is correct.

Please fix minor issues for test.

t1.cancel()
lock.release()

asyncio.Task(lockandtrigger(), loop=self.loop)
Copy link
Contributor

asvetlov Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use loop.create_task() for a task creation

Copy link
Contributor Author

bharel Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Keep in mind asyncio.Task was used in the rest of the tests so I used it in order to be consistent.


# Make sure only one lock was taken
self.assertEqual(lock_count, 1)

Copy link
Contributor

asvetlov Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explicitly make sure that all four tasks are finished at the end of test

Copy link
Contributor Author

bharel Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

bedevere-bot commented Feb 1, 2018

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Copy link

bedevere-bot commented Feb 1, 2018

GH-5477 is a backport of this pull request to the 3.6 branch.

Copy link

bedevere-bot commented Feb 1, 2018

GH-5479 is a backport of this pull request to the 3.7 branch.

Copy link
Contributor Author

bharel commented Feb 1, 2018

Alright. Done with changes and backports. Anything else I can help with? :-)

I have made the requested changes; please review again

Copy link

bedevere-bot commented Feb 1, 2018

Thanks for making the requested changes!

@asvetlov: please review the changes made to this pull request.

@@ -181,18 +181,22 @@ def locked(self):
self._locked = True
return True

_waiters = self._waiters
Copy link
Member

1st1 Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: just use self._waiters. No need for a new variable.

Copy link
Contributor Author

bharel Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New variable was created in order to avoid the double attribute lookup, I'll remove it if you think it's more readable though.

await fut
self._locked = True
return True
try:
Copy link
Member

1st1 Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very subtle. Can you add a comment why we use two nested 'try' statements here? I.e. that it's important that 'finally' block is called before 'except CancelledError'

Copy link
Contributor Author

bharel Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing :-)

if not fut.done():
fut.set_result(True)
break
"""Wake up the first waiter if it isn't done."""
Copy link
Member

1st1 Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it is done? Can there be a situation where you have [fut, fut] in the _waiters list?

Copy link
Contributor Author

bharel Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be indeed, when 2 or more locks are waiting on acquire.
If the first future is done, it means that later on it will either get the lock, or it was cancelled in which case it will wake up the next waiter upon loop's next cycle.
The patch fixes the bug by avoiding multiple done waiters at once and cleaning up the waiters list asap.
The only way multiple done waiters can happen is if 2 futures get cancelled in which case only 1 waiter will wake up, and the second cancelled future will not wake up another one as lock would have already been taken (if not self._locked: self._wake_up_first()) or it wasn't taken yet it which case it's still sitting .done() in the waiter's list.

In summary:
.done necessarily means either the lock is immediately taken next loop or if it isn't, (.done occurred because of cancellation) next one wakes up. So if we have a done future not including the one currently running we shouldn't wake up another one as it will cause a double wakeup, hence, the almighty bug.

Copy link
Member

1st1 Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, makes sense. Could you please add this all as a comment to _wake_up_first? :)

Copy link
Contributor Author

bharel Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phew, now that's a long comment, lemme see how can I shorten it while still managing to fully explain it. Challenge accepted :P

Copy link
Member

1st1 Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, if you have time to write a short comment :)

Sorry for being anal about this. Every once in a while we find a new bug in asyncio.Lock or asyncio.Queue and it's always a challenge to wrap your head around how they are actually working. Every line of code is significant, sometimes it's not safe to do a simple refactoring because things break in very non-obvious ways.

Copy link
Contributor Author

bharel Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, all good mate. It took some time to wrap my head around it as well.
Luckily I didn't actually encounter the bug at production but rather found it by reading the implementation which took a couple of hours ;-)

How's this comment? Attempting to be short and to the point as much as you can with synchronization.

Copy link
Member

1st1 left a comment

I've left a few review comments

Copy link

bedevere-bot commented Feb 2, 2018

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

@@ -181,18 +181,22 @@ def locked(self):
self._locked = True
return True

_waiters = self._waiters
Copy link
Contributor Author

bharel Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New variable was created in order to avoid the double attribute lookup, I'll remove it if you think it's more readable though.

if not fut.done():
fut.set_result(True)
break
"""Wake up the first waiter if it isn't done."""
Copy link
Contributor Author

bharel Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be indeed, when 2 or more locks are waiting on acquire.
If the first future is done, it means that later on it will either get the lock, or it was cancelled in which case it will wake up the next waiter upon loop's next cycle.
The patch fixes the bug by avoiding multiple done waiters at once and cleaning up the waiters list asap.
The only way multiple done waiters can happen is if 2 futures get cancelled in which case only 1 waiter will wake up, and the second cancelled future will not wake up another one as lock would have already been taken (if not self._locked: self._wake_up_first()) or it wasn't taken yet it which case it's still sitting .done() in the waiter's list.

In summary:
.done necessarily means either the lock is immediately taken next loop or if it isn't, (.done occurred because of cancellation) next one wakes up. So if we have a done future not including the one currently running we shouldn't wake up another one as it will cause a double wakeup, hence, the almighty bug.

try:
await fut
finally:
_waiters.remove(fut)
except futures.CancelledError:
Copy link
Contributor Author

bharel Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1st1 @asvetlov I'll be honest, the only problem I have (which originates from the original code anyway) is what will happen if the exception isn't a CancelledError (such as task.__step(self, exc=OutOfCoconutsException)). Apart from Monty Python having no horse, the code will enter a deadlock as no one will wake up the next waiter. I'm not entirely sure what exceptions can pass into the task's __step as task doesn't allow set_exception but still better be safe than sorry no?

await fut
self._locked = True
return True
try:
Copy link
Contributor Author

bharel Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing :-)

Copy link
Contributor Author

bharel commented Feb 2, 2018

I have made the requested changes; please review again

Copy link

bedevere-bot commented Feb 2, 2018

Thanks for making the requested changes!

@1st1, @asvetlov: please review the changes made to this pull request.

Copy link
Contributor

asvetlov left a comment

LGTM.
@1st1 please review again

1st1 approved these changes Feb 2, 2018
Copy link
Contributor Author

bharel commented Feb 2, 2018

Alright, I guess that's a green light for my backports?

Copy link
Member

1st1 commented Feb 2, 2018

Alright, I guess that's a green light for my backports?

Let's see if our bots can do them automagically first ;)

1st1 merged commit 2f79c01 into python:master Feb 2, 2018
Copy link
Contributor

miss-islington commented Feb 2, 2018

Thanks @bharel for the PR, and @1st1 for merging it 🌮🎉.. I'm working now to backport this PR to: 3.6, 3.7.
🐍🍒🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 2, 2018
)

(cherry picked from commit 2f79c01)

Co-authored-by: Bar Harel <bzvi7919@gmail.com>
Copy link

bedevere-bot commented Feb 2, 2018

GH-5501 is a backport of this pull request to the 3.7 branch.

Copy link
Contributor

miss-islington commented Feb 2, 2018

Sorry, @bharel and @1st1, I could not cleanly backport this to 3.6 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 2f79c014931cbb23b08a7d16c534a3cc9607ae14 3.6

Copy link

bedevere-bot commented Feb 2, 2018

GH-5502 is a backport of this pull request to the 3.6 branch.

Copy link
Contributor Author

bharel commented Feb 2, 2018

Thanks @1st1 :-)
Done with 3.6 backport.
Btw, doesn't it apply to 3.5 as well or did we stop supporting it?

Copy link
Member

1st1 commented Feb 2, 2018

3.5 is in security-only fixes mode IIRC.

1st1 pushed a commit that referenced this pull request Feb 2, 2018
…5501)

(cherry picked from commit 2f79c01)

Co-authored-by: Bar Harel <bzvi7919@gmail.com>
1st1 pushed a commit that referenced this pull request Feb 2, 2018
bharel deleted the bpo-32734 branch Dec 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants