[proxy] web.archive.org← back | site home | direct (HTTPS) ↗ | proxy home | ◑ dark◐ light

Issue 30064: BaseSelectorEventLoop.sock_{recv,sendall}() don't remove their callbacks when canceled

Created on 2017-04-13 08:55 by abacabadabacaba, last changed 2020-06-15 16:37 by vstinner.

Messages (41) msg291593 - (view) Author: Evgeny Kapun (abacabadabacaba) Date: 2017-04-13 08:55
Code:

    import asyncio as a
    import socket as s

    @a.coroutine
    def coro():
        s1, s2 = s.socketpair()
        s1.setblocking(False)
        s2.setblocking(False)
        try:
            yield from a.wait_for(loop.sock_recv(s2, 1), 1)
        except a.TimeoutError:
            pass
        yield from loop.sock_sendall(s1, b'\x00')
        yield
        s1.close()
        s2.close()

    loop = a.get_event_loop()
    loop.run_until_complete(coro())

Result:

    Exception in callback BaseSelectorEventLoop._sock_recv(<Future cancelled>, True, <socket.socke...2049, proto=0>, 1)
    handle: <Handle BaseSelectorEventLoop._sock_recv(<Future cancelled>, True, <socket.socke...2049, proto=0>, 1)>
    Traceback (most recent call last):
      File "/usr/lib/python3.6/asyncio/events.py", line 127, in _run
        self._callback(*self._args)
      File "/usr/lib/python3.6/asyncio/selector_events.py", line 378, in _sock_recv
        self.remove_reader(fd)
      File "/usr/lib/python3.6/asyncio/selector_events.py", line 342, in remove_reader
        return self._remove_reader(fd)
      File "/usr/lib/python3.6/asyncio/selector_events.py", line 279, in _remove_reader
        key = self._selector.get_key(fd)
      File "/usr/lib/python3.6/selectors.py", line 189, in get_key
        return mapping[fileobj]
      File "/usr/lib/python3.6/selectors.py", line 70, in __getitem__
        fd = self._selector._fileobj_lookup(fileobj)
      File "/usr/lib/python3.6/selectors.py", line 224, in _fileobj_lookup
        return _fileobj_to_fd(fileobj)
      File "/usr/lib/python3.6/selectors.py", line 41, in _fileobj_to_fd
        raise ValueError("Invalid file descriptor: {}".format(fd))
    ValueError: Invalid file descriptor: -1
msg326294 - (view) Author: Yury Selivanov (yselivanov) * Date: 2018-09-24 21:57
See also https://bugs.python.org/issue34795
msg329746 - (view) Author: Andrew Svetlov (asvetlov) * Date: 2018-11-12 17:00
New changeset 74387926072abf338a4c1cec1bf0501fc65bbee5 by Andrew Svetlov in branch 'master':
bpo-30064: Refactor sock_* asyncio API (#10419)
https://github.com/python/cpython/commit/74387926072abf338a4c1cec1bf0501fc65bbee5
msg355272 - (view) Author: Yury Selivanov (yselivanov) * Date: 2019-10-23 22:42
I think this broke asyncio a bit.  These two uvloop sock_* API tests no longer pass on asyncio 3.8:

* https://github.com/MagicStack/uvloop/blob/master/tests/test_sockets.py#L192

* https://github.com/MagicStack/uvloop/blob/master/tests/test_sockets.py#L238

Andrew, since this was your patch, please take a look.
msg355273 - (view) Author: Yury Selivanov (yselivanov) * Date: 2019-10-23 22:58
I'll elevate the status so that we don't forget before 3.8.1 is too close
msg357989 - (view) Author: Ned Deily (ned.deily) * Date: 2019-12-07 20:48
> I'll elevate the status so that we don't forget before 3.8.1 is too close

Andrew, Yury: ping. 3.8.1 cutoff is approaching.
msg358011 - (view) Author: Andrew Svetlov (asvetlov) * Date: 2019-12-08 16:37
Not sure if failed uvloop tests are correct.
The scenario is like the following:
1. Suppose we have an unblocking socket connected to peer.
2. Create a task for reading data: 
  task = asyncio.create_task(loop.sock_read(sock, 1))
 Note, the task is not started yet.
3. Just after it, cancel the task without waiting for the actual cancellation
  task.cancel()
4. Finally, read from the socket again:
  data = await loop.sock_read(sock, 1)

If I put context switch (await asyncio.sleep(0)) between 3) and 4) *or* replace direct sock_read() call in 4) with creation a task (data = await asyncio.create_task(loop.sock_read(sock, 1))) the cancellation of former read is performed and test passes.

I very doubt if any sane code is organizing like this test: start delayed reading, cancel it and read again.

The worse, neither previous not current sock_read() implementation doesn't prevent the concurrent reading which basically delivers data in an unpredictable order. Honestly, I'm not sure if we need to provide the concurrency guarantee for such low-level functions.  The code built on top of these low-level primitives should handle the cuncurrent access problem IMHO.
msg358098 - (view) Author: Łukasz Langa (lukasz.langa) * Date: 2019-12-09 14:19
Note: this is going to miss Python 3.8.1 as I'm releasing 3.8.1rc1 right now.  Please address this before 3.8.2 (due in February).
msg358108 - (view) Author: Andrew Svetlov (asvetlov) * Date: 2019-12-09 14:59
As I wrote, I don't think that we should fix the code but the failed uvloop test is slightly weird instead.

Maybe I'm wrong, I need to communicate with Yuri anyway.
msg362637 - (view) Author: Łukasz Langa (lukasz.langa) * Date: 2020-02-25 12:10
Downgrading priority on this one.
msg362671 - (view) Author: Yury Selivanov (yselivanov) * Date: 2020-02-25 23:38
> I very doubt if any sane code is organizing like this test: start delayed reading, cancel it and read again.

Hm, cancellation should work correctly no matter how "sane" or "insane" the user code is.

> The worse, neither previous not current sock_read() implementation doesn't prevent the concurrent reading which basically delivers data in an unpredictable order.

But we're not discussing using a socket concurrently -- asyncio explicitly does not support that for the sock_ api. 

AFAICT this issue is about consequent cancel operation not working as expected in asyncio, no?
msg362690 - (view) Author: Andrew Svetlov (asvetlov) * Date: 2020-02-26 12:39
This is a very interesting question.
In asyncio, task cancellation is not executed on `task.cancel()` call immediately but the task is *marked* for cancellation. 
The actual cancellation requires a context switch (await asyncio.sleep(0) or similar) to propagate `asyncio.CancelledError` into the task code and unwind the exception.

As I said, the test needs a context switch between `cancel()` call and analyzing the state after the task cancellation.

Futures are canceled immediately, that's why the previous implementation passed uvloop's test suite untouched. Unfortunately, it had own flaws.
Also please note, sock_connect()/sock_accept() were implemented this way about two years before my changes, they also suffer from cancel-without-actual-cancellation problems as well.
msg368553 - (view) Author: Fantix King (fantix) * Date: 2020-05-09 22:52
Following the discussion - looks like `add_reader()` will cancel the previous `handle` on the same `fd` if found, so I think it is reasonable to solve this cancellation-race issue by checking if the `handle` is cancelled as shown in the attached patch (tests in uvloop passed with this patch). WDTY?
msg368701 - (view) Author: Joongi Kim (achimnol) * Date: 2020-05-12 05:13
I just encountered this issue when doing "sys.exit(1)" on a Click-based CLI program that internally uses asyncio event loop via wrapped via a context manager, on Python 3.8.2.

Using uvloop or adding "time.sleep(0.1)" before "sys.exit(1)" removes the error.
msg368702 - (view) Author: Joongi Kim (achimnol) * Date: 2020-05-12 05:17
And I suspect that this issue is something simliar to what I did in a recent janus PR:
https://github.com/aio-libs/janus/blob/ec8592b91254971473b508313fb91b01623f13d7/janus/__init__.py#L84
to give a chance for specific callbacks to execute via an extra context switch.
msg370123 - (view) Author: Yury Selivanov (yselivanov) * Date: 2020-05-27 19:47
New changeset 210a137396979d747c2602eeef46c34fc4955448 by Fantix King in branch 'master':
bpo-30064: Fix asyncio loop.sock_* race condition issue (#20369)
https://github.com/python/cpython/commit/210a137396979d747c2602eeef46c34fc4955448
msg370131 - (view) Author: miss-islington (miss-islington) Date: 2020-05-27 20:39
New changeset 3a2667d91e33493ccde113ddf5092afefc3c89fa by Miss Islington (bot) in branch '3.9':
bpo-30064: Fix asyncio loop.sock_* race condition issue (GH-20369)
https://github.com/python/cpython/commit/3a2667d91e33493ccde113ddf5092afefc3c89fa
msg370143 - (view) Author: STINNER Victor (vstinner) * Date: 2020-05-27 22:17
The new test_sock_client_racing() test seems to have... a race condition...

AMD64 FreeBSD Non-Debug 3.9:
https://buildbot.python.org/all/#/builders/750/builds/11

ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.KqueueEventLoopTests)
ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.PollEventLoopTests)
ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.SelectEventLoopTests)

Example:

ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.PollEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/test/test_asyncio/test_sock_lowlevel.py", line 216, in _basetest_sock_connect_racing
    await self.loop.sock_connect(sock, addr)
  File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/asyncio/selector_events.py", line 500, in sock_connect
    return await fut
  File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/asyncio/selector_events.py", line 505, in _sock_connect
    sock.connect(address)
ConnectionRefusedError: [Errno 61] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/test/test_asyncio/test_sock_lowlevel.py", line 253, in test_sock_client_racing
    self.loop.run_until_complete(asyncio.wait_for(
  File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/asyncio/tasks.py", line 496, in wait_for
    return fut.result()
  File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/test/test_asyncio/test_sock_lowlevel.py", line 219, in _basetest_sock_connect_racing
    await self.loop.sock_connect(sock, addr)
  File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/asyncio/selector_events.py", line 500, in sock_connect
    return await fut
  File "/usr/home/buildbot/python/3.9.koobs-freebsd-9e36.nondebug/build/Lib/asyncio/selector_events.py", line 505, in _sock_connect
    sock.connect(address)
ConnectionRefusedError: [Errno 61] Connection refused
msg370147 - (view) Author: Fantix King (fantix) * Date: 2020-05-27 22:51
Ouch ... looks like FreeBSD also needs a few more retries than a single retry. I'll test on a FreeBSD and create a PR for that.
msg370159 - (view) Author: Fantix King (fantix) * Date: 2020-05-28 00:50
Not a simple one - FreeBSD is returning ECONNREFUSED immediately when trying to connect to a non-listening port on the loopback address:

https://lists.freebsd.org/pipermail/freebsd-current/2005-May/049876.html

Then all following connect attempts on the same socket return ECONNREFUSED regardless of whether the port is listening or not. I think I'll have to skip this test on FreeBSD.
msg370161 - (view) Author: Kyle Stanley (aeros) * Date: 2020-05-28 02:21
> The new test_sock_client_racing() test seems to have... a race condition...

I'm also a bit skeptical about relying on `time.sleep(0.01)` at the end of the loop in `_basetest_sock_recv_racing()`, but I don't have an idea for a better approach at the moment.
msg370205 - (view) Author: STINNER Victor (vstinner) * Date: 2020-05-28 13:48
I created https://github.com/python/cpython/pull/20485 to skip the unstable tests until someone fix them. IMO it's better to skip the test than reverting the fix, since the fix does fix a race condition.

The test is still failing on buildbots which makes analysis of buildbot failures more difficult. For example, a SQLite change got 3 different 4 different bugs on buildbots:
https://github.com/python/cpython/pull/20448#issuecomment-635350054

See the "revert on fail" policy for buildbots:
https://pythondev.readthedocs.io/ci.html#revert-on-fail


> I'm also a bit skeptical about relying on `time.sleep(0.01)` at the end of the loop in `_basetest_sock_recv_racing()`

Yeah, please avoid sleep as a synchronization primitive:
https://pythondev.readthedocs.io/unstable_tests.html#don-t-use-sleep-as-synchronization
msg370215 - (view) Author: STINNER Victor (vstinner) * Date: 2020-05-28 14:08
New changeset 84ee7e1573d166fe7a9be676813e12523b62ab24 by Victor Stinner in branch 'master':
bpo-30064: Fix unstable asyncio "racing" socket tests (GH-20485)
https://github.com/python/cpython/commit/84ee7e1573d166fe7a9be676813e12523b62ab24
msg370218 - (view) Author: miss-islington (miss-islington) Date: 2020-05-28 14:28
New changeset 1d82f003678816ff8dd822452ec91669844d2d09 by Miss Islington (bot) in branch '3.9':
bpo-30064: Fix unstable asyncio "racing" socket tests (GH-20485)
https://github.com/python/cpython/commit/1d82f003678816ff8dd822452ec91669844d2d09
msg370219 - (view) Author: STINNER Victor (vstinner) * Date: 2020-05-28 14:29
> bpo-30064: Fix unstable asyncio "racing" socket tests (GH-20485)

Sorry. You should read "Skip" tests, not "Fix" tests :-p
msg370234 - (view) Author: Fantix King (fantix) * Date: 2020-05-28 16:11
Thanks for the comments! Added PR 20494 to properly fix/skip the test for (hopefully) all platforms.
msg370268 - (view) Author: Yury Selivanov (yselivanov) * Date: 2020-05-28 21:56
New changeset dc4eee9e266267498a6b783a0abccc23c06f2b87 by Fantix King in branch 'master':
bpo-30064: Properly skip unstable loop.sock_connect() racing test (GH-20494)
https://github.com/python/cpython/commit/dc4eee9e266267498a6b783a0abccc23c06f2b87
msg370269 - (view) Author: miss-islington (miss-islington) Date: 2020-05-28 22:17
New changeset 6637bd45163024f2187e40d2cc12c473f78bf5da by Miss Islington (bot) in branch '3.9':
bpo-30064: Properly skip unstable loop.sock_connect() racing test (GH-20494)
https://github.com/python/cpython/commit/6637bd45163024f2187e40d2cc12c473f78bf5da
msg371240 - (view) Author: STINNER Victor (vstinner) * Date: 2020-06-10 20:14
Another kind of failure on x86 Gentoo Installed with X 3.x:
https://buildbot.python.org/all/#/builders/128/builds/1030

ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.EPollEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.10/test/test_asyncio/test_sock_lowlevel.py", line 200, in _basetest_sock_send_racing
    await self.loop.sock_sendall(sock, b'world')
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.10/asyncio/selector_events.py", line 460, in sock_sendall
    return await fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.10/asyncio/tasks.py", line 507, in wait_for
    fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.10/test/test_asyncio/test_sock_lowlevel.py", line 256, in test_sock_client_racing
    self.loop.run_until_complete(asyncio.wait_for(
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.10/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.installed/build/target/lib/python3.10/asyncio/tasks.py", line 509, in wait_for
    raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
msg371247 - (view) Author: Fantix King (fantix) * Date: 2020-06-10 20:40
I'm checking now
msg371467 - (view) Author: Fantix King (fantix) * Date: 2020-06-13 19:38
I've been testing on local VMs with Gentoo and Ubuntu, but the issue cannot be reproduced. The error log is not really revealing anything useful, other than that the send was blocked when it was supposed to be successful. As it is not constantly failing on buildbot (https://buildbot.python.org/all/#/builders/128), I don't know if it is worth it to add some more debug code to this test in master and wait for the next hang. Please kindly advise.
msg371478 - (view) Author: Kyle Stanley (aeros) * Date: 2020-06-14 00:17
I was able to replicate the failure locally by running the large number of jobs in parallel. We typically use this to test for tricky race conditions, to simulate maximum load:

[aeros:~/repos/cpython]$ ./python -m test test_asyncio.test_sock_lowlevel --match test_sock_client_racing -j100 -F -v
# [snip]
======================================================================
ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.EPollEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/aeros/repos/cpython/Lib/test/test_asyncio/test_sock_lowlevel.py", line 200, in _basetest_sock_send_racing
    await self.loop.sock_sendall(sock, b'world')
  File "/home/aeros/repos/cpython/Lib/asyncio/selector_events.py", line 460, in sock_sendall
    return await fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/aeros/repos/cpython/Lib/asyncio/tasks.py", line 507, in wait_for
    fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/aeros/repos/cpython/Lib/test/test_asyncio/test_sock_lowlevel.py", line 256, in test_sock_client_racing
    self.loop.run_until_complete(asyncio.wait_for(
  File "/home/aeros/repos/cpython/Lib/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/aeros/repos/cpython/Lib/asyncio/tasks.py", line 509, in wait_for
    raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError

======================================================================
ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.PollEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/aeros/repos/cpython/Lib/test/test_asyncio/test_sock_lowlevel.py", line 200, in _basetest_sock_send_racing
    await self.loop.sock_sendall(sock, b'world')
  File "/home/aeros/repos/cpython/Lib/asyncio/selector_events.py", line 460, in sock_sendall
    return await fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/aeros/repos/cpython/Lib/asyncio/tasks.py", line 507, in wait_for
    fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/aeros/repos/cpython/Lib/test/test_asyncio/test_sock_lowlevel.py", line 256, in test_sock_client_racing
    self.loop.run_until_complete(asyncio.wait_for(
  File "/home/aeros/repos/cpython/Lib/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/aeros/repos/cpython/Lib/asyncio/tasks.py", line 509, in wait_for
    raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError

======================================================================
ERROR: test_sock_client_racing (test.test_asyncio.test_sock_lowlevel.SelectEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/aeros/repos/cpython/Lib/test/test_asyncio/test_sock_lowlevel.py", line 256, in test_sock_client_racing
    self.loop.run_until_complete(asyncio.wait_for(
  File "/home/aeros/repos/cpython/Lib/asyncio/base_events.py", line 629, in run_until_complete
    self.run_forever()
  File "/home/aeros/repos/cpython/Lib/asyncio/base_events.py", line 596, in run_forever
    self._run_once()
  File "/home/aeros/repos/cpython/Lib/asyncio/base_events.py", line 1854, in _run_once
    event_list = self._selector.select(timeout)
  File "/home/aeros/repos/cpython/Lib/selectors.py", line 323, in select
    r, w, _ = self._select(self._readers, self._writers, [], timeout)
OSError: [Errno 9] Bad file descriptor

I'll try to add some debug lugs to find the source of the failure(s). This one might be a bit tricky to assess though.

@Fantix I'd recommend trying to run the local test with the above command, and increasing the parallel job count until it occurs. If it takes more than a minute or two for the tests to run, slowly lower the number. At the moment, the failure has only occurred once in the buildbots, but we might need to temporarily disable it if it occurs repeatedly or find some form of workaround (as it will potentially mask other bugs).
msg371479 - (view) Author: Kyle Stanley (aeros) * Date: 2020-06-14 01:14
I believe that I might have identified a simple fix for the issue, but first I'd like to clarify on something:

What exactly is the reason that we need to use asyncio.wait_for() with a timeout of 10 seconds in these tests? We typically avoid using short duration timeouts on individual tests, largely because we don't want them to fail based on the performance of the device they're ran on. This can be a concern for slower devices running the test suite w/ parallel jobs.

If asyncio.wait_for() is necessary in this case, I'd advise increasing the timeout substantially. I didn't experiment with a wide variety of numbers, but increasing the duration from 10 to 60 seconds seems to have fixed the failure locally, so that might be a decent starting point. If it's not necessary, I'd prefer to simply remove the asyncio.wait_for()s in test_sock_client_racing.
msg371485 - (view) Author: Fantix King (fantix) * Date: 2020-06-14 04:06
Oh thank you for the clue, that's super helpful! Let me try locally.

This fix PR was originally done to solve a hanging call in a race condition, so the test would hang forever if the fix is not working as expected. The 10 seconds wait_for() is to make sure that we don't end up with a test hanging forever.

If enlarging the timeout works in some of your cases, that really inspires me on a potential cause of the issue in the test. To simulate the race condition specifically for loop.sock_sendall(), I needed a connected socket with a full send buffer so that the next sock_sendall() call would block. To achieve this, I wrote something like this:

    with self.assertRaises(BlockingIOError):
        while True:
            sock.send(b' ' * 5)

And another loop in a subtask to consume the data:

    while not data:
        data = await self.loop.sock_recv(server, 1024)
        data = data.strip()

It might be that it cost too much time in these loops (especially the 2nd one because the logs showed that it was blocked on calling sock_sendall()). But that's just a guess yet, let me add some more debug logs and test locally.
msg371489 - (view) Author: Kyle Stanley (aeros) * Date: 2020-06-14 06:20
No problem at all, and thanks for clarifying.

> If enlarging the timeout works in some of your cases, that really inspires me on a potential cause of the issue in the test. To simulate the race condition specifically for loop.sock_sendall(), I needed a connected socket with a full send buffer so that the next sock_sendall() call would block. To achieve this, I wrote something like this:

> It might be that it cost too much time in these loops (especially the 2nd one because the logs showed that it was blocked on calling sock_sendall()). But that's just a guess yet, let me add some more debug logs and test locally.

That seems reasonable to me and would make the most sense at a glance. If this is confirmed to be the source of the issue, we should be able to just reduce the size of the socket's send buffer via SO_SNDBUF before sending to it so that it fills up much more quickly.
msg371490 - (view) Author: Fantix King (fantix) * Date: 2020-06-14 06:31
OK I think I solved the problem by improving the two loops using smarter buffer sizes:

            # fill the buffer until sending 5 chars would block
            size = 8192
            while size > 5:
                with self.assertRaises(BlockingIOError):
                    while True:
                        sock.send(b' ' * size)
                size = int(size / 2)

and

            # receive everything that is not a space
            async def recv_all():
                rv = b''
                while True:
                    buf = await self.loop.sock_recv(server, 8192)
                    if not buf:
                        return rv
                    rv += buf.strip()

This test is now running ~25x faster, and the load test crashed my laptop a few times before it hits a timeout.

Last question before PR pls: given the fact that this test is to cover a fixed case when loop.sock_*() was hanging forever, should I keep the wait_for(..., timeout=10)?
msg371491 - (view) Author: Fantix King (fantix) * Date: 2020-06-14 06:41
and yes, updated the SO_SNDBUF too
msg371492 - (view) Author: Kyle Stanley (aeros) * Date: 2020-06-14 06:45
> Last question before PR pls: given the fact that this test is to cover a fixed case when loop.sock_*() was hanging forever, should I keep the wait_for(..., timeout=10)?

For now, I think we can keep the asyncio.wait_for() w/ 10 the sec timeout. I'll have to run the tests locally though w/ the patch to double check (once the PR is opened), and it may still end up needing to be increased if the test fails again in the future. It should be okay though.
msg371495 - (view) Author: miss-islington (miss-islington) Date: 2020-06-14 07:44
New changeset 8f04a84755babe516ebb5304904ea7c15b865c80 by Fantix King in branch 'master':
bpo-30064: Fix slow asyncio sock test (GH-20868)
https://github.com/python/cpython/commit/8f04a84755babe516ebb5304904ea7c15b865c80
msg371498 - (view) Author: miss-islington (miss-islington) Date: 2020-06-14 08:08
New changeset 83cd968f5cc59ad60b665080087f50fefac17a07 by Miss Islington (bot) in branch '3.9':
bpo-30064: Fix slow asyncio sock test (GH-20868)
https://github.com/python/cpython/commit/83cd968f5cc59ad60b665080087f50fefac17a07
msg371499 - (view) Author: Kyle Stanley (aeros) * Date: 2020-06-14 08:13
Thanks again for working on this @Fantix, and for the continued vigilance on the issue after the test failures occurred in the buildbots. I think this recent PR-20868 will do the trick. At the very least, it will make the failure much more rare than before, which is definitely an improvement.
History Date User Action Args 2020-06-15 16:37:16vstinnersetnosy: - vstinner
2020-06-14 08:13:38aerossetstage: patch review -> commit review
messages: + msg371499
versions: + Python 3.9, Python 3.10, - Python 3.8 2020-06-14 08:08:29miss-islingtonsetmessages: + msg371498 2020-06-14 07:49:04miss-islingtonsetpull_requests: + pull_request20058 2020-06-14 07:44:10miss-islingtonsetmessages: + msg371495 2020-06-14 07:05:37fantixsetpull_requests: + pull_request20057 2020-06-14 06:45:04aerossetmessages: + msg371492 2020-06-14 06:41:45fantixsetmessages: + msg371491 2020-06-14 06:31:21fantixsetmessages: + msg371490 2020-06-14 06:20:47aerossetmessages: + msg371489 2020-06-14 04:06:58fantixsetmessages: + msg371485 2020-06-14 01:14:40aerossetmessages: + msg371479 2020-06-14 00:17:45aerossetmessages: + msg371478 2020-06-13 19:38:46fantixsetmessages: + msg371467 2020-06-10 20:40:42fantixsetmessages: + msg371247 2020-06-10 20:14:49vstinnersetmessages: + msg371240 2020-05-28 22:17:40miss-islingtonsetmessages: + msg370269 2020-05-28 21:56:57miss-islingtonsetpull_requests: + pull_request19749 2020-05-28 21:56:48yselivanovsetmessages: + msg370268 2020-05-28 16:11:10fantixsetmessages: + msg370234 2020-05-28 16:08:06fantixsetpull_requests: + pull_request19743 2020-05-28 14:29:04vstinnersetmessages: + msg370219 2020-05-28 14:28:23miss-islingtonsetmessages: + msg370218 2020-05-28 14:09:09miss-islingtonsetpull_requests: + pull_request19736 2020-05-28 14:08:57vstinnersetmessages: + msg370215 2020-05-28 13:48:52vstinnersetmessages: + msg370205 2020-05-28 13:47:27vstinnersetpull_requests: + pull_request19734 2020-05-28 02:21:40aerossetmessages: + msg370161 2020-05-28 00:50:44fantixsetmessages: + msg370159 2020-05-27 22:51:03fantixsetmessages: + msg370147 2020-05-27 22:17:25vstinnersetnosy: + vstinner
messages: + msg370143
2020-05-27 20:39:07miss-islingtonsetmessages: + msg370131 2020-05-27 19:47:47miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request19713
2020-05-27 19:47:37yselivanovsetmessages: + msg370123 2020-05-24 22:58:32fantixsetstage: patch review
pull_requests: + pull_request19633 2020-05-12 05:28:07corona10setnosy: + aeros
2020-05-12 05:17:25achimnolsetmessages: + msg368702 2020-05-12 05:13:01achimnolsetnosy: + achimnol
messages: + msg368701
2020-05-09 22:52:06fantixsetfiles: + cancelled-sock-recv-race.patch
nosy: + fantix
messages: + msg368553 2020-02-26 12:39:03asvetlovsetmessages: + msg362690 2020-02-25 23:38:54yselivanovsetmessages: + msg362671 2020-02-25 12:10:38lukasz.langasetpriority: deferred blocker -> high

messages: + msg362637

2019-12-09 14:59:33asvetlovsetmessages: + msg358108 2019-12-09 14:19:33lukasz.langasetnosy: + lukasz.langa
messages: + msg358098
2019-12-08 16:37:21asvetlovsetmessages: + msg358011 2019-12-07 20:48:28ned.deilysetnosy: + ned.deily
messages: + msg357989
2019-10-23 22:58:45yselivanovsetpriority: normal -> deferred blocker

messages: + msg355273

2019-10-23 22:42:58yselivanovsetstatus: closed -> open
resolution: fixed ->
messages: + msg355272

stage: resolved -> (no value)

2018-11-12 17:02:04asvetlovsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved 2018-11-12 17:00:25asvetlovsetmessages: + msg329746 2018-11-08 22:41:33asvetlovsetkeywords: + patch
stage: patch review
pull_requests: + pull_request9699 2018-10-09 08:57:06asvetlovsetassignee: asvetlov 2018-09-24 21:57:11yselivanovsetnosy: + asvetlov

messages: + msg326294
versions: + Python 3.8, - Python 3.5, Python 3.6

2017-04-13 08:55:09abacabadabacabacreate