Issue38449
Created on 2019-10-11 14:06 by Yaroslav.Halchenko, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 16724 | merged | maxking, 2019-10-12 00:42 | |
| PR 16725 | closed | miss-islington, 2019-10-12 05:41 | |
| PR 16727 | merged | maxking, 2019-10-12 15:56 | |
| PR 16728 | merged | maxking, 2019-10-12 16:04 | |
| PR 16729 | merged | corona10, 2019-10-12 17:04 | |
| PR 17431 | merged | miss-islington, 2019-12-01 23:06 | |
| PR 17432 | merged | miss-islington, 2019-12-01 23:07 | |
| Messages (24) | |||
|---|---|---|---|
| msg354455 - (view) | Author: Yaroslav Halchenko (Yaroslav.Halchenko) | Date: 2019-10-11 14:06 | |
Our tests in DataLad started to fail while building on Debian with Python 3.7.5rc1 whenever they passed just fine previously with 3.7.3rc1. Analysis boiled down to mimetypes
$> ./python3.9 -c 'import mimetypes; mimedb = mimetypes.MimeTypes(strict=False); print(mimedb.guess_type(";1.tar.gz"))'
(None, None)
$> ./python3.9 -c 'import mimetypes; mimedb = mimetypes.MimeTypes(strict=False); print(mimedb.guess_type("1.tar.gz"))'
('application/x-tar', 'gzip')
$> git describe
v3.8.0b1-1174-g2b7dc40b2af
Ref:
- original issue in DataLad: https://github.com/datalad/datalad/issues/3769
|
|||
| msg354456 - (view) | Author: Yaroslav Halchenko (Yaroslav.Halchenko) | Date: 2019-10-11 14:08 | |
FWIW, our more complete test filename is # python3 -c 'import patoolib.util as ut; print(ut.guess_mime(r" \"\`;b&b&c |.tar.gz"))' (None, None) which works fine with older versions |
|||
| msg354465 - (view) | Author: Kyle Meyer (kyleam) * | Date: 2019-10-11 16:13 | |
I've performed a bisect the issue with the following script:
#!/bin/sh
make -j3 || exit 125
./python <<\EOF || exit 1
import sys
import mimetypes
res = mimetypes.MimeTypes(strict=False).guess_type(";1.tar.gz")
if res[0] is None:
sys.exit(1)
EOF
That points to 87bd2071c7 (bpo-22347: Update mimetypes.guess_type to allow proper parsing of URLs (GH-15522), 2019-09-05). That commit was included in 3.7.5rc1 when it was cherry picked by 8873bff287.
|
|||
| msg354470 - (view) | Author: Ned Deily (ned.deily) * | Date: 2019-10-11 17:17 | |
Marking as regression release blocker for 3.7.5 final and 3.8.0 final. |
|||
| msg354488 - (view) | Author: Abhilash Raj (maxking) * | Date: 2019-10-11 20:11 | |
I am looking into the issue. |
|||
| msg354500 - (view) | Author: Abhilash Raj (maxking) * | Date: 2019-10-11 21:26 | |
The bug is interesting due to some of the implementation details of "guess_type". The documentation says that it can parse either a URL or a filename.
Switching from urllib.parse._splittype to urllib.parse.urlparse changed what a valid "path" is. _splittype doesn't care about the rest of the URL except the scheme, but urlparse does. Previously, we used to split things like:
>>> print(urllib.parse._splittype(';1.tar.gz')
(None, ';1.tar.gz')
Then, we'd just treat the 2nd part as a filesystem path, which would rightfully guess the extension as .tar.gz
However, switching to using parsing via urllib.parse.urlparse, we get:
>>> print(urllib.parse.urlparse(';1.tar.gz')
ParseResult(scheme='', netloc='', path='', params='1.tar.gz', query='', fragment='')
And then we get the ".path" attribute for further processing, which being empty, returns (None, None).
The format of all these parts is:
scheme://netloc/path;parameters?query#fragment
A simple fix would be to just merge path, parameters, query and fragment together (with appropriate delimiters) and the proceed with further processing. That would fix parsing of Filesystem paths but would break (again) parsing of URLs like:
>>> mimetypes.guess_type('http://example.com/index.html;1.tar.gz')
('application/x-tar', 'gzip')
It should return 'text/html' as the type, since this is a URL and everything after the ';' should not be used to determine the mimetype. But, if there is no scheme provided, we should treat it as a filesystem path and in that case 'application/x-tar' is the right type.
I hope I am not confusing everyone here.
The right fix IMO would be to make "guess_type" not treat URLs and filesytem paths alike.
|
|||
| msg354505 - (view) | Author: Ned Deily (ned.deily) * | Date: 2019-10-11 21:38 | |
Thanks for looking into this, @maxking. With both 3.8.0 final and 3.7.5 final scheduled for just a few days away, I wonder if the best thing to do at this point is to revert them and work on a more robust fix targeted for the next maintenance releases since the original issue was not identified as being a security issue or otherwise critical. |
|||
| msg354513 - (view) | Author: Abhilash Raj (maxking) * | Date: 2019-10-12 00:38 | |
Yeah, I agree. I'll submit a PR for reverting the commits. |
|||
| msg354522 - (view) | Author: miss-islington (miss-islington) | Date: 2019-10-12 05:41 | |
New changeset 19a3d873005e5730eeabdc394c961e93f2ec02f0 by Miss Islington (bot) (Abhilash Raj) in branch 'master': bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs (GH-15522)" (GH-16724) https://github.com/python/cpython/commit/19a3d873005e5730eeabdc394c961e93f2ec02f0 |
|||
| msg354523 - (view) | Author: Dong-hee Na (corona10) * | Date: 2019-10-12 06:04 | |
I'd like to suggest add unit test for the report case. So that we can detect future regression issue :) |
|||
| msg354524 - (view) | Author: Dong-hee Na (corona10) * | Date: 2019-10-12 06:13 | |
And I aplogize for my patch which makes regrssion issue. |
|||
| msg354536 - (view) | Author: Abhilash Raj (maxking) * | Date: 2019-10-12 16:42 | |
corona10: That's okay, it happens. I missed it too. There was really no way to foresee all the use cases, which is why we have beta and rc period to catch bugs. Yes, we should add a test case definitely, do you want to work on a PR? |
|||
| msg354537 - (view) | Author: Dong-hee Na (corona10) * | Date: 2019-10-12 16:49 | |
> Yes, we should add a test case definitely, do you want to work on a PR? Sure, I want to finalize this issue :) |
|||
| msg354539 - (view) | Author: Abhilash Raj (maxking) * | Date: 2019-10-12 16:58 | |
New changeset 5a638a805503131f4a9cc2bbc5944611295c1500 by Abhilash Raj in branch '3.8': [3.8] bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs" (GH-16724) (GH-16728) https://github.com/python/cpython/commit/5a638a805503131f4a9cc2bbc5944611295c1500 |
|||
| msg354544 - (view) | Author: miss-islington (miss-islington) | Date: 2019-10-12 18:50 | |
New changeset 164bee296ab1f87cc05566b39ee8fb9fb64b3e5a by Miss Islington (bot) (Abhilash Raj) in branch '3.7': [3.7] bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs (GH-15685)" (GH-16724) (GH-16727) https://github.com/python/cpython/commit/164bee296ab1f87cc05566b39ee8fb9fb64b3e5a |
|||
| msg354545 - (view) | Author: Ned Deily (ned.deily) * | Date: 2019-10-12 18:52 | |
Thanks everyone for the quick action on this! |
|||
| msg354546 - (view) | Author: Ned Deily (ned.deily) * | Date: 2019-10-12 18:52 | |
(On second thought, I'll leave this open as a release blocker until we've cherry-picked the fixes for 3.8.0 final and 3.7.5 final.) |
|||
| msg354665 - (view) | Author: Ćukasz Langa (lukasz.langa) * | Date: 2019-10-14 21:45 | |
(3.8.0 is released with this fix) |
|||
| msg354696 - (view) | Author: Ned Deily (ned.deily) * | Date: 2019-10-15 07:30 | |
New changeset 2a405598bbccbc42710dc5ecf3d44c8de4c16582 by Ned Deily (Abhilash Raj) in branch '3.7': [3.7] bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs (GH-15685)" (GH-16724) (GH-16727) https://github.com/python/cpython/commit/2a405598bbccbc42710dc5ecf3d44c8de4c16582 |
|||
| msg354705 - (view) | Author: Ned Deily (ned.deily) * | Date: 2019-10-15 07:45 | |
(fix also released in 3.7.5) |
|||
| msg357694 - (view) | Author: Abhilash Raj (maxking) * | Date: 2019-12-01 23:06 | |
New changeset 2fe4c48917c2d1b40cf063c6ed22ae2e71f4cb62 by Abhilash Raj (Dong-hee Na) in branch 'master': bpo-38449: Add URL delimiters test cases (#16729) https://github.com/python/cpython/commit/2fe4c48917c2d1b40cf063c6ed22ae2e71f4cb62 |
|||
| msg357695 - (view) | Author: miss-islington (miss-islington) | Date: 2019-12-01 23:23 | |
New changeset 926eabb6b46106e677d5e1ea25b7bab918da4110 by Miss Islington (bot) in branch '3.7': bpo-38449: Add URL delimiters test cases (GH-16729) https://github.com/python/cpython/commit/926eabb6b46106e677d5e1ea25b7bab918da4110 |
|||
| msg357696 - (view) | Author: miss-islington (miss-islington) | Date: 2019-12-01 23:24 | |
New changeset 4f1eaf028058cc357030dfaa5e611c90662539f0 by Miss Islington (bot) in branch '3.8': bpo-38449: Add URL delimiters test cases (GH-16729) https://github.com/python/cpython/commit/4f1eaf028058cc357030dfaa5e611c90662539f0 |
|||
| msg358302 - (view) | Author: Dong-hee Na (corona10) * | Date: 2019-12-12 15:24 | |
@ned.deily @maxking I close this issue since all PRs were merged. Thanks, everyone for actions for this issue :) Have a warm and happy holiday and a hopeful new year. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:21 | admin | set | github: 82630 |
| 2019-12-12 15:24:35 | corona10 | set | status: open -> closed resolution: fixed messages: + msg358302 stage: patch review -> resolved |
| 2019-12-01 23:24:21 | miss-islington | set | messages: + msg357696 |
| 2019-12-01 23:23:36 | miss-islington | set | messages: + msg357695 |
| 2019-12-01 23:07:11 | miss-islington | set | pull_requests: + pull_request16911 |
| 2019-12-01 23:06:42 | miss-islington | set | pull_requests: + pull_request16910 |
| 2019-12-01 23:06:39 | maxking | set | messages: + msg357694 |
| 2019-10-15 07:45:01 | ned.deily | set | messages: + msg354705 |
| 2019-10-15 07:30:25 | ned.deily | set | messages: + msg354696 |
| 2019-10-14 21:45:09 | lukasz.langa | set | priority: release blocker -> normal messages: + msg354665 |
| 2019-10-14 12:43:44 | vstinner | set | nosy:
- vstinner |
| 2019-10-12 18:52:59 | ned.deily | set | priority: normal -> release blocker messages: + msg354546 |
| 2019-10-12 18:52:00 | ned.deily | set | priority: release blocker -> normal messages: + msg354545 |
| 2019-10-12 18:50:07 | miss-islington | set | messages: + msg354544 |
| 2019-10-12 17:04:04 | corona10 | set | pull_requests: + pull_request16311 |
| 2019-10-12 16:58:15 | maxking | set | messages: + msg354539 |
| 2019-10-12 16:49:08 | corona10 | set | messages: + msg354537 |
| 2019-10-12 16:42:59 | maxking | set | messages: + msg354536 |
| 2019-10-12 16:04:59 | maxking | set | pull_requests: + pull_request16309 |
| 2019-10-12 15:56:15 | maxking | set | pull_requests: + pull_request16308 |
| 2019-10-12 06:13:56 | corona10 | set | messages: + msg354524 |
| 2019-10-12 06:04:23 | corona10 | set | nosy:
+ corona10 messages: + msg354523 |
| 2019-10-12 05:41:53 | miss-islington | set | pull_requests: + pull_request16304 |
| 2019-10-12 05:41:50 | miss-islington | set | nosy:
+ miss-islington messages: + msg354522 |
| 2019-10-12 00:42:58 | maxking | set | keywords:
+ patch stage: patch review pull_requests: + pull_request16302 |
| 2019-10-12 00:38:51 | maxking | set | messages: + msg354513 |
| 2019-10-11 21:38:52 | ned.deily | set | messages: + msg354505 |
| 2019-10-11 21:26:30 | maxking | set | messages: + msg354500 |
| 2019-10-11 20:11:09 | maxking | set | nosy:
+ maxking messages: + msg354488 |
| 2019-10-11 17:17:31 | ned.deily | set | priority: normal -> release blocker nosy:
+ ned.deily, martin.panter, lukasz.langa, vstinner keywords: + 3.7regression |
| 2019-10-11 16:13:34 | kyleam | set | nosy:
+ kyleam messages: + msg354465 |
| 2019-10-11 14:08:39 | Yaroslav.Halchenko | set | messages: + msg354456 |
| 2019-10-11 14:06:40 | Yaroslav.Halchenko | create | |