Issue23801
Created on 2015-03-29 02:22 by dstufft, last changed 2022-04-11 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| reproduce.py | dstufft, 2015-03-29 02:22 | |||
| cgi-read-until-boundary.diff | dstufft, 2015-03-29 02:30 | review | ||
| Messages (11) | |||
|---|---|---|---|
| msg239471 - (view) | Author: Donald Stufft (dstufft) * | Date: 2015-03-29 02:22 | |
While working on PyPI 2.0 (which is currently running Python 3) I discovered that ``setup.py upload`` was causing an exception. After tracing things I determined that the reason for this is that Python 3 fails to handle leading whitespace in a multipart body. I've attached a minimum reproducer that runs without error on Python 2.6 and Python 2.7 which fails on Python 3.2, 3.3, and 3.4. If I go into the cgi.py module and add a print() statement that will print the header of each part, I get output that looks like: b'----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\r\nContent-Disposition: form-data; name="protcol_version"\r\n\r\n' b'Content-Disposition: form-data; name="summary"\r\n\r\n' b'Content-Disposition: form-data; name="home_page"\r\n\r\n' b'Content-Disposition: form-data; name="filetype"\r\n\r\n' b'Content-Disposition: form-data; name="content"; filename="jasmin-13.13.13.tar.gz"\r\n\r\n' The first line of that is obviously suspicious since it includes the inner boundary marker. Looking at the Python 3.x code it throws away the first line off the fp and then proceeds to process the rest of the fp. However in this case the first line is just a blank b'\r\n'. Looking at the Python 2.7 code it throws away an entire first part, not just the first line. I'm guessing that the "read first line and throw it away code" needs to continue reading lines until it's read enough lines to get to the boundary marker. |
|||
| msg239472 - (view) | Author: Donald Stufft (dstufft) * | Date: 2015-03-29 02:30 | |
Added a patch that fixes this issue by reading lines until we find the line that is our expected boundary marker. |
|||
| msg239478 - (view) | Author: Donald Stufft (dstufft) * | Date: 2015-03-29 07:50 | |
Added R David Murray to the nosy list because this is kinda similar to the email stuff and there doesn't seem to be anyone better to look at this patch that I can find... |
|||
| msg239480 - (view) | Author: Donald Stufft (dstufft) * | Date: 2015-03-29 07:52 | |
Also adding Berker Peksag because they've touched this module recently :) |
|||
| msg239490 - (view) | Author: Benjamin Peterson (benjamin.peterson) * | Date: 2015-03-29 14:27 | |
The loop might be more elegantly expressed as
for line in self.fp:
self.bytes_read += len(line)
if line.strip() != b"--" + self.innerboundary:
break
but otherwise lgtm
|
|||
| msg239497 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2015-03-29 16:42 | |
The code you are modifying looks completely wrong-headed to me. Unfortunately I don't have time to rewrite the cgi module right now :). Given that, your fix looks fine: the first part of a multipart (up to the first inner boundary) is the 'preamble', which can contain pretty much arbitrary content, and in the context of cgi should indeed be completely ignored. Benjamin's suggested modification has the if logic reversed. |
|||
| msg239516 - (view) | Author: Donald Stufft (dstufft) * | Date: 2015-03-29 20:31 | |
@Benjamin The reason I didn't do that to begin with, was the code currently checks if the first line is a bytes object or not in order to be able to raise an error if it's returning str instead of bytes. I didn't want to redo that check on every iteration, so I left the original part alone and then used the while loop to handle doing more if needed. Would you prefer the code written your way and either drop the bytes check or incur the cost of doing the type check on every line? |
|||
| msg239517 - (view) | Author: Benjamin Peterson (benjamin.peterson) * | Date: 2015-03-29 20:39 | |
Oh, I see. I think your way is fine then. |
|||
| msg239518 - (view) | Author: Roundup Robot (python-dev) | Date: 2015-03-29 20:43 | |
New changeset 3af77d1c5c47 by Donald Stufft in branch '3.4': Closes #23801 - Ignore entire preamble to multipart in cgi.FieldStorage https://hg.python.org/cpython/rev/3af77d1c5c47 |
|||
| msg239519 - (view) | Author: Roundup Robot (python-dev) | Date: 2015-03-29 20:45 | |
New changeset b9364903d91c by Benjamin Peterson in branch 'default': merge 3.4 (#23801) https://hg.python.org/cpython/rev/b9364903d91c |
|||
| msg239520 - (view) | Author: Donald Stufft (dstufft) * | Date: 2015-03-29 20:50 | |
Thanks everyone for taking a look at this! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:14 | admin | set | github: 67989 |
| 2015-03-29 20:51:22 | dstufft | set | stage: resolved |
| 2015-03-29 20:50:07 | dstufft | set | messages:
+ msg239520 stage: resolved -> (no value) |
| 2015-03-29 20:45:25 | python-dev | set | messages: + msg239519 |
| 2015-03-29 20:43:32 | python-dev | set | status: open -> closed nosy:
+ python-dev resolution: fixed |
| 2015-03-29 20:39:27 | benjamin.peterson | set | messages: + msg239517 |
| 2015-03-29 20:31:42 | dstufft | set | messages: + msg239516 |
| 2015-03-29 16:42:53 | r.david.murray | set | messages: + msg239497 |
| 2015-03-29 14:27:31 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg239490 |
| 2015-03-29 07:52:50 | dstufft | set | nosy:
+ berker.peksag messages: + msg239480 |
| 2015-03-29 07:50:10 | dstufft | set | nosy:
+ r.david.murray messages: + msg239478 |
| 2015-03-29 02:30:39 | dstufft | set | files:
+ cgi-read-until-boundary.diff keywords: + patch messages: + msg239472 |
| 2015-03-29 02:22:20 | dstufft | create | |