Issue34424
Created on 2018-08-17 23:02 by _savage, last changed 2019-05-14 14:36 by r.david.murray. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 8803 | merged | python-dev, 2018-08-18 04:56 | |
| PR 13302 | merged | miss-islington, 2019-05-14 01:45 | |
| Messages (13) | |||
|---|---|---|---|
| msg323686 - (view) | Author: Jens Troeger (_savage) * | Date: 2018-08-17 23:02 | |
See also this comment and ensuing conversation: https://bugs.python.org/issue24218?#msg322761 Consider an email message with the following: message = EmailMessage() message["From"] = Address(addr_spec="bar@foo.com", display_name="Jens Troeger") message["To"] = Address(addr_spec="foo@bar.com", display_name="Martín Córdoba") It’s important here that the email itself is `ascii` encodable, but the names are not. Flattening the object (https://github.com/python/cpython/blob/master/Lib/smtplib.py#L964) incorrectly inserts multiple linefeeds, thus breaking the email header, thus mangling the entire email: flatmsg: b'From: Jens Troeger <bar@foo.com>\r\nTo: Fernando =?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= <foo@bar.com>\r\r\r\r\r\nSubject:\r\n Confirmation: …\r\n…' After an initial investigation into the BytesGenerator (used to flatten an EmailMessage object), here is what’s happening. Flattening the body and attachments of the EmailMessage object works, and eventually _write_headers() is called to flatten the headers which happens entry by entry (https://github.com/python/cpython/blob/master/Lib/email/generator.py#L417-L418). Flattening a header entry is a recursive process over the parse tree of the entry, which builds the flattened and encoded final string by descending into the parse tree and encoding & concatenating the individual “parts” (tokens of the header entry). Given the parse tree for a header entry like "Martín Córdoba <foo@bar.com>" eventually results in the correct flattened string: '=?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= <foo@bar.com>\r\n' at the bottom of the recursion for this “Mailbox” part. The recursive callstack is then: _refold_parse_tree _header_value_parser.py:2687 fold [Mailbox] _header_value_parser.py:144 _refold_parse_tree _header_value_parser.py:2630 fold [Address] _header_value_parser.py:144 _refold_parse_tree _header_value_parser.py:2630 fold [AddressList] _header_value_parser.py:144 _refold_parse_tree _header_value_parser.py:2630 fold [Header] _header_value_parser.py:144 fold [_UniqueAddressHeader] headerregistry.py:258 _fold [EmailPolicy] policy.py:205 fold_binary [EmailPolicy] policy.py:199 _write_headers [BytesGenerator] generator.py:418 _write [BytesGenerator] generator.py:195 The problem now arises from the interplay of # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2629 encoded_part = part.fold(policy=policy)[:-1] # strip nl which strips the '\n' from the returned string, and # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2686 return policy.linesep.join(lines) + policy.linesep which adds the policy’s line separation string linesep="\r\n" to the end of the flattened string upon unrolling the recursion. I am not sure about a proper fix here, but considering that the linesep policy can be any string length (in this case len("\r\n") == 2) a fixed truncation of one character [:-1] seems wrong. Instead, using: encoded_part = part.fold(policy=policy)[:-len(policy.linesep)] # strip nl seems to work for entries with and without Unicode characters in their display names. |
|||
| msg323691 - (view) | Author: Jens Troeger (_savage) * | Date: 2018-08-18 05:08 | |
Pull request https://github.com/python/cpython/pull/8803/ |
|||
| msg328381 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2018-10-24 17:07 | |
I've requested some small changes on the PR. If Jens doesn't respond in another week or so someone else could pick it up. |
|||
| msg328382 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2018-10-24 17:10 | |
Michael, if you could check if Jens patch fixes your problem I would appreciate it. |
|||
| msg328425 - (view) | Author: Michael Thies (michael.thies) | Date: 2018-10-25 10:46 | |
Thanks for pointing me to this issue. :) > Michael, if you could check if Jens patch fixes your problem I would appreciate it. Jens PR does exactly, what I proposed in #35057, so it fixes my problem, indeed. |
|||
| msg334096 - (view) | Author: Jens Troeger (_savage) * | Date: 2019-01-20 18:22 | |
Can somebody please review and merge https://github.com/python/cpython/pull/8803 ? I am still waiting for this fix the become mainstream. |
|||
| msg342315 - (view) | Author: Cheryl Sabella (cheryl.sabella) * | Date: 2019-05-13 12:41 | |
This looks like it was just pending final approval after changes. Please let me know if I can help move it along. Thanks! |
|||
| msg342388 - (view) | Author: Jens Troeger (_savage) * | Date: 2019-05-13 21:25 | |
Cheryl, if you can find somebody to approve and merge this fix, that would be greatly appreciated! Anything I can do, please let me know. |
|||
| msg342409 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2019-05-14 01:07 | |
New changeset 45b2f8893c1b7ab3b3981a966f82e42beea82106 by R. David Murray (Jens Troeger) in branch 'master': bpo-34424: Handle different policy.linesep lengths correctly. (#8803) https://github.com/python/cpython/commit/45b2f8893c1b7ab3b3981a966f82e42beea82106 |
|||
| msg342411 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2019-05-14 01:13 | |
Approved and merged. Cheryl, can you shepherd this through the backport process, please? I'm contributing infrequently enough that I'm not even sure which version we are on :) |
|||
| msg342416 - (view) | Author: Cheryl Sabella (cheryl.sabella) * | Date: 2019-05-14 01:54 | |
David, Thank you for the review and merging into master! I'd be glad to backport this. I believe the only backport would be to 3.7 (master is currently 3.8), but please let me know if it would need to go into a security branch. |
|||
| msg342418 - (view) | Author: miss-islington (miss-islington) | Date: 2019-05-14 02:02 | |
New changeset c0abd0c8e9b62bc008fc74cfbfbbab36ef7d7617 by Miss Islington (bot) in branch '3.7': bpo-34424: Handle different policy.linesep lengths correctly. (GH-8803) https://github.com/python/cpython/commit/c0abd0c8e9b62bc008fc74cfbfbbab36ef7d7617 |
|||
| msg342469 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2019-05-14 14:36 | |
Thank you. I don't believe this is a security issue. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2019-07-13 14:36:15 | SilentGhost | link | issue37572 superseder |
| 2019-05-14 14:36:45 | r.david.murray | set | messages: + msg342469 |
| 2019-05-14 02:04:55 | cheryl.sabella | set | status: open -> closed stage: patch review -> resolved resolution: fixed versions: + Python 3.7 |
| 2019-05-14 02:02:50 | miss-islington | set | nosy:
+ miss-islington messages: + msg342418 |
| 2019-05-14 01:54:19 | cheryl.sabella | set | messages: + msg342416 |
| 2019-05-14 01:45:27 | miss-islington | set | pull_requests: + pull_request13213 |
| 2019-05-14 01:13:49 | r.david.murray | set | messages: + msg342411 |
| 2019-05-14 01:07:49 | r.david.murray | set | messages: + msg342409 |
| 2019-05-13 21:25:36 | _savage | set | messages: + msg342388 |
| 2019-05-13 12:41:35 | cheryl.sabella | set | nosy:
+ cheryl.sabella messages:
+ msg342315 |
| 2019-02-10 02:37:38 | Celelibi | set | nosy:
+ Celelibi |
| 2019-01-20 18:22:25 | _savage | set | messages: + msg334096 |
| 2018-10-25 10:46:11 | michael.thies | set | messages: + msg328425 |
| 2018-10-24 17:10:07 | r.david.murray | set | nosy:
+ michael.thies messages: + msg328382 |
| 2018-10-24 17:07:59 | r.david.murray | set | messages: + msg328381 |
| 2018-08-18 05:08:14 | _savage | set | messages: + msg323691 |
| 2018-08-18 04:56:44 | python-dev | set | keywords:
+ patch stage: patch review pull_requests: + pull_request8279 |
| 2018-08-17 23:02:02 | _savage | create | |