Issue25047
Created on 2015-09-09 19:43 by zimeon, last changed 2015-09-23 02:14 by martin.panter. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| etree-encoding.patch | martin.panter, 2015-09-18 02:39 | review | ||
| Messages (7) | |||
|---|---|---|---|
| msg250328 - (view) | Author: Simeon Warner (zimeon) | Date: 2015-09-09 19:43 | |
Seems that in python3 the XML encoding declaration from xml.etree.ElementTree has changed from 2.x in that it is now lowercased, e.g. 'utf-8'. While the XML spec [1] says that decoders _SHOULD_ understand this, the encoding string _SHOULD_ be 'UTF-8'. It seems that keeping to the standard in the vein of being strictly conformant in encoding, lax in decoding will give maximum compatibility.
It also seems like an unhelpful change for 2.x to 3.x migration though that is perhaps a minor issue (but how I noticed it).
Can show with:
>cat a.py
from xml.etree.ElementTree import ElementTree, Element
import os, sys
print(sys.version_info)
if sys.version_info > (3, 0):
fp = os.fdopen(sys.stdout.fileno(), 'wb')
else:
fp = sys.stdout
root = Element('hello',{'beer':'good'})
ElementTree(root).write(fp, encoding='UTF-8', xml_declaration=True)
fp.write(b"\n")
>python a.py
sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0)
<?xml version='1.0' encoding='UTF-8'?>
<hello beer="good" />
>python3 a.py
sys.version_info(major=3, minor=4, micro=2, releaselevel='final', serial=0)
<?xml version='1.0' encoding='utf-8'?>
<hello beer="good" />
Cheers,
Simeon
[1] <http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncName> "In an encoding declaration, the values "UTF-8", "UTF-16", ... should be used for the various encodings and transformations of Unicode" and then later "XML processors should match character encoding names in a case-insensitive way".
|
|||
| msg250345 - (view) | Author: Martin Panter (martin.panter) * | Date: 2015-09-10 02:14 | |
I agree that Python should not be converting the supplied encoding name to lowercase, although I guess reverting this has the potential to upset people’s output (e.g. if they depend on the checksum or something). |
|||
| msg250930 - (view) | Author: Martin Panter (martin.panter) * | Date: 2015-09-18 02:39 | |
Here is a patch which changes the code to respect the letter case specified by the user, although it still compares the special strings "unicode", "us-ascii", and "utf-8" case-insensitively, and the default encoding is still lowercase. Let me know what you think.
>>> tree = ElementTree(Element('hello', {'beer': 'good'}))
>>> tree.write(stdout.buffer, encoding="UTF-8", xml_declaration=True); print()
<?xml version='1.0' encoding='UTF-8'?>
<hello beer="good" />
>>> tree.write(stdout.buffer, encoding="UTF-8"); print()
<hello beer="good" />
>>> tree.write(stdout.buffer, xml_declaration=True); print()
<?xml version='1.0' encoding='us-ascii'?>
<hello beer="good" />
|
|||
| msg251211 - (view) | Author: Stefan Behnel (scoder) * | Date: 2015-09-21 07:14 | |
LGTM |
|||
| msg251223 - (view) | Author: Simeon Warner (zimeon) | Date: 2015-09-21 13:00 | |
Path looks fine and seems to work as expected -- Simeon |
|||
| msg251224 - (view) | Author: Simeon Warner (zimeon) | Date: 2015-09-21 13:00 | |
s/Path/Patch/ |
|||
| msg251392 - (view) | Author: Roundup Robot (python-dev) | Date: 2015-09-23 02:08 | |
New changeset ff7aba08ada6 by Martin Panter in branch '3.4': Issue #25047: Respect case writing XML encoding declarations https://hg.python.org/cpython/rev/ff7aba08ada6 New changeset 9c248233754c by Martin Panter in branch '3.5': Issue #25047: Merge Element Tree encoding from 3.4 into 3.5 https://hg.python.org/cpython/rev/9c248233754c New changeset 409bab2181d3 by Martin Panter in branch 'default': Issue #25047: Merge Element Tree encoding from 3.5 https://hg.python.org/cpython/rev/409bab2181d3 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2015-09-23 02:14:46 | martin.panter | set | status: open -> closed resolution: fixed stage: commit review -> resolved |
| 2015-09-23 02:08:32 | python-dev | set | nosy:
+ python-dev messages: + msg251392 |
| 2015-09-23 02:07:47 | martin.panter | set | assignee: martin.panter nosy:
+ berker.peksag |
| 2015-09-21 13:00:35 | zimeon | set | messages: + msg251224 |
| 2015-09-21 13:00:15 | zimeon | set | messages: + msg251223 |
| 2015-09-21 12:16:41 | Arfrever | set | nosy:
+ Arfrever |
| 2015-09-21 07:14:49 | scoder | set | nosy:
+ scoder messages: + msg251211 |
| 2015-09-18 02:39:41 | martin.panter | set | files:
+ etree-encoding.patch keywords: + patch messages: + msg250930 stage: needs patch -> patch review |
| 2015-09-11 02:26:28 | martin.panter | set | stage: needs patch versions: + Python 3.5, Python 3.6 |
| 2015-09-10 02:14:02 | martin.panter | set | nosy:
+ martin.panter messages: + msg250345 |
| 2015-09-09 19:43:38 | zimeon | create | |