Issue36407
Created on 2019-03-23 15:38 by vsurjaninov, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 12514 | merged | vsurjaninov, 2019-03-23 16:00 | |
| PR 12578 | closed | miss-islington, 2019-03-27 06:19 | |
| Messages (5) | |||
|---|---|---|---|
| msg338681 - (view) | Author: Vladimir Surjaninov (vsurjaninov) * | Date: 2019-03-23 15:38 | |
If we are writing xml with CDATA section and leaving non-empty indentation and new-line parameters, a parent node of the section will contain useless indentation, that will be parsed as a text.
Example:
>>>doc = minidom.Document()
>>>root = doc.createElement('root')
>>>doc.appendChild(root)
>>>node = doc.createElement('node')
>>>root.appendChild(node)
>>>data = doc.createCDATASection('</data>')
>>>node.appendChild(data)
>>>print(doc.toprettyxml(indent=‘ ‘ * 4)
<?xml version="1.0" ?>
<root>
<node>
<![CDATA[</data>]]> </node>
</root>
If we try to parse this output doc, we won’t get CDATA value correctly.
Following code returns a string that contains only indentation characters:
>>>doc = minidom.parseString(xml_text)
>>>doc.getElementsByTagName('node')[0].firstChild.nodeValue
Returns a string with CDATA value and indentation characters:
>>>doc.getElementsByTagName('node')[0].firstChild.wholeText
But we have a workaround:
>>>data.nodeType = data.TEXT_NODE
…
>>>print(doc.toprettyxml(indent=‘ ‘ * 4)
<?xml version="1.0" ?>
<root>
<node><![CDATA[</data>]]></node>
</root>
It will be parsed correctly:
>>>doc.getElementsByTagName('node')[0].firstChild.nodeValue
</data>
But I think it will be better if we fix the writing function, which would set this as default behavior.
|
|||
| msg338701 - (view) | Author: Stefan Behnel (scoder) * | Date: 2019-03-23 21:33 | |
Yes, this case is incorrect. Pretty printing should not change character content inside of a simple tag. The PR looks good to me. |
|||
| msg338936 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2019-03-27 05:59 | |
New changeset 384b81d923addd52125e94470b11d2574ca266a9 by Serhiy Storchaka (Vladimir Surjaninov) in branch 'master': bpo-36407: Fix writing indentations of CDATA section (xml.dom.minidom). (GH-12514) https://github.com/python/cpython/commit/384b81d923addd52125e94470b11d2574ca266a9 |
|||
| msg338939 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2019-03-27 06:19 | |
Should we backport this change? I am not sure. |
|||
| msg338943 - (view) | Author: Stefan Behnel (scoder) * | Date: 2019-03-27 07:04 | |
I don't think this should be backported. Pretty-printing is not a production relevant feature, more of a "debugging, diffing and help users see what they get" kind of feature. It's good to have it fixed for the future, but we shouldn't bother users with it during a point release. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:12 | admin | set | github: 80588 |
| 2019-03-27 12:08:27 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2019-03-27 07:04:43 | scoder | set | messages: + msg338943 |
| 2019-03-27 06:19:42 | serhiy.storchaka | set | messages: + msg338939 |
| 2019-03-27 06:19:22 | miss-islington | set | pull_requests: + pull_request12522 |
| 2019-03-27 05:59:02 | serhiy.storchaka | set | messages: + msg338936 |
| 2019-03-23 21:33:28 | scoder | set | messages:
+ msg338701 versions: + Python 3.8 |
| 2019-03-23 16:00:14 | vsurjaninov | set | keywords:
+ patch stage: patch review pull_requests: + pull_request12465 |
| 2019-03-23 15:40:39 | xtreak | set | nosy:
+ scoder, eli.bendersky, serhiy.storchaka |
| 2019-03-23 15:38:49 | vsurjaninov | create | |