Issue35502
Created on 2018-12-14 19:59 by jess.j, last changed 2019-04-27 18:02 by scoder. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| run2.py | vstinner, 2018-12-14 20:11 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 11170 | merged | serhiy.storchaka, 2018-12-14 21:59 | |
| PR 11169 | closed | vstinner, 2018-12-14 22:10 | |
| PR 11217 | merged | miss-islington, 2018-12-18 20:29 | |
| Messages (10) | |||
|---|---|---|---|
| msg331861 - (view) | Author: Jess Johnson (jess.j) | Date: 2018-12-14 19:59 | |
When given xml that that would raise a ParseError, but parsing is stopped before the ParseError is raised, xml.etree.ElementTree.iterparse leaks memory.
Example:
import gc
from io import StringIO
import xml.etree.ElementTree as etree
import objgraph
def parse_xml():
xml = """
<LEVEL1>
</LEVEL1>
</ROOT>
"""
parser = etree.iterparse(StringIO(initial_value=xml))
for _, elem in parser:
if elem.tag == 'LEVEL1':
break
def run():
parse_xml()
gc.collect()
uncollected_elems = objgraph.by_type('Element')
print(uncollected_elems)
objgraph.show_backrefs(uncollected_elems, max_depth=15)
if __name__ == "__main__":
run()
Output:
[<Element 'LEVEL1' at 0x10df712c8>]
Also see this gist which has an image showing the objects that are retained in memory: https://gist.github.com/grokcode/f89d5c5f1831c6bc373be6494f843de3
|
|||
| msg331863 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-14 20:11 | |
I wrote attached run.py which confirms a leak using tracemalloc: $ python3 run.py 1 calls: 15.3B / call (total: 15.3 kB) 100 calls: 15.3B / call (total: 1527.7 kB) 1000 calls: 15.3B / call (total: 15265.0 kB) |
|||
| msg331864 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-14 20:11 | |
Oops, there was a typo, you should read kB: 1 calls: 15.3 kB / call (total: 15.3 kB) 100 calls: 15.3 kB / call (total: 1527.7 kB) 1000 calls: 15.3 kB / call (total: 15265.0 kB) |
|||
| msg331875 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2018-12-14 22:06 | |
The problem was with detecting a reference cycle containing a TreeBuilder. |
|||
| msg332051 - (view) | Author: STINNER Victor (vstinner) * | Date: 2018-12-18 11:20 | |
Oops, my PR 11169 used the wrong issue number: bpo-35257 instead of bpo-35502. Anyway, I closed it, the change is too complex. -- IMHO the root issue is the handling of the SyntaxError exception in XMLPullParser.feed(). I wrote a fix, but I don't have the bandwidth to write an unit test checking that the reference cycle is broken. commit 9f3354d36a89d7898bdb631e5119cc37e9a74840 (fix_etree_leak) Author: Victor Stinner <vstinner@redhat.com> Date: Fri Dec 14 22:03:16 2018 +0100 bpo-35257: Fix memory leak in XMLPullParser.feed() Fix memory leak in XMLPullParser.feed() of xml.etree: on syntax error, clear the traceback to break a reference cycle. diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index c1cf483cf5..f17c52541b 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1266,6 +1266,8 @@ class XMLPullParser: try: self._parser.feed(data) except SyntaxError as exc: + # bpo-35502: Break reference cycle + #exc.__traceback__ = None self._events_queue.append(exc) def _close_and_return_root(self): I don't see any behavior difference in XMLPullParser.read_events() which raise again the exception: events = self._events_queue while events: event = events.popleft() if isinstance(event, Exception): raise event else: yield event -- PR 11170 is also a nice enhancement (fix treebuilder_gc_traverse()), but maybe we should also prevent creating reference cycles in the first place? |
|||
| msg332052 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2018-12-18 11:53 | |
It is not easy to avoid reference cycles if use a generator function. And generator function is much faster than an implementation as a class with the __next__ method. We need to access the iterator object from the code of the generator function, and this creates a cycle. |
|||
| msg332078 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2018-12-18 20:29 | |
New changeset d2a75c67830d7c9f59e4e9b60f36974234c829ef by Serhiy Storchaka in branch 'master': bpo-35502: Fix reference leaks in ElementTree.TreeBuilder. (GH-11170) https://github.com/python/cpython/commit/d2a75c67830d7c9f59e4e9b60f36974234c829ef |
|||
| msg332082 - (view) | Author: miss-islington (miss-islington) | Date: 2018-12-18 21:40 | |
New changeset 60c919b58bd3cf8730947a00ddc6a527d6922ff1 by Miss Islington (bot) in branch '3.7': bpo-35502: Fix reference leaks in ElementTree.TreeBuilder. (GH-11170) https://github.com/python/cpython/commit/60c919b58bd3cf8730947a00ddc6a527d6922ff1 |
|||
| msg340991 - (view) | Author: Stefan Behnel (scoder) * | Date: 2019-04-27 15:39 | |
This ticket looks like it's done for 3.7/8. Can it be closed? I guess 3.6 isn't relevant anymore, right? |
|||
| msg341005 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-04-27 18:00 | |
The 3.6 branch no longer accept bugfixes. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2019-04-27 18:02:59 | scoder | set | status: open -> closed stage: patch review -> resolved resolution: fixed versions: - Python 3.6 |
| 2019-04-27 18:00:05 | vstinner | set | messages: + msg341005 |
| 2019-04-27 15:39:05 | scoder | set | messages: + msg340991 |
| 2018-12-18 21:40:27 | miss-islington | set | nosy:
+ miss-islington messages: + msg332082 |
| 2018-12-18 20:29:31 | miss-islington | set | pull_requests: + pull_request10453 |
| 2018-12-18 20:29:16 | serhiy.storchaka | set | messages: + msg332078 |
| 2018-12-18 11:53:29 | serhiy.storchaka | set | messages: + msg332052 |
| 2018-12-18 11:20:42 | vstinner | set | messages: + msg332051 |
| 2018-12-15 00:15:44 | jess.j | set | versions: + Python 3.6 |
| 2018-12-14 22:10:01 | vstinner | set | pull_requests: + pull_request10408 |
| 2018-12-14 22:06:17 | serhiy.storchaka | set | nosy:
+ scoder, eli.bendersky messages:
+ msg331875 |
| 2018-12-14 21:59:01 | serhiy.storchaka | set | keywords:
+ patch stage: patch review pull_requests: + pull_request10407 |
| 2018-12-14 21:28:39 | serhiy.storchaka | set | assignee: serhiy.storchaka nosy: + serhiy.storchaka |
| 2018-12-14 20:11:56 | vstinner | set | files:
+ run2.py messages: + msg331864 |
| 2018-12-14 20:11:24 | vstinner | set | files: - run.py |
| 2018-12-14 20:11:04 | vstinner | set | files:
+ run.py nosy: + vstinner messages: + msg331863 |
| 2018-12-14 19:59:00 | jess.j | create | |