[proxy] github.com← back | site home | direct (HTTPS) ↗ | proxy home | ◑ dark◐ light
/ cpython Public

Conversation

Copy link
Member

serhiy-storchaka commented Jan 11, 2018

PyMemoryView_FromMemory() created a memoryview referring to the internal data of the string. When the string is destroyed the memoryview become referring to a freed memory.

https://bugs.python.org/issue31993

PyMemoryView_FromMemory() created a memoryview referring to
the internal data of the string.  When the string is destroyed
the memoryview become referring to a freed memory.
Copy link
Contributor

ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. I am concerned about the high memory usage incurred by the use of PyBytes_FromStringAndSize but off-course I agree that the priority is to fix the safety issue.

@@ -2179,57 +2179,60 @@ def write(self, chunk):
def concatenate_chunks(self):
# Some chunks can be memoryview instances, we need to convert
# them to bytes to be able to call join
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should now be removed.


# Actually read the binary content of the chunks after the end
# of the call to dump: ant memoryview passed to write should not
# of the call to dump: and memoryview passed to write should not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should read "any memoryview" instead.

self.assertGreaterEqual(9, chunk_size)
self.assertLess(chunk_size, 2 * self.FRAME_SIZE_TARGET,
chunk_sizes)
# There shouldn't bee too much small chunks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment: the protocol header, the frame headers and the large string headers are written in small chunks.

if (payload == NULL) {
payload = mem = PyMemoryView_FromMemory((char *) data, data_size,
PyBUF_READ);
payload = mem = PyBytes_FromStringAndSize(data, data_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This forces a memory copy. Wouldn't it be possible to create a read-only memoryview that keeps a reference to the original ascii str object to avoid the large memory overhead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not easy. Currently there is no API for creating a memoryview to the area of memory with linked Python object. Designing such API is a separate issue. There should be other issues on the tracker that needs this API. For example bytes IO seems suffers from this issue.

Here I just try to fix a regression introduced by #4353.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

Copy link
Member Author

serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review @ogrisel.

if (payload == NULL) {
payload = mem = PyMemoryView_FromMemory((char *) data, data_size,
PyBUF_READ);
payload = mem = PyBytes_FromStringAndSize(data, data_size);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not easy. Currently there is no API for creating a memoryview to the area of memory with linked Python object. Designing such API is a separate issue. There should be other issues on the tracker that needs this API. For example bytes IO seems suffers from this issue.

Here I just try to fix a regression introduced by #4353.

Copy link
Member

pitrou commented Jan 11, 2018

This looks fine to me. Does the previous NEWS file need updating?

The picklers no longer allocate temporary memory when dumping large
``bytes`` and ``str`` objects into a file object. Instead the data is
directly streamed into the underlying file object.
The pickler now uses less memory when serialize large bytes and str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when serializing

serhiy-storchaka merged commit 5b76bdb into python:master Jan 12, 2018
serhiy-storchaka deleted the pickle-large-str-memoryview branch January 12, 2018 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip news type-bug An unexpected behavior, bug, or error

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants