PyMemoryView_FromMemory() created a memoryview referring to the internal data of the string. When the string is destroyed the memoryview become referring to a freed memory.
There was a problem hiding this comment.
Some comments. I am concerned about the high memory usage incurred by the use of PyBytes_FromStringAndSize but off-course I agree that the priority is to fix the safety issue.
| @@ -2179,57 +2179,60 @@ def write(self, chunk): | |||
| def concatenate_chunks(self): | |||
| # Some chunks can be memoryview instances, we need to convert | |||
| # them to bytes to be able to call join | |||
There was a problem hiding this comment.
This comment should now be removed.
|
|
||
| # Actually read the binary content of the chunks after the end | ||
| # of the call to dump: ant memoryview passed to write should not | ||
| # of the call to dump: and memoryview passed to write should not |
There was a problem hiding this comment.
I think this should read "any memoryview" instead.
| self.assertGreaterEqual(9, chunk_size) | ||
| self.assertLess(chunk_size, 2 * self.FRAME_SIZE_TARGET, | ||
| chunk_sizes) | ||
| # There shouldn't bee too much small chunks. |
There was a problem hiding this comment.
Maybe add a comment: the protocol header, the frame headers and the large string headers are written in small chunks.
| if (payload == NULL) { | ||
| payload = mem = PyMemoryView_FromMemory((char *) data, data_size, | ||
| PyBUF_READ); | ||
| payload = mem = PyBytes_FromStringAndSize(data, data_size); |
There was a problem hiding this comment.
This forces a memory copy. Wouldn't it be possible to create a read-only memoryview that keeps a reference to the original ascii str object to avoid the large memory overhead?
There was a problem hiding this comment.
This is not easy. Currently there is no API for creating a memoryview to the area of memory with linked Python object. Designing such API is a separate issue. There should be other issues on the tracker that needs this API. For example bytes IO seems suffers from this issue.
Here I just try to fix a regression introduced by #4353.
There was a problem hiding this comment.
Thank you for your review @ogrisel.
| if (payload == NULL) { | ||
| payload = mem = PyMemoryView_FromMemory((char *) data, data_size, | ||
| PyBUF_READ); | ||
| payload = mem = PyBytes_FromStringAndSize(data, data_size); |
There was a problem hiding this comment.
This is not easy. Currently there is no API for creating a memoryview to the area of memory with linked Python object. Designing such API is a separate issue. There should be other issues on the tracker that needs this API. For example bytes IO seems suffers from this issue.
Here I just try to fix a regression introduced by #4353.
|
This looks fine to me. Does the previous NEWS file need updating? |
| The picklers no longer allocate temporary memory when dumping large | ||
| ``bytes`` and ``str`` objects into a file object. Instead the data is | ||
| directly streamed into the underlying file object. | ||
| The pickler now uses less memory when serialize large bytes and str |
PyMemoryView_FromMemory()created a memoryview referring to the internal data of the string. When the string is destroyed the memoryview become referring to a freed memory.https://bugs.python.org/issue31993