Created on 2014-10-24 17:00 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.
pprint() sorts the content of sets and dicts in order to get stable output which doesn't depend on iteration order of set or dict, which depend not only from values of elements, but also from set or dict history.
But in some cases the output is different for equal sets or dicts which differs only by their history.
>>> import pprint
>>> class A: # string 'A' < 'int'
... def __lt__(self, other): return False
... def __gt__(self, other): return self != other
... def __le__(self, other): return self == other
... def __ge__(self, other): return True
... def __eq__(self, other): return self is other
... def __ne__(self, other): return self is not other
... def __hash__(self): return 1 # == hash(1)
...
>>> a = A()
>>> sorted([1, a])
[1, <__main__.A object at 0xb700c64c>]
>>> sorted([a, 1])
[1, <__main__.A object at 0xb700c64c>]
>>> # set
>>> pprint.pprint({1, a})
{<__main__.A object at 0xb700c64c>, 1}
>>> pprint.pprint({a, 1})
{1, <__main__.A object at 0xb700c64c>}
>>> # dict
>>> pprint.pprint({1: 1, a: 1})
{1: 1, <__main__.A object at 0xb700c64c>: 1}
>>> pprint.pprint({a: 1, 1: 1})
{<__main__.A object at 0xb700c64c>: 1, 1: 1}
This is happen because _safe_key's __lt__() calls the __lt__() method of it's left argument, and doesn't use special methods of it's right argument. a.__lt__(1) is successful, but (1).__lt__(a) is failed.
I think that instead of `self.obj.__lt__(other.obj)` here should be `self.obj < other.obj`. Or may be call other.obj.__gt__(self.obj) if the result of self.obj.__lt__(other.obj) is NotImplemented.
_safe_key was introduced in issue3976.
Hmm... is it important?
Stability in output order from pprint is very useful in doctests (yes, some people write documentation that they test). I think fixing any output stability issues would be very worthwhile.
> Hmm... is it important? Not more than sorting pprint output at all. This looks low priority issue to me, but the fix looks pretty easy. Here is a patch. I hope Raymond will make a review, may be I missed some details.
And here is alternative patch if the first patch is not correct. It is more complicated and I suppose is less efficient in common case.
What if [some flavor of] pprint sorted items not by value, but by their repr() string? It's probably faster than any other algorithm, and guaranteed to produce consistent results. Or use this idea only for ambiguous cases?
Sorting by the repr sounds good, but if some dict keys or set members are strings containing single-quotes, the primary sort will be on the type of quote used for the repr, which would be surprising and significantly less useful.
> the primary sort will be on the type of quote used for the repr,
> which would be surprising and significantly less useful.
How about: repr(obj).strip("'\"") ?
Overall, the idea of using repr() in some fashion is appealing because it sorts on what the user actually sees.
> How about: repr(obj).strip("'\"") ?
String can starts or ends with quotes. And string repr can be a part of the
repr of other type (e.g. short list).
I think it'd be nice if the solution kept the current order when all keys are orderable (which is a very common case). So IMO repr() should only be used as a fallback when the object comparison fails.
My question to Raymond is should we use the "<" operator or special methods __lt__ and __gt__ (this is the difference between alternative patches)? The use of repr instead of id is different issue.
Ping.
Ping.
Sorry for the delay. pprint_safe_key.patch looks good to me.
New changeset c8815035116b by Serhiy Storchaka in branch 'default': Issue #22721: An order of multiline pprint output of set or dict containing https://hg.python.org/cpython/rev/c8815035116b
Thank you for your review Fred.
stage: patch review -> resolved
messages: + msg229999
stage: patch review