|
I do like having the inverse of shlex.split in the standard library however do be careful with the documentation. The returned value is not safe to execute when built up from user input just as using shlex.split(user_input)[0] and feeding it to a shell is not safe :) |
There was a problem hiding this comment.
One small comment on the docs, otherwise this implementation looks fine to me. No comment on whether or not it should be added to the module, since I don't know the shlex use cases enough to weigh in.
With regards to the test strategy, I think it's probably fine as is, but I've given a suggestion anyway because I've been going back and forth on it. You are only testing the fact that join is an inverse of split, not testing the behavior of join itself (which would be useful even if split did not exist), but honestly it's probably fine to define join in terms of split and then say that it's sufficient to test split and the fact that join is the inverse of split. Still, I don't think a few explicit join tests would hurt.
There was a problem hiding this comment.
Could you also add a note to Doc/whatsnew/3.8.rst?
Thank you!
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
|
I think this is a great idea! Always nice to have symmetry in an API. Speaking of which, one thing I noticed about this implementation is that it doesn't mirror the new >>> from shlex import join, split
>>> join(split('test ê test'))
"test 'ê' test"Not sure if you're up to adding another flag, but figured I'd mention. |
|
@berkerpeksag, I think I've made the changes you requested. @mahmoud, do you have an implementation suggestion? The |
|
I guess the problem is more going the other way, right? >>> split("test 'ê' test", posix=True)
['test', 'ê', 'test']
>>> split("test 'ê' test", posix=False)
['test', "'ê'", 'test']
>>> split("test 'ê' test", posix=False)
['test', "'ê'", 'test']
>>> join(split("test 'ê' test", posix=False))
'test \'\'"\'"\'ê\'"\'"\'\' test'I'm not sure how much of a bug this is, but it does mean that |
|
@pganssle Bingo! Agreed that full roundtrippability is nice to have, but not necessarily a blocker. (when I've needed this, I've done exactly what @bbayles This may be outside the scope of this change, but I think it makes sense to add the If there's an appetite for that, then I think we could make the string constants inside the I haven't done the venn diagram, but there may be other corners lurking in quote's "exclusive" regex of unsafe chars ( self.wordchars = ('abcdfeghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_')
if self.posix:
self.wordchars += ('ßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ'
'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ')You can see how, instead of the regex, you could check if any of the word characters aren't those characters. Similar thing could happen for |
|
I'll defer to @berkerpeksag on whether to increase the scope of this PR. |
|
I think we want If |
|
I think we need to add a |
|
@berkerpeksag I'm not sure that >>> shlex.split("one 'two' three")
['one', 'two', 'three']
>>> shlex.split("one two three")
['one', 'two', 'three']If you turn off POSIX mode, the splitting is no longer lossy (at least in this example): >>> shlex.split("one 'two' three", posix=False)
['one', "'two'", 'three']
>>> shlex.split("one two three", posix=False)
['one', 'two', 'three']So right now, you have the situation where: split(join(split(cmd, posix=True)), posix=True) == split(cmd, posix=True)Which is I think as good as we can hope for with a default behavior. I think we can go ahead with this and add a Is this accurate @mahmoud? |
|
@pganssle Just caught this, and yeah I agree. The sooner we get the functionality, the fewer |
There was a problem hiding this comment.
My other comments notwithstanding, this looks like a solid shlex addition I wish would've existed long ago. Thanks for adding it!
|
@berkerpeksag The feature freeze is coming up pretty soon, have you had a chance to review Mahmoud and my comments about the feature? It would be nice to get this merged for Python 3.8. |
|
I agree that "Let's merge this and we'll fix other cases later" never works in practice and I'm not comfortable adding an unfinished API to the standard library. Let's see what @vsajip thinks as he has been working on maintaining the shlex module lately. |
If I understand you correctly, it sounds like you have the opposite understanding from me as to what needs to be done. When I said that The new feature that we're blocking on is support for lossless round trips when |
|
I'm not available to increase the scope of this PR currently. So if |
|
Thanks @bbayles for putting in the work on this, and thanks @berkerpeksag and @mahmoud for the reviews and discussion! Thanks @vsajip for the review and merge! |
This PR adds
shlex.join(), using the implementation suggested in the issue.This is the opposite of
shlex.split(), and is intended to prevent the naive use of' '.join(list_of_shell_tokens), which can be unsafe.https://bugs.python.org/issue22454