bpo-22454: Add shlex.join() (the opposite of shlex.split()) by bbayles · Pull Request #7605 · python/cpython

bbayles · 2018-06-10T22:59:06Z

This PR adds shlex.join(), using the implementation suggested in the issue.

This is the opposite of shlex.split(), and is intended to prevent the naive use of ' '.join(list_of_shell_tokens), which can be unsafe.

https://bugs.python.org/issue22454

Doc/library/shlex.rst

supakeen · 2018-09-20T13:09:33Z

I do like having the inverse of shlex.split in the standard library however do be careful with the documentation. The returned value is not safe to execute when built up from user input just as using shlex.split(user_input)[0] and feeding it to a shell is not safe :)

pganssle

One small comment on the docs, otherwise this implementation looks fine to me. No comment on whether or not it should be added to the module, since I don't know the shlex use cases enough to weigh in.

With regards to the test strategy, I think it's probably fine as is, but I've given a suggestion anyway because I've been going back and forth on it. You are only testing the fact that join is an inverse of split, not testing the behavior of join itself (which would be useful even if split did not exist), but honestly it's probably fine to define join in terms of split and then say that it's sufficient to test split and the fact that join is the inverse of split. Still, I don't think a few explicit join tests would hurt.

Doc/library/shlex.rst

Lib/test/test_shlex.py

berkerpeksag

Could you also add a note to Doc/whatsnew/3.8.rst?

Thank you!

Doc/library/shlex.rst

Lib/shlex.py

Lib/test/test_shlex.py

Misc/NEWS.d/next/Library/2018-06-10-17-48-07.bpo-22454.qeiy_X.rst

bedevere-bot · 2019-04-17T16:06:05Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

mahmoud · 2019-04-17T16:22:28Z

I think this is a great idea! Always nice to have symmetry in an API. Speaking of which, one thing I noticed about this implementation is that it doesn't mirror the new posix flag in the split API. So you end up with slightly overzealous quoting:

>>> from shlex import join, split                                                                                                         
>>> join(split('test ê test'))                                                                                                          
"test 'ê' test"

Not sure if you're up to adding another flag, but figured I'd mention.

bbayles · 2019-04-17T16:51:07Z

@berkerpeksag, I think I've made the changes you requested.

@mahmoud, do you have an implementation suggestion? The quote function doesn't have the posix flag, and this is just ' '.join(quote(arg) for arg in split_command).

pganssle · 2019-04-17T18:14:06Z

I guess the problem is more going the other way, right?

>>> split("test 'ê' test", posix=True)
['test', 'ê', 'test']

>>> split("test 'ê' test", posix=False)
['test', "'ê'", 'test']

>>> split("test 'ê' test", posix=False)
['test', "'ê'", 'test']
>>> join(split("test 'ê' test", posix=False))
'test \'\'"\'"\'ê\'"\'"\'\' test'

I'm not sure how much of a bug this is, but it does mean that split(join(split(x, posix=False)), posix=False) is not an idempotent operation. I'm not sure how much it needs to be, though.

mahmoud · 2019-04-17T22:56:22Z

@pganssle Bingo! Agreed that full roundtrippability is nice to have, but not necessarily a blocker. (when I've needed this, I've done exactly what join() does now, but that was before this new posix flag).

@bbayles This may be outside the scope of this change, but I think it makes sense to add the posix flag for symmetry.

If there's an appetite for that, then I think we could make the string constants inside the shlex class into global constants, shared between shlex and quote.

I haven't done the venn diagram, but there may be other corners lurking in quote's "exclusive" regex of unsafe chars (re.compile(r'[^\w@%+=:,./-]', re.ASCII)) and shlex's "inclusive" specification of safe chars:

        self.wordchars = ('abcdfeghijklmnopqrstuvwxyz'
                          'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_')
        if self.posix:
            self.wordchars += ('ßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ'
                               'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ')

You can see how, instead of the regex, you could check if any of the word characters aren't those characters. Similar thing could happen for punctuation_chars.

bbayles · 2019-04-17T23:07:16Z

I'll defer to @berkerpeksag on whether to increase the scope of this PR.

pganssle · 2019-04-18T14:12:42Z

I think we want join to default to using the POSIX behavior, right? So if quote defaults to the correct behavior with posix=True (which it seems to), then I think posix=False can be added to join in a separate PR.

If quote defaults to posix=False, then I think it needs to be added into this PR (since we want the default for join to be posix=True, and it would be best if this were not a backwards-incompatible change).

berkerpeksag · 2019-04-18T14:22:05Z

I think we need to add a posix parameter because split()'s posix defaults to True and users would expect join(split(cmd)) == cmd.

pganssle · 2019-04-18T14:31:07Z

@berkerpeksag I'm not sure that join(split(cmd)) == cmd will ever be possible for all cmd when posix=True, because split is lossy in POSIX mode:

>>> shlex.split("one 'two' three")
['one', 'two', 'three']

>>> shlex.split("one two three")
['one', 'two', 'three']

If you turn off POSIX mode, the splitting is no longer lossy (at least in this example):

>>> shlex.split("one 'two' three", posix=False)
['one', "'two'", 'three']

>>> shlex.split("one two three", posix=False)
['one', 'two', 'three']

So right now, you have the situation where:

split(join(split(cmd, posix=True)), posix=True) == split(cmd, posix=True)

Which is I think as good as we can hope for with a default behavior. I think we can go ahead with this and add a posix parameter (defaulting to True) to both join and quote as a separate feature.

Is this accurate @mahmoud?

mahmoud · 2019-04-26T06:34:23Z

@pganssle Just caught this, and yeah I agree. The sooner we get the functionality, the fewer ' '.join(args) will be written. Corner cases in a new ticket sounds good to me, too!

mahmoud

My other comments notwithstanding, this looks like a solid shlex addition I wish would've existed long ago. Thanks for adding it!

pganssle · 2019-05-18T18:50:57Z

@berkerpeksag The feature freeze is coming up pretty soon, have you had a chance to review Mahmoud and my comments about the feature? It would be nice to get this merged for Python 3.8.

berkerpeksag · 2019-05-18T22:38:41Z

I agree that shlex.split() is lossy in POSIX mode and I'm fine with it. I'd merge this PR now if the default mode of shlex.split() wasn't POSIX (it has been the default mode for more than a decade)

"Let's merge this and we'll fix other cases later" never works in practice and I'm not comfortable adding an unfinished API to the standard library.

Let's see what @vsajip thinks as he has been working on maintaining the shlex module lately.

pganssle · 2019-05-18T22:50:44Z

I agree that shlex.split() is lossy in POSIX mode and I'm fine with it. I'd merge this PR now if the default mode of shlex.split() wasn't POSIX (it has been the default mode for more than a decade)

If I understand you correctly, it sounds like you have the opposite understanding from me as to what needs to be done. When I said that shlex.split is lossy in POSIX mode, what I mean is that this PR already does as well as it possibly can do with the default (POSIX) mode.

The new feature that we're blocking on is support for lossless round trips when posix=False is specified, and since the default is posix=True, I think it's a sufficiently clean separation that adding support for it can be considered a separate feature (especially considering that quote also has no way to specify posix=False).

bbayles · 2019-05-19T01:43:43Z

I'm not available to increase the scope of this PR currently. So if shlex.join needs to be more than it is here, then I'll close this and allow someone else to contribute.

pganssle · 2019-05-29T15:46:54Z

Thanks @bbayles for putting in the work on this, and thanks @berkerpeksag and @mahmoud for the reviews and discussion!

Thanks @vsajip for the review and merge!

…-7605)

Add shlex.join (bpo-22454)

e90cd7c

the-knights-who-say-ni added the CLA signed label Jun 10, 2018

bedevere-bot added the awaiting review label Jun 10, 2018

bbayles commented Jun 10, 2018

View reviewed changes

Doc/library/shlex.rst Outdated Show resolved Hide resolved

pganssle reviewed Apr 11, 2019

View reviewed changes

Doc/library/shlex.rst Outdated Show resolved Hide resolved

Lib/test/test_shlex.py Show resolved Hide resolved

Update shlex.rst

d1f1edc

berkerpeksag requested changes Apr 17, 2019

View reviewed changes

bedevere-bot removed the awaiting review label Apr 17, 2019

bedevere-bot added the awaiting changes label Apr 17, 2019

bbayles added 4 commits April 17, 2019 11:30

Tweaks for docs

2583779

Add more tests for shlex.join

7865679

Add What's New entry

1410371

Add short docstring

792f3d0

Merge branch 'master' into shlex-join-22454

9feabd2

mahmoud approved these changes Apr 26, 2019

View reviewed changes

pganssle approved these changes Apr 26, 2019

View reviewed changes

vsajip merged commit ca80495 into python:master May 29, 2019

bedevere-bot removed the awaiting changes label May 29, 2019

DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020

bpo-22454: Add shlex.join() (the opposite of shlex.split()) (pythonGH…

80cfd47

…-7605)

Uh oh!

Conversation

bbayles commented Jun 10, 2018 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

supakeen commented Sep 20, 2018

Uh oh!

pganssle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

berkerpeksag left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bedevere-bot commented Apr 17, 2019

Uh oh!

mahmoud commented Apr 17, 2019

Uh oh!

bbayles commented Apr 17, 2019

Uh oh!

pganssle commented Apr 17, 2019

Uh oh!

mahmoud commented Apr 17, 2019

Uh oh!

bbayles commented Apr 17, 2019

Uh oh!

pganssle commented Apr 18, 2019

Uh oh!

berkerpeksag commented Apr 18, 2019

Uh oh!

pganssle commented Apr 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mahmoud commented Apr 26, 2019

Uh oh!

mahmoud left a comment

Choose a reason for hiding this comment

Uh oh!

pganssle commented May 18, 2019

Uh oh!

berkerpeksag commented May 18, 2019

Uh oh!

pganssle commented May 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bbayles commented May 19, 2019

Uh oh!

pganssle commented May 29, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

bbayles commented Jun 10, 2018 •

edited by bedevere-bot

Loading

pganssle commented Apr 18, 2019 •

edited

Loading

pganssle commented May 18, 2019 •

edited

Loading