bpo-40267: Fix message when last input character produces a SyntaxError by lysnikolaou · Pull Request #19521 · python/cpython

lysnikolaou · 2020-04-14T18:50:21Z

When there is a SyntaxError after reading the last input character from
the tokenizer and if no newline follows it, the error message used to be
unexpected EOF while parsing, which is wrong.

https://bugs.python.org/issue40267

When there is a SyntaxError after reading the last input character from the tokenizer and if no newline follows it, the error message used to be `unexpected EOF while parsing`, which is wrong.

lysnikolaou · 2020-04-14T18:55:36Z

CC: @gvanrossum @pablogsal

gvanrossum

This looks straightforward enough. I have one niggling thought. Why is tok->done set to E_EOF in the first place?

lysnikolaou · 2020-04-15T09:40:09Z

If we take an example, where the last character produces a SyntaxError, like x+@, the tokenizer checks for a two(or three) character token and it thus reaches EOF, when it tokenizes the @ character. Upon doing so, the tokenizer state gets updated so that tok->done is E_EOF.

Note that if the toknizer reaches EOF, it cannot backup, because it would go into an endless loop, if it were to do so.

gvanrossum · 2020-04-15T14:19:03Z

I'm not sure I entirely believe that:

>>> eval('+and ')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1
    +and 
     ^
SyntaxError: invalid syntax
>>> eval('+and')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1
    +and
     ^
SyntaxError: unexpected EOF while parsing
>>>

But it does look like it always has to do with the final operator ending the file, so you're close.

lysnikolaou · 2020-04-15T18:01:06Z

Ohh, you're right! It's not the last character, it's the last token. In your example, and just gets tokenized as a NAME, which means all of its characters are consumed until a character is found that is not a valid identifier character. In your first example, it's the space, in the second it's EOF. So, tok->done gets the value E_EOF there.

I still think that this is a fix that catches all these cases (I tested your example and a few more, should I maybe add tests for these?) and does not generate any new problems, since E_SYNTAX is what's getting propagated up anyway, if it's not E_EOF. Right?

gvanrossum · 2020-04-15T18:21:49Z

OK, so the done field is set to E_EOF when the tokenizer sees the EOF after the last token. This is harmless if the program is valid, since then the token just get processed, but when there's a syntax error on the last token, the EOF state modifies the error message, incorrectly.

I'll merge now.

bpo-40267: Fix message when last input character produces a SyntaxError

7c007ea

When there is a SyntaxError after reading the last input character from the tokenizer and if no newline follows it, the error message used to be `unexpected EOF while parsing`, which is wrong.

the-knights-who-say-ni added the CLA signed label Apr 14, 2020

bedevere-bot added the awaiting review label Apr 14, 2020

📜🤖 Added by blurb_it.

00f7799

lysnikolaou mentioned this pull request Apr 14, 2020

Failure in test_fstring: SyntaxError messages are different we-like-parsers/cpython#66

Closed

gvanrossum approved these changes Apr 15, 2020

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting review labels Apr 15, 2020

gvanrossum merged commit 9a4b38f into python:master Apr 15, 2020

bedevere-bot removed the awaiting merge label Apr 15, 2020

lysnikolaou deleted the tokenizer-bug branch April 24, 2020 00:30

Uh oh!

lysnikolaou commented Apr 14, 2020 •

edited by bedevere-bot

Loading

Uh oh!

lysnikolaou commented Apr 14, 2020

Uh oh!

gvanrossum left a comment

Uh oh!

lysnikolaou commented Apr 15, 2020

Uh oh!

gvanrossum commented Apr 15, 2020

Uh oh!

lysnikolaou commented Apr 15, 2020 •

edited

Loading

Uh oh!

gvanrossum commented Apr 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

lysnikolaou commented Apr 14, 2020 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lysnikolaou commented Apr 14, 2020

Uh oh!

gvanrossum left a comment

Choose a reason for hiding this comment

Uh oh!

lysnikolaou commented Apr 15, 2020

Uh oh!

gvanrossum commented Apr 15, 2020

Uh oh!

lysnikolaou commented Apr 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gvanrossum commented Apr 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lysnikolaou commented Apr 14, 2020 •

edited by bedevere-bot

Loading

lysnikolaou commented Apr 15, 2020 •

edited

Loading