[proxy] web.archive.org← back | site home | direct (HTTPS) ↗ | proxy home | ◑ dark◐ light
/ cpython Public
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-32285: Add unicodedata.is_normalized to check the current norma… #4806

Merged
merged 11 commits into from Nov 4, 2018

Conversation

Copy link
Contributor

maxbelanger commented Dec 12, 2017

Introduces unicodedata.is_normalized, which can check whether a unistr is in a given normal form.

This makes use of the internal helper (also called is_normalized) that can "quick check" normalization, but falls back on creating a normalized copy and comparing when necessary.

https://bugs.python.org/issue32285

Copy link
Contributor Author

maxbelanger commented Dec 12, 2017

This was tested locally using ./python.exe -E -Wd -m test -u urlfetch -v test_normalization on macOS 10.13.

Copy link
Member

vstinner left a comment

I like the new function, but it should be documented in Doc/library/unicodedata.rst.

You may also add it to Doc/whatsnew/3.7.rst, in a "unicodedata" section of Improved Modules.

Modules/unicodedata.c Outdated Show resolved Hide resolved
Modules/unicodedata.c Outdated Show resolved Hide resolved
Copy link
Member

vstinner commented Dec 12, 2017

Please add also a NEWS entry for the Changelog using the "blurb" tool:
https://devguide.python.org/committing/#what-s-new-and-news-entries

Copy link
Contributor Author

maxbelanger commented Jan 8, 2018

@vstinner any other changes you'd like to see here? Just made a tiny signature change to ensure consistency with the rest of the module, otherwise I think this is good to go.

Copy link
Contributor Author

maxbelanger commented Sep 14, 2018

@vstinner should I rebase this patch for 3.8?

Modules/unicodedata.c Outdated Show resolved Hide resolved
self.assertTrue(is_normalized("NFC", c2))
self.assertTrue(is_normalized("NFD", c3))
self.assertTrue(is_normalized("NFKC", c4))
self.assertTrue(is_normalized("NFKD", c5))
Copy link
Contributor

benjaminp Nov 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be some negative cases, too. Make sure the MAYBE case is being exercised.

Copy link
Contributor Author

maxbelanger Nov 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increased coverage + confirmed that this is exercising the MAYBE path.

Copy link
Member

vstinner Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add also tests when it returns False. If the function always returns True, the test still pass ;-)


PyObject *result;
int nfc = 0;
int k = 0;
Copy link
Contributor

benjaminp Nov 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be bool.

Copy link
Contributor Author

maxbelanger Nov 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant to conform to the existing implementation of is_normalized, which takes in ints. Could change is_normalized, but I preferred to avoid making changes outside the scope of my own.

benjaminp merged commit 2810dd7 into python:master Nov 4, 2018
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants