Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unicodedata.is_normalized to check the current norma…
#4806
…lization of a unistr
|
This was tested locally using |
|
Please add also a NEWS entry for the Changelog using the "blurb" tool: |
|
@vstinner any other changes you'd like to see here? Just made a tiny signature change to ensure consistency with the rest of the module, otherwise I think this is good to go. |
|
@vstinner should I rebase this patch for 3.8? |
| self.assertTrue(is_normalized("NFC", c2)) | ||
| self.assertTrue(is_normalized("NFD", c3)) | ||
| self.assertTrue(is_normalized("NFKC", c4)) | ||
| self.assertTrue(is_normalized("NFKD", c5)) |
There was a problem hiding this comment.
The reason will be displayed to describe this comment to others. Learn more.
There should be some negative cases, too. Make sure the MAYBE case is being exercised.
There was a problem hiding this comment.
The reason will be displayed to describe this comment to others. Learn more.
Increased coverage + confirmed that this is exercising the MAYBE path.
There was a problem hiding this comment.
The reason will be displayed to describe this comment to others. Learn more.
Maybe add also tests when it returns False. If the function always returns True, the test still pass ;-)
|
|
||
| PyObject *result; | ||
| int nfc = 0; | ||
| int k = 0; |
There was a problem hiding this comment.
The reason will be displayed to describe this comment to others. Learn more.
These could be bool.
There was a problem hiding this comment.
The reason will be displayed to describe this comment to others. Learn more.
This is meant to conform to the existing implementation of is_normalized, which takes in ints. Could change is_normalized, but I preferred to avoid making changes outside the scope of my own.
Introduces
unicodedata.is_normalized, which can check whether aunistris in a given normal form.This makes use of the internal helper (also called
is_normalized) that can "quick check" normalization, but falls back on creating a normalized copy and comparing when necessary.https://bugs.python.org/issue32285