bpo-32285: Add `unicodedata.is_normalized` to check the current norma… #4806

maxbelanger · 2017-12-12T01:19:58Z

Introduces unicodedata.is_normalized, which can check whether a unistr is in a given normal form.

This makes use of the internal helper (also called is_normalized) that can "quick check" normalization, but falls back on creating a normalized copy and comparing when necessary.

https://bugs.python.org/issue32285

…lization of a unistr

maxbelanger · 2017-12-12T01:21:02Z

This was tested locally using ./python.exe -E -Wd -m test -u urlfetch -v test_normalization on macOS 10.13.

vstinner

I like the new function, but it should be documented in Doc/library/unicodedata.rst.

You may also add it to Doc/whatsnew/3.7.rst, in a "unicodedata" section of Improved Modules.

Modules/unicodedata.c

vstinner · 2017-12-12T10:59:24Z

Please add also a NEWS entry for the Changelog using the "blurb" tool:
https://devguide.python.org/committing/#what-s-new-and-news-entries

maxbelanger · 2018-01-08T22:32:27Z

@vstinner any other changes you'd like to see here? Just made a tiny signature change to ensure consistency with the rest of the module, otherwise I think this is good to go.

maxbelanger · 2018-09-14T18:12:39Z

@vstinner should I rebase this patch for 3.8?

Modules/unicodedata.c

benjaminp · 2018-11-03T23:11:27Z

Lib/test/test_normalization.py

+            self.assertTrue(is_normalized("NFC", c2))
+            self.assertTrue(is_normalized("NFD", c3))
+            self.assertTrue(is_normalized("NFKC", c4))
+            self.assertTrue(is_normalized("NFKD", c5))


There should be some negative cases, too. Make sure the MAYBE case is being exercised.

Increased coverage + confirmed that this is exercising the MAYBE path.

Maybe add also tests when it returns False. If the function always returns True, the test still pass ;-)

benjaminp · 2018-11-03T23:11:27Z

Modules/unicodedata.c

+
+    PyObject *result;
+    int nfc = 0;
+    int k = 0;


These could be bool.

This is meant to conform to the existing implementation of is_normalized, which takes in ints. Could change is_normalized, but I preferred to avoid making changes outside the scope of my own.

bpo-32285: Add unicodedata.is_normalized to check the current norma…

eea1543

…lization of a unistr

the-knights-who-say-ni added the CLA signed label Dec 12, 2017

bedevere-bot added the awaiting review label Dec 12, 2017

vstinner reviewed Dec 12, 2017

View changes

Modules/unicodedata.c Outdated Show resolved Hide resolved

Modules/unicodedata.c Outdated Show resolved Hide resolved

maxbelanger added 4 commits Dec 12, 2017

add docs + cr fixes

7a4076c

properly attribute internal work

591abc0

Merge branch 'master' into add-normalization-check

1ecc284

fix signature to match normalize

8db1a3c

maxbelanger and others added 5 commits Sep 14, 2018

Merge branch 'master' into add-normalization-check

fb401d5

move to 3.8

cf2a177

merge master

15be04f

make verb imperative per normal style

697e35b

tweak to conform to style

25f0623

benjaminp reviewed Nov 3, 2018

View changes

some changes based on CR

bd823e5

benjaminp merged commit 2810dd7 into python:master Nov 4, 2018
5 checks passed

bedevere-bot removed the awaiting review label Nov 4, 2018

bpo-32285: Add `unicodedata.is_normalized` to check the current norma… #4806

bpo-32285: Add `unicodedata.is_normalized` to check the current norma… #4806

maxbelanger commented Dec 12, 2017 •

edited by bedevere-bot

maxbelanger commented Dec 12, 2017

vstinner left a comment

vstinner commented Dec 12, 2017

maxbelanger commented Jan 8, 2018

maxbelanger commented Sep 14, 2018

benjaminp Nov 3, 2018

maxbelanger Nov 4, 2018

vstinner Nov 5, 2018

benjaminp Nov 3, 2018

maxbelanger Nov 4, 2018

bpo-32285: Add unicodedata.is_normalized to check the current norma… #4806

bpo-32285: Add unicodedata.is_normalized to check the current norma… #4806

Conversation

maxbelanger commented Dec 12, 2017 • edited by bedevere-bot

maxbelanger commented Dec 12, 2017

vstinner left a comment

vstinner commented Dec 12, 2017

maxbelanger commented Jan 8, 2018

maxbelanger commented Sep 14, 2018

benjaminp Nov 3, 2018

Choose a reason for hiding this comment

maxbelanger Nov 4, 2018

Choose a reason for hiding this comment

vstinner Nov 5, 2018

Choose a reason for hiding this comment

benjaminp Nov 3, 2018

Choose a reason for hiding this comment

maxbelanger Nov 4, 2018

Choose a reason for hiding this comment

bpo-32285: Add `unicodedata.is_normalized` to check the current norma… #4806

bpo-32285: Add `unicodedata.is_normalized` to check the current norma… #4806

maxbelanger commented Dec 12, 2017 •

edited by bedevere-bot