Message221161
| Author |
ezio.melotti |
| Recipients |
ezio.melotti, lemburg, loewis, taleinat, terry.reedy |
| Date |
2014-06-21.08:33:39 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1403339619.71.0.612807156117.issue21765@psf.upfronthosting.co.za> |
| In-reply-to |
|
| Content |
> _ID_FIRST_CATEGORIES = {"Lu", "Ll", "Lt", "Lm", "Lo", "Nl",
> "Other_ID_Start"}
> _ID_CATEGORIES = _ID_FIRST_CATEGORIES | {"Mn", "Mc", "Nd", "Pc",
> "Other_ID_Continue"}
Note that "Other_ID_Start" and "Other_ID_Continue" are not categories -- they are properties -- and that unicodedata.category() won't return them, so adding them to these set won't have any effect. I don't think there's a way to check if chars have that property, but as I said in my previous message it's probably safe to ignore them (nothing will explode even in the unlikely case that those chars are used, right?).
> def is_id_char(char):
> return char in _ASCII_ID_CHARS or (
> ord(char) >= 128 and
What's the reason for checking if the ord is >= 128?
> category(normalize(char)[0]) in _ID_CATEGORIES
> ) |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2014-06-21 08:33:39 | ezio.melotti | set | recipients:
+ ezio.melotti, lemburg, loewis, terry.reedy, taleinat |
| 2014-06-21 08:33:39 | ezio.melotti | set | messageid: <1403339619.71.0.612807156117.issue21765@psf.upfronthosting.co.za> |
| 2014-06-21 08:33:39 | ezio.melotti | link | issue21765 messages |
| 2014-06-21 08:33:39 | ezio.melotti | create | |
|