[proxy] web.archive.org← back | site home | direct (HTTPS) ↗ | proxy home | ◑ dark◐ light

Issue 2052: Allow changing difflib._file_template character encoding.

The Wayback Machine - http://web.archive.org/web/20210126001350/https://bugs.python.org/issue2052

Issue2052

classification
Title: Allow changing difflib._file_template character encoding.
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: berker.peksag Nosy List: berker.peksag, ezio.melotti, hashimo, josephoenix, python-dev, r.david.murray, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2008-02-08 21:21 by josephoenix, last changed 2015-03-14 23:19 by berker.peksag. This issue is now closed.

Files
File name Uploaded Description Edit
issue2052.diff berker.peksag, 2014-05-13 19:25 review
issue2052_html5.diff berker.peksag, 2014-05-13 19:26 review
issue2052_html5_v2.diff berker.peksag, 2014-05-18 01:20 review
issue2052_v2.diff berker.peksag, 2015-03-13 19:50 review
issue2052_v3.diff berker.peksag, 2015-03-14 17:08 review
issue2052_v4.diff berker.peksag, 2015-03-14 17:50 review
Messages (15)
msg62208 - (view) Author: (josephoenix) Date: 2008-02-08 21:21
When passed unicode strings, difflib.HtmlDiff.make_file and make_table 
fail with a UnicodeEncodeError. Also, the html outputted by make_file 
seems to be hardcoded to use charset=ISO-8859-1 (line 1584 of difflib.py)
msg62209 - (view) Author: (josephoenix) Date: 2008-02-08 21:34
Oops, please close this. Apparently was fixed in 2.5.1, and I'm just 
behind.
msg62211 - (view) Author: (josephoenix) Date: 2008-02-08 21:51
After installing 2.5.1, the UnicodeEncodeError is gone, but the charset is 
still hardcoded in difflib._file_template. So, I guess this is still a 
separate bug.
msg116949 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-09-20 14:59
difflib._file_template is still hard-coded in py3k SVN.  I'm unsure as to whether this is a feature request, a behaviour issue or not an issue at all, can someone please advise, thanks.
msg117234 - (view) Author: R. David Murray (r.david.murray) * Date: 2010-09-23 21:11
I believe that charset is the standard default for html, which would make this a feature request.
msg184726 - (view) Author: Terry J. Reedy (terry.reedy) * Date: 2013-03-20 02:57
In 3.2, it is line 1629:
          content="text/html; charset=ISO-8859-1" />

That charset was only standard for Western European documents limited to that charset. Now, even such limited-char docs often use 'utf-8' (python.org does). The result of putting an incorrect charset designation in an html file is that the browser will not display the file correctly.

For instance, I tried an input sequence containing line 'c\u3333', which displays in IDLE as  'c㌳'. The string from HtmlDill.make_file() must be written to a file opened with encoding='utf-8', not the above or equivalent. Firefox then reads the three bytes of the utf-8 encoding as three separate characters and displays 'c㌳'. To check:
>>> 'c㌳'.encode().decode(encoding='Latin-1')
'cã\x8c³'

To me the clear implication of "returns a string which is a complete HTML file containing a table showing line by line differences with inter-line and intra-line changes highlighted." is that the resulting file will display correctly. The current template charset prevents that, changing to 'utf-8' results in a file that displays correctly (tested). So the current behavior and the code that causes it is to me clearly a bug. I would like to fix it before 2.7.4 comes out.
msg184751 - (view) Author: Terry J. Reedy (terry.reedy) * Date: 2013-03-20 11:15
After thinking about it more, the real problem is that the charset setting must match the chars used and how they re encoded, so no one setting is right for all uses. An alternative to changing the default in existing versions is to at least document what it is and explain how to work around it with .replace -- for instance output.replace('ISO-8859-1', 'utf-8'). I agree that adding a parameter (charset=xxx) is a new feature.
msg184755 - (view) Author: Ezio Melotti (ezio.melotti) * Date: 2013-03-20 12:13
I haven't looked at the code, but if an HTML page is generated it should probably be updated to use HTML5 and <meta charset="utf-8">.
msg218479 - (view) Author: Berker Peksag (berker.peksag) * Date: 2014-05-13 19:25
Attaching two patches:

issue2052.diff adds a "charset" keyword argument to HtmlDiff.make_file().

issue2052_html5.diff also adds a "charset" keyword argument to HtmlDiff.make_file() and updates the markup of HtmlDiff() to HTML5. I tested it with Firefox 29 and Chrome 34.
msg218726 - (view) Author: Berker Peksag (berker.peksag) * Date: 2014-05-18 01:20
Attaching a new version of issue2052_html5.diff. Changes:
- Switch from px to em in CSS
- Cleanup markup a bit (e.g. delete redundant colgroup tags)
msg237383 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * Date: 2015-03-06 21:26
May be updating the markup to HTML5 should be different issue. issue2052_html5_v2.diff not only adds charset in HTML5 format, it totally changes the template. This definitely a separate issue.
msg238050 - (view) Author: Berker Peksag (berker.peksag) * Date: 2015-03-13 19:50
Here is an updated patch. Thanks for the review, Serhiy. I will open a new issue for the HTML 5 part of the patch.
msg238097 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * Date: 2015-03-14 19:37
LGTM
msg238104 - (view) Author: Roundup Robot (python-dev) Date: 2015-03-14 23:18
New changeset e058423d3ca4 by Berker Peksag in branch 'default':
Issue #2052: Add charset parameter to HtmlDiff.make_file().
https://hg.python.org/cpython/rev/e058423d3ca4
msg238105 - (view) Author: Berker Peksag (berker.peksag) * Date: 2015-03-14 23:19
Thanks Serhiy.
History
Date User Action Args
2015-03-14 23:19:45berker.peksagsetstatus: open -> closed
resolution: fixed
messages: + msg238105

stage: commit review -> resolved

2015-03-14 23:18:50python-devsetnosy: + python-dev
messages: + msg238104
2015-03-14 19:37:49serhiy.storchakasetassignee: berker.peksag
messages: + msg238097
stage: patch review -> commit review
2015-03-14 17:50:11berker.peksagsetfiles: + issue2052_v4.diff
2015-03-14 17:08:49berker.peksagsetfiles: + issue2052_v3.diff
2015-03-13 19:50:32berker.peksagsetfiles: + issue2052_v2.diff

messages: + msg238050

2015-03-06 21:26:14serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg237383
2014-07-30 10:41:30hashimosetnosy: + hashimo
2014-05-18 01:20:43berker.peksagsetfiles: + issue2052_html5_v2.diff

messages: + msg218726

2014-05-13 19:26:11berker.peksagsetfiles: + issue2052_html5.diff
2014-05-13 19:25:57berker.peksagsetfiles: + issue2052.diff

versions: + Python 3.5, - Python 3.2
keywords: + patch
nosy: + berker.peksag

messages: + msg218479
stage: needs patch -> patch review

2014-04-19 16:00:57orsenthilsetnosy: - orsenthil
2014-02-03 18:38:49BreamoreBoysetnosy: - BreamoreBoy
2013-03-20 12:13:17ezio.melottisetmessages: + msg184755
2013-03-20 11:15:31terry.reedysetmessages: + msg184751
2013-03-20 02:57:37terry.reedysetnosy: + ezio.melotti, terry.reedy, orsenthil, - tim.peters
messages: + msg184726
2010-09-23 21:11:26r.david.murraysetassignee: tim.peters -> (no value)
type: behavior -> enhancement
versions: + Python 3.2, - Python 2.6
nosy: + r.david.murray

messages: + msg117234
stage: test needed -> needs patch

2010-09-20 14:59:10BreamoreBoysetnosy: + BreamoreBoy
messages: + msg116949
2010-01-29 04:40:41brian.curtinsetstage: test needed
versions: + Python 2.6, - Python 2.5
2008-03-18 19:03:06jafosetpriority: normal
assignee: tim.peters
nosy: + tim.peters
title: Lack of difflib.HtmlDiff unicode support -> Allow changing difflib._file_template character encoding.
2008-02-08 21:51:40josephoenixsetmessages: + msg62211
2008-02-08 21:34:08josephoenixsetmessages: + msg62209
2008-02-08 21:21:53josephoenixcreate