- Author:
- Raymond Hettinger <python at rcn.com>
- Status:
- Final
- Type:
- Standards Track
- Created:
- 12-Mar-2009
- Python-Version:
- 2.7, 3.1
- Post-History:
- 12-Mar-2009
Provide a simple, non-locale aware way to format a number
with a thousands separator.
Adding thousands separators is one of the simplest ways to
humanize a program’s output, improving its professional appearance
and readability.
In the finance world, output with thousands separators is the norm.
Finance users and non-professional programmers find the locale
approach to be frustrating, arcane and non-obvious.
The locale module presents two other challenges. First, it is
a global setting and not suitable for multi-threaded apps that
need to serve-up requests in multiple locales. Second, the
name of a relevant locale (such as “de_DE”) can vary from
platform to platform or may not be defined at all. The docs
for the locale module describe these and many other challenges
in detail.
It is not the goal to replace the locale module, to perform
internationalization tasks, or accommodate every possible
convention. Such tasks are better suited to robust tools like
Babel. Instead, the goal is to make a common, everyday
task easier for many users.
A comma will be added to the format() specifier mini-language:
[[fill]align][sign][#][0][width][,][.precision][type]
The ‘,’ option indicates that commas should be included in the
output as a thousands separator. As with locales which do not
use a period as the decimal point, locales which use a
different convention for digit separation will need to use the
locale module to obtain appropriate formatting.
The proposal works well with floats, ints, and decimals.
It also allows easy substitution for other separators.
For example:
format(n, "6,d").replace(",", "_")
This technique is completely general but it is awkward in the
one case where the commas and periods need to be swapped:
format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".")
The width argument means the total length including the commas
and decimal point:
format(1234, "08,d") --> '0001,234'
format(1234.5, "08,.1f") --> '01,234.5'
The ‘,’ option is defined as shown above for types ‘d’, ‘e’,
‘f’, ‘g’, ‘E’, ‘G’, ‘%’, ‘F’ and ‘’. To allow future extensions, it is
undefined for other types: binary, octal, hex, character,
etc.
This proposal has the virtue of being simpler than the alternative
proposal but is much less flexible and meets the needs of fewer
users right out of the box. It is expected that some other
solution will arise for specifying alternative separators.
Scanning the web, I’ve found that thousands separators are
usually one of COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE.
C-Sharp provides both styles (picture formatting and type specifiers).
The type specifier approach is locale aware. The picture formatting only
offers a COMMA as a thousands separator:
String.Format("{0:n}", 12400) ==> "12,400"
String.Format("{0:0,0}", 12400) ==> "12,400"
Common Lisp uses a COLON before the ~D decimal type specifier to
emit a COMMA as a thousands separator. The general form of ~D is
~mincol,padchar,commachar,commaintervalD. The padchar defaults
to SPACE. The commachar defaults to COMMA. The commainterval
defaults to three.
(format nil "~:D" 229345007) => "229,345,007"
- The ADA language allows UNDERSCORES in its numeric literals.
Visual Basic and its brethren (like MS Excel) use a completely
different style and have ultra-flexible custom format
specifiers like:
COBOL uses picture clauses like:
Java offers a Decimal.Format Class that uses picture patterns (one
for positive numbers and an optional one for negatives) such as:
"#,##0.00;(#,##0.00)". It allows arbitrary groupings including
hundreds and ten-thousands and uneven groupings. The special pattern
characters are non-localized (using a DOT for a decimal separator and
a COMMA for a grouping separator). The user can supply an alternate
set of symbols using the formatter’s DecimalFormatSymbols object.
Make both the thousands separator and decimal separator user
specifiable but not locale aware. For simplicity, limit the
choices to a COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE.
The SPACE can be either U+0020 or U+00A0.
Whenever a separator is followed by a precision, it is a
decimal separator and an optional separator preceding it is a
thousands separator. When the precision is absent, a lone
specifier means a thousands separator:
[[fill]align][sign][#][0][width][tsep][dsep precision][type]
Examples:
format(1234, "8.1f") --> ' 1234.0'
format(1234, "8,1f") --> ' 1234,0'
format(1234, "8.,1f") --> ' 1.234,0'
format(1234, "8 ,f") --> ' 1 234,0'
format(1234, "8d") --> ' 1234'
format(1234, "8,d") --> ' 1,234'
format(1234, "8_d") --> ' 1_234'
This proposal meets mosts needs, but it comes at the expense
of taking a bit more effort to parse. Not every possible
convention is covered, but at least one of the options (spaces
or underscores) should be readable, understandable, and useful
to folks from many diverse backgrounds.
As shown in the examples, the width argument means the total
length including the thousands separators and decimal separators.
No change is proposed for the locale module.
The thousands separator is defined as shown above for types
‘d’, ‘e’, ‘f’, ‘g’, ‘%’, ‘E’, ‘G’ and ‘F’. To allow future
extensions, it is undefined for other types: binary, octal,
hex, character, etc.
The drawback to this alternative proposal is the difficulty
of mentally parsing whether a single separator is a thousands
separator or decimal separator. Perhaps it is too arcane
to link the decimal separator with the precision specifier.
This document has been placed in the public domain.