Created on 2019-03-02 23:04 by rhettinger, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 12149 | merged | rhettinger, 2019-03-03 21:58 | |
| Messages (3) | |||
|---|---|---|---|
| msg337020 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2019-03-02 23:04 | |
------ How to use it ------
What percentage of men and women will have the same height in two normally distributed populations with known means and standard deviations?
# http://www.usablestats.com/lessons/normal
>>> men = NormalDist(70, 4)
>>> women = NormalDist(65, 3.5)
>>> men.overlap(women)
0.5028719270195425
The result can be confirmed empirically with a Monte Carlo simulation:
>>> from collections import Counter
>>> n = 100_000
>>> overlap = Counter(map(round, men.samples(n))) & Counter(map(round, women.samples(n)))
>>> sum(overlap.values()) / n
0.50349
The result can also be confirmed by numeric integration of the probability density function:
>>> dx = 0.10
>>> heights = [h * dx for h in range(500, 860)]
>>> sum(min(men.pdf(h), women.pdf(h)) for h in heights) * dx
0.5028920586287203
------ Code ------
def overlap(self, other):
'''Compute the overlap coefficient (OVL) between two normal distributions.
Measures the agreement between two normal probability distributions.
Returns a value between 0.0 and 1.0 giving the overlapping area in
the two underlying probability density functions.
'''
# See: "The overlapping coefficient as a measure of agreement between
# probability distributions and point estimation of the overlap of two
# normal densities" -- Henry F. Inman and Edwin L. Bradley Jr
# http://dx.doi.org/10.1080/03610928908830127
# Also see:
# http://www.iceaaonline.com/ready/wp-content/uploads/2014/06/MM-9-Presentation-Meet-the-Overlapping-Coefficient-A-Measure-for-Elevator-Speeches.pdf
if not isinstance(other, NormalDist):
return NotImplemented
X, Y = self, other
X_var, Y_var = X.variance, Y.variance
if not X_var or not Y_var:
raise StatisticsError('overlap() not defined when sigma is zero')
dv = Y_var - X_var
if not dv:
return 2.0 * NormalDist(fabs(Y.mu - X.mu), 2.0 * X.sigma).cdf(0)
a = X.mu * Y_var - Y.mu * X_var
b = X.sigma * Y.sigma * sqrt((X.mu - Y.mu)**2 + dv * log(Y_var / X_var))
x1 = (a + b) / dv
x2 = (a - b) / dv
return 1.0 - (fabs(Y.cdf(x1) - X.cdf(x1)) + fabs(Y.cdf(x2) - X.cdf(x2)))
---- Future ----
The concept of an overlap coefficient (OVL) is not specific to normal distributions, so it is possible to extend this idea to work with other distributions if needed.
|
|||
| msg337026 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2019-03-03 06:16 | |
Another cross-check can be had with this nomogram: https://www.rasch.org/rmt/rmt101r.htm |
|||
| msg337367 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2019-03-07 06:59 | |
New changeset 318d537daabf2bd5f781255c7e25bfce260cf227 by Raymond Hettinger in branch 'master': bpo-36169 : Add overlap() method to statistics.NormalDist (GH-12149) https://github.com/python/cpython/commit/318d537daabf2bd5f781255c7e25bfce260cf227 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:11 | admin | set | github: 80350 |
| 2019-03-07 07:00:03 | rhettinger | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2019-03-07 06:59:43 | rhettinger | set | messages: + msg337367 |
| 2019-03-03 21:58:37 | rhettinger | set | keywords:
+ patch stage: patch review pull_requests: + pull_request12149 |
| 2019-03-03 06:16:31 | rhettinger | set | messages: + msg337026 |
| 2019-03-02 23:04:32 | rhettinger | create | |