Social scientists and wonk readers are familiar with Gini coefficients, and while I’ve used them a few times myself, I have no idea what they really mean.

Okay, fine, I know what a Lorenz curve is, and I can explain it mathematically. But ultimately any *understanding *of the Gini coefficient is inherently relative, e.g. “They say .3 is good for income distribution, but anything above .5 is bad. The U.S. is .45, so that’s bad … ish.” It doesn’t say anything about the actual distribution.

Indeed, the first two *Wonkblog* posts I found in a brief Internet search describe the Gini coefficient essentially as a (well-known) measure of inequality. No details.

*Vox* is slightly better: “[I]t’s an incredibly abstract idea that’s difficult to verbally describe. [Thanks, *Vox*.] The advantage to using a gini coefficent is that in principle it summarizes *all* the information about the distribution of income and thus facilitates easy comparisons.”

Easy comparisons? To what? Other Gini coefficients? Whatever.

Still, aside from eyeballing a bunch of Lorenz curves—fun fact: I’m going to make you eyeball a bunch of Lorenz curves today—a Gini coefficient is totally undescriptive because it’s a decimal without any units.

This post tries to remedy that, but first, per the title here’s a big ol’ array formula for calculating Gini coefficients in MS Excel. The source is *Excel and UDF Performance Stuff*. I chose the Angus Deaton version and improved on it by replacing the ROW formulae with COUNTIFs because those account for blank cells in the data. It also avoids the pitfall of negative data numbers, which mess up Lorenz curves. However, as an array formula, it must be entered with CTRL-SHIFT-ENTER otherwise it won’t work. Behold:

((COUNTIF(Datarange,”>=0″)+1)/(COUNTIF(Datarange,”>=0″)-1)-2/(COUNTIF(Datarange,”>=0″)*(COUNTIF(Datarange,”>=0″)-1)*AVERAGEIF(Datarange,”>=0″))*SUM((IF(Datarange>=0,RANK(Datarange,Datarange))+((COUNT(Datarange)+1-IF(Datarange>=0,RANK(Datarange,Datarange,0))-IF(Datarange>=0,RANK(Datarange,Datarange,1)))/2))*IF(Datarange>=0,Datarange)))/(COUNTIF(Datarange,”>=0″)/(COUNTIF(Datarange,”>=0″)-1))

*Where “Datarange” is something like, “$B$1:$B$1000”.

The Web site contains a few other methods, but this one is the most comprehensive and isn’t limited to 4,000 data points. I’ve tested this version with hand-cranked Ginis, that is, sorting the data by size, creating a cumulative sum of them (the Lorenz curve), calculating a bunch of trapezoids among them to find the area under the Lorenz curve, adding those up, subtracting them by equivalent triangle of equality, and then dividing them by that triangle.

So what does the Gini coefficient mean? In other words, what kind of distribution can one imagine when provided with a given Gini number?

It’s a damn good question that I figured out an answer to. I created a bunch of columns in Excel with a bunch of RAND() formulae, gathered some statistics, and then averaged the results. The RAND() formula produces a random number from 0 to 1. The more of them you use, the bigger, more robust the distribution you get. A .1 is as likely as a .9, etc. The mean and median will be .5. But what’s the Gini coefficient of a large number of random numbers within a set range?

Answer: One third.

This is huge for me because now I know when someone says that the Gini coefficient for the income distribution in some country is .333333, I can visualize that it’s equivalent to a random income distribution. Because one third is such a round number, I had a hunch that taking each RAND() formula to different powers would yield other round Gini coefficients, so, here’s a table of Gini coefficients as a power of each data point from a random distribution. (And yes, this blog endorses the divisible properties of the number 12.) I’m adding similar information for the resulting Lorenz curves. It’s quite intriguing—and quite tedious because it takes a few minutes for my computer to spit out the numbers. But this blog wouldn’t be anything if it didn’t hog some clock cycles:

DATA POINTS SHARE OF MAXIMUM AT GIVEN GINI COEFFICIENT | |||||||||
---|---|---|---|---|---|---|---|---|---|

POWER | GINI | MINIMUM | 10TH PERCENTILE | 25TH PERCENTILE | MEDIAN | 75TH PERCENTILE | 90TH PERCENTILE | MAXIMUM | MEAN |

1/12 | .040 | 48.9% | 82.5% | 89.1% | 94.4% | 97.6% | 99.1% | 100.0% | 92.3% |

2/12 | .077 | 24.2% | 68.0% | 79.4% | 89.1% | 95.3% | 98.3% | 100.0% | 85.7% |

3/12 | .111 | 12.1% | 56.1% | 70.7% | 84.1% | 93.1% | 97.4% | 100.0% | 80.0% |

4/12 | .143 | 06.1% | 46.3% | 63.0% | 79.4% | 90.9% | 96.6% | 100.0% | 75.0% |

5/12 | .173 | 03.1% | 38.2% | 56.1% | 74.9% | 88.8% | 95.7% | 100.0% | 70.6% |

6/12 | .200 | 01.6% | 31.5% | 50.0% | 70.7% | 86.7% | 94.9% | 100.0% | 66.7% |

7/12 | .226 | 00.8% | 26.0% | 44.5% | 66.8% | 84.6% | 94.0% | 100.0% | 63.2% |

8/12 | .250 | 00.4% | 21.4% | 39.7% | 63.0% | 82.6% | 93.2% | 100.0% | 60.0% |

9/12 | .273 | 00.2% | 17.7% | 35.3% | 59.5% | 80.7% | 92.4% | 100.0% | 57.2% |

10/12 | .294 | 00.1% | 14.6% | 31.5% | 56.2% | 78.8% | 91.6% | 100.0% | 54.6% |

11/12 | .315 | 00.1% | 12.0% | 28.1% | 53.0% | 76.9% | 90.8% | 100.0% | 52.2% |

12/12 | .334 | 00.0% | 09.9% | 25.0% | 50.0% | 75.1% | 90.0% | 100.0% | 50.0% |

16/12 | .400 | 00.0% | 04.6% | 15.7% | 39.7% | 68.3% | 86.9% | 100.0% | 42.9% |

20/12 | .454 | 00.0% | 02.1% | 09.9% | 31.5% | 62.1% | 83.9% | 99.9% | 37.6% |

24/12 | .500 | 00.0% | 01.0% | 06.3% | 25.0% | 56.4% | 81.0% | 99.9% | 33.4% |

30/12 | .555 | 00.0% | 00.3% | 03.1% | 17.7% | 48.9% | 76.9% | 99.9% | 28.6% |

36/12 | .600 | 00.0% | 00.1% | 01.6% | 12.5% | 42.4% | 72.9% | 99.9% | 25.1% |

42/12 | .636 | 00.0% | 00.0% | 00.8% | 08.9% | 36.8% | 69.2% | 99.9% | 22.3% |

48/12 | .666 | 00.0% | 00.0% | 00.4% | 06.3% | 31.9% | 65.6% | 99.9% | 20.1% |

54/12 | .692 | 00.0% | 00.0% | 00.2% | 04.4% | 27.6% | 62.3% | 99.9% | 18.2% |

60/12 | .714 | 00.0% | 00.0% | 00.1% | 03.1% | 24.0% | 59.1% | 99.8% | 16.7% |

66/12 | .733 | 00.0% | 00.0% | 00.0% | 02.2% | 20.8% | 56.1% | 99.8% | 15.4% |

72/12 | .749 | 00.0% | 00.0% | 00.0% | 01.6% | 18.0% | 53.2% | 99.8% | 14.3% |

84/12 | .777 | 00.0% | 00.0% | 00.0% | 00.8% | 13.5% | 47.9% | 99.8% | 12.6% |

96/12 | .799 | 00.0% | 00.0% | 00.0% | 00.4% | 10.2% | 43.1% | 99.7% | 11.2% |

120/12 | .833 | 00.0% | 00.0% | 00.0% | 00.1% | 05.8% | 35.0% | 99.7% | 09.1% |

144/12 | .856 | 00.0% | 00.0% | 00.0% | 00.0% | 03.3% | 28.3% | 99.6% | 07.7% |

240/12 | .909 | 00.0% | 00.0% | 00.0% | 00.0% | 00.3% | 12.3% | 99.4% | 04.8% |

And here is the same information for the Lorenz curves.

CUMULATIVE DATA POINTS SHARE OF MAXIMUM AT GIVEN GINI COEFFICIENT (LORENZ CURVES) | |||||||||
---|---|---|---|---|---|---|---|---|---|

POWER | GINI | MINIMUM | 10TH PERCENTILE | 25TH PERCENTILE | MEDIAN | 75TH PERCENTILE | 90TH PERCENTILE | MAXIMUM | MEAN |

1/12 | .040 | 00.0% | 08.2% | 22.3% | 47.2% | 73.3% | 89.2% | 100.0% | 48.0% |

2/12 | .077 | 00.0% | 06.8% | 19.8% | 44.5% | 71.5% | 88.5% | 100.0% | 46.2% |

3/12 | .111 | 00.0% | 05.6% | 17.7% | 42.0% | 69.8% | 87.7% | 100.0% | 44.4% |

4/12 | .143 | 00.0% | 04.6% | 15.7% | 39.7% | 68.2% | 86.9% | 100.0% | 42.9% |

5/12 | .173 | 00.0% | 03.8% | 14.0% | 37.4% | 66.6% | 86.2% | 100.0% | 41.4% |

6/12 | .200 | 00.0% | 03.1% | 12.5% | 35.3% | 65.0% | 85.4% | 100.0% | 40.0% |

7/12 | .226 | 00.0% | 02.6% | 11.1% | 33.4% | 63.5% | 84.7% | 100.0% | 38.7% |

8/12 | .250 | 00.0% | 02.1% | 09.9% | 31.5% | 62.0% | 83.9% | 100.0% | 37.5% |

9/12 | .273 | 00.0% | 01.8% | 08.8% | 29.7% | 60.5% | 83.2% | 100.0% | 36.4% |

10/12 | .294 | 00.0% | 01.5% | 07.8% | 28.1% | 59.1% | 82.5% | 100.0% | 35.3% |

11/12 | .315 | 00.0% | 01.2% | 07.0% | 26.5% | 57.7% | 81.8% | 100.0% | 34.3% |

12/12 | .334 | 00.0% | 01.0% | 06.2% | 25.0% | 56.3% | 81.1% | 100.0% | 33.3% |

16/12 | .400 | 00.0% | 00.5% | 03.9% | 19.8% | 51.2% | 78.3% | 100.0% | 30.0% |

20/12 | .454 | 00.0% | 00.2% | 02.5% | 15.8% | 46.5% | 75.6% | 100.0% | 27.3% |

24/12 | .500 | 00.0% | 00.1% | 01.6% | 12.5% | 42.3% | 73.0% | 100.0% | 25.0% |

30/12 | .555 | 00.0% | 00.0% | 00.8% | 08.9% | 36.7% | 69.3% | 100.0% | 22.3% |

36/12 | .600 | 00.0% | 00.0% | 00.4% | 06.3% | 31.8% | 65.8% | 100.0% | 20.0% |

42/12 | .636 | 00.0% | 00.0% | 00.2% | 04.4% | 27.5% | 62.4% | 100.0% | 18.2% |

48/12 | .666 | 00.0% | 00.0% | 00.1% | 03.1% | 23.9% | 59.3% | 100.0% | 16.7% |

54/12 | .692 | 00.0% | 00.0% | 00.0% | 02.2% | 20.7% | 56.3% | 100.0% | 15.4% |

60/12 | .714 | 00.0% | 00.0% | 00.0% | 01.6% | 18.0% | 53.4% | 100.0% | 14.3% |

66/12 | .733 | 00.0% | 00.0% | 00.0% | 01.1% | 15.6% | 50.7% | 100.0% | 13.4% |

72/12 | .749 | 00.0% | 00.0% | 00.0% | 00.8% | 13.5% | 48.1% | 100.0% | 12.6% |

84/12 | .777 | 00.0% | 00.0% | 00.0% | 00.4% | 10.2% | 43.3% | 100.0% | 11.2% |

96/12 | .799 | 00.0% | 00.0% | 00.0% | 00.2% | 07.6% | 39.1% | 100.0% | 10.1% |

120/12 | .833 | 00.0% | 00.0% | 00.0% | 00.1% | 04.3% | 31.7% | 100.0% | 08.4% |

144/12 | .856 | 00.0% | 00.0% | 00.0% | 00.0% | 02.4% | 25.7% | 100.0% | 07.2% |

240/12 | .909 | 00.0% | 00.0% | 00.0% | 00.0% | 00.3% | 11.2% | 100.0% | 04.6% |

And as promised, some Lorenz curves:

Ooooh, the colors.

And for a fun digression, here’s a Lorenz curve of U.S. earnings.

(Source: Social Security Administration)

The data aren’t presented in a way that can be properly sorted, so I can’t calculate the area under the curve, but at the median, the Gini coefficient is probably about .5. At the 90th percentile, it’s more like .8. I wouldn’t be surprised if U.S. income polarization is worse than .45.

So, now you know how to estimate a Gini coefficient in MS Excel, and if someone throws Gini numbers at you, you can look at the tables to get a sense of what they mean in terms of the maximum data point. It’s still a number without any units, but at least you can see the relationships among the data points more clearly.

## One comment