Gini Coefficients: How to Calculate Them in Excel and What The F They Mean

Social scientists and wonk readers are familiar with Gini coefficients, and while I’ve used them a few times myself, I have no idea what they really mean.

Okay, fine, I know what a Lorenz curve is, and I can explain it mathematically. But ultimately any understanding of the Gini coefficient is inherently relative, e.g. “They say .3 is good for income distribution, but anything above .5 is bad. The U.S. is .45, so that’s bad … ish.” It doesn’t say anything about the actual distribution.

Indeed, the first two Wonkblog posts I found in a brief Internet search describe the Gini coefficient essentially as a (well-known) measure of inequality. No details.

Vox is slightly better: “[I]t’s an incredibly abstract idea that’s difficult to verbally describe. [Thanks, Vox.] The advantage to using a gini coefficent is that in principle it summarizes all the information about the distribution of income and thus facilitates easy comparisons.”

Easy comparisons? To what? Other Gini coefficients? Whatever.

Still, aside from eyeballing a bunch of Lorenz curves—fun fact: I’m going to make you eyeball a bunch of Lorenz curves today—a Gini coefficient is totally undescriptive because it’s a decimal without any units.

This post tries to remedy that, but first, per the title here’s a big ol’ array formula for calculating Gini coefficients in MS Excel. The source is Excel and UDF Performance Stuff. I chose the Angus Deaton version and improved on it by replacing the ROW formulae with COUNTIFs because those account for blank cells in the data. It also avoids the pitfall of negative data numbers, which mess up Lorenz curves. However, as an array formula, it must be entered with CTRL-SHIFT-ENTER otherwise it won’t work. Behold:

((COUNTIF(Datarange,”>=0″)+1)/(COUNTIF(Datarange,”>=0″)-1)-2/(COUNTIF(Datarange,”>=0″)*(COUNTIF(Datarange,”>=0″)-1)*AVERAGEIF(Datarange,”>=0″))*SUM((IF(Datarange>=0,RANK(Datarange,Datarange))+((COUNT(Datarange)+1-IF(Datarange>=0,RANK(Datarange,Datarange,0))-IF(Datarange>=0,RANK(Datarange,Datarange,1)))/2))*IF(Datarange>=0,Datarange)))/(COUNTIF(Datarange,”>=0″)/(COUNTIF(Datarange,”>=0″)-1))

*Where “Datarange” is something like, “$B$1:$B$1000”.

The Web site contains a few other methods, but this one is the most comprehensive and isn’t limited to 4,000 data points. I’ve tested this version with hand-cranked Ginis, that is, sorting the data by size, creating a cumulative sum of them (the Lorenz curve), calculating a bunch of trapezoids among them to find the area under the Lorenz curve, adding those up, subtracting them by equivalent triangle of equality, and then dividing them by that triangle.

So what does the Gini coefficient mean? In other words, what kind of distribution can one imagine when provided with a given Gini number?

It’s a damn good question that I figured out an answer to. I created a bunch of columns in Excel with a bunch of RAND() formulae, gathered some statistics, and then averaged the results. The RAND() formula produces a random number from 0 to 1. The more of them you use, the bigger, more robust the distribution you get. A .1 is as likely as a .9, etc. The mean and median will be .5. But what’s the Gini coefficient of a large number of random numbers within a set range?

Answer: One third.

This is huge for me because now I know when someone says that the Gini coefficient for the income distribution in some country is .333333, I can visualize that it’s equivalent to a random income distribution. Because one third is such a round number, I had a hunch that taking each RAND() formula to different powers would yield other round Gini coefficients, so, here’s a table of Gini coefficients as a power of each data point from a random distribution. (And yes, this blog endorses the divisible properties of the number 12.) I’m adding similar information for the resulting Lorenz curves. It’s quite intriguing—and quite tedious because it takes a few minutes for my computer to spit out the numbers. But this blog wouldn’t be anything if it didn’t hog some clock cycles:

DATA POINTS SHARE OF MAXIMUM AT GIVEN GINI COEFFICIENT
POWER	GINI	MINIMUM	10TH PERCENTILE	25TH PERCENTILE	MEDIAN	75TH PERCENTILE	90TH PERCENTILE	MAXIMUM	MEAN
1/12	.040	48.9%	82.5%	89.1%	94.4%	97.6%	99.1%	100.0%	92.3%
2/12	.077	24.2%	68.0%	79.4%	89.1%	95.3%	98.3%	100.0%	85.7%
3/12	.111	12.1%	56.1%	70.7%	84.1%	93.1%	97.4%	100.0%	80.0%
4/12	.143	06.1%	46.3%	63.0%	79.4%	90.9%	96.6%	100.0%	75.0%
5/12	.173	03.1%	38.2%	56.1%	74.9%	88.8%	95.7%	100.0%	70.6%
6/12	.200	01.6%	31.5%	50.0%	70.7%	86.7%	94.9%	100.0%	66.7%
7/12	.226	00.8%	26.0%	44.5%	66.8%	84.6%	94.0%	100.0%	63.2%
8/12	.250	00.4%	21.4%	39.7%	63.0%	82.6%	93.2%	100.0%	60.0%
9/12	.273	00.2%	17.7%	35.3%	59.5%	80.7%	92.4%	100.0%	57.2%
10/12	.294	00.1%	14.6%	31.5%	56.2%	78.8%	91.6%	100.0%	54.6%
11/12	.315	00.1%	12.0%	28.1%	53.0%	76.9%	90.8%	100.0%	52.2%
12/12	.334	00.0%	09.9%	25.0%	50.0%	75.1%	90.0%	100.0%	50.0%
16/12	.400	00.0%	04.6%	15.7%	39.7%	68.3%	86.9%	100.0%	42.9%
20/12	.454	00.0%	02.1%	09.9%	31.5%	62.1%	83.9%	99.9%	37.6%
24/12	.500	00.0%	01.0%	06.3%	25.0%	56.4%	81.0%	99.9%	33.4%
30/12	.555	00.0%	00.3%	03.1%	17.7%	48.9%	76.9%	99.9%	28.6%
36/12	.600	00.0%	00.1%	01.6%	12.5%	42.4%	72.9%	99.9%	25.1%
42/12	.636	00.0%	00.0%	00.8%	08.9%	36.8%	69.2%	99.9%	22.3%
48/12	.666	00.0%	00.0%	00.4%	06.3%	31.9%	65.6%	99.9%	20.1%
54/12	.692	00.0%	00.0%	00.2%	04.4%	27.6%	62.3%	99.9%	18.2%
60/12	.714	00.0%	00.0%	00.1%	03.1%	24.0%	59.1%	99.8%	16.7%
66/12	.733	00.0%	00.0%	00.0%	02.2%	20.8%	56.1%	99.8%	15.4%
72/12	.749	00.0%	00.0%	00.0%	01.6%	18.0%	53.2%	99.8%	14.3%
84/12	.777	00.0%	00.0%	00.0%	00.8%	13.5%	47.9%	99.8%	12.6%
96/12	.799	00.0%	00.0%	00.0%	00.4%	10.2%	43.1%	99.7%	11.2%
120/12	.833	00.0%	00.0%	00.0%	00.1%	05.8%	35.0%	99.7%	09.1%
144/12	.856	00.0%	00.0%	00.0%	00.0%	03.3%	28.3%	99.6%	07.7%
240/12	.909	00.0%	00.0%	00.0%	00.0%	00.3%	12.3%	99.4%	04.8%

And here is the same information for the Lorenz curves.

CUMULATIVE DATA POINTS SHARE OF MAXIMUM AT GIVEN GINI COEFFICIENT (LORENZ CURVES)
POWER	GINI	MINIMUM	10TH PERCENTILE	25TH PERCENTILE	MEDIAN	75TH PERCENTILE	90TH PERCENTILE	MAXIMUM	MEAN
1/12	.040	00.0%	08.2%	22.3%	47.2%	73.3%	89.2%	100.0%	48.0%
2/12	.077	00.0%	06.8%	19.8%	44.5%	71.5%	88.5%	100.0%	46.2%
3/12	.111	00.0%	05.6%	17.7%	42.0%	69.8%	87.7%	100.0%	44.4%
4/12	.143	00.0%	04.6%	15.7%	39.7%	68.2%	86.9%	100.0%	42.9%
5/12	.173	00.0%	03.8%	14.0%	37.4%	66.6%	86.2%	100.0%	41.4%
6/12	.200	00.0%	03.1%	12.5%	35.3%	65.0%	85.4%	100.0%	40.0%
7/12	.226	00.0%	02.6%	11.1%	33.4%	63.5%	84.7%	100.0%	38.7%
8/12	.250	00.0%	02.1%	09.9%	31.5%	62.0%	83.9%	100.0%	37.5%
9/12	.273	00.0%	01.8%	08.8%	29.7%	60.5%	83.2%	100.0%	36.4%
10/12	.294	00.0%	01.5%	07.8%	28.1%	59.1%	82.5%	100.0%	35.3%
11/12	.315	00.0%	01.2%	07.0%	26.5%	57.7%	81.8%	100.0%	34.3%
12/12	.334	00.0%	01.0%	06.2%	25.0%	56.3%	81.1%	100.0%	33.3%
16/12	.400	00.0%	00.5%	03.9%	19.8%	51.2%	78.3%	100.0%	30.0%
20/12	.454	00.0%	00.2%	02.5%	15.8%	46.5%	75.6%	100.0%	27.3%
24/12	.500	00.0%	00.1%	01.6%	12.5%	42.3%	73.0%	100.0%	25.0%
30/12	.555	00.0%	00.0%	00.8%	08.9%	36.7%	69.3%	100.0%	22.3%
36/12	.600	00.0%	00.0%	00.4%	06.3%	31.8%	65.8%	100.0%	20.0%
42/12	.636	00.0%	00.0%	00.2%	04.4%	27.5%	62.4%	100.0%	18.2%
48/12	.666	00.0%	00.0%	00.1%	03.1%	23.9%	59.3%	100.0%	16.7%
54/12	.692	00.0%	00.0%	00.0%	02.2%	20.7%	56.3%	100.0%	15.4%
60/12	.714	00.0%	00.0%	00.0%	01.6%	18.0%	53.4%	100.0%	14.3%
66/12	.733	00.0%	00.0%	00.0%	01.1%	15.6%	50.7%	100.0%	13.4%
72/12	.749	00.0%	00.0%	00.0%	00.8%	13.5%	48.1%	100.0%	12.6%
84/12	.777	00.0%	00.0%	00.0%	00.4%	10.2%	43.3%	100.0%	11.2%
96/12	.799	00.0%	00.0%	00.0%	00.2%	07.6%	39.1%	100.0%	10.1%
120/12	.833	00.0%	00.0%	00.0%	00.1%	04.3%	31.7%	100.0%	08.4%
144/12	.856	00.0%	00.0%	00.0%	00.0%	02.4%	25.7%	100.0%	07.2%
240/12	.909	00.0%	00.0%	00.0%	00.0%	00.3%	11.2%	100.0%	04.6%

And as promised, some Lorenz curves:

gini-coefficients-at-given-powers-of-random-distribution

Ooooh, the colors.

And for a fun digression, here’s a Lorenz curve of U.S. earnings.

lorenz-curve-of-u-s-earnings-2015

(Source: Social Security Administration)

The data aren’t presented in a way that can be properly sorted, so I can’t calculate the area under the curve, but at the median, the Gini coefficient is probably about .5. At the 90th percentile, it’s more like .8. I wouldn’t be surprised if U.S. income polarization is worse than .45.

So, now you know how to estimate a Gini coefficient in MS Excel, and if someone throws Gini numbers at you, you can look at the tables to get a sense of what they mean in terms of the maximum data point. It’s still a number without any units, but at least you can see the relationships among the data points more clearly.

6 comments

Pingback: Full-Time Law-School Application Inequality Up in 2016 | The Last Gen X American
Pingback: CLASS OF 2016 EMPLOYMENT REPORT (CORRECTED) | The Last Gen X American
Volker Schweisfurth says:

2017/09/29 at 08:42

Matt, this Looks useful for me. I am working in visusalizing the “new Split” : The WEF recently underlined that nowadays the gap between countries is met by the gaps ( economically) within nearly all societies; this is extremely dangerous. With your table I intend to visually show the gaps weithin 150 countries. Best regards Volker Schweisfurth http://www.meliesart.de

1. Matt Leichter says:
  
  2017/09/29 at 11:17
  
  Glad I could help, Volker.
  
Regi says:

2018/06/29 at 07:05

Thank you so much! You are a genius
It works perfectly and with slight changes helped me to complete my work

Pingback: CLASS OF 2018 EMPLOYMENT REPORT: About the Same | Sportal Template

Share this:

Related

6 comments

Leave a comment Cancel reply