Linguistic Distance

Differences in language amongst countries are measured using three scales:

L1 is a 5 point scale which quantifies the difference between the dominant languages of any two countries, i and j,

L2 is a 5 point scale based on the incidence of country i‘s dominant language(s) in country j, and

L3 is a 5 point scale based on the incidence of country j‘s dominant language(s) in country i.

The scores for each of these indicators and the resultant factor (see below concerning the confirmatory factor analysis) can be found in an attached Excel spreadsheet at the bottom of this page. This spreadsheet contains the values for 22,350 country pairs (i.e. n x n-1 for 150 countries) for three specific time periods (1995, 2005 and 2015). The precise coding for these variables is explained below.

L1  – Distance Between Major Languages

The first language indicator is the distance between the two closest major languages for each pair of countries. This distance is based on the ‘Family of Languages’ (attached at the bottom of the page) and is coded as follows:  

5 – if in different families of languages
4 – if in the same family, but in different branches
3 – if in the same branch, but different at the 1st sub-branch level
2 – if in the same 1st level sub-branch, but different at lower levels
1 – if the same language (i.e. same 3 letter language code)

L2 & L3 – Incidence of One Country’s Major Language(s) in Other Countries

The second and third language indicators measure the proportion of the population in one country that are able to speak the major language(s) of another country. 

L2 concerns the incidence of the country i’s major language(s) in country j, and
L3 concerns the incidence of the country j’s major language(s) in country i

The indicators are coded as follows:  

5 – if less than 1%
4 – if greater than or equal to 1% but less than 5%
3 – if greater than or equal to 5% but less than 50%
2 – if greater than or equal to 50% but less than 90%
1 – if greater than or equal to 90%  

Where a country has more than one major language, a weighted average is calculated.  

Lang fis the single-factor solution, using principal component analysis, for L1, L2 and L3 

Lang Dist  – is  Lang f  re-scaled such that the lowest possible score (1, 1, 1) equals zero (0), and the maximum possible score (5, 5, 5) equals ten (10)


A major language for a given country is defined as any language which can be spoken by more than 20% of the population, or a language which holds a official status within the country (e.g. an official second language or a de facto working language).

If only one language exceeds the 20% threshold, and does not cover at least 50% of the population, then the next most common language will also be deemed a major language.

If no language exceeds 20% threshold, then the two most common languages are deemed to be major languages.

For the countries used in our analyses, 167 languages qualified as a major language for at least one of the 150 countries in one of the three time periods.  These languages have been grouped into a hierarchy of families, branches, 1st level sub-branches, 2nd level sub-branches, etc. based on the Ethnologue classification of languages. Excel files documenting the hierarchy of language families and the major languages for each country for each time period are attached at the bottom of this page.

Lang f – Differences in Language Factor:

The preceding three indicators have be reduced to a single factor using confirmatory factor analysis (cfa).  This factor score has been estimated using the full set of country pairs (22,350) and time periods (3).  The individual factor loadings and the Cronbach alpha are reported below.

  Lang f – 3 item factor score for differences in language   0.799
  L1 – Distance between major languages 0.826
  L2 – Incidence of i’s major language in country j 0.955
  L3 – Incidence of j’s major language in country i 0.955


The primary sources for these estimates were …

      • Grimes, B. F. (ed), Ethnologue: Languages of the World, 13th Edition, 1996
      • Grimes, B. F. (ed), Ethnologue: Languages of the World, 14th Edition, 2000
      • Gordon, R. G. (ed), Ethnologue: Languages of the World, 15th Edition, 2005
      • Simons, G. F. and Fennig, C. D.  (eds), Ethnologue: Languages of the World, 21th Edition, 2018 – accessed via

Lang Dist_all years_n22350


Major Languages by Country – 1995

Major Languages by Country – 2005

Major Languages by Country – 2015