The problem of substitution bias in CPI data occurs because there is no perfect way of accounting for the changing pattern of consumers preferences. In other words, when the relative prices of products change, consumers will naturally switch from buying some goods in favor of others, and that leaves us unable to specify exactly by how much their cost of living has changed.
The key here is to understand that a cost of living index is intended to give some idea about how the standard of living is changing over time as the prices of goods and services change. To do that with complete accuracy would require us to calculate each person's indifference curve between the goods he/she purchases.
This concept is best illustrated with a graph, but I would encourage the reader to read up about indifference curves and budget constraints to better understand the graph below.
I have previously used this graph in my article about budget constraints to explain how a consumer's consumption preferences between good x and good y change after a reduction in the price of good x. On this page we can use the reverse logic to explain what happens after good x becomes more expensive.
If we start with the blue indifference curve U', this curve touches the budget line BL' with a preferred consumption x' and y'. Now, after a price increase in good x, the highest possible indifference curve that the new budget line BL can touch is the lower blue curve U. The reduced utility from U' to U illustrates the reduced standard of living after the price increase.
This is quite different to simply calculating the increased cost of the original x' y' consumption bundle precisely because the consumer has changed his/her preferred bundle to x y. If the preferred consumption had not changed, only then can we calculate the extra cost and determine that the cost of living had risen by precisely that amount, and that the standard of living had fallen proportionately.
The new preferred bundle x, y is the new utility maximizing bundle, and if our consumer were to continue consuming goods x and y in the same proportions as previously, he/she would have to settle for a lower amount of utility.
Whenever a consumer prefers to substitute more of one good over the other, this is done in order to limit the loss of utility, and therefore it would be incorrect to state that the standard of living had fallen proportionately with the higher cost of the original preferred consumption bundle x' y'. In other words, there is a substitution bias at work that reduces the cost of living impact incurred by the higher price of good x.
With inflation in 2022 at a historically high level, you may have heard some people remarking that the official Consumer Price Index measure is understating the true level of inflation. This may or may not be true, but the substitution bias actually works to make an unadjusted price index actually overstate inflation.
It is, of course, quite proper to consider that your own personal cost of living may have increased by more than the official estimate of inflation if you consume a relatively large amount of the goods most affected by price increases. However, following the reasoning above, you should realize that the substitution bias is real and will reduce the impact of higher prices on your standard of living if you can find cheaper alternatives to your previously preferred basket of goods.
You may also have heard some commentators state that the official CPI estimate understates the true inflation rate, and then follow up by stating that if the CPI was measured in the same way that it was measured in the 1970s, it would be higher today than it was then.
The last part may be true, but it would be more accurate to state that the inflation estimates of the 1970s were overstated, rather than today's estimates, and that if they had been measured in the same way that we measure them today then they would have been lower than today's estimates. Why is that? Because the US Bureau of Labor Statistics did not account for substitution bias in the 1970s, but they do today - at least partly.
The adjustment is made by what they call a 'geometric means formula' whereby they split expenditures into categories. So, expenditures on different cuts of meat might be one category, and any substitution bias occurring within that category as the relative prices of those cuts change is taken into account when estimating overall inflation. This is referred to as a 'lower-level substitution bias'. However, even today, no attempt is made to account for 'upper-level substitution bias' i.e. where expenditures move between 'categories' as a result of relative price changes (presumably because this would be too difficult to do with any precision).
Some items in the CPI basket of goods are known to be more volatile in terms of their price fluctuations than other items. In particular, energy and food prices are known to be volatile, partly because they are essential items i.e., they have a low price elasticity of demand meaning that consumers do not simply buy something else when the price of these goods increases.
The core inflation index is sometimes preferred by economists during volatile economic times because it excludes these goods. This is, of course, highly controversial since these basic necessities tend to account for a larger proportion of expenditure for poorer households.
For cost of living purposes, the items in the representative basket of goods that typical consumers spend their money on may be reasonable. However, there are two notable points here:
Substitution bias is not the only problem with estimating cost-of-living changes, the very products themselves tend to change over time, not just their prices. Cars, and all technological products, improve in terms of their quality as time passes.
If we simply state that the cost of an average family car has risen by some percentage rate over some given timeframe, that would be an unfair estimate of the cost-of-living because modern cars are far superior to older cars. Performance, comfort, safety, reliability and so on are all so much better that the product itself is not the same.
By the same token, some goods that we buy today simply were not available in the past. Modern computers, smartphones, Netflix subscriptions and so on did not exist a few years ago, and that presents a problem because these are all new categories for which substitution bias cannot be accounted for.
Lastly, I should note that there is some significant potential for government manipulation of the figures. I find it highly suspicious that certain items in the representative basket of goods used to calculate the CPI can be weighted differently at different times for unclear purposes. This is especially suspicious given that the results of these weighting changes usually seem to present inflation estimates as being less troublesome than other independent estimates of inflation suggest.