Food analytics

Last modified on


People want to track their food calories. But how accurate is the data they are recording?

Real people:

  • Log `1 cup rice'' when it is 1.5.
  • Forget condiments and oils.
  • Use the wrong entry for a food that is `close enough.''

In the real world, the error stack looks like this (largest to smallest):

  • User logging error (portion size, omissions) – likely the biggestref.
  • Label and manufacturer inaccuracies — nutrition labels themselves are not perfect.
  • App database issues, including:
    • Variation in which base database was used (USDA vs branded vs regional sources).
    • User-uploaded entries with typos, wrong serving sizes, or `creative'' rounding.

Unfortunately, if we sorted the data by the ease of fixing it, the order would be reversed. App database issues are the easiest to fix, followed by label inaccuracies, with human mis-behavior being the hardest to address (though we can think of some ways). Especially, if we consider technical solutions to the problem.

We will look at each of these in turn.

1. User logging error

Food logging is, scientifically speaking, `asking a human to do accounting with a brain that evolved to dodge lions.''

A feeding-study comparison found that participants reported ~80% of items actually consumed. Critically, additions/addones in foods and drinks were omitted more often than the `main'' items (think: sugar in coffee, dressing, sauces, oils) ref. The addons are cognitively harder to remember and estimate and thus more likely to be forgotten. You can see a meta analysis here.

User-level omission + portion misestimation routinely creates ~10-20% energy error (and sometimes far more) even in structured studies, and can be worse in free-living app logging.

2. US/Canada Nutrition Facts label

Packaged food labels, regulated by bodies like the U.S. Food and Drug Administration (FDA), provide nutritional information based on the manufacturer's lab analyses. However, these analyses can be affected by various factors, including natural nutrient variability, inconsistent sample collection methods, and reporting inaccuracies. Although the FDA enforces accuracy guidelines, there can still be slight variations or uncertainty in the data presented.

Additionally, the FDA's policies allow for minor deviations in the reported calorie count, implying that a product labeled as 200 calories could, in reality, contain up to 240 calories and still comply with the regulations. Additionally, trans fats, notably harmful to health, may not be reported if they fall below 0.5 g per serving, enabling manufacturers to claim "0 g trans fat" by merely adjusting serving sizes. The same rule applies to any nutrient less than 0.5 grams.

“The calories in a packaged food product can differ from what is stated on the Nutrition Facts label and you may be getting more calories than you bargained for,” by up to 20%, says Catherine Lee, PhD, a food scientist at Procter & Gamble. So a snack bar labeled as having 200 calories could potentially be 240 calories and still be within the government labeling guidelines.

The FDA categorizes nutrients into three classes. Class I nutrients are added elements like fortified vitamins or minerals. They must be present at 100% or more of the declared value on the label. Class II nutrients occur naturally in food and include vitamins, minerals, protein, and carbohydrates. They must be at least 80% of the declared value. Finally, the third category includes calories, sugars, fats, and sodium, which must not exceed 120% of the declared value. Even with these categorizations, manufacturers are allowed some reasonable excesses or deficiencies within these classes as long as they adhere to good manufacturing practices.

3. Manufacturer Complacency

Packaged food labels from different brands can show varying nutritional values for the same type of food, which can cause confusion. Some of these differences might result from the use of distinct macronutrients during the manufacturing process or different analytical methods for determining nutritional content.

I'm working as a student in a quality control lab of a dairy factory. First: they don't really care about what the consumers get served as long as they follow the law and that is explained in the other comments. They do care however if there is money involved. The firms have contracts with shops and wholesales and those do their own quality control as well. The accuracy of the label depends on the value of the ingredient. For example: lactose is abundant in milk and is fairly cheap, so the concentration of lactose is allowed to vary alot. Fat by contrast is expensive and the wolsales are aware of that too. If a label says that some whipped cream contains 35% fat you can be fairly sure that the real concentration won't be 34,3% or something (yes they are that strict). Note: 'fat' is not just normal fat. The triglyceride molecules in 'milk'fat contain a certain mixture of fatty acids and this has to be preserved as well (by law) so we can't just use some garbage fat. The fat has to come from milk (around 20x the quantity of milk is needed to make 1x the quantity of 40% cream). This makes fat so expensive.

I bought 2 packages of frozen blueberries from different brands, explicitly with no added sugars or ingredients, and cannot figure out why their info differs, nor why the macros don't add up to the calories given (per 100g):

46 calories, 0.6g protein, 6.1g carbs (of which 6.1g are sugars), 0.6g fats, fiber not given (isn't counted towards total calorie value on food labels in germany anyways)

53 calories: 0.4g protein, 10.4g carbs (of which 10.4g are sugars), 0.3g fats, 3.3g fiber

any ideas? minor differences I'd expect based on what database they're using for labeling, but not by this much.

[...]

Nutritional labels in the US can be off by up to 20% legally.

[...]

Those very small differences could easily be explained by different blueberry varieties or even different crops. For example, cider apples have ~40% more sugar than dessert apples.

Many brands calculate their calories simply by mutiplying CHOs and PTNs by 4 and LIPs by 9, but we have differences between available energy in different kinds of macronutrients, that 4,4,9 rule is just the average kcal released by oxidation. Bigger brands usually calculate the energy value by calorimetry, which means it's almost as close as it gets to the real kcal value of the food you are ingesting so it's okay to trust them.

Those noticeable differences between brands might be just due to use of different kinds of macronutrients, especially fats, in the fabrication process.

[...]

Lets say Brand A uses flour from supplier X and brand B from supplier Z, the carbohidrate composition (meaning starch%, chain size, etc.) from X and Z might be different due to quality of crops, processing, and on... the difference between them would not be noticeable by calculating manually (using a standard for white flour) but when calorimetry is used the "true" energetic value of the bread contents are shown for each brand and that difference can be meaninful.

4. Studies on Food Label Accuracy

A 2014 study, Food Label Accuracy of Common Snack Foods, measured the true calorie content of 24 popular snack foods.

When accounting for the deviation in serving size, median estimated metabolizable energy was 6.8 kcal (0.5, 23.5, p=0.0003) or 4.3% (0.2, 13.7, p=0.001) higher than the label calories.

A 2010 study titled 'The Accuracy of Stated Energy Contents of Reduced-Energy, Commercially Prepared Foods' tested the energy content of 29 restaurant meals and 10 frozen meals.

On average, restaurant foods contained 18% more energy than stated; however, there was substantial variability in the difference between measured and stated values, and some foods contained twice as much energy as stated. The measured energy content of supermarket-purchased meals was also greater than stated values, by 8%. There was no statistically significant difference between measured and stated values in either restaurant foods (P = 0.12) or supermarket meals (P = 0.12), and no significant difference between the mean accuracy of restaurant vs supermarket foods (P = 0.64).

Using label data to generate nutrient data for foods assumes that the label information is reliable. While there have been no wide-scale assessments of the reliability of NIP data, a small study conducted between 2004 and 2005 analysed 350 foods and compared the analysed values with the values presented on the product label. The study found a discrepancy between -13% to +61% for individual nutrient values (Fabiansson, 2006). A small study conducted by FSANZ between 2008 and 2009 analysed 363 foods also found 60% of analysed sodium values were within 20% of the level reported on the label (FSANZ, 2009).

These studies collectively indicate that there's a considerable discrepancy between the nutrient values declared on food labels and the actual nutrient content.

5. Food Tracking Apps

Food tracking apps like MyFitnessPal and Lose It rely on nutritional data uploaded by their users, which can obviously be inaccurate.

5.1. MyFitnessPal

MyFitnessPal's own help center says its database is:

Our food database contains a combination of foods added by MyFitnessPal, depending on the availability of information, as well as foods that are added by our users. We allow users to both add and share new food items with our community to continue to grow our database and enhance the overall experience for all users.

They have a "check mark" system for entries they say they’ve reviewed and believe are accurateref.

So:

  • There are MFP-created / curated entries.
  • There are user-created entries, widely shared.
  • The company does not publish a ratio of "official" vs user data.

Community users regularly complain that the database is "mostly user entries," and note that there are incorrect ones ref. That is observational, and not hard data: for generic items you often see something that clearly comes from a standard database; for brands and random products, you see 17 garbage duplicates.

5.2. Lose It

Lose It's own support page is much more explicit about its sources ref. They say their food database is built from:

  • The USDA food database for many "generic" foods
  • Member uploaded foods, which can be shared to all users.
  • Verified foods that are marked with a green checkmark, indicating accuracy.

A 2019 study compared 4 tracking apps (MyFitnessPal, Lose It, FatSecret, Cronometer) against the Swiss Food Composition Database and found inconsistent feedback and gaps in nutrients across apps, questioning reliability for precise dietary tracking. arXiv

A 2020 study on MyFitnessPal found it was generally accurate for total energy and macros, but less so for things like cholesterol and sodium. PMC

So to sum up: the nutritional information provided by users is not always verified or accurate.

Portions sizes can be subjective and difficult to estimate accurately, leading to further discrepancies in the data. For instance, what one user considers a 'medium' apple, another might view as 'large,' leading to significant differences in recorded caloric and nutrient intake. This can create inconsistencies across the board, skewing the overall accuracy of the app's data.

Another potential issue arises from the inherent variation in natural foods. Fresh fruits, vegetables, and meats can have varying nutrient profiles depending on factors such as soil quality, feed, or farming practices. As such, even if the user inputs data accurately according to their source, there could still be differences in the actual nutrient content.

Moreover, these apps often rely heavily on packaged food data for their databases. Given the potential inaccuracies of food labels (see below), this reliance can further contribute to the inaccuracy of the data found in these apps.