I'm also a physicist and I would first say that generally (except for some specific sub-fields) we are actually not well trained on experiments (like this) that require significant statistics.
In particular the law of large numbers applies (which you should know about as a physicist). I would argue that despite the noise in the reporting the conclusions from this study are likely much more significant than a tightly controlled randomized trial with a small group (~100 people).
Self reported questionnaires are certainly a well established method in these sort of studies and the short-comings are well understood, that does not mean their results are meaningless.
In particular, I would argue that many (most) people can give meaningful quantitative answers to their eating habits over several years as their habits rarely change. Certainly most people here would be able to answer approximately with how many meals per week they eat meat, how much vegetables they eat on average, how often they drink alcohol etc.. We are only interested in rough averages.
Regarding your argument that there are too many variables that can't be controlled for, ok lets hear them name at least 10 and make a valid statistical argument why they need to be controlled for in a sample size of ~80 000.
In this case, a "well established method" simply means common-place methodology, not necessarily a robust scientific protocol.
The problem is the humans. We are not reliable witnesses, even of our own pasts, and unless we are following very strict diets, most of us essentially follow a mostly random walk with our food intake across a narrow range of food groups (even caloric intake can be wildly inconsistent over time). I've met enough people who completely underestimate their consumption of desserts, e.g., that they mentally reduce the occurrence of such events in their personal historical narratives. That alone could put many from both low-carb vegans and low-carb meat eaters into the high-carb category.
Which is the point, the worst failure of this methodology is that there is no evidence that human memory, regarding food consumption, provides an accurate measure over long periods of time. If you cannot establish the trustworthiness of the most basic tool of your methodology, then almost by definition it's meaningless. The error in self reporting could be high enough such that the true classification of 80% of the low carb group is actually high carb. Or the vegan group could be nearly half high-carb, etc.
Without a thorough vetting of the accuracy of the instrumentation used in the research (FFQ in this case), we cannot know the margins of error. But we have apriori reasons for believing the accuracy to be significantly high. Adding more people to the group doesn't increase the accuracy in this case, it only increases the volume of the noise. Is the True Low Carb ratio of each group 10% or 90%? Who knows.
If we can't validate the accuracy of the tools, it makes the results essentially a random choice between the possible conclusions. It means they haven't disproven the null hypothesis.
Which is why I favor some form of sequestration of smaller populations of similar genetic makeup (multiple groups for each type of diet), etc. But that is mostly impractical.
In particular the law of large numbers applies (which you should know about as a physicist). I would argue that despite the noise in the reporting the conclusions from this study are likely much more significant than a tightly controlled randomized trial with a small group (~100 people).
Self reported questionnaires are certainly a well established method in these sort of studies and the short-comings are well understood, that does not mean their results are meaningless.
In particular, I would argue that many (most) people can give meaningful quantitative answers to their eating habits over several years as their habits rarely change. Certainly most people here would be able to answer approximately with how many meals per week they eat meat, how much vegetables they eat on average, how often they drink alcohol etc.. We are only interested in rough averages.
Regarding your argument that there are too many variables that can't be controlled for, ok lets hear them name at least 10 and make a valid statistical argument why they need to be controlled for in a sample size of ~80 000.