P values?
Do they account solely for sampling error (therefore irrelevant when population data is available) OR do they serve to asses the likelihood of something being due to chance in other ways (therefore relevant for studies with population data)?
Any links or literature are welcome :)
@rstats @phdstudents @datascience @socialscience @org_studies
Please keep in mind that p-values (and null-hypothesis testing) suffer from officially recognized intrinsic flaws:
https://doi.org/10.1080/00031305.2016.1154108 https://doi.org/10.1080/00031305.2019.1583913
That is, they have intrinsic problems even when used “correctly” (for examples see e.g. https://doi.org/10.3758/BF03194105). On top of this, they are often misinterpreted and used incorrectly.
A p-value is the probability of observing your particular sample data or other imagined data, assuming your null-hypothesis is true, and assuming additional hypotheses (such as underlying gaussian distributions or the like), which may or may not hold:
p-value = Pr(observed data or imagined data | null hypothesis and additional assumptions)
Because of the presence of imagined data, a p-value is not the likelihood of the null hypothesis (+assumptions), which is Pr(observed data | null hypothesis and additional assumptions)
My personal recommendation is that you do your analysis by using more modern methods that do not have intrinsic flaws. Many good books are out there (just one example https://doi.org/10.1201/9780429029608). Of course any method may be misused, but at least using a self-consistent method we have worry only about one problem (misuse) rather than two (misuse & intrinsic flaws).
An example of the quirky, unscientific characteristics of p-values. Imagine you design an experiment this way: “I’ll test 10 subjects, and in the meantime I apply for a grant. At the time the 10th subject is tested, I’ll know my application’s outcome. If the outcome is positive, I’ll test 10 more subjects; if it isn’t, I’ll stop”. Not an unrealistic situation.
With this stopping rule, your p-value will depend on the probability that you get the grant. This is not a joke.
This is a quote from H. Jeffreys, “Theory of Probability” § VII.7.2 (emphasis in the original) https://doi.org/10.1093/oso/9780198503682.001.0001:
“What the use of P implies, therefore, is that a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred. This seems a remarkable procedure. On the face of it the fact that such results have not occurred might more reasonably be taken as evidence for the law, not against it.”
It really depends on what you mean by “population data”. If you mean that you have data on every person (or object, or whatever your research is about) in the population you are interested about, then the is no need for p-values. The mean you calculate IS the actual population mean and there is no room for error (assuming each measurement is correct). If you just mean “a big dataset from the population” the inference statistics can still make sense.
One thing to consider is that mathematically a t- or z-test always assumes that the population is infinitely large (the confidence interval reaches zero at infinity), while in reality, as described above, your confidence interval should already be zero when your sample size is equal to the actual population size.
Hope that helps. ;)
@arandomthought
I read some similar comments online, but there were also positions contrary, but I think this makes sense.And I didn’t know about the infinite population thing, that is interesting.
If I may a follow up: despite p values, regression models and correlation tests can still be interesting to apply to census data to measure effect sizes and such, right?
Look up super-population theory. It is based on the idea that even a perfect census is only a point-in-time estimate of the theoretical “super-population” that the point-in-time population is derived from. In large, real world populations, people are constantly coming and going. If we assume that this coming and going is random and the relevant super-population parameters do not change over time, it is easy to see a census population as a sample instance from a larger super-population. While somewhat theoretical, this is a useful model when estimating relationships between variables in census data and leads to the use of standard frequentist confidence intervals and, yes, even p-values.
That’s a very cool way to look at it. You’re basically taking “a sample in time” and will never be able to sample across time (assuming we don’t invent time machines… ever), so you will always be looking at a super-population that is technically infinite. =)
Sure, even if you had all the data on your whole population (and therefore p-values “wouldn’t make sense”) a regression could still tell you something useful about that population. It can for example let you estimate how strongly variable X influences variable Y (or at least how strongly they are related. Causality is a separate issue), or what value of Y we would expect for someone new in the population with a certain value of X.
@arandomthought that’s very helpful, thank you so much!
@Jey_snow @rstats @phdstudents @datascience @socialscience @org_studies
P-values can be generated from various statistical tests, so a P-value gives no indication of whether the appropriate test was used to analyze the data.
Here are a couple of papers on P-values:
@MarcusMuench @rstats @phdstudents @datascience @socialscience @org_studies
According to the second article:
“…A p value should be interpreted in terms of what would happen if you repeated the measurement multiple times with different samples…”
If I have a census, I would expect zero difference for repeating measurements due to random sampling. Therefore p values are irrelevant for census data.Thanks for the references!
Careful there. If you had a census of ALL the people in your population then you would not expect any variation, as you wrote. But not because of random sampling, but because next time you sample, you would just sample the exact same people (your whole population). And since the sample stays the same, so do the numbers.
If you however truly took a random sample of the population, then the next time you take a new random sample you ask different people and would therefore also get slightly different numbers. And there p-values are useful, because they are based on exactly this question of “well what if I took another random sample, and then another, and then another and so on”.