Donald Trump can tweet again, 15 million Twitter users voted for it. However, the study is not representative.
Elon Musk shared a poll on Twitter over the weekend. A narrow majority voted to reactivate former US President Donald Trump’s Twitter account after the January 2021 shutdown. In addition to the result, the poll itself was discussed. It is not analog – although more than 15 million users participate in it.
But when are polls representative? This isn’t due to the number of respondents or the fact that six percent of Twitter’s 230 million daily users participated—far more, in both absolute and relative terms, than the usual survey of 1,000 respondents. On the other hand, it is often referred to as charade. What’s behind it?
From the individual to the whole
The difference between a representative and non-representative survey is the choice of respondents. Polls are representative if inferences can be drawn about the population of the sample (i.e., the respondents)—in the case of the Trump poll, that would be all Twitter users. For this to happen, “all carriers of the population must have the same opportunity to join this sample,” the Statista portal writes on his website.
This was not the case with Elon Musk’s vote on Trump. Not all Twitter users viewed the poll. Some deliberately ignore Twitter’s new owner. Many people for whom Trump’s return to the microblogging service is important or who absolutely want to avoid it, including perhaps some bots, are presumably involved. In such a case, statisticians speak of self-selection: the sample is skewed.
It depends on the sample
For a representative survey, the sample must not be biased. Survey participants must correspond to the population in some characteristics. So-called random sampling, in which potential participants are selected at random from a population registry and contacted, is considered the gold standard. Sabine Häder of the Leibniz Institute for the Social Sciences writes that this is “currently the highest quality design for nationwide population surveys”.
A less valuable way is to call phone numbers at random to contact potential participants. Some survey institutes also use so-called online panels, that is, a fixed group of potential survey participants. In addition, statisticians must know characteristics such as gender, age, or sample income. They check if the distribution matches the population distribution.
If the distribution skews, the researchers compensate for this with something called a weighting. Example: Half of the population is made up of women and half of men. If the sample contains 45 percent women and 55 percent men, women’s responses will be correspondingly more weighted and men’s responses will be less likely.
And the confidence period?
Of course, a sample-based survey cannot be as perfect as an all-person survey. No matter how representative the sample is chosen: the results will almost inevitably deviate from the results for the population as a whole. But the better the sample, the less likely a deviation will occur.
It is only at this point that the number of respondents becomes relevant: a representative sample of 10,000 respondents is better than one of 1,000 respondents – because the potential deviation from the position of the general population is smaller. In serious surveys, not only the score is given, but also the range in which the answers of the entire population are likely to lie – the so-called confidence interval.
In Elon Musk’s Trump poll, the sample is neither random nor representative, and nothing is read about a confidence interval. When the owner of Twitter writes that people have spoken (“people have spoken”), it is not entirely clear who he means. At best, a Twitter poll can reflect the opinion of all Twitter users. But this certainly does not apply to the Trump poll.