By Gordon Rugg
I really, really, really hate badly designed questionnaires.
That’s an issue, because most questionnaires are badly designed. The bad design makes them worse than useless. At least if something is useless, it isn’t making the situation actively worse. Badly designed questionnaires, however, can make a situation significantly worse, by adding disinformation into the story, so that a problem takes longer to solve.
This is even more of an issue because questionnaires are so widely used. Any idiot can design a bad questionnaire, and many idiots do, with a variety of excuses, such as:
- Every other idiot is doing this, so I want to get in on the act
- Nobody ever got fired for using a questionnaire
- It’ll all come right in the end anyway, even if I do it badly
- Who cares?
None of these arguments inspire much confidence or respect with regard to the person using them.
In this article, I’ll make a start on the issues affecting questionnaires. They’re big issues, that have deep roots and broad implications, so discussing them in full will take a number of articles. For now, I’ll focus on a single topic, namely how Likert scales can be used within questionnaires.
Likert scales, and Likert-style scales, are widely used (and widely misused) in questionnaires. In this article, I’ll look at some of the key concepts involved, and at some of the issues involved in using this approach properly.
Images from Wikipedia; sources at the end of this article
I’ll start by giving some background context about problems with questionnaires.
Questionnaires have been in use since at least the days of the ancient Romans. If someone’s using a questionnaire today, you might expect that they’ll be at least as professional about it as someone working before electricity was discovered or Columbus encountered America. The reality, though, is very different.
Here’s an example of getting it right. It’s from the Domesday Book, compiled in 1086.
There is situated there, in addition, one berewick, as the manor of Heuseda. In the time of king Edward, 1 carucate of land; then and afterwards 7 villains, now 5. At all times 12 bordars, and 3 serfs, and 40 acres of meadow; 1 mill. Woods for 16 swine and 1 salt pond and a half.
A key point about this particular entry is that it records the ownership of one and a half salt ponds.
How can someone own one and a half ponds? When you stop and think about it, you realise that it’s perfectly reasonable; the second pond can be jointly owned by two people.
The team that compiled the Domesday Book got this right, over nine centuries ago. Most present-day questionnaires don’t get it right.
I use this example regularly with my long-suffering students. One common response is along the lines of “Okay, but how often is that going to happen?”
It’s a fair question, and the first part of the answer is that I don’t know. The second, and much more important, part of the answer, is that the person giving that response doesn’t know either. We might guess that it’s a rare issue, but the key point is that we would both be guessing. That’s not a good strategy, especially when the whole point of your questionnaire is to find out how often something actually occurs.
There are a couple of similar questions where I do know the answers, and where the answers show why this issue is really important. One is the question of how many motorbikes you own; the other is the question of whether a new-born baby is male or female. In both cases, the answer “not sure” occurs about 1% of the time.
In the case of motorbike ownership, there are numerous possible reasons for not being sure whether or not you own a motorbike. As with the half-pond, you might own one jointly with someone else; does that count as owning it? Another possible situation is where you are buying it in instalments, but haven’t completely paid it off yet. Another is that you own an engine-supplemented pushbike; does that count as a motorbike or not?
The figure for motorbike ownership comes from a newspaper report that I saw years ago. The precise figure doesn’t particularly matter; the more important issue is that this illustrates how an apparently simple question can have multiple possible answers that are only obvious with hindsight.
The figure for baby gender similarly varies depending on the definition that you use. Most of the numbers that I’ve seen are in that general area. A lot of babies are born with ambiguous or unclear gender. One per cent may not sound a lot when expressed as a percentage, but when you work it out as a percentage of the population, then you realise that the hospital system for the UK will be dealing with about ten thousand cases a year, as a back-of-envelope figure. That’s a lot of cases, and all of them are going to involve major human issues, as the parents grapple with decisions that will have huge implications for the child’s life, and that they have probably never thought about before, and where building some compassion and support into the medical system can make an enormous difference to everyone involved.
So, it’s really important to find out the frequency of the answers that aren’t immediately obvious. Even if they’re only 1% of the population, in some situations that can translate into a major issue. Sometimes, conversely, you find that something is much more widespread than you had imagined. The early surveys into human sexual behaviour turned up quite a few answers that hardly anyone had been expecting, and that changed the landscape for debate about which sexual behaviours should be decriminalised.
All the examples above involve things that are solid and tangible; there’s not much doubt about whether or not a motorbike exists when it’s right in front of you, or about whether a salt pond exists if you’re standing in it. However, even with those items, there’s a surprising amount of uncertainty; for instance, is that machine in front of you actually a motorbike as opposed to a scooter, or is the water you’re standing in really a salt pond as opposed to a lagoon?
If the situation is that messy with relation to motorbikes and ponds, then you’d expect things to be even worse with relation to subjective issues such as emotions and opinions. That’s where Likert scales enter the story.
Likert scales are named after Rensis Likert, who invented them in the early 1930s.
They turn up in most questionnaires, usually in the form of response options such as:
- Strongly agree
- Weakly agree
- Neither agree nor disagree
- Weakly disagree
- Strongly disagree
That’s a really useful format for handling responses about subjective, intangible topics.
So far, so good.
Another variant is the visual analogue Likert-style scale. This occurs in various forms. Here’s one form, from Wikipedia:
Here’s another form, where the participant draws a vertical line through the horizontal scale line at the point of their choice.
The initial line has a label at each end, like this.
Image copyleft Hyde & Rugg, 2014
The participant now draws a vertical line (shown below in blue) at the appropriate point on the horizontal line.
Image copyleft Hyde & Rugg, 2014
The researcher measures the distance along the horizontal line to where the vertical line is placed; that gives a score. In the example below, that score would be 78.
Image copyleft Hyde & Rugg, 2014
Usually these scales are 100 mm long, so the researcher can get a value anywhere from 0 to 100, meaning that the researcher can use much more powerful statistics on the data (and use much smaller sample sizes) than would be the case with a five point scale like the one in the Wikipedia example above.
I’ve written about this approach in more depth here:
Likert scales and Likert-style scales look straightforward, and at one level they are, right up to the point where you start to analyse the results. At that point, the grim reaper of Statistics winnows out the wheat of virtuous research from the chaff of clueless, uninformed amateurism. It’s not a pretty sight.
If you’re doing your analysis right, you’ll be able to give confident, well-informed answers to questions such as:
- What’s the difference between a Likert scale and a Likert item?
- Are you assuming that your data are on an interval scale or some other scale?
- What allowance have you made for acquiescence bias?
- How have you checked for external validity in your data?
- What allowance have you made for Miller’s findings on cognitive limitations in your choice of number of response options?
- What level of skewness is present in your data?
- What level of kurtosis is present in your data?
- Which statistical test will you use to check for statistical significance in your data?
- Which policies will you change in which direction if you find which results in your data?
There are plenty of other, similar, questions where those came from. If you don’t know the answer to them before you start deploying your questionnaire, then you’re proceeding without any solid underpinnings. A large number of amateur questionnaire-designers simply don’t want to know; they proceed in the hope that these issues don’t really matter, and that the truth will shine through despite the flaws in their methodology.
“I don’t know the answer” or “I don’t care” are not strong positions to take…
A more positive note
One thing that my long-suffering students often don’t immediately realise is that I have no objection to properly used questionnaires, or to properly used Likert scales and Likert-style scales. Used properly, they’re extremely useful.
As an example of how you can get fascinating new insights with very practical implications by using these approaches properly, here’s a short account of what one of my former students did.
Zoe was looking at how strongly university department home pages encouraged or discouraged potential students. The obvious approach was to use a scale running from “discourage” at one end to “encourage” at the other end.
Zoe didn’t do that. Instead, she used two scales. One scale ran from “not at all encouraging” at one end to “completely encouraging” at the other end. The other ran from “not at all discouraging” at one end to “completely discouraging” at the other end.
You might thing that the answer on one scale would logically have to be the opposite of the answer on the other scale, give or take some human error. Most of the time, that’s what happened.
Quite often, though, something different happened.
One fairly common response was that a home page was given a low score both for encouragement and for discouragement. When you stop and think about it (or when you ask the research participants about it afterward) there’s an obvious answer: the home page was just boring, with nothing very bad about it, but nothing very good, either. Those pages needed to have something positive added.
Another response that was less common, but still very much present, was that a home page was given a high score on both encouragement and discouragement. Again, that makes sense with hindsight; these were departments that had some very good points, but also had some very bad points. Those pages needed to have something negative removed; they were the opposite of the boring pages that needed to have something positive added.
This has far-reaching implications for any survey of attitudes, preferences and opinions, but very few such surveys use this approach, even though it’s cheap and simple to implement.
When used correctly, these approaches can be powerful, clean and useful.
If you want to be on the side of light and virtue, then you need to read the relevant literature before using these approaches; general knowledge and common sense are nowhere near enough.
As is often the case, Wikipedia is a good place to start, but a bad place to stop. The Wikipedia articles on Likert scales and visual analogue scales provide a reasonable start, but they don’t have as much coverage as I would like of issues such as the limitations of human working memory (e.g. The magical number seven, plus or minus two, and subsequent work).
It’s also highly advisable to think long and hard about what use you will make of the data you get. If you don’t know what you’ll do with the data, then you’ll have a real problem when you have boxfuls of it (or, as is more often the case with questionnaires, when you have a response rate of 9%, and you’re trying desperately to dredge something believable out of the thirteen responses that have come back in the post/via email).
On which encouraging note, I’ll end.
Notes, links and sources:
You’re welcome to use Hyde & Rugg copyleft images for any non-commercial purpose, including lectures, provided that you state that they’re copyleft Hyde & Rugg.
There’s more about the theory behind this article in my latest book: Blind Spot, by Gordon Rugg with Joseph D’Agnese
Sources for images:
Reblogged this on The Echo Chamber.
Pingback: The Knowledge Modelling Book | hyde and rugg
Pingback: Iterative non-functional prototyping | hyde and rugg
Pingback: When liking and disliking aren’t opposites | hyde and rugg
Pingback: The apparent attraction of average faces | hyde and rugg
Pingback: Reflective reports 101 | hyde and rugg
Pingback: Tacit and semi tacit knowledge: Overview | hyde and rugg