The Likert Scale – in its various formats – is widely used, for instance in psychology, social sciences but also in commercial market research. Respondents may be asked about their attitudes, perceptions or evaluations of organisations, services or brands. The use of Likert Scales, however, has come under scrutiny. It is argued that the traditional 5-point rating scales are boring, repetitive and overly long. The proposed alternative is the Slider Scale. The question then is this: are Slider Scales really better than Likert Scales?
After I finished my PhD, I decided it was time to quit Academia and head for pastures new: The Exciting Universe Of Market Research. As a way of integrating myself in this new universe, I started reading various blogs and (quasi) scientific articles on market research and online surveying. As CheckMarket is specialised in online surveys this seemed like a fitting starting point. However, to my initial surprise, it took me only a couple of blogs to reach the conclusion that Likert Scales – the same scales that are practically declared holy in my previous universe; the academic one – have come under some scrutiny. Even to the point that it is seriously questioned whether they should be retained or replaced by so-called ‘Slider Scales’. However, in this post I will come to the defence of Likert Scales – if they actually need me defending them. I will argue that the case of Likert Scale vs. Slider Scale is based on the design and the customer-experience of online surveys, rather than on sound methodological arguments. Without a shadow of a doubt, slider scales have a more inviting design but as it turns out, there are not too many methodological reasons to favour them over ‘boring’ Likert Scales.
What is the Likert Scale?
First things first. What is a Likert Scale? A Likert Scale, named after its developer, the psychologist Rensis Likert, is a scale that is designed to measure underlying attitudes or opinions. This scale usually consists of several items, so-called Likert Items, on which respondents can express their opinions. Say for instance that we are trying to measure the underlying ‘general happiness’ of market researchers, then we would probably ask them several questions regarding their happiness in their private and professional life. These questions such as ‘How happy are you with your current employment?’ or ‘How happy are you with your current family life?’ are the so-called Likert Items. Based on these Likert items, we would then construct the Likert Scale on ‘general happiness of market researchers’.
Usually respondents express their opinions by choosing one of the response alternatives from the response scale. This response scale can take various formats. On the one hand there is a distinction between worded and numerical formats and on the other hand there is the distinction between the most common 5-, 7-, 10- or 11-point format (Dawes, 2008). Consequently, it seems quite logical that the 5-point format is usually worded (e.g. strongly disagree, disagree, neutral, agree, strongly agree) while a 10-point format is most often numerical. After all, the gradation of (dis)agreement on a 10-point rating scale probably becomes too granular to easily express in words .
The Likert Scale – in its various formats – is widely used, for instance in psychology, social sciences but also in commercial market research (Dawes, 2008). In commercial market research respondents may be asked about their attitudes, perceptions or evaluations of amongst others organisations, services or brands. However, these Likert Scales are often used for measuring single-item issues, rather than for measuring an underlying attitude. Recalling our example of the ‘general happiness of market researchers’, this means that we use the Likert Scale for measuring how happy they are with their salary, rather than measuring how happy they are in general. Even though this distinction is hardly ever made, we should be aware of it and the fact that when we are talking about ‘Likert Scales vs. Slider Scales’, we are actually talking about the 5-, 7-, 11-point rating scale of Likert Items rather than Likert Scales. Nevertheless, for the sake of clarity I will consistently use Likert Scale, even if I am technically talking about the X-point rating scale.
Criticism of Likert Scales in market research
Going back to the original purpose for writing this post, it appears that the use of Likert Scales has come under some scrutiny in the commercial market research. While this is not necessarily a bad thing, the various criticisms levelled at the use of traditional Likert Scales ought to be valid. Let us look at the two most common criticisms that I have come across. First of all, it is argued that the traditional 5-point rating scales are quite boring, repetitive and certainly overly long. Furthermore, a large battery of questions using 5-point rating scales might discourage respondents. Second it is equally argued that respondents are forced into expressing an opinion that is not their real opinion because there are too few response alternatives offered. In short, it is argued that the 5-point Likert Scale is too blunt a) to detect differences between items and b) to precisely measure specific opinions as the respondent’s true opinion can lie in between the answer categories. While the first criticism is probably true, especially in commercial market research in which respondents have to be convinced to participate in surveys, there are various reasons to question the validity of the second.
The Slider Scale, a valid alternative?
The proposed alternative to the traditional Likert Scale is the (admittedly more attractive) Slider Scale. These Slider Scales are basically Likert Scales with much more response categories and, instead of selecting one of the response alternatives, respondents use a slider to position themselves on a certain question. It is argued that this Flash-based alternative does not only enable more interactivity in online surveys but that they equally enable respondents to indicate more precisely their opinions. Respondents have more response options, ergo the results will be more finely grained. This argument is then strengthened by some experimental findings that show that respondents seem to answer questions slightly different when they are first confronted with a traditional Likert Scale and subsequently with a Slider Scale.
Nevertheless, I tend to argue that this case of ‘Likert Scales vs. Slider Scales’ is mainly about the design and interactivity of online surveys, i.e. to make it a more pleasing experience for the respondents, than about sound methodological reasons. After all, there do not seem to be that many sound methodological arguments to favor Slider Scales. Let us look at some evidence.
First of all, there has been done quite a lot of scientific research on the effect of the number of response alternatives on the psychometric properties of a scale, i.e. the reliability and validity. The former refers to the fact that the scale can be interpreted across various situations and the latter refers to the fact that the scale actually measures what it set out to measure. The recurring conclusion is that when the number of response alternatives is increased, the reliability and validity of the underlying factor increase as well. For instance Lozano et al. (2008) have shown that both the reliability and validity of a Likert Scale decrease when the number of response options is reduced. Vice versa, if you increase the number of response options, the reliability and the validity increase. They conclude that the bare minimum of a Likert Scale should be four response categories.
Reliability and validity of Slider Scales
So, at first sight, this seems to favour the use of Slider Scales with more response alternatives. Nevertheless, the positive relationship between the number of response categories and the reliability and validity of the scale is not a linear one. Lozano et al. (2008) equally show that the increase in reliability and validity is very rapid at first but it tends to level off at about 7 response alternatives. After 11 response alternatives there is even hardly any gain in reliability and validity from increasing the number of response categories. In short, from a psychometric point of view it is shown that the gains are scarce when including more finely graded scales than a 11-point scale, i.e. more than 11 response categories. They hardly improve the scale reliability nor its validity (Dawes, 2008).
Second, plenty of scientific research has shown that respondents use the meaning of the labels attached to some response categories when mapping judgments to response scales (Rohrmann, 2003; Wegner, Faulbaum, & Maag, 1982; Wildt & Mazis, 1978). So, a larger number of response categories with few labelled points makes it harder for respondents to orientate themselves. Intuitively, it makes sense that labelled response categories are less ambiguous to respondents than when only the end labels are provided as “respondents need to figure out the meaning of the intermediate response categories to determine the option that comes closest to expressing their opinion” (Weijters, Cabooter & Schillewaert, 2010).
If we project these findings on the Slider Scale, i.e. most often a scale with many response categories and few labelled points, then it is rather easy to see that the often lauded strong point of this scale type might not be so strong after all. After all, it is commonly argued that Slider Scales offer respondents more response categories and as a result generate more precise data. Nevertheless, selecting the right response option that corresponds with their real opinion will be more challenging when respondents need to make up the right meaning for each response category (De leeuw, 1992; Krosnick, 1991) and this will only become more challenging if the number of response categories increases.
In short, while a Slider Scale is most definitely more attractive, interactive and is more consumer-friendly than the archetypical Likert Scale, the case of ‘Likert Scale vs. Slider Scale’ is almost certainly a discussion about design rather than about methodology. Personally, I would judge the case to be inadmissible.
 See for instance Lozano, L., Garcia-Cueto, E. & Muniz, J. (2008), Effect of the number of response categories on the reliability and validity of rating scales, Methodology, 4 (2), 73-79.