By Gordon Rugg
This article follows on from a series about the problem of identifying and clarifying client requirements. In this article, we’ll look at the issues involved in measuring and evaluating a product, both qualitatively and quantitatively.
A key point to establish at the outset is whether a well-defined schema is involved. This can make a dramatic difference in terms of the desirable attributes and values. A schema is a mental template of key features for something, which can be an activity such as buying a train ticket, or a physical entity such as a type of house or car.
For example, the schema of mansion (large, imposing building, made with costly materials) is very different from the schema of traditional cottage (small, modest building, made with low-cost materials) . Similarly, the schema of sports car is very different from the schema of SUV. Knowing the schema won’t give you all the information that you need, but it’s a very good start. Once you know the schema, then you can make a good guess at what the client will want, and after that, you’ll probably just have to fine-tune the details.
So far, so good. However, there are cases that don’t involve a well-defined schema, such as designing a one-off building.
Also, “fine-tuning the details” can involve non-trivial issues such as translating subjective terms like “easy to use” into specific, tangible design features.
So how do you set about this?
Form and function
The form and the function of an artefact are closely intertwined. The usual perception is that function is the goal, and that the form should follow it. In reality, often the form nudges the ways in which an artefact functions, by making some actions easier than others.
You can find out the intended function of an artefact by verbal methods such as think-aloud technique, or content analysis, or upward laddering. For example, the client might give you feedback on a draft design, and say that it needs to let the user do X, or you might read comments in a thread about design where people mention things that they want to be able to do, or you can ask the client what they would like to be able to do. Showing the client draft designs can be useful for getting at functions so obvious to the client that the client completely fails to mention them. Upward laddering can be useful for getting directly to the higher-level goals that are really important, rather than the lower-level goals that are often a misleading distraction, as discussed in our previous article about why clients often appear to change their minds radically about requirements.
Once you know what the intended function of the artefact is, you can then start identifying ways of measuring how well a particular design meets the requirements that go with the function or functions.
Requirements for functions
A key step is finding out what the functions of an artefact actually are. These can usefully be divided into two main categories:
- Typical functions
- Boundary cases
The typical functions are what they sound like: Typical, everyday functions. In a bathroom, for instance, typical functions are taking a bath and brushing your teeth.
Because typical functions are so common, it’s important to get them right. The more common the function, the more you can gain in the long term by making the artefact as fit as possible for its function; all those little gains in usability will add up over time. A useful concept here is the Pareto distribution, also known as the 80:20 distribution; 80% of one thing typically comes from 20% of another thing. So, 80% of the time you’ll only be using 20% of the available functions of an artefact. A classic example is software, where most users most of the time only use a small subset of the features of the software. Because those common cases are common, they’re pretty easy to identify. We’ll look in a later section at just how to make an artefact as easy as possible to use for those functions.
Boundary cases are important, even though rare, because they involve the extreme cases that the artefact needs to handle. Safety requirements are a classic case. Most products are designed to handle specified boundary cases involving safety. Modern cars, for instance, are designed to protect their occupants from head-on collisions at a specified speed, and from side collisions at another specified speed. Boundary cases don’t always involve safety. Often, they involve the most adverse circumstances in which the product is likely to be used, such as in bad weather, or poor light.
Two common and effective ways of tackling typical functions and boundary cases are scenarios and use cases. A scenario is a “what-if?” situation; a use case in the technical sense is a more tightly constrained method where you work out systematically which actions the user performs and which actions the product performs. As usual, it’s highly advisable to use observation and/or think-aloud as reality checks, to catch Taken For Granted knowledge and Not Worth Mentioning knowledge, which can make a huge difference; people often don’t mention key points, because of one of those issues. For brevity, we won’t go into this in detail here; we’ve covered all these topics in other articles on this site.
In a scenario, you ask the user to perform a specified task, such as finding a particular piece of information on a newly-designed website. You then observe what happens, and measure how easy the user finds the task.
A classic way of measuring ease of use with regard to software is to count the number of mouse clicks and/or keystrokes and/or scrolls that are needed in order to do a given task. Usually, user-centred software designers concentrate on making the most common and the most important tasks as easy as possible (i.e. they require an absolute minimum of clicks and keystrokes and scrolls).
For our Search Visualizer software for instance, the default setting (for the most common typical case) means that the user only has to type in their key words and then hit the Search button or press the Return key. That’s just one click in addition to typing in the keywords.
There are other ways of measuring ease of use. Some common measures are:
- Time taken to perform a given task
- Number of separate actions required to perform a task
- Number of times a user makes a mistake when performing a given task
- Number of times a user hesitates or goes to the help pages
- Number of times a user mutters or swears
- Number of times a user smiles or says something positive
It can be useful to have a tally sheet for recording how often each of these things happen. Other useful aids include a timeline sheet and Therblig-style notations.
A tally sheet is for recording how often each thing happens. It can take various forms – we advise using whatever works best for you.
Here’s an example.
A timeline sheet is different in that it records the sequence in which actions occur. There are various forms; for instance, each column might just represent the next activity, regardless of how long each activity lasts, or each column might instead represent a specified length of time, such as five seconds. Timeline sheets can be useful for spotting patterns and sequences of activities.
Here’s an example.
Image copyleft Hyde & Rugg
This timeline tells a story that would be missed by a tally sheet.
The user has begun by reading the screen, but they’ve then hesitated, which tells you that the on-screen text isn’t easy enough to understand.
Next, they click on an option and input text, followed by swearing, which suggests that the software didn’t do what they were expecting. They hesitate, read what’s on the screen, and then go to the Help option. This tells you that the text on screen didn’t give them the information they needed; you’ll need to fix that. After reading the Help option, they swear again. This is telling you a clear, vivid story about what’s going wrong, and what needs to be fixed in the next version.
Therbligs were originally a pictographic notation for recording sequences of physical actions in manual tasks, such as assembly-line work in factories. There’s a core set of commonly used notations for activities such as “pick up” and “visually inspect”. There’s an explicit expectation that you’ll need to add some new notations to fit whatever topic you’re working on, such as a notation for “press Return key” or “press back arrow” if you’re using this approach to record what people do when using some new software.
They’ve been around for a long time, and they’re still useful today.
The measures above are useful for objective, observable activities. They aren’t designed to handle subjective, internal opinions about an artefact. A common and effective way of investigating those subjective opinions is by using Likert-style scales, named after the American researcher Rensis Likert, who invented them in the 1930s. Purist note: The original forms are known as Likert scales; later developments of the original forms are usually described by purists as Likert-style scales.
Likert-style scales are useful, but often used badly, because they look a lot simpler physically than they are conceptually. The full story is too long to cover in this article, but in brief, we prefer to use visual analogue Likert-style scales, which give finer-grained measurements than the usual numeric scales, and which aren’t constrained by the limitations of human working memory, unlike numeric scales, which are effectively limited to a maximum of about nine points.
A visual analogue Likert-style scale looks like this.
Image copyleft Hyde & Rugg
It’s a line, typically 100mm long, with a question above it and an anchor term at each end. The question is usually open-ended, such as “how easy did you find it to use this software?” and the anchor terms are usually along the lines of “not at all” at one end and “completely” at the other. There’s a lot of debate in the literature about when to use a negative value at the end and when to use a zero value (e.g. “very difficult” as a negative on one end, paired with “very easy” at the other, as opposed to “not at all easy” at one end and “completely easy” at the other). If you’re going to use Likert-style scales as part of a university project, then you really need to know about this literature.
With visual analogue Likert-style scales, the user answers each question by making a vertical mark through the horizontal scale at whichever point they want. For instance, if they think that the software is very easy to use, but not perfect, then they make a vertical mark near the right hand end of the scale.
Image copyleft Hyde & Rugg
It’s very quick and easy to use. Once the form is completed and the participant has left, you measure how far along the line the mark is for each question, and write that figure beside the question. This is where using a 100mm line is useful, since you can measure the distance in millimetres, and the line is long enough to give good resolution on the measurements.
Image copyleft Hyde & Rugg
Where do you get the questions from?
The questions on the Likert-style scales should derive from the requirements acquisition stage of product design. In that stage, as described in some of our previous articles on this blog, you identify what the key attributes are for the product, using methods such as think-aloud technique and downward laddering, where you ask the user questions along the lines of “How can you tell that something is a good X?”
If you’re using an iterative design approach, then you’ll use the results from this evaluation to refine your design, and you’ll then evaluate the new design in exactly the same way, to check whether it’s more usable than the previous version, measured against the same criteria. What usually happens is that there’s a sharp improvement in the second version compared to the first, followed by smaller improvements in subsequent versions.
This is where using fast, cheap mockups can be very effective; they allow you to catch any major changes in the requirements and the design very early on, with minimal cost in time, effort and money.
That concludes this series on requirements. There will be more articles about various aspects of requirements and design in later posts.
Links and notes
As usual, I’ve used bold italic for technical terms where it’s easy to find further reading. I’ve listed some specialist links below.
There’s a Wikipedia article about Therbligs here:
There’s a Wikipedia article about Likert here:
There’s a Wikipedia article about Likert scales here:
There’s a tutorial article about laddering on the main Hyde & Rugg site here:
The Search Visualizer site is here: