By Gordon Rugg
I have very mixed feelings about content analysis. At its best, it gives you a new understanding of the world around you. At its worst, which I see all too often, it’s little more than an attempt to salvage mangled fragments of something useful from the wreckage of a questionnaire perpetrated by some sinner who deserves to be locked in a cell for a while with the assorted works of Barbara Cartland being read aloud over the intercom. Accompanied by accordion music.
So what is content analysis, and why do I have such strongly mixed feelings about it? In essence, it’s about analysing the content of texts. The texts may be questionnaire answers or interview answers, or magazine articles, or books, or online forum debates, or just about anything else that’s spoken or written.
Content analysis is usually something that you “grapple with” rather than “do” because it’s a messy, nasty problem. The core dilemna is that the further you get from the original words in the text, the more you risk distorting their meaning; however, the nearer you get to the original words, the less sense you can make of what those words are telling you.
There are various ways of tackling this problem, but none of them provide a perfect solution. The result is that there are numerous types of content analysis, which vary widely in their assumptions, methods, strengths and weaknesses. This article describes a “vanilla flavour” type of content analysis, which is enough for the needs of many students. I’ll look at other types in future articles.
Vanilla flavour: Counting and categorising significant terms
One widely used form of content analysis, which I’ve described for convenience as vanilla flavour, involves counting how often particular terms occur in the text.
The text may take various forms.
Questionnaire and interview answers
Content analysis is often used on the answers to open questions in questionnaires and interviews, where the participants are asked to answer in their own words, rather than choosing from a specified set of possible answers.
This is one of the many points where the perpetrators of badly designed questionnaires discover that questionnaire design isn’t as easy as it looks, and that questionnaires really do have significant limitations. If you’re designing a questionnaire, it’s highly advisable to do some trial runs with it, including analysis of the data from the trial runs, before starting the main data collection. Analysing the trial data will probably identify a lot of places where re-wording the original questions will produce richer, clearer results that are much less likely to be mangled in the analysis.
Transcripts and recordings
Sometimes the text is a transcript, such as a transcript of a think-aloud session or of an interview or of evidence given in court. Transcripts are invaluable because they provide machine-readable text which is easier to analyse, but they also have various limitations. A low-level problem is that the original wording may be inaudible in places. Another is that if you get someone else to do the transcribing for you, they often “tidy it up” by changing the original wording to make it “more grammatical” or to remove swearwords. Those are often the most useful parts, which is why “tidying them up” by removing them or changing them is a very bad idea. It’s a good policy to do your own transcribing, but that’s a slow process; the usual rule of thumb is that it takes ten hours of transcription per hour of audio recording. If you’re going to deal with substantial amounts of material, it’s a good idea to teach yourself to do touch typing (i.e. ten finger rather than two finger typing).
It’s also possible to do the content analysis directly off the recording, without transcribing it. This is faster, but has various limitations – for instance, there are types of analysis that you can do on a written transcript that you can’t do on the original audio recording, such as the Search Visualiser analyses described below.
A lot of research in media studies and sociology involves analysis of existing documents such as books, magazine articles, film scripts and Internet sources. Often, this analysis looks for significant absences, such as the non-occurrence of minority social groups as protagonists in action movies. The issue of absences is an important one, to which I return later in this article.
The usual core process within vanilla flavour content analysis is to identify key phrases or text fragments, and then to count how often each of them occurs.
For instance, if you’re researching people’s perceptions of some products, you might list the adjectives and adjectival phrases that people use to describe them, and record how often each descriptive term is mentioned. The result is a list looking something like this. I’ve kept it short for clarity. In practice, the list is usually much longer, and the sheer length of the list can cause a lot of practical problems.
Alphabetical list of terms mentioned
Listing the terms alphabetically has the advantage that it’s easy to do in a spreadsheet such as Excel. Also, you can then easily add together the number of times each term is mentioned.
However, this doesn’t give you much idea of what’s going on within the data in terms of broader categories. To make more sense of it, you need to do further analysis.
One thing you can do, by way of further analysis, is to clump those terms into bigger groups. For instance, “pale” and “pale-ish” could be combined together in some way. You can do this yourself, or you can ask someone else to do it, to reduce the risk of subconsciously biasing the analysis in the direction that you would like. This person is usually known as an independent judge. In student projects, this is often a friend who does this as a favour, often in exchange for you helping with their project in some way.
The problem with combining terms together is that it can end up as a very messy process; just how do you decide which terms should be combined together, and how many levels of grouping should you use?
Here’s one way of doing the grouping more systematically. It’s the way that Marian Petre and I describe in our book on research methods; it’s based on a standard approach within the card sorts literature.
First step: You start by listing and counting the exact words that the participants used. This is the verbatim (i.e. precise words) column in the table below.
It’s important to be strict about only counting identical phrasings together. Note that, for instance, “pale” and “pale-ish” are listed separately from each other in the list above. This is important because in many fields there are technical terms that look very similar, but that have very different meanings (e.g. “sulphide” and “sulphite” in chemistry). For this first stage, you shouldn’t make any assumptions about which words might mean the same thing as each other. In fact, it’s perfectly possible that different participants mean different things by the same word, so even the verbatim lumping needs to be viewed with caution.
Second step: You can now lump some of those words together into broader categories, if they mean roughly the same thing. For instance, “pale” and “lightish” usually mean pretty much the same thing. This is the gist (i.e. same meaning) column below. For the gist column, you total together the numbers from the verbatim words that are being aggregated together within the category (so 5 mentions of “pale” plus 1 mention each of “pale-ish” and “lightish” plus 2 of “white” make a total of 9).
We now have 9 gist mentions of “light” and 6 gist mentions of “dark”.
The terms “light” and “dark” have opposite meanings, but they’re related, in that they both relate to shade. What we can now do is to aggregate these two gist terms together in the superordinate category of “shade” and sum their numbers, giving us a total of 15.
We can now go on to do the same for other things that were mentioned, such as cost.
How do you decide which gist and superordinate categories to use? There are various ways, including:
- What you think they should be
- What an independent judge thinks they should be
- What a thesaurus says they should be
- Categories used in the previous literature
That’s a swift overview of one version of vanilla flavour content analysis.
At the end of all this, you may be wondering what the point was. You have a batch of numbers, but what do they actually tell you?
It’s a fair question. For a lot of badly-designed questionnaires, the answer is that the numbers tell you nothing that’s of any use to anyone. This is why my heart sinks whenever a student tells me that they’re planning to do a questionnaire. Good questionnaires are invaluable, but they’re rare; most questionnaires are hacked together by people who are either unaware of the complexities involved in good questionnaire design, or who don’t care about getting it right.
However, when the data collection is done reasonably well, you can do some interesting, useful things with the numbers that you get from content analysis.
For instance, various researchers have pointed out that many of the heroes in adventure stories across various media don’t have mothers. Here’s an example.
Content analysis lets you see whether this is a real effect or just someone cherry-picking a handful of unrepresentative examples.
One common recipe for making use of the data is to compare the numbers from the published literature with the numbers from your own data collection. Here’s a hypothetical example, involving perceptions of the key features of a good website.
The first batch of ten columns are papers from the literature, and the second ten columns are your ten research participants; each row represents one of the categories that you used in your content analysis. (I’ve kept the numbers low for clarity and brevity; in reality, you’d be using more papers and/or more participants.) I’ve shown participant numbers in italics, for clarity of explanation.
A black circle in a cell shows that the category in question is mentioned by the relevant paper or participant. This is the simplest form of notation, a binary “mentioned/not mentioned” division; you can show other things in the cells instead, such as how often each category is mentioned. I’ve used the binary version for simplicity of explanation.
This format lets you see at a glance how well the literature corresponds with what you found.
For instance, all the papers from the literature in this sample mentioned content, but only two of the participants mentioned content. That’s an interesting difference. Conversely, most of the participants mentioned interestingness, novelty and trustworthiness, but only a few of the papers from the literature mentioned these categories. Overall, this table suggests that there’s a considerable mis-match between what the literature is mentioning and what the participants are mentioning.
We can see other patterns within this table. For instance, among the human participants, every participant who mentioned interestingness also mentioned novelty. Does this mean that the participants are treating the two concepts as synonyms, or do they think that one of those concepts always involves the other? That would be a clear candidate for some follow-up research. Another striking feature is participant 9, who only mentions trustworthiness; was this someone who just didn’t give many answers, or was this someone who had a very different (and potentially very interesting) way of thinking about this topic? Again, follow-up research would be needed to find the answer.
This is a particularly powerful approach when your participants are from a group that hasn’t featured much in the published literature. The vast majority of studies published in journal articles use Western university students as participants, so if you gather data from participants who aren’t Western university students, then you can do a neat, simple compare-and-contrast between your findings and the findings from previous studies.
That still leaves the question of what any differences and similarities mean, which you’d need to explore in a further study, probably using a different method from questionnaires (e.g. observation to find out what’s actually happening, or laddering to find out about values and beliefs). However, the content analysis has provided a solid batch of evidence that can be used as a foundation for that further research, which is a good start.
This way of doing content analysis is clear, tidy, simple and completely traceable; the reader can see exactly how you arrived at the numbers you did, at each step of the way. If you’re doing content analysis for your student project, it’s a solid, sensible approach.
Content analysis has assorted limitations, which have been the subject of numerous heated debates in the research communities involved. I’ve given a brief summary of various limitations below. Most of these apply to content analysis in general, with different types of content analysis being better in terms of some limitations, but usually at the price of being worse in terms of other limitations.
If you’re working from transcripts, then the sheer time taken to do the transcription can be a problem. Also, getting accurate transcripts is far from easy. If you get someone else to do the transcription, there’s a real risk that they’ll mis-hear or mis-spell crucial words (e.g. specialist or slang terms that they don’t know), and a real risk that they’ll reword the text to make it more grammatical, or to remove swearwords.
Another practical problem is that spoken interactions often involve two or more people talking at once. For some research questions, this can be a key part of the data – for instance, if you’re looking at power relations between the speakers, and at who interrupts whom. Transcripts are normally linear, like the written dialogue in a novel, and representing multiple simultaneous speakers is a non-trivial problem. It can be done, but it means that you need to read up on best practice for handling this issue.
A major issue in content analysis is the things that aren’t mentioned in the text. You’ll need to decide which things to treat as significant absences and which to treat as non-significant absences.
Some absences are because of Taken For Granted (TFG) or Not Worth Mentioning (NWM) assumptions; the person has assumed that those points were so obvious that there was no need to mention them (for instance, the assumption that a product shouldn’t kill a user).
Other absences are because of taboos; for instance, if you did a content analysis of classic 1950s Westerns, you’d find few or no mentions of people going to the toilet.
Some absences are just because of chance; for instance, you happen to select a batch of texts which happen not to include mentions of a topic, although that topic is mentioned in other texts outside your sample. This is particularly likely to happen with topics that are relatively rare; you need to be sure that your sampling process will be able to detect whether mentions of that topic are at about the level that would be expected by chance.
This leads into the deeper, important issues of framing and choice of topics for research. One of the points made by sociologists of science and some of the more constructive postmodernists is that research tends to focus on some areas and to keep clear of others, usually for reasons that involve social norms, politics and religion.
Interpretation and implications
What do the numbers actually mean anyway, and what will you do as a result of knowing them?
This is a key question for any type of research, and it’s one where a proper research design will think about the implications of possible findings right from the start, before the data collection has even begun. There’s not much point in spending a lot of time and effort collecting data if you’re not going to make any use of the answers. For instance, if you’ve used a questionnaire which asks participants whether they’re male or female, are you going to compare and contrast the male and the female results? What will you do if you find that the male results are different from the female results? What will you do if you find that the male results aren’t different from the female results? How much difference counts as “different” anyway? If you haven’t worked this out before you start, then there’s a pretty good chance that you’ll end up with a set of findings and with no idea what to do with them. That’s not a good place to be.
In a good research design, you work out in advance what the possible findings could be, and then work out what you will do in response to each of those findings. This is a well established part of standard good practice in research design, but it’s something that amateur questionnaire designers tend to overlook until too late. (Bottom line: If you’re seriously planning to use a questionnaire to collect data, then read up thoroughly on how to do data analysis and research design before you start reading up on questionnaire design. It could save you a lot of time and grief.)
Other types of content analysis
That was a short overview of a widely used, vanilla flavour type of content analysis. It’s far from the only type.
There’s a methodological debate about most things relating to content analysis, which has been running for the best part of a century, and which doesn’t look likely to end any time soon. There are, in consequence, numerous different types of content analysis, and many approaches to content analysis. The following sections give a very brief description of some of those other types. I’m planning to write in more detail about them at some point, when there’s nothing more exciting to do…
Grounded theory is an approach which involves trying to have full traceability from each level of categorisation back to the original data.
This has similarities to laddering, which I prefer for a variety of reasons. If there’s a choice, I prefer to gather information directly via laddering, rather than indirectly via grounded theory analysis of texts.
Laddering is cleaner than grounded theory, with a simple but powerful underlying structure that maps well onto graph theory. Laddering can be used to find out people’s mental categorisation, including how a person’s subjective definitions map on to physical reality.
Cognitive causal maps involve using a subset of graph theory to produce diagrams of the networks of reasoning and evidence used within a text. The classic book by Axelrod and his colleagues contains some excellent examples of how this can be applied to significant areas such as reasoning about international politics by politicians, and of how the results from this approach can be used to predict the future actions of individual decision-makers.
Search Visualizer lets you see thematic structures within a text, such as where and how often women are mentioned as opposed to men, or where hesitation words on an aircraft flight recorder are an indication of a problem arising.
There are more examples on the Search Visualizer blog.
Discourse analysis involves looking at the dynamic aspects of two or more people verbally interacting (whether in a conversation, or a group meeting, or in an email exchange, or in some other medium). This approach can give useful insights into e.g. power structures among the people involved, such as who interrupts whom, and who is able to change the subject being discussed.
Graphs, maps, trees is the title of a fascinating book by Moretti. This brings together a variety of ways of analysing texts, such as showing the spatial distribution of the places mentioned within a novel, or the number of books published within a particular genre over time.
Statistical approaches, including lexicostatistics are useful for fine-grained analysis, and for answering questions such as who might be the author of a particular anonymous work.
Story grammars are powerful formalisms for analysing the plot structures of stories, including books and film scripts. The classic early work was done by Vladimir Propp; his approach is still highly relevant today.
Some common criticisms of content analysis are that it:
- Involves a lot of effort without much to show for it
- Often doesn’t produce anything new or unexpected
- Is open to accusations of being subjective
- Doesn’t stand up well to “so what?” questions
All these criticisms can be true, particularly when the content analysis is being performed on a bad set of data (e.g. the woeful outputs from a poorly designed questionnaire).
Done well, though, content analysis can be very useful. A result that you were expecting may be completely unexpected to other people. For instance, the finding about protagonists who don’t have mothers wouldn’t be much of a surprise to gender theorists, but would probably come as a surprise to most people, and is the sort of unexpected pattern that can make people start thinking about their implicit assumptions about the world.
On which note, I’ll end.
Notes and links
You’re welcome to use Hyde & Rugg copyleft images for any non-commercial purpose, including lectures, provided that you state that they’re copyleft Hyde & Rugg.
There’s more about content analysis and related concepts in my book with Marian Petre on research methods:
Rugg & Petre, A Gentle Guide to Research Methods:
Overviews of the articles on this blog: