What is the Wizard of Oz testing?

4 min readNov 2, 2020

The Wizard of Oz testing method is a way fool the users into believing their using the real experience in order to get accurate feedback on your service or product.

The OZ method has gained more attraction in usability testing and prototyping in recent years. The major benefit of this testing is that instead of the service being fully built and functional it is actually being operated by a human to generate the responses requested by the user so it saves you time and resource to get useful feedback and insight.

Why is it called the Wizard of Oz?

It comes from the scene in the Wizard of Oz where the Wizard (played by Frank Morgan in the film) is projected as this massive floating head, in which Dorothy says:

“But I don’t understand,” said Dorothy, in bewilderment.
“How was it you appeared to me as a great head?”
“That was one of my tricks,” answered Oz. “Step this way, please, and I will tell you all about it.”
-L. Frank Baum, The Wonderful Wizard of Oz

What was actually happening is that the Wizard was just behind a curtin, faking the whole experience but it felt very real to Dorothy.

Where did the Wizard of Oz experiment come from?

John F. Kelley introduced Wizard of Oz research method into human-computer interaction discipline in 1984 as part of his dissertation work at Johns Hopkins University.

There is a clear benefit of using this research method as you can get more accurate data through research whilst avoiding costly development times and expenditure.

This research method could be used when your still in the early stages of discovery and concept validation or whilst usability testing features.

The trick of this research method is to make it as believable possible, so make sure to rehearse it a few times. If users start noticing that the prototype is fake it could skew the results.

How to conduct a Wizard of Oz experiment?

There are a few ways to fake the experience, it could be used hard-coded data and give the user beforehand a fake set details before the testing begins. Alternatively you can generate on-the-fly responses with someone controlling the responses close by. It really depends on what you’re trying to test.

Example: a chatbot on social media

The Wizard of Oz testing approach can be a great way to test your chatbot experience. Instead of spending time to program all of the different responses and accounting for all the ways people will ask questions you could just have someone responding to what the user inputs to start understanding what your users would ask the chatbot and how they would interact with the system.

Benefits and the ways in which to use the OZ method

By using this method of user research you’ll understand more about how the user will interact with the service or product and the problems they may experience.

For fairness and accurate data you’ll want to make sure that all responses by the “wizard” are consistent. The key is being well prepared with a good set of instructions to try and account for as many situations as possible.

The way in which you respond also needs to be thought through, so it’s best to make sure the content is designed to consistent and match the tone of voice for the company or organisation.

One of the best things about OZ testing is that you can make rapid iterations, from small tweaks to the content or interaction to a full re-work of the flow.

The best benefit to the Wizard of OZ testing can also be it’s undoing. If you show your users something that looks and responds like the real deal then they may form the perception that this new product or service is about to go live. If you have loyal customers or users it might be best after testing to set their perceptions straight that it may be a little while before that thing is released, or that it may never be released.

Conducting the Wizard of OZ testing is a great way to find out if people would be willing to use and pay for this new feature or product. As we all know, what people say is normally quite different to what they do, by showing them something that at face-value looks real then you may get closer to the truth.

Researchers can then move away from questions like “how useful would ‘x’ be to you?” or “do you think you would use ‘x’ feature” which don’t provide much insight. Instead as researchers you’ll be able to look at meaningful data such as is this feature usable, will people use it, if they do use it then how do they use it.