Testing film and television prototypes
There’s nothing quite like seeing your own film or TV show in the cold light of day: experiencing it for the first time with a real audience. It’s a sobering moment. The moment when all of the struggles of financing, of casting, of production and editing pale into insignificance before the question of whether your audience cares about what you’ve made.
Of course, films are frequently tested with cinema audiences at the end of the process to see how they react, sometimes resulting in significant re-shooting and editing. In TV, piloting is common, enabling a complete episode to be seen by real audiences. The good thing about tests of fully-realised film or TV output is that, as long as test design is good, the results should be trustworthy and genuinely useful. The bad thing is that testing takes place at the end of the process. If it were possible, wouldn’t it be cleverer to test earlier? In fact, wouldn’t it be cleverer to test all the way through the development and production process? This is how testing works in many other industries (drug manufacture, space travel, software design, for example); why not in film and TV?
To make continuous testing possible and useful in the film and television industry, there are a few things that need to be true:
Prototyping is possible: To be able to prototype a film or TV show more quickly and more cost effectively than the finished, polished article, but nevertheless for it to be of sufficient fidelity to be able to gather genuinely useful feedback from audiences.
Testing of prototypes is reliable and useful: Test audiences are of sufficient sample size and representative of the film or TV show’s eventual audience, and the director (and team) can use the results to check whether their objectives are being met, to find out what is and what is not working, and to measure against objective benchmarks.
Testing of prototypes is quick and easy: Testing is so easy that it can become another tool in a director’s toolbox, offering the ability to painlessly experiment with alternative approaches, whilst delivering results frequently and quickly enough for the director to make improvements across all stages of development and production.
At New Forest Film Co, we’ve developed a testing process that meets all three requirements. We have already outlined how we prototype in this blog post, and now we’ll look at how we have developed a quick and easy prototype testing process that is both reliable and useful.
First things first: the test audience. Ensuring that the test audience is of sufficient scale and breadth is crucial to delivering reliable results.
At the risk of getting too statistical too early, for scale the truth is that if you want your answers to represent the thoughts of a million or more people, you need to test on a minimum of 385 people (with a 95% confidence interval and 5% margin of error). The more you’d like to segment your results, the more people you need, but 385 people is where it starts.
For breadth, you need a test audience of sufficiently differentiated ages, genders, and interest in film and TV to be able to analyse who likes/dislikes aspects of your prototype. One of the most important aspects of ensuring test audience breadth is to be able to compare results fairly across multiple prototypes and multiple films and TV shows in order to develop benchmarks.
Sourcing a test audience of sufficient scale and breadth isn’t difficult. Audiences can be recruited directly from social media (testing for free or for incentives), or via test panel providers (at a cost per test complete), or if more secrecy is required - probably more relevant for larger productions at larger film and TV companies - a test audience can be gathered from NDA-signing employees.
By far the most convenient way to test is online, directing test audiences to a website or 3rd party testing platform to experience prototypes and provide their reactions.
The test experience must be well-designed, with close attention paid to the ordering and wording of the questions, keeping the UX (user experience) slick, and the test relatively short, else audience completions will suffer. We aim to keep test completion times to 30 minutes or less.
We measure audience reaction in three ways:
Standard questions with standard scores: Questions we always ask across prototypes and films, to which test audiences answer with a score from 0 to 10. The most important of these, asked after a tester has experienced an element of the prototype, is “How likely are you to recommend this film/TV show to a friend?”
Specific and open questions: Questions that are specific to the prototype, and that are open in the sense that testers are invited to write answers (of whatever length they wish) as a reply. Depending on the number of testers and responses, such questions may require the help of NLP (Natural Language Processing) to analyse quickly. Such written feedback not only brings colour to the standardised scores, but provides a significant batch of ideas as to where and how prototypes need to be improved.
Behavioural responses and emotion detection: Measures of audience response which do not rely on asking testers direct questions. For example, if testing a poster directly in the Facebook timelines of testers, the comparative CTR (click through rate) or RS (relevance score) are behavioural measures. The most sophisticated technique we use is AI-driven emotion detection. When testers use our testing platform, we ask their permission to use their webcams to watch them as they watch our audio-visual prototypes. This technology enables us to detect in testers six core emotions (happiness, sadness, intrigue/confusion, fear, disgust and surprise) and three emotion aggregates (including engagement) second-by-second as they watch, providing a detailed and objective measure of audience response to the moments, scenes and acts constructing the story.
Analysing test results must be fast (in order to enable continuous improvement as the film or TV show is developed) and genuinely useful to directors and writers.
Online testing enables fast data acquisition, which - using a spreadsheet or database plus standardised reporting and visualisation template - can be set up to deliver results to the director and team within 24 hours of test completion.
Being able to compare results between prototype variants (e.g., endings), successive prototypes and across films and TV shows is essential in order to acquire both relative and benchmarked results.
All data should be retained to enable refinement of benchmarks over time, not least to eventually correlate the performance of the finished film or TV show with early test results, to discover the level of predictive power in questions asked early in the development process.
Crucially, the analysis of test results must enable judgements to be made at critical junctures, particularly at "the gate” - the moment when it is decided that a film or TV show has proved itself enough to move from the “discovery” phase (development) to the “delivery” phase (production) or not.
You can’t succeed without failure
The psychology of a testing process for film and television is that it is likely that testing will identify failures (failures in script, failures in audio-visual execution, failures in marketing), bringing the cold light of day experience to far earlier in the development process.
What this means is that both the expectation of failure and the ability to respond humbly and creatively to test audience responses are important to the psychology of a director - a way of being that is contrary to the classic caricature of a dictatorial director with a megaphone. This is a very good thing. You can’t succeed without failure, and you don’t know you’ve failed until you test.