Rotoscoping and motion capture are two methods of producing naturalistic animated motion. Rotoscoping is the process of tracing live-action footage; motion capture is 3D rotoscoping, with unwanted defects generated by computer instead of by hand.
Use of these methods is generally contraindicated. The range of effective dosage is very small, and overdose will make you Ralph.
Recognizing Common Symptoms
Ralph Bakshi, King of the Tin Ear, leaned heavily on rotoscoping as a way to save time and money in his films, as seen in the unbelievably corny clip above, from American Pop in 1981.
It’s an excellent example of roto gone rancid. Control of the scene has essentially been ceded to the rotoscoping. Any subtlety in the acting is drowned by the artifacts of the process: the jitter of the actors’ outlines, especially in the long shot, is larger than their motions; glitches are instantly obvious; holds, no doubt intentional frame-saving choices, look like accidents.
Rotoscoping isn’t to blame here. The source material was the worst it could have been. Terrible writing, acting, direction, and editing ruined the scene long before rotoscoping had the chance to be useful.
How do we know it wasn’t just the roto? Because so is this:
It’s old-fashioned and far from technically perfect, but it works. All the human roles in Snow White (and a good many in Cinderella and Sleeping Beauty) were first filmed with human actors. The footage was not always strictly rotoscoped; often it was used simply as “reference footage,” but the line there is blurry, and Disney doesn’t bring it up.
Motion capture is an equally risky proposition; maybe more so, as its dangers are veiled by the misty glow of technological superiority. It’s used more and more often in video games and feature films, by everyone but Pixar. High tech it may be, but the philosophy is the same as rotoscoping: motion from life, applied to a form of our choosing. If there’s a benefit to be gained through the use of these methods, how can we get it without gagging?
Bugs in the System
Roto and mocap both sound good in theory. Animating is tedious and difficult. Wouldn’t it be great to just have an actor do his acting thing, and then zap him into cartoon form?
The troubles with this hypothesis are first seen in the practical realm. The results are only as good as the methods used to achieve them, and those methods are far from perfect.
Classical rotoscoping is a painstaking, cramp-inducing process. Rotoscoping artists must make informed decisions about which lines to draw and which to omit. Motion capture needs just as much artistic finesse, and a lot more technical savvy. Optical tracking systems and wireless inertial sensors are tetchy, and no faster than animating by hand due to the setup and processing required. Higher-quality productions have the added difficulty of rigging and lighting skin and clothing, a daunting task even for big-name production houses.
Then there’s the problem of source material. The same budget and time considerations which may lead a director to consider roto or mocap also make it harder to find an actor worth watching. However, there are a few notable exceptions.
I’m Not a Real Doctor
The characters of Gollum in Peter Jackson’s Lord of the Rings trilogy and the ’05 vintage King Kong are two famous examples of motion capture applied to 3D characters. They occasionally move in a way that doesn’t ring true, but overall they are excellent specimens, and perfectly appropriate for their roles. Coincidentally, both characters were based on data captured from performances of actor Andy Serkis, who went to Rwanda and studied gorilla vocalizations for the King Kong role. One assumes that he likewise went to Middle Earth and studied Gollumses.
The performances are not entirely Serkis; his motion capture data was edited and augmented by animators. Likewise, the facial data for Kong was not direct motion capture, but keyframed by animators based on footage of Serkis’ face, and tweaked to compensate for differences in physiology. This was essentially the same method used for Gollum, with perhaps a bit more artistic license on the part of the animators:
Contrast the Serkis performances to Final Fantasy: the Spirits Within. Granted, Square‘s task was harder in a number of ways: mocap was responsible for something like 90% of the human motion in the movie. No facial or hand motion was captured, and had to be added manually by animators. Additionally, their goal was photo-realistic humans, something we’re quite familiar with, whereas we can’t say exactly what a Gollum ought to look like.
But the kicker is in this paraphrased quote from the film’s animation director:
According to animation director Andy Jones, animators debated on what made a character more “human” – the way it moved or the way it looked – and finally decided to focus on its look, particularly the face and fingers.
If you need a reminder of how wrong they were, there are plenty available:
The script is atrocious, the acting is poor, and the mocap data doesn’t look finessed at all; but it’s the facial expressions that really kill it. It’s difficult to say which step went most wrong with this movie, but in the end it doesn’t matter; the acting is the data is the acting, and it’s all awful.
But if there’s any film that illustrates the problems inherent with motion capture, it’s the creepy Tom Hanks vehicle The Polar Express, memorably dismantled by Ward Jenkins in his post “Virtual Train Wreck” (part 1 and part 2). If you don’t get the faces right, you’ve got a horrorshow. No amount of tech can compensate for poor artistry.
Finally, there’s the philosophical difficulty. We expect cartoony figures to move in cartoony ways. Cartoon motion is a caricature of motion, an exaggeration based on what we as humans find significant in movement.
With rotoscoped and mocapped movement, the source is high-grade first-class Reality, as uncartoony as it gets. This world of natural motion is the world we live in when we’re not watching cartoons, and we’re experts at identifying it. When it’s distorted, or applied to cartoony figures, it can create the same sort of dissonance that leads to the uncanny valley, and can be jarring.
This disconnect can work in certain stylized ways, the more obvious the better. The 1985 music video for “Take On Me” explicitly explores the boundary between rotoscoped and source footage, with a heavily treated rotoscoping style. Waking Life‘s trippy subject matter was enhanced by the frisson of natural motion clearly visible on the Picassoesque faces.
The problem can be dodged to some degree — 3D video games featuring human characters (including the notorious Grand Theft Auto 3, as well as every modern combat or sports game) feature motion capture more all the time, occasionally using professional actors. It looks pretty good if you stick to long shots, but up close the illusion is destroyed, and the characters look like low-poly Muppets. New high-powered game platforms don’t have it any easier — in games as in film, as the visual quality approaches photorealism, our standards for realistic motion increase, particularly for facial animation and lip-sync.
The most successful uses of roto and mocap thus seem to be in stylized contexts where realism isn’t the goal, and situations where subtlety isn’t a priority. The real danger comes when, as in American Pop and Snow White, naturalistic motion is used in literal contexts, expressly for its subtleties. Of these examples, one looks unintentional and out of control and the other is a classic. The difference is skilled animators.
Take a Deep Breath
Roto and mocap are not replacements for animation; they’re tricky methods in their own right. Standards for acting, art direction, and animation are higher with natural motion than with traditionally animated motion, where the rules are more obviously in flux, and adherence to the laws of physics is not expected. The more realistic a figure is, the more any flaws in motion or appearance will stand out. On the other hand, if imprecise means are used to apply the subtleties and nuances of realistic human motion to a stylized, simplified figure, the combination of noise in the system and stylistic dissonance can overwhelm the content.
There’s a time and place for both of these processes, but if subtlety is important, rotoscoping and motion capture are no cheaper or easier than animating by hand, and require capable animators to get good results. If photorealism is necessary, and you’re Peter Jackson, you’re probably capable of assembling the team of skilled animators and technical directors necessary to pull it off. Otherwise your work will end up looking bad, bad, bad.
 J Park, “Final Fantasy: The Spirits Within: A Case Study,” 2002« previously: Joshua Davis Tropism show | Home | next: Will Vinton, Laika, and the Knights »