The character performance capture and facial animation process in Marvel's Guardians of the Galaxy

At GDC 2022, Simon Habib, Lead Technical Animator at Eidos Montreal, presented a talk on the full cycle of creating facial animation for characters in Marvel's Guardians of the Galaxy. He detailed the process, from actor scanning to integrating animated content into the game. Skillbox Media's Gamedev editorial team shares key points from the talk. When viewing the videos in this article, be sure to turn on the sound to fully understand the information.

In this material, you will receive information about:

the basis for the vision of the Eidos Montreal animators;
details of the photogrammetry and mocap processes;
how the scanned data is transferred to animation;
how the team found a balance between quantity and quality by developing the criteria for bronze, silver, and gold animation levels.

Brief information about the speaker and his contribution to the game

During the first ten years of his career, Simon Habib specialized in rigging. His responsibilities included creating rigs for facial animation, bipedal characters, animals, and vehicles. He also worked on skinning, physics simulation, and developing pipelines for riggers. Simon's experience in these areas has allowed him to significantly improve the quality of animation and optimize workflows in studios.

In 2015, Simon began working on facial animation, striving to create unique gaming experiences. His interest in narrative-driven games inspired him to develop animations that convey genuine character emotions and create an emotional connection with players. Marvel's Guardians of the Galaxy project became the perfect platform for realizing this idea, allowing him to realize his ambitions and draw attention to the importance of high-quality animation in video games.

Screenshot: Marvel's Guardians of the Galaxy / Eidos Montreal

The game received high marks from both critics and users. Rock, Paper, Shotgun noted that the characters' facial animations were on par with the original Marvel films. VG247 even called it one of the best he'd ever seen in a game of this genre. This highlights the high quality of the graphics and the developers' attention to detail, making the game appealing to fans of both video games and the Marvel film franchise.

What guided the Eidos Montreal team during animation creation?

Work on Marvel's Guardians of the Galaxy began with the realization that this is a game with an emphasis on storytelling and cinematography. The focus is on five main characters fighting together. Although the heroes are stylized to match their comic book counterparts, it was important to make them visually compelling, including the anthropomorphic Rocket and Groot. This was achieved through the masterful performance of lively emotions by the actors, which adds depth and believability to their interactions. Ultimately, the game combines a gripping story with vibrant animation, making it appealing to fans of both games and comics.

The team analyzed existing mocap solutions in the industry and identified four key technical innovations. These innovations became the basis for creating animations for future characters.

Photogrammetry. As a rule, during development, character artists create new images based on references. Photogrammetry makes it possible to obtain a finished scanned appearance of a real person, which is quite suitable as a basis for a future hero.
Animation based on filmed video footage. Many games use sound tracks to create facial animation. Through sound, it is possible to achieve precise synchronization of lip movements, but the question arises how to adjust the rest of the facial muscles to such animation. Typically, procedural generation is involved in this process, which does not always produce satisfactory results. When animation is based on video, the character's facial expressions are more lively and refined.
Having your own mocap studio and hiring local actors. Renting a studio always slows down production, since you need to calculate the time for booking sessions and processing the captured data.
Advantages of full performance capture. Typically, many game developers record mocap and voiceover separately, then combine all the data during cutscene assembly. However, this approach doesn't always guarantee the integrity of the shot. During the production of Marvel's Guardians of the Galaxy, the actors' voices, faces, and movements were recorded simultaneously, ensuring maximum synchronicity and naturalness.

Screenshot: Marvel's Guardians of the Galaxy game / Eidos Montreal

Developers often face a dilemma between quantity and quality. In the process of creating facial animations for characters in Marvel’s Guardians of the Galaxy, the team demonstrated that it is possible to achieve a compromise by dividing the content into different quality levels. This allows for high animation standards to be maintained without sacrificing the volume of content. This approach not only improves the user experience but also optimizes production processes in the gaming industry.

Bronze level involves the creation of a fully automated rig that requires virtually no manual refinement. This type of animation is created using machine learning with subsequent selection of emotions for each line. Such animations were used during in-game dialogues.
Silver level is closest to the original acting, but requires technical and artistic refinement by specialists to achieve it. It is used in most cutscenes.
Gold level is achieved by refining silver animations directly in the engine. Used in key moments of the game and for promotional materials.

After learning the basic principles of the Eidos Montreal animators, it is worth taking a closer look at each stage of their workflow. This analysis will help understand the methods and techniques used in animation and their impact on the final product. The stages of the animators' work include concept development, keyframing, and final animation and optimization. Each of these stages plays a key role in the creation of high-quality animation that meets the high standards of the video game industry. Understanding these processes will allow a deeper appreciation of the work of animators and their contribution to creating a unique gaming experience.

The Photogrammetry Process

Photogrammetry is a method of obtaining multiple scans of an object from different angles to create a 3D model. Marvel's Guardians of the Galaxy uses stylized graphics, which made it possible to avoid transferring the actors' appearances entirely to the character models. The development team had to decide: either choose a suitable actor and create concept art for the character based on their appearance, or start with a design and then select a model that matches the drawn image. Ultimately, the animators chose the latter approach. This solution allowed us to create unique characters that fit seamlessly into the game's style.

The process of preparing a model for scanning. Frame: Simon Habib's talk at GDC / YouTube

The actors were scanned in a specialized booth built by Pixel Light Effects. This unique space utilized 40 DSLR cameras simultaneously, while five softboxes provided high-quality, diffused lighting, eliminating harsh shadows. The team completed 13 scanning sessions, one for each character, allowing them to create high-quality 3D models for use in film and video games.

Before filming, approximately 100 markers were applied to the models' faces with black eyeliner. This allowed the character artists to more accurately track changes in facial muscles during the process. The models demonstrated 25 different emotions, which served as the basis for creating 138 blendshapes required for subsequent rigging. Using such technologies significantly improves the quality of animation and the realism of characters in virtual projects.

Technically, photogrammetry is a process that uses photographs to obtain precise measurements of objects and terrain. This process involves capturing images from various angles, allowing for the creation of a three-dimensional model of the object under study. A key step is processing the resulting images using specialized software that analyzes the data and reconstructs geometric parameters. As a result, photogrammetry provides high accuracy and detail, making it an indispensable tool in fields such as architecture, surveying, and cartography. The effectiveness of photogrammetry also depends on the quality of the source images and the equipment used, which emphasizes the importance of proper pre-shoot preparation.

The .cr2 images received from the cameras were color- and tone-corrected in Lightroom Classic.
The edited images were then transferred to RealityCapture, a program that generates a point cloud of objects in the image and creates highly detailed meshes from them. The specialists subsequently automated this process.
Following this, the team converted the resulting meshes to Wrap3, transferred them to the base character mesh, and then finalized the stylization and added accents in ZBrush.

The animation below, created using blendshapes in Maya, demonstrates the initial testing phase. The main goal was to reproduce the digital model as accurately as possible, closely matching the appearance of a real person, before beginning the stylization process. This approach allows us to achieve a high degree of realism and accuracy in character animation, which is critical for creating convincing visual effects in digital projects.

In the upper right corner, you can see a variety of emotions displayed by the actor. Just below is a collection of emotions created based on blendshapes. These visual elements help better understand the range of character expressiveness and their emotional state, which contributes to a deeper perception of the content.

Performance Capture

In Marvel’s Guardians of the Galaxy, the developers used three different motion capture methods. Let's take a closer look at each of them. These technologies allow for the creation of realistic character animations, improving interaction and immersion in the gameplay. Using different approaches to motion capture helps to more deeply reveal the characters’ personalities and gives the game a unique style.

The image shown shows the room where all the mocap sessions for cutscenes took place. This room was specially prepared for recording the actors' movements, which made it possible to create realistic animations and improve the quality of the project's cutscenes.

The monitors broadcast recordings of the actors' faces in vertical video format in real time. Still: Simon Habib's talk at GDC / YouTube

The Eidos Montreal space, although smaller than the soundstages of other AAA studios, proved ideal for recording dialogue, simple movements, and stunts in mocap sessions. Up to seven people could participate in these sessions simultaneously. Almost all participants wore special mounts with cameras mounted in front of the actor's face, allowing their facial expressions to be captured on video. The studio only had six of these devices in its arsenal, which underscores the uniqueness and effectiveness of their approach to the recording process.

Still: Simon Habib's talk at GDC / YouTube

Motion capture was recorded using the OptiTrack system, while facial animation was performed using Faceware technology. In the attached photo, you can see that all the actors, wearing headsets, had small microphones attached to their devices. This allowed the team to record audio tracks directly during filming, which significantly simplified the subsequent merging of audio files and synchronization with visual content. This approach ensures high-quality sound and the integration of audiovisual elements, which is an important aspect in the process of creating multimedia projects.

During filming, a key aspect was synchronizing the motion capture data and facial animation of the actors, as well as the audio tracks and three reference cameras installed around the perimeter of the room. To achieve this, common timecodes were used. It was also necessary to ensure simultaneous recording on all devices and arrange the files in a specified sequence. To address these challenges, the team developed a custom Lumiere plugin, which significantly simplified the process and increased its efficiency.

During the mocap recording session, approximately five key personnel were present: the producer, cutscene director, mocap specialist, sound engineer, and Simon himself. Assistants were also on set to help the actors don the mocap suits. This ensured high-quality footage and accurate character movement, which is critical to creating a realistic gaming experience. Each participant played a unique role in achieving the final result, highlighting the importance of teamwork in video game production.

Despite the comfortable filming conditions, the last year of production proved to be stressful. The pandemic necessitated a number of restrictions and strict social distancing guidelines, including wearing masks and protective screens, as well as regular use of hand sanitizer. A camera was installed in the studio to record everything that happened on set. Recordings were streamed via Zoom when necessary, allowing staff to remotely manage the filming process from home and provide immediate feedback via Slack. This ensured effective communication and continued workflow during the pandemic.

The collage shown here depicts the Faceware Mark III camera mounted on a helmet. Although the Mark III was available during the development of Marvel's Guardians of the Galaxy, the animators at Eidos Montreal chose the Mark III camera for their work. This decision reflects a high degree of confidence in proven technology, which continues to be used in animation to achieve high-quality results.

Mark III helmet design Still: Simon Habib's talk at GDC / YouTube

The helmet-mounted RGB camera provided 720p footage at 60 frames per second. An LED backlight, positioned beneath the camera and with adjustable brightness, effectively softened the harsh light from background light sources, minimizing the appearance of shadows. To achieve precise calibration, 27 key points were marked on the actor's face, allowing for facial tracking using specialized software, similar to the photogrammetry process. This ensured a high level of detail and accuracy in conveying emotion, which is especially important in computer graphics and animation.

Specialists regularly activated the grid overlay mode on the monitors to ensure the headset did not move and the actor's face was precisely centered in the frame. Between takes, the team adjusted the focus of the lenses to ensure image clarity, which facilitated more effective tracking. This approach guaranteed high-quality video and accurate capture of the actors' movements.

During filming, special attention was paid to the actors' comfort, and the Mark III camera met this requirement successfully. Its light weight made it significantly more comfortable to use than similar devices. The low position of the front camera and light sources did not obstruct the actors' view, allowing them to perform their roles more freely and naturally. This created a favorable atmosphere on the set and contributed to the high-quality execution of tasks.

A selection of footage for one of the cutscenes Still: Simon Habib's talk at GDC / YouTube

The image on the right shows six vertical videos shot with the front-facing camera. In the center is a selection of frames captured by three reference cameras. At the bottom are the title of the video and the timecode, which is necessary for synchronizing all media files, including separately recorded audio tracks. This approach ensures high quality of the final material and simplifies the editing process.

Video fragments are combined using an automated script using FFmpeg commands. A cutscene director analyzes the received content and selects the best takes to create a first rough cut. This process ensures high quality and smooth transitions between scenes, which is important for the final product.

Guardians of the Galaxy uses a unique method of recording acting lines called "talk sessions." Unlike the traditional approach, where actors record their lines separately in isolated booths, in Marvel's Guardians of the Galaxy they worked in the same room. This allowed the actors to interact with each other, significantly strengthening the chemistry between the characters. This method of recording dialogue contributed to more sincere and dynamic conversations, which made the game's story more engaging and lively. The use of "talk sessions" in the project highlights the importance of teamwork in the acting process and demonstrates how collaborative interaction can improve the quality of the final product.

23,000 lines of dialogue were recorded, forming the basis for the creation of high-quality game content. During filming, the actors wore special helmets with a frontal camera, which allowed for the recording of their facial expressions and emotions. The captured video footage was used to develop facial animation for characters in various gameplay situations, including movement through locations, combat scenes, and in-game dialogue. To accurately match the audio track with the corresponding video fragments, specialists used timecode metadata embedded in the audio files. This process ensured high-quality synchronization and realistic character interactions.

While the video recording captured all the nuances of facial expressions, the animators primarily relied on the lower part of the facial muscles for lip syncing. The complete data obtained from the actors' faces was loaded into a machine learning system, which analyzed and tracked a variety of emotions. This approach allows for more realistic animation, improving the quality of interaction between characters and the audience. The use of modern technologies in animation contributes to a more accurate reproduction of emotions, which significantly increases the level of audience engagement.

Footage taken during a talk session, where the actors stand still and read lines from a tablet. Still: Simon Habib's talk at GDC / YouTube

In-game dialogue is typically recorded using fixed or boom microphones, which limits the actors' freedom of movement. However, the voice acting for Marvel's Guardians of the Galaxy took an innovative approach: helmet-mounted microphones were used. This allowed the actors to move freely during recording, significantly improving the naturalness of their performances without affecting the sound quality. This method of recording dialogue was an important step in creating a more dynamic and immersive gaming experience.

Despite the length of filming, a friendly atmosphere reigned on set. Simon emphasizes that the actors often laughed and joked with each other during the process. It was clearly evident that they enjoyed their roles and fully immersed themselves in them. This created a special atmosphere conducive to the creative process and improved the quality of the final product.

The third and final type of motion capture used by the team is emotion capture. During these sessions, the animators asked the actors to convey a wide range of emotions with varying degrees of intensity. The resulting content was classified into four main groups of character emotional states: happy, angry, nervous, and sad. This approach allows for the creation of more realistic and expressive animations, which in turn improves the quality of the final product and improves audience interaction with the characters.

During communication, people express their emotions not only with words, but also with non-verbal gestures. This applies to both speakers and listeners, as gestures can depend on body position. With this in mind, actors were paired to record conversations in various poses—both standing and sitting. As a result, animators obtained a vast collection of gestures captured in a variety of positions. This diversity allows for better conveyance of the emotional tone of interactions in animation, making them more realistic and expressive. Facial expressions captured during conversations can be effectively used for emotion tracking using machine learning. This allows for determining the intensity of a character's emotional state. The program automatically selects appropriate animations from the library and applies them to the upper face and gestures. For more information on this process, we recommend reading the report by Simon's colleague, Romain Trachel.

The video shows that the lip movements, including those of the anthropomorphic character, are synchronized with the spoken lines. The character's gaze, neck, and torso automatically turn toward the speaker. This process is carried out automatically, and Simon rates it as a bronze quality level.

Batch Processing

The three types of sessions were processed into animation using batch processing. This mode analyzes thousands of videos, automatically identifying suitable facial animations for the actor and applying them to the corresponding character rig. Training is required for the program to recognize facial expressions to work effectively. To ensure the animation accurately matches a specific digital character, a unique profile is created for each actor.

Machine learning was used to develop a tracking model, which is a set of facial expressions and lip shapes. This model captures data collected by the camera throughout the recording. Optimizing facial expression and lip tracking improves the accuracy of analysis and user interaction, which is important for virtual reality and animation applications.

Screenshots from Faceware Analyzer Studio, a facial tracking program developed by Eidos Montreal used for machine learning Frame: GDC / YouTube

Simon emphasizes that in machine learning, it is necessary to follow the "less is more" principle. This means selecting only those training frames in which the facial expression reaches its peak. By excluding unnecessary frames, you can avoid overloading the process, which will simplify the interpolation between different facial expressions for the program. The key is to maintain consistency. If discrepancies are detected in similar expressions, the algorithm may not be able to process this data, which will lead to unreliable tracking results. The correct approach to choosing training frames and attention to detail play a crucial role in the successful application of machine learning technologies to facial expression analysis.

After processing the video, the animators used the Faceware Retargeter plugin for Maya to create a library of poses with corresponding facial expressions for the digital characters. The library includes a variety of highly pronounced emotions, as well as more moderate options to expand the range. The team also took into account slight asymmetries in facial expressions, which added to the naturalness. As a result, 20 unique character profiles were created, developed using Faceware technology, which significantly improves the realism and expressiveness of animation.

Eidos Montreal Animation Tools Interface for Batch Processing Still: Simon Habib's talk at GDC / YouTube

Recording a Range of Facial Expressions

To create each profile in Faceware, the team recorded a facial ROM, which reflected a range of facial expressions. This process involved meticulous facial motion capture, which allowed for accurate reproduction of expressions and emotions. High-quality facial animation recording has become a key step in the development of believable digital characters. This approach ensures a high degree of realism and detail in animation, which is especially important for games and films. Using facial ROM allows the Faceware team to create unique profiles that match the individual characteristics of each character, increasing the overall level of immersion in the virtual world.

The group's actors completed tasks depicting a variety of emotions. The recording session began with simple eye, eyebrow, and lip movements, gradually moving to the full engagement of all facial muscles to create more complex emotions. The entire recording process took about five minutes.

Recordings conducted to create character profiles also revealed additional benefits. Firstly, they provide an excellent vocal warm-up for the actors at the beginning of the workday, which helps improve their preparedness. Secondly, such sessions allow for efficient sound adjustments and testing of the simultaneous recording trigger on multiple devices using the built-in timecode. This ensures high-quality recording and synchronization, which is especially important for subsequent post-production.

The Eidos Montreal team paid special attention to recording facial expressions before each session, even if no new character footage was planned. Some characters used the same facial animation profile developed early in the project. New profiles were created only when new actors joined, when it was necessary to adapt their facial expressions to the digital characters. This ensured high-quality animation and realistic interactions between characters in the game.

Cinematic Refinement

At this stage, refinement of the existing animations was carried out in both technical and artistic aspects. The goal of these refinements was to achieve maximum accuracy and believability of the result.

Simon and three other specialists from the cutscene department performed various tasks related to motion capture. They assisted the actors in using mocap rigs and set up profiles in MotionBuilder. After motion capture was complete, the animations were processed and transferred to the appropriate rig for further integration into the project.

Facial animations used Faceware profiles based on a range of facial expressions. This method allowed for batch processing of Bronze-level animations, which significantly increased the efficiency and quality of animated content creation. Using Faceware allows for the creation of realistic and expressive animations, which is especially important for projects in the gaming and film industries.

Creating silver-level facial animations involved two key stages. During the technical stage, the team aimed to achieve maximum fidelity to the actors' facial expressions. Since the Faceware system does not capture tongue movements, this element was animated using keyframes in scenes with close-ups of the characters. During the artistic stage, the team focused on facial expressions in comical situations and added asymmetrical elements obtained through batch processing. This approach allowed for more lively and expressive animations, significantly improving the quality of the final product.

In the video, you can see two versions of the same facial animation. On the left is the bronze animation, created automatically without further manual processing. On the right is the silver animation, which underwent both technical and artistic processing. These two approaches to creating animation highlight the differences in quality and detail, which allows for a better understanding of the process of improving visual effects.

The specialists optimized the facial rig of each character individually in an empty scene without props. This technique allowed them to achieve maximum frame rate and focus solely on character animation, eliminating distracting environmental elements. This approach ensures high animation quality and improves the overall performance of the project.

Refinement of Star-Lord and Raker's facial animation for one of the cutscenes. Still: Simon Habib's talk at GDC / YouTube

A colored circle is located in the lower left corner of vertical videos, serving as an important indicator for specialists. It helps identify and adjust the rig when an actor's face extends beyond the medium close-up. This visual element allows for more precise adjustments and improved video quality, helping to achieve a professional result.

After completing the mocap data and silver-level animations, the team handed over the finished assets to the cutscene animators. These specialists refined the animation directly in the game engine, bringing it to gold-level quality. They assembled scenes, populating them with characters, objects, and vehicles, and positioned cameras at the desired angles to achieve maximum visual impact. The final content was used not only in cutscenes but also in marketing materials, which contributed to increased interest in the project and its successful promotion.

The cutscenes specialists also worked on animating anthropomorphic creatures and animals, for which human mocap was not possible. For example, when creating the character Cosmo's animation, the team did not use mocap for a dog's face. Instead, several scenes featured Diego's four-legged assistant, who was equipped with special motion capture gear. This allowed us to achieve realistic animation and preserve the character's unique features.

Cosmo in the game and his real-life prototype Diego Frames: WeRateDogs / Twitter

Cosmo in the game and his real prototype DiegoFrames: WeRateDogs / Twitter

Cosmo in the game and his real-life counterpart Diego. Frames: WeRateDogs / Twitter

The proportions of anthropomorphic characters differ significantly from those of a real person. That's why animators manually refine the rigs of characters like Rocket. This handcrafted work allows for more natural movement and expression, which is crucial for animation. Proper rig setup helps create a unique style and personality for a character, making them more engaging for viewers.

Results

Simon and his team accomplished a significant amount of work, balancing quantity and quality. Their focus is on achieving high standards, allowing them to create products that meet market demands. This balance is key to their success, allowing the team to not only increase production volumes but also maintain a high level of customer satisfaction.

As a result of 13 scanning sessions, 13 unique game character models were developed, each equipped with anatomically correct blendshapes. These models provide a high degree of realism and detail, enhancing player interaction with the game world.

Using Faceware technology, 20 unique profiles were created, which were then used for batch processing. These profiles significantly improve the quality of facial rendering and animation, ensuring high accuracy and realism in the final results.

Based on 23,000 lines of in-game dialogue, 99.9% of high-quality facial animations were developed, including character expressions in combat scenes, during dialogue, and while moving around game locations. The remaining 0.01% of animations were manually refined to achieve a perfect result. This attention to detail allows for more realistic interactions with characters and improves the overall experience of the game.

As a result, over five hours of cutscenes were created, including alternative storylines. Animations are presented in a ratio of 90% silver level and 10% gold level quality.

Eidos Montreal animators are striving to improve the quality of mocap-based animations. They plan to implement the ability to refine content that hasn't reached gold level directly in the scene. This innovation will significantly speed up the work process thanks to the ability to preview animations in the game engine with installed lighting and shaders, which is more efficient compared to pre-rendering in Maya.

Profession 3D animator

You will learn Create and animate 3D models of any complexity. Master in-demand programs and animation principles, and we'll help you find studio work or freelance assignments.

Find out more