Developing Social Engagement in VR with “Werewolves Within”
Published a year ago
A deep dive into Werewolves Within
In Werewolves Within, players are transported to a “town meeting” of the residents of Gallowston. Gallowston is plagued by Werewolves, and only by voting to eliminate one of the residents can the Townsfolk save their village. The Townsfolk try to identify the Werewolves while the Werewolves try to trick the Townsfolk into eliminating their own. It’s a classic social deduction challenge, and in VR, the game transports players from around the world into each other’s presence for the town meeting.
That presence is at the center of Werewolves Within; a foundation of social presence was where we started building the game. At its core, Werewolves Within is a virtual destination that has a game layered on top of it, as opposed to being a VR game that has social components interleaved. In fact, let’s skip the game design component of the development altogether and focus instead of the idea of the shared virtual space.
One of the most remarkable aspects of social VR is how little it takes to communicate to the user that one of the avatars in the world is another real person. Many developers take advantage of this by showing ghosts of users’ hands and/ or floating disembodied HMDs, and even with these minimalist avatars, it’s easy to see movement that immediately identifies that avatar as another human (as opposed to an AI). In a presentation he gave at Carnegie Mellon in 2014, Oculus Chief Scientist Michael Abrash put it similarly simply when describing an early networked Oculus prototype: “All of a sudden, it was a person.”
When the Werewolves Within team first started prototyping the game in Unity, they were able to assemble a proof-of-concept build in just a few weeks. As you might expect, the early prototypes were fairly limited in their functionality. The avatars weren’t as expressive as they are in the shipping game. But even in their prototype version, it was obvious that they belonged to people. Realizing that the other avatars in the game belonged to real people was the first step in developing the movements that could be built into game mechanics.
Since the “town meeting” assumed that the player would be seated, we built several of the key communication systems to use natural, seated conversational movements in ways that would seem familiar to the players performing them. For example, one of the most popular features has proven to be the Whisper. To whisper with another player, simply lean toward that player, and if they lean toward you, the game creates a private VOIP channel in which only you and the other participant can hear what you’re saying. Not only does leaning in to whisper feel natural, leaning toward a person who’s not even next to you in physical space turns takes advantage of that natural movement taking place in networked virtual reality. Leaning in to whisper with a person 3000 miles away feels just as leaning in to whisper with an adjacent person at an actual table.
Other movements followed suit in development. Standing to perform a monologue emerged as a way to silence the table and force everyone to listen to what the speaker had to see. (The timer was originally set to 20 seconds for this, which we rapidly realized could be used onerously, so the final monologue time is limited to 10 seconds.) One of the roles gains information by looking up to “gaze at the stars” and divine information. Another bows their head in prayer to receive a “holy vision” of a Werewolf’s identity. All of these actions translated 1:1 with the player’s movements to perform them, preventing the dissonance that can sometimes occur in VR when a player’s mind doesn’t agree with their body when they perceive movement or activity. Moreover, other players could see each other performing these actions, and recognizing the movements became part of the non-verbal vocabulary of the game. Knowing who was praying for a holy vision helps the Werewolves identify the Saint… but then pretending to pray for a holy vision becomes a Townsfolk tactic to help protect the Saint. This adds a level of misdirection and counter-misdirection to the experience and a new level by which to add challenge to the social deduction of the core game.
As our understanding of the significance of the social component increased, so did our refinement of the systems that highlighted social presence. In particular, we devoted many of our development efforts toward increasing the communication cues. 
  • Avatars’ eyes are magnetic. When Player A and Player B look at each other, their eyes are drawn to each other’s eyes. This helps communicate that the players are focusing their attention on each other and very closely approximates real-world eye contact.
  • Real-time lip-sync. The game detects phonemes by analyzing player speech input and translates it to the avatars’ faces.
  • Non-focal sound ducking. When Player A and Player B are talking to one another, all other VOIP volume is ducked, reducing the amount of distraction posed by conversations that aren’t their own.
  • Characters perform conversation gestures that are based on their voice input, in two degrees of significance. Speaking at a normal volume and intensity causes the character to perform a visible but non-emphatic gesture that heightens the impression of actual body language. Speaking more loudly and intensely causes these conversation gestures to be more exaggerated, performed in a wider span.
  • The gestures are confined to a general area below the player’s field of vision. If you look for your own gestures, you’ll see them, but in most cases, they don’t obstruct the speaker’s view. (Also, all gestures are suppressed in first-person view while the user has their magical book UI displayed.)
  • Players can opt to perform more expressive emotes. These are pre-recorded, but because they are actively initiated by the player, they are often in the player’s field of vision. Many of these are designed to be communicative as opposed to simply entertaining. For example, one of the emotes causes the player to point at what they’re looking at, which can be used to nonverbally accuse or indicate a voting target. The player can perform a thumbs-up or thumbs-down, indicating agreement or disagreement. Other emotes include “I’ve got my eyes on you,” a sarcastic slow clap, or a pearl-clutching “who, me?” All of these have their obvious uses, but we’re looking forward to a variety of emergent uses as players explore them.
The result is a social space in which the game occurs. If you watch any of the Twitch streams, you’ll see that not only is the result effective for gameplay, in many cases, the players are socializing above and beyond the social gameplay itself. Players casually chat in the game’s lobby state; they readily send one another friend invitations because they want to play with each other again once they’ve made each other’s acquaintance. Internally, we even ran a build of the game that had the game systems turned off, just because it was enjoyable to hang out in the game world with other people and use the social features.
I was initially a VR skeptic. But over the course of prototyping I quickly came to realize how much social VR allowed me to satisfy the same sorts of psychological needs that I seek in my analog gaming. When friends and I get together to play board games, the first step is getting together, which addresses a basic human desire for social interaction. And with Werewolves Within, addressing that human desire for relatedness is at the center of the game.
Even if the game is about lying to one another….
CeeCee Smith