2024-10-28

Overanalyzing and Synthesizing

Crafting the accessible metroidvania experience of Periphery Synthetic

Greetings surveyors!

I was delighted to be invited to speak virtually at GAconf USA on October 28, 2024. The conference was filled with a diversity of perspectives covering topics such as inclusive design, assistive technologies, and behind-the-scenes looks at accessibility in games. It was an incredibly informative experience which reminded me that people are awesome—and we have so much to learn from each other. I’m deeply grateful for the organizers who produced this must-see event.

Continue reading to find the video—and a full transcript—of my presentation about my journey over the last 2.5 years. It’s a half-hour deep dive which explores the inspiration, tools and techniques, challenges, and lessons learned from developing Periphery Synthetic.

I’m excited to share more details about the upcoming free content expansion with you soon. There is a lot more to it than I’ve revealed. Enjoy!

Watch the video

Read the transcript

This is a transcript of my remarks, as delivered in my prerecorded video, to attendees of GAConf 2024.

Introduction

Hey! Let’s talk about overanalyzing and synthesizing, or: how I crafted the accessible metroidvania experience of Periphery Synthetic.

I’m shiftBacktick

I’m shiftBacktick and I use they/them pronouns. For full disclosure: I’m an independent developer, and this talk is just about my own experience in developing an unconventional project that challenges our reliance on sight, and not necessarily a one-size-fits-all blueprint for better accessibility in games.

So, let’s see. I’m a human from planet earth, who is sitting in my home office. I’m pale, against a Barbie-pink wall in the background. I’m wearing all black, with sunglasses, my favorite studio headphones on my ears, a concept of a beard, and very long hair—or exactly how you might expect I look.

I’m also a web developer by trade, a computer musician by night, and I advocate for accessibility from both perspectives. I create elaborate synthesizers that are played like games in the web browser—mostly smaller ones created for game jams, and four of which are fully polished and released on Steam—including most recently: Periphery Synthetic.

Periphery Synthetic

It’s a chill and nonviolent metroidvania game where you collect materials, unlock new abilities, and reveal its story across: a desert planet with shifting sands and hidden structures, a water world with waves to surf and caves to explore, and its icy moon with vertical terrain and mountains to scale; but, it’s also an interactive generative ambient music album, where the game is an instrument itself!

That’s a lot to unpack, so here are the key features of Periphery Synthetic:

Everything you hear is purely synthesized on-the-fly—in real-time—from its ambient music systems and environmental sounds, to your own footsteps and a playable instrument. The musical worlds that you explore are procedurally generated to provide a uniquely personal journey for every player through ever-changing never-repeating liminal spaces. To traverse them, you’re given full freedom of movement in three dimensions, including: jumping and climbing, driving and drifting, swimming and diving, and flight capabilities.

Importantly, all of this is fully accessible without seeing the screen, including: full screen reader support for menus, notifications, and special in-game hotkeys to access additional information; navigational audio cues, echolocation, and terrain sonification at the press of a button; haptic feedback punctuating every physical interaction; and even a toggle for the graphics itself, which invites everyone to try it as an audio-only experience.

So coming up next we’ll discuss: a brief background of the project, including the tools and techniques used; then, a deep dive into the approaches toward audio, movement, navigation, and iterative development; and finally, we’ll close with what’s next for me, some key findings, and takeaways for you.

Genesis

Let’s get started with the question: why make games from synths?

My journey

As a lifelong musician, programmer, and video game enjoyer, I may randomly find myself writing a song, prototyping a wild idea, or deep into a 100-hour role-playing game. I started early—on a Windows 95 computer—and I fell in love with how empowering it was to have this versatile tool that can write MIDI music, build HTML sites, and play MechWarrior 2. That love followed me through my study of music technology in university and an unconventional career path.

My experiences as a web developer in both the private and public sectors have shaped my philosophies toward technology and its accessibility—to the point that it should never be an afterthought, but the foundation we must lay to raise up and enrich the experiences of as many folks as possible. It’s also a moving target which evolves as our understanding of our capabilities as developers—and the needs of our players—become clearer with feedback and experience.

For a long time, I’ve had this unrealized fascination with exploring the intersection of music, code, and accessibility. It began as a thought experiment: what if I made a gigantic synthesizer with hundreds—or even thousands—of parameters that you explore with just an Xbox controller?

In 2019 I finally built the first prototype, documenting the results on my personal website. It was much better of a learning experience than a game, but it laid the foundation for what would later become my first Steam release. Yet—as I began Googling where I might find my first-ever playtesters for this bizarre, psychedelic, audio-based exploration game that I was building—I quickly learned that this concept of an audio game was not entirely unique.

Audio games

They were first pioneered in the 1970s with handheld games like Simon—which was obviously perfected with Bop It in the 90s. In 1984, the Macintosh was the first computer to put synthesized text-to-speech in your home, which instantly made a vast catalog of text-based games accessible to folks who are blind. As the power of home computers increased over the decades, hobbyists— many of whom were blind themselves—experimented with audio to create more accessible audio-only game experiences.

And now, they span virtually every genre—from arcade and role-playing games, to racing games and shooters—all made accessible with text-to-speech and immersive binaural sound design—many of which were cultivated by vibrant online communities, like AudioGames.net, whose members actively discuss, build, and teach others to build games that are accessible to them. These communities are fueled by: dedicated developers who create full retail-quality audio games, host and join accessibility-focused game jams; and community streamers who play and review their creations. Yet, in a happy accident, joining this community myself was instrumental in shaping the scope and complexity of the synthesizers that I now build almost five years later.

Tools and techniques

So, where do we even get started with building games from synths?

The web browser

As a web developer, it was most comfortable for me to stick with technologies I already know, and thankfully I know that the web browser—and underlying web technologies—are extremely powerful, including beautiful 3D graphics and rich synthesized sounds—all in the web browser without any extensions. It even gives us accessible user interfaces—for free—as long as we follow the specifications and best practices.

The Web Content Accessibility guidelines (or WCAG) provides a framework of criteria that ensure perceivability, operability, understandability, and robustness for as many folks as possible, such as being able to navigate a page with a keyboard and screen reader. For more complex user interfaces, it recommends using Accessible Rich Internet Applications (or ARIA) attributes, which provide the tools for solving the technical requirements of WCAG, such as communicating the role and state of dynamic user interface elements like buttons, toggles, or sliders to assistive technologies like screen readers, like whether they are clickable—or helpful descriptions about them.

Web audio

So by building a web application that implements WCAG with ARIA, we’re building something that’s externally consistent and familiar to all players—no matter how they experience the web. But good accessibility is just the foundation of the more interesting tools and techniques available in the web browser, such as: the Web Audio API, which is a modular synthesis environment for creating rich and complex audio circuits in real-time with code, from recording or synthesizing streams of audio, manipulating it with effects, and playing it back to the user.

With oscillators and buffers containing random noise we can use pure synthesis techniques, such as amplitude and frequency modulation (or AM and FM for short), additive, subtractive, and granular synthesis—which are all just fancy ways of combining basic sound waves into increasingly more complex sounds to the point we’re making Pokémon sounds or emulating a glockenspiel.

However, some advanced knowledge is necessary to get the most from the powerful toolset—and empty canvas—provided by the Web Audio API. For example: music theory is essential for crafting consonant sounds that sound beautiful together by understanding the very subjective relationships of their root frequencies and harmonics. Similarly: the process of routing the various individual sounds into a cohesive soundscape—that is both loud and clear through mixing and effects processing—is one that would benefit from experience with audio engineering.

Unfortunately, Web Audio also has its limitations compared to more low-level languages or commercial audio middleware—such as a relatively low performance ceiling that limits the overall complexity and number of simultaneous sounds—which leads to sparser and more abstract soundscapes due to the sheer amount of work it puts on a single core of your device’s processor.

Custom tooling

With that, as a learning exercise, I wanted to deeply understand these technologies—as well as all of the underlying tools and techniques for audio synthesis, procedural generation, and game development—all in the web browser. This led to the creation of a framework of custom tools—and a project template—that I use and continuously improve for all of my synthesis projects. The majority of them, and the framework and template, are open-source on GitHub for you to learn from—as long as you ignore all the ugly code.

Beyond the basic event loop which everything hooks into, it provides all of the necessary recipes for success. For example: it has helpful data structures like vectors and quaternions for representing points of data and their rotations, or quadtrees and octrees for efficient storage and retrieval of nearby data all at once.

Of course it has your algorithms, like OpenSimplex noise, which enables us to procedurally generate endless worlds from smoothed random numbers. Let’s say we have a field of two-dimensional noise, which represents a heightmap. If we sample and store its values in a quadtree, and pass them into a WebGL program each frame, then we might a basic terrain renderer.

The same could be said for all of the many ways that sound can be synthesized and processed. The framework has factories for building all the basic synthesizers, like AM and FM synths, which can be routed into pre-built effects like ping-pong delay, overdrive, or a talkbox, before applying some final compression before it reaches your ears.

Altogether, with the project template, I can more easily get started on projects both big and small with the scaffolding of solutions it provides for building the user interface, managing focus, handling input, save games, user preferences, and more. Ultimately, developing these tools—and using them for three years to create a dozen smaller projects—gave me the deep technical insight and confidence to continue pushing the boundaries of what is possible in the web browser to finally develop Periphery Synthetic.

Periphery Synthetic

The overall objective of Periphery Synthetic is to deliver a deeply personalized exploration of ambient music, couched within an undemanding metroidvania experience, that can be enjoyed by as many folks as possible.

Goals

To achieve this, I strove to meet these goals:

To compose an ambient music album where each of its tracks are detailed sonic environments that evolve over time and space, react to players’ input, and invite them to play along with a built-in instrument—from which a uniquely intimate experience emerges for each player.
To provide just enough movement mechanics and gameplay systems to actively engage players and their natural curiosities—without any handholding tutorials or nagging quests which could distract from their own nonlinear explorations of the music and its interactivity.
To ensure that all aspects of the experience are reasonably accessible to as many folks as possible by using a combination of audio, video, and haptic cues for each important interaction—and wrapping it up in an accessible user interface.
And finally: to build upon the knowledge, challenges, and successes of past projects to further my own understanding of synthesis, game design, and accessibility—as a stepping stone to even bigger projects—while continuing to push the boundaries of what can be done in the web browser.

General approaches

So let’s discuss the various approaches to getting Periphery Synthetic over the finish line, starting with its basic principles before diving into more specific topics:

A key principle of WCAG is making user interfaces more understandable by actively engaging more than one way of cognition—and this is directly applicable to game design as well. To reach parity between the senses, I sought to represent each element with at least two senses. For example: you may see sand blowing in the wind or snow falling from the sky. Those are complemented with grains of subtractive synthesis, which imply the wind speed and severity of the weather around you.

It’s at its best when experienced with a gamepad—where nearly everything has a detailed, parameterized haptic cue, which finds your controller vibrating with every footstep, driving surface, collision, and more. These vibrations—of course—can be reduced or toggled off from a settings screen.

Another key principle of WCAG is expanding the operability of a user interface by supporting multiple input methods. So beyond the gamepad, the keyboard has extensive mappings—as easy to reach with just one hand as possible, including a full mapping on the numpad to also support left-handed play. Beyond the first-person look controls, the mouse also has mappings for common actions like jumping, drifting, changing movement modes, and scanning the environment—to the point that it can be almost entirely played with just a mouse. Something I’m also very excited about is Steam Input, which gives folks the tools to create their own mappings, and support other peripherals, like a steering wheel, throttle, or other accessible controllers that the browser doesn’t support on its own.

Lastly, I’ve found success in testing these approaches early and often. For example: I tend to close my eyes—or turn off my headphones—to verify the parity between the audio and visual cues of a new feature. And whenever I add a new screen: I always check in with my friend Microsoft David in my favorite screen reader to verify that the interface will be fully read aloud, to align with WCAG’s key principle of robustness.

Audio

From this universal baseline of accessibility, we can create immersive experiences with audio. To do that, I chose to leverage strictly diegetic music and sounds, for the worlds and their soundscapes to feel entirely within your control.

The musical elements of each world are evolving manifestations of the terrain, and nearby celestial bodies. For example: the sun on the desert planet emits all of its musical textures, which evolve in loudness and brightness throughout its day/night cycle. However, it’s not just an object in the sky that circles the horizon every half hour. It’s so bright at noon, that it can obscure the sound of objects between you and it—encouraging you to face away to muffle its droning chords. Perhaps, as you jump, you’ll hear its deep sub bass fade in and out.

Gluing everything together is an immersive environmental sound design. For example: the dynamic wind system is a subtractive synthesizer, which takes pink noise, and filters and pans it relative to the current wind speed, plus your velocity, times the atmospheric pressure, etc. It supports normal winds you might hear while standing still, to the yo-yo of cacophony and stillness of exiting and reentering the atmosphere at orbital speeds.

A key consideration was giving each environment—and the objects they contain—a distinctly recognizable sound by grouping them into sonic families. The most obvious example are the musical worlds themselves—and the specific choices that were made—to make their music systems uniquely interactive based on the setting. On the water world, the underwater music evolves slowly as it is explored in three dimensions to evoke that sense of being a submarine. However, its surface is based more in chance, using granular synthesis to mimic the chaotic motion of the water.

Many synthesis techniques are reserved for specific characters or themes to help preserve their families as well. For example: square waves are only used when representing more advanced civilizations or their technologies for their very rigid and unnatural sonic characteristics. Similarly: the player character, and all user interface sounds by extension, are the only sources of pulse-width-modulation. This is a special kind of square wave with a lot more expressive qualities, for giving it that unique voice.

The biggest challenge with pure synthesis was giving each sound a clear and understandable place in the mix. I found that using a Head-Related Transfer Function (or HRTF) was not enough when dealing with sounds that rarely go beyond 2kHz. Sounds in this frequency range can easily fall within our cone of confusion, where it’s difficult to localize where they’re coming from. More nuanced directional filtering was found to accurately simulate the acoustic shadow cast by your head—by sweeping through a musical frequency range specific to each sound as it is panned.

For example: a triangle wave directly at your right side may have a few harmonics in your right ear—sounding brighter—but have those filtered out in your left ear—sounding darker. In general, this harmonic manipulation was the secret sauce that was applied to enhance the panning of every directional sound in Periphery Synthetic.

Movement

Given that these immersive audio environments are driven by how you move through them, finding the right balance between fun and physical realism with the movement system was extremely important. When you take your first steps on that desert planet, you are quite average: walking, running, and jumping at very slow familiar rates. However, through a straightforward system of horizontal progression—which you advance by simply collecting materials, and applying them toward your favorite abilities—you gradually acquire and hone a diversity of movement abilities, which mirror the unique challenges of each environment.

On this desert planet, with its vast stretches of barren dunes, you gain the ability to drive and drift to reduce your commute time between distant destinations. Similarly: on the ice moon you gain flight abilities which help you navigate its unforgivingly rough and vertical features. You can even bring these abilities to the other worlds to keep exploring with fresh perspective and approach. With all of the abilities fully leveled up, the intention is for you to finally feel like a superhero empowered to overcome every obstacle that you encounter.

The biggest challenge to finding this empowerment through movement was limiting common frustrations with collision detection, through lenience in how they are detected and corrected. As a purely relaxing experience, it made sense to make the player character invincible, with no fall damage or major repercussions when falling from great heights. In my opinion, it’s more fun to get up and try again when I miss a jump, rather than losing my stuff and sitting through a loading screen.

Except for the underwater caves, where you’re navigating enclosed three-dimensional spaces, there are also no invisible walls. Instead, when you meet a steep slope, your maximum speed is reduced, allowing you to climb over any surface—just at an inefficient pace compared to jumping, flying, or driving around the obstacle.

In that sense, the terrain is very sticky. In fact, the main difference between on-foot and driving movement modes is that the player always sticks to the ground while on-foot unless deliberately jumping. The driving mode is a little less forgiving, letting you recklessly drive off a cliff or use a boulder as a makeshift ramp. With this approach, I’ve effectively created a musical playground where you have absolute freedom to experiment and push its limits with as little friction as possible.

With full freedom of movement in these procedurally-generated worlds, there also needs to be a way to help guide folks to resources and points of interest. In Periphery Synthetic, a scanning system was implemented to find further parity between its audio and visual elements—and after movement, it was further expanded upon to be your other main interaction with the environment: to reveal fast-travel destinations and pieces of narrative.

Its core design principle is being unintrusive by being something you actively engage, rather than something that’s always passively happening around you. With one button, you activate a scanning sequence which, over the course of two seconds, gives you full situational awareness of the area ahead of you. It has three components:

Upon activating the scanner, you’ll hear an echolocation sound which calls out from you, and responds with a sound that indicates how far away your eyeline intersects with a solid surface. This is especially helpful for gauging big jumps and navigating dark caves.
Throughout the scan, you’ll hear the terrain sonified around you. This could be a thirty-minute talk by itself, so I’ll summarize by saying it’s achieved by tracing seven rays from the ground beneath you, converting the relative heights of the terrain into specific musical frequencies, and synthesizing them as streams of stereo tones to help you visualize the terrain with sound.
Simultaneously, you’ll hear the blips of collectables, positioned on top of this terrain, much like a metal detector, to help you locate more resources for leveling up.

After the scan, a persistent sound is emitted from the nearest collectable it reveals, if any. The parameters of this sound change as you get farther or closer, above or below, or turn away or toward it, to guide you like a game of hot and cold. A similar yet more subtle cue, paired with a particle animation on the screen, guides you toward the nearest point of interest. This could be a new destination to explore, an important object to collect, or even a portal between the worlds.

Iterative development

Importantly, the navigation systems did not materialize overnight, or never change throughout development. Instead, they were mindfully crafted and refined from feedback collected throughout early access. Periphery Synthetic was developed in the open, on my personal website, with its first public alpha available for download on itch.io in August 2022, followed by two years of regular milestone releases, smaller patches, and hotfixes before its full Steam release in August 2024.

Throughout development, a small, yet dedicated, group of folks regularly followed along, playing many of the builds as they came out, sharing their experiences with me via Discord, itch.io, and the AudioGames.net forums. It’s important to note: feedback comes from a diversity of experiences, and not all of it is constructive or actionable, such as folks wanting me to add guns, or use real sampled sounds instead, which are contrary to the overall vision of the project.

Beyond sharing its emotional impact on them, their feedback helped me identify areas for improvement, which were iterated early and often, and rolled out with the regular updates to create a cycle of feedback. Often these were small changes that built up over time, like how dozens of slight tweaks to the audio mix to increase its loudness, spatialization, and clarity might make its first alpha release sound like an entirely different experience when compared to the full release. Yet the feedback also led to new features or massive overhauls of existing features, like the ability to charge your jumps in any direction, or how the scanner locks on to collectables or points you to the next destination.

Without this feedback, I’m certain that, if the project had even actually gotten over the finish line, then it would be far less polished and complete than it is today. I’m so incredibly grateful for everyone who encouraged and supported me through this journey.

Conclusions

Well, now that we’ve covered the background, tools and techniques, and various approaches to get Periphery Synthetic over the finish line, let’s wrap up. After two and a half years of development, from the initial concept, to early access, to full release, now what?

Next steps

Currently I’m working on two free content expansions, which will each add a new musical world to explore, while also adding more depth and refining existing gameplay systems. For example: the first expansion will add the lichen sphere: an ethereal moss world of shifting vertical landscapes, like scaling the canopy of a massive forest. It also adds new mechanics for breaking down and transmuting the collectable materials, and new aerodynamics and gliding abilities, to create new avenues for how you explore and level up. I’m also working on a second expansion which adds a lava planet—where the magma is constantly hiding and revealing the landscape as it rises and falls—but this is much earlier in development.

Ideally, future updates will continue to make performance optimizations and implement commonly-requested features, like input remapping or audio balance sliders, which would help further its accessibility goals by helping folks with motor or hearing difficulties.

One day, I’d even like to move on from Periphery Synthetic by exploring more genres in a game jam, like unique puzzle or strategy games built with synthesizers, or even something entirely new. Imagine taking everything I learned from Periphery Synthetic, such as the movement mechanics and navigational audio cues, and applying it to a game with giant robots that fight to control territory, and upgrade their equipment on a procedurally-generated planet.

Key findings

Overall, I found that developing Periphery Synthetic was a unique insight into making larger open-world experiences filled with exploration, crafting, and discovery that are accessible to as many folks as possible, from finding parity between our senses, to getting it right through iteration. Specifically, the cycle of iterative development and collecting player feedback was integral to achieving its accessibility goals, because I am just one person with one perspective from an abled body in many ways, who can not possibly understand the diverse life experiences of the folks who my work touches.

The biggest lesson that I learned is that music and environmental audio cues can be immersive—not just functional—in the pursuit of accessibility. Every element of the soundscape has the potential to convey important information about the player’s position, the game state, or the environment itself, in purely diegetic ways. In that sense, my main takeaway is that generative audio absolutely has untapped potential in making in-game, and real-world environments, more accessible to all, with a multitude of applications in augmented and virtual realities alike.

Calls to action

And finally, some things for you to consider:

Turn off your screen, or blindfold yourself, to experience the world in a different light. You may try browsing the web with a screen reader to understand how the mental model of accessing a website shifts when you can’t see it, or what frustrations you encounter when a website ignores the Web Content Accessibility Guidelines. Or you may try playing your favorite video game—if you can even get past the main menu—and try to complete the first objective or level to get a sense of what, if anything, is missing to make it more accessible.

Furthermore, there exists a vast database of audio games on AudioGames.net which you may play to understand the history, challenges, and solutions they reach. Compare and contrast them with video games, such as the design choices in their audio and gameplay which make them an accessible alternative to folks who can’t see the screen. Are there any improvements that could be made to your favorite video game that would make it more inclusive? And if you like any of the audio games—or more generally appreciate the dedication that went into it—then please consider supporting their developers.

Finally, you may take what you learn from these exercises and try building your own accessible experiences. Are you feeling ambitious? Join one of the upcoming game jams and make an audio game for yourself. Or do you have an existing video game, whether it’s one you’re making or one you can modify? Add screen reader support or more immersive audio to support unsighted play.

I’m shiftBacktick, that was Periphery Synthetic, and you are awesome. Cheers!

Watch the video

Read the transcript

Introduction

I’m shiftBacktick

Periphery Synthetic

Genesis

My journey

Audio games

Tools and techniques

The web browser

Web audio

Custom tooling

Periphery Synthetic

Goals

General approaches

Audio

Movement

Navigation

Iterative development

Conclusions

Next steps

Key findings

Calls to action