Radar / Metaverse

The Metaverse Is Not a Place

It’s a communications medium.

By Tim O’Reilly

August 2, 2022

What color am I, series 03, scott richard (source: torbakhopper on Flickr)

The metaphors we use to describe new technology constrain how we think about it, and, like an out-of-date map, often lead us astray. So it is with the metaverse. Some people seem to think of it as a kind of real estate, complete with land grabs and the attempt to bring traffic to whatever bit of virtual property they’ve created.

Seen through the lens of the real estate metaphor, the metaverse becomes a natural successor not just to Second Life but to the World Wide Web and to social media feeds, which can be thought of as a set of places (sites) to visit. Virtual Reality headsets will make these places more immersive, we imagine.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

But what if, instead of thinking of the metaverse as a set of interconnected virtual places, we think of it as a communications medium? Using this metaphor, we see the metaverse as a continuation of a line that passes through messaging and email to “rendezvous”-type social apps like Zoom, Google Meet, Microsoft Teams, and, for wide broadcast, Twitch + Discord. This is a progression from text to images to video, and from store-and-forward networks to real time (and, for broadcast, “stored time,” which is a useful way of thinking about recorded video), but in each case, the interactions are not place based but happening in the ether between two or more connected people. The occasion is more the point than the place.

In an interview with Lex Fridman, Mark Zuckerberg disclaimed the notion of the metaverse as a place, but in the same sentence described its future in a very place-based way:

A lot of people think that the Metaverse is about a place, but one definition of this is it’s about a time when basically immersive digital worlds become the primary way that we live our lives and spend our time.

Think how much more plausible this statement might be if it read:

A lot of people think that the Metaverse is about a place, but one definition of this is it’s about a time when immersive digital worlds become the primary way that we communicate and share digital experiences.

My personal metaverse prototype moment does not involve VR at all, but Zoom. My wife Jen and I join our friend Sabrina over Zoom each weekday morning to exercise together. Sabrina leads the sessions by sharing her Peloton app, which includes live and recorded exercise videos. Our favorites are the strength training videos with Rad Lopez and the 15-minute abs videos with Robin Arzón. We usually start with Rad and end with Robin, for a vigorous 45-minute workout.

Think about this for a moment: Jen and I are in our home. Sabrina is in hers. Rad and Robin recorded their video tracks from their studios on the other side of the county. Jen and Sabrina and I are there in real time. Rad and Robin are there in stored time. We have joined five people in four different places and three different times into one connected moment and one connected place, “the place between” the participants.

Sabrina also works out on her own on her Peloton bike, and that too has this shared quality, with multiple participants at various “thicknesses” of connection. While Jen and Sabrina and I are “enhancing” the sharing using real-time Zoom video, Sabrina’s “solo” bike workouts use the intrinsic sharing in the Peloton app, which lets participants see real-time stats from others doing the same ride.

This is the true internet—the network of networks, with dynamic interconnections. If the metaverse is to inherit that mantle, it has to have that same quality. Connection.

Hacker News user kibwen put it beautifully when they wrote:

A metaverse involves some kind of shared space and shared experience across a networked medium. Not only is it more than just doing things in VR, a metaverse doesn’t even require VR.

The metaverse as a vector

It’s useful to look at technology trends (lines of technology progression toward the future, and inheritance from the past) as vectors—quantities that can only be fully described by both a magnitude and a direction and that can be summed or multiplied to get a sense of how they might cancel, amplify, or redirect possible pathways to the future.

I wrote about this idea back in 2020, in a piece called “Welcome to the 21st Century,” in the context of using scenario planning to imagine the post-COVID future. It’s worth recapping here:

Once you’ve let loose your imagination, observe the world around you and watch for what scenario planners sometimes call “news from the future”—data points that tell you that the world is trending in the direction of one or another of your imagined scenarios. As with any scatter plot, data points are all over the map, but when you gather enough of them, you can start to see the trend line emerge.…

If you think of trends as vectors, new data points can be seen as extending and thickening the trend lines and showing whether they are accelerating or decelerating. And as you see how trend lines affect each other, or that new ones need to be added, you can continually update your scenarios (or as those familiar with Bayesian statistics might put it, you can revise your priors). This can be a relatively unconscious process. Once you’ve built mental models of the world as it might be, the news that you read will slot into place and either reinforce or dismantle your imagined future.

Here’s how my thinking about the metaverse was formed by “news from the future” accreting around a technology-development vector:

I had a prior belief, going back decades, that the internet is a tool for connection and communication, and that advances along that vector will be important. I’m always looking with soft focus for evidence that the tools for connection and communication are getting richer, trying to understand how they are getting richer and how they are changing society.
I’ve been looking at VR for years, trying various headsets and experiences, but they are mostly solo and feel more like stand-alone games or if shared, awkward and cartoonish. Then I read a thoughtful piece by my friend Craig Mod in which he noted that while he lives his physical life in a small town in Japan or walking its ancient footpaths, he also has a work life in which he spends time daily with people all over the world. I believe he made the explicit connection to the metaverse, but neither he nor I can find the piece that planted this thought to confirm that. In any case, I think of Craig’s newsletter as where the notion that the metaverse is a continuation of the communications technologies of the internet took hold for me.
I began to see the connection to Zoom when friends started using interesting backgrounds, some of which make them appear other than where they are and others that make clear just where they are. (For example, my friend Hermann uses as a background the beach behind his home in New Zealand, which is more vividly place based than his home office, which could be anywhere.) That then brought my exercise sessions with Sabrina and Jen into focus as part of this evolving story.
I talked to Phil Libin about his brilliant service mmhmm, which makes it easy to create and deliver richer, more interactive presentations over Zoom and similar apps. The speaker literally gets to occupy the space of the presentation. Phil’s presentation on “The Out of Office World” was where it all clicked. He talks about the hierarchy of communication and the tools for modulating it. (IMO this is a must-watch piece for anyone thinking about the future of internet apps. I’m surprised how few people seem to have watched it.)

Trying Supernatural using the Meta Quest 2 headset completed the connection between my experience using Zoom and Peloton for fitness with friends and the VR-dominant framing of the metaverse. Here I was, standing on the edge of one of the lava lakes at Erta Ale in Ethiopia, an astonishing volcano right out of central casting for Mount Doom in The Lord of the Rings, working through warm-up exercises with a video of a fitness instructor green-screened into the scene, before launching into a boxing training game. Coach Susie was present in stored time, just like Robin and Rad. All that was missing was Jen and Sabrina. I’m sure that such shared experiences in remarkable places are very much part of the VR future.

That kind of shared experience is central to Mark Zuckerberg’s vision of socializing in the metaverse.

In that video, Zuck shows off lavishly decorated personal spaces, photorealistic and cartoon avatars, and an online meeting interrupted by a live video call. He says:

It’s a ways off but you can start to see some of the fundamental building blocks take shape. First the feeling of presence. This is the defining quality of the metaverse. You’re going to really feel like you’re there with other people. You’ll see their facial expressions, you’ll see their body language, maybe figure out if they’re actually holding a winning hand—all the subtle ways that we communicate that today’s technology can’t quite deliver.

I totally buy the idea that presence is central. But Meta’s vision seems to miss the mark in its focus on avatars. Embedded video delivers more of that feeling of presence with far less effort on the part of the user than learning to create avatars that mimic our gestures and expressions.

Chris Milk, the CEO of Within, the company that created Supernatural, both agreed and disagreed about avatars when explaining the company’s origin story to me in a phone conversation a few months ago:

What we learned early on was that photorealism matters a lot in terms of establishing presence and human connection. Humans, captured using photorealistic methods like immersive video, allow for a deeper connection between the audience and the people recorded in the immersive VR experience. The audience feels present in the story with them. But it’s super hard to do from a technical standpoint and you give up a bunch of other things. The trade-off is that you can have photorealism but sacrifice interactivity, as the photorealistic humans need to be prerecorded. Alternatively, you can have lots of interactivity and human-to-human communication, but you give up on anyone looking real. In the latter, the humans need to be real-time-rendered avatars, and those, for the moment, don’t look remotely like real humans.

At the same time, Milk pointed out that humans are able to read a lot into even crude avatars, especially when they’re accompanied by real-time communication using voice.

Especially if it’s someone you already know, then the human connection can overcome a lot of missing visual realism. We did an experiment back in 2014 or 2015, probably. Aaron [Koblin, the cofounder of Within] was living in San Francisco, and I was in Los Angeles. We had built a VR prototype where we each had a block for the head and two blocks for our hands. I got into my headset in LA, and Aaron’s blocks were sitting over on the floor across from me as his headset and hand controllers were sitting on his floor in San Francisco. All of a sudden the three blocks jumped up off the ground into the air as he picked up his headset and put it on. The levitating cubes “walked” up to me, waved, and said, “Hey.” Immediately, before I even heard the voice, I recognized the person in those blocks as Aaron. I recognized through the posture and gait the spirit of Aaron in these three cubes moving through space. The resolution, or any shred of photorealism, was completely absent, but the humanity still showed through. And when his voice came out of them, my brain just totally accepted that the soul of Aaron now resides in these three floating cubes. Nothing was awkward about communicating back and forth. My brain just accepted it instantly.

And that’s where we get back to vectors. Understanding the future of photorealism in the metaverse depends on the speed and direction of progress in AI. In many ways, a photorealistic avatar is a kind of deepfake, and we know how computationally expensive their creation is today. How long will it be before the creation of deepfakes is cheap enough and fast enough that hundreds of millions of people can be creating and using them in real time? I suspect it will be a while.

Mmhmm’s blending of video and virtual works really well, using today’s technology. It’s ironic that in Meta’s video about the future, video is only shown on a screen in the virtual space rather than as an integral part of it. Meta could learn a lot from mmhmm.

On the other hand, creating a vast library of immersive 3D still images of amazing places into which either avatars or green-screened video images can be inserted seems much closer to realization. It’s still hard, but the problem is orders of magnitude smaller. The virtual spaces offered by Supernatural and other VR developers give an amazing taste of what’s possible here.

In this regard, an interesting sidenote came from a virtual session that we held earlier this year at the Social Science Foo Camp (an event put together annually by O’Reilly, Meta, and Sage) using the ENGAGE virtual media conferencing app. The group began their discussion in one of the default meeting spaces, but one of the attendees, Adam Flaherty, proposed that they have it in a more appropriate place. They moved to a beautifully rendered version of Oxford’s Bodleian Library, and attendees reported that the entire tenor of the conversation changed.

Two other areas worth thinking about:

Social media evolved from a platform for real-time interaction (real-time status updates, forums, conversations, and groups) to one that’s often dominated by stored-time interaction (posts, stories, reels, et al). Innovation in formats for stored-time communications is at the heart of future social media competition, as TikTok has so forcefully reminded Facebook. There’s a real opportunity for developers and influencers to pioneer new formats as the metaverse unfolds.
Bots are likely to play a big role in the metaverse, just as they do in today’s gaming environments. Will we be able to distinguish bots from humans? Chris Hecker’s indie game SpyParty, prototyped in 2009, made this a central feature of its game play, requiring two human players (one spy and one sniper) to find or evade each other among a party crowded with bots (what game developers call non-player characters or NPCs). Bots and deepfakes are already transforming our social experiences on the internet; expect this to happen on steroids in the metaverse. Some bots will be helpful, but others will be malevolent and disruptive. We will need to tell the difference.

The need for interoperability

There’s one thing that a focus on communications as the heart of the metaverse story reminds us: communication, above all, depends on interoperability. A balkanized metaverse in which a few big providers engage in a winner-takes-all competition to create the Meta- or Apple- or whatever-owned metaverse will take far longer to develop than one that allows developers to create great environments and experiences and connect them bit by bit with the innovations of others. It would be far better if the metaverse were an extension of the internet (“the network of networks”) rather than an attempt to replace it with a walled garden.

Some things that it would be great to have be interoperable:

Identity. We should be able to use the digital assets that represent who we are across platforms, apps, and places offered by different companies.
Sensors. Smartwatches, rings, and so forth are increasingly being used to collect physiological signals. This technology can be built into VR-specific headsets, but we would do better if it were easily shared between devices from different providers.
Places. (Yes, places are part of this after all.) Rather than having a single provider (say Meta) become the ur-repository of photorealistic 360-degree immersive spaces, it would be great to have an interoperability layer that allows their reuse.
Bot identification. Might NFTs end up becoming the basis for a nonrepudiable form of identity that must be produced by both humans and bots? (I suspect we can only force bots to identify themselves as such if we also require humans to do so.)

Foundations of the metaverse

You can continue this exercise by thinking about the metaverse as the combination of multiple technology trend vectors progressing at different speeds and coming from different directions, and pushing the overall vector forward (or backward) accordingly. No new technology is the product of a single vector.

So rather than settling on just “the metaverse is a communications medium,” think about the various technology vectors besides real-time communications that are coming together in the current moment. What news from the future might we be looking for?

Virtual Reality/Augmented Reality. Lighter and less obtrusive headsets. Advances in 3D video recording. Advances in sensors, including eye-tracking, expression recognition, physiological monitoring, even brain-control interfaces. Entrepreneurial innovations in the balance between AR and VR. (Why do we think of them as mutually exclusive rather than on a continuum?)
Social media. Innovations in connections between influencers and fans. How does stored time become more real time?
Gaming. Richer integration between games and communications. What’s the next Twitch + Discord?
AI. Not just deepfakes but the proliferation of AIs and bots as participants in social media and other communications. NPCs becoming a routine part of our online experience outside of gaming. Standards for identification of bots versus humans in online communities.
Cryptocurrencies and “Web3.” Does crypto/Web3 provide new business models for the metaverse? (BTW, I enjoyed the way that Neal Stephenson, in Reamde, had his character design the business model and money flows for his online game before he designed anything else. Many startups just try to get users and assume the business model will follow, but that has led us down the dead end of advertising and surveillance capitalism.)
Identity. Most of today’s identity systems are centralized in one way or another, with identity supplied by a trusted provider or verifier. Web3 proponents, however, are exploring a variety of systems for decentralized “self-sovereign identity,” including Vitalik Buterin’s “soulbound tokens.” The vulnerability of crypto systems to Sybil attacks in the absence of verifiable identity is driving a lot of innovation in the identity space. Molly White’s skeptical survey of these various initiatives is a great overview of the problem and the difficulties in overcoming it. Gordon Brander’s “Soulbinding Like A State,” a riff on Molly White’s post and James C. Scott’s Seeing Like A State, provides a further warning: “Scott’s framework reveals…that the dangers of legibility are not related to the sovereignty of an ID. There are many reasons self-sovereignty is valuable, but the function of a self-sovereign identity is still to make the bearer legible. What’s measured gets managed. What’s legible gets controlled.” As is often the case, no perfect solution will be found, but society will adopt an imperfect solution by making trade-offs that are odious to some, very profitable to others, and that the great mass of users will passively accept.

There’s a lot more we ought to be watching. I’d love your thoughts in the comments.

Post topics: Metaverse

Post tags: Research