Infusing Intelligence — AI, The Integral Building Block of the Metaverse

In Summary
- Transformer-based algorithms are part of recent developments in deep learning that support advancements in the area of Language Modeling (e.g. NLP, Natural Language Processing) as well as Image Processing (with Vision Transformer).
- Semantic Segmentation, Instance Segmentation, and Sequence Modeling provide the intelligence behind the Metaverse.

Machine Learning and Deep Learning

With AI, humans try to mimic how humans predict things by building an AI model trained on a particular dataset.

Machine Learning

Traditional Machine Learning (ML) builds a Machine Learning model by finding the relationship between the input and the target output dataset (function approximation). If we have a set of input and output datasets (target labels), this is Supervised Learning. For instance, the example applications are regression (predict continuous variables) or classification use-cases (predict discrete variables).

Deep Learning

With the advancements in technologies in the early 2010s (Algorithms, ImageNet dataset, Bigdata, and Graphics Processing Units — GPU), the Deep Learning approach has gained popularity. Deep Learning learns directly from the dataset and stores (after training) the relationship between a set of input data and a set of output data in its deep neural network structure.

Image Processing in Metaverse

Understanding the semantic meaning in an image (or stream of images, e.g., video) is one of the advancements in deep learning that contributes to Metaverse development.

A snapshot shows an avatar at the Samsung 837X building in the Decentraland Metaverse.
Semantic segmentation can classify each pixel into a specific label, e.g., people, building, land, door, etc. Semantic segmentation by (Meta, 2022).
First Example: Our image is given as input to the image similarity algorithm.
Given the input image, do an image similarity search (Meta, 2022).
Second Example: Our image is given as input to the image similarity algorithm.
Do an image similarity search (Meta, 2022).
Given the input image, select part of the image, then do a patch similarity search for only that part of the image (Meta, 2022).

Sequence Modeling in Metaverse

Imagine if we can speak in a language and the persons we talk to can understand it in their languages, in real-time. The universal translator, the advancement in Natural Language Processing (NLP) that we have seen to some extent so far in reality (not just in Metaverse), and will be expected to be better in the coming years (Meta AI, 2022).

“Universal Language Translator” enables cross-nation communications. Everyone speaks their language, and the other party hears and responds with their local language, including Bahasa Indonesia. In the future, Smart AR glasses or Smart contact lenses may serve as the communication medium (Meta AI, 2022).
A two-person interaction having a discussion on project progress in the field. The reviewer uses Smart AR glasses to inspect a document written in a foreign language he does not understand directly. The Smart AR glasses facilitate real-time translation for understanding the document (Meta AI, 2022).

AIoT — AI and IoT

We live in a modern world. The invention of the smartphone enables better communication and allows humans to enjoy better living in society.

Various AR/VR Headsets, including a wrist tracker. Image source: Google Image Search.
Personal VR Omni-Directional Treadmill (KATVR, 2022).
bHaptics: Haptic Glove — (will be available in Q4 2022).

AI and Metaverse

Many technologies support the Metaverse ecosystem (which becomes the business and investment opportunities). Artificial Intelligence is one of them.

  • evojax (February 2022): a “library for hardware-accelerated neuroevolution.”
  • Gradients without Backpropagation (February 2022)— kind of a strange idea as we have been doing backpropagation for some time. However, human brains seem to communicate only in forward mode.
  • Hierarchical Perceiver (February 2022) — a “new version of the Perceiver, which was a Transformer-based approach that could be applied to arbitrary modalities as long sequences (up to 100k!) of tokens: vision, language, audio-visual tasks.”
  • MuZero (February 2022 — Deepmind): for “video compression.”
  • PyTorch’s released TorchRec (February 2022 — Meta): a “domain library built to provide common sparsity & parallelism primitives needed for large-scale recommender systems.

References

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store