Generative AI — The Exciting Technology Leapfrog in 2023 and Promising Years Ahead

Andi Sama
6 min readDec 29, 2023

Is 2023 the Year of Emerging Artificial General Intelligence? If so, are we on the right track to AGI and ASI?

Motion generation with Leonardo.ai based on DreamShaper v7 fine-tuned model. Prompt “colorful smoke moving around, camera zoom in.” by LordGarmadon.

Andi Sama — CIO, Sinergi Wahana Gemilang

In 2023, the term “Generative AI” has taken the majority of the technology news in the world, as for the first time, the high-valued Artificial Intelligence (AI) research that used to be put behind the closed doors accessible only by the giant players in the top-funded high-tech billion-dollars USD labs was started to be accessible to the general public. Thanks to OpenAI, a Microsoft-funded research organization, that released chatGPT for free in November 2022 — the text-based chat with AI application, in which we, as human beings, can communicate with the machine using natural language, just like we communicate with another human being.

Opening the Pandora box — chat-GPT

chatGPT is based on a foundation model, GPT (Generative Pre-Trained Transformer). At the time of launch in Nov 2022, chatGPT was based on GPT-3.5 (free version), which was later upgraded to GPT-4 (paid version). A foundation model is a neural network model (primarily based on the transformer architecture introduced in 2017) trained on a vast amount of data, usually all accessible data from the Internet, as the starting point. GPT is a type of Natural Language Processing (NLP) deep learning architecture trained on vast amounts of data, therefore the generic name, the Large Language Model (LLM).

The release of chatGPT by OpenAI has triggered others to release similar foundation models and their applications. The following list is as follows and keeps growing. The released LLM foundation models have been trained on different sets of datasets. However, they share the same principle that they are trained on vast amounts of data.

  • Google (foundation model: Gemini). The model size starts from 1.8-3.23 billion parameters targeting the next Google Pixels. Larger model sizes are also available, e.g., accessed through Google vertex.ai.
  • IBM (foundation model: Granite). The model size is 20 billion parameters. One of the applications that uses granite is Watsonx.ai.
  • Meta/Facebook (foundation model: Llama2). 7 to 70 billion parameters. Unlike others, these models are open-sourced. We can run the llama2–13b (13 billion parameters version) on a Windows-based laptop with an Intel i7 processor and 16GB RAM (with no GPU), although the performance is relatively slow at only about 0.5 to 2 tokens per second.
  • Microsoft (foundation model: GPT-4). It was based on GPT-4 as Microsoft invested heavily in OpenAI.
  • OpenAI (foundation model: GPT-4). It was said to have 8x220 billion parameters, which equals 1.7 trillion parameters.
  • Samsung (foundation model: Gauss). The next Samsung flagship smartphone model, S24, is expected to utilize this foundation model in Q1 2024.

The Rise of Generative AI

The essential ingredients to build LLMs are massive datasets, large neural network architecture, and vast computing power. While we saw 2023 as the year of advancements of the fine-tuned LLMs and prompt engineering (more specialized LLMs built from a foundation model with different approaches) — due to enormous resource requirements, only a handful of companies in the world can build the foundation models.

While various neural network architectures have been in discussion and experimented long before 2010, the availability of big data in 2012 (ImageNet dataset) and vast computation capability powered by GPU (Graphic Processing Unit) — especially NVidia, really kicked off the advancements in AI research.

Acquiring the datasets is one thing. Not to mention storing and processing the vast acquired dataset to train a foundation model. The neural network architecture used to be derived from the transformer architecture. The real challenge is to train the billions or even trillions of neural network parameters. Here comes the need for computing power, equipped with thousands of high-end GPUs running in parallel, to do the training. Not to mention the power requirement, CPU (Central Processing Unit), I/O (Input/Output), and storage.

Multi-Modal Foundation Models

In addition to LLM, which “just” generates text, the foundation model with transformer-based architecture can generate images, voice, code, and video. This is a multi-modal foundation model.

Multi-modal means the model will be able to generate more than text. In addition to generating text, the model can create images, voice, code, and video (and who knows, others in the future).

chatGPT, for example. It can only generate text (including generating code). However, chatGPT Plus or the APIs can do more than generate text. The OpenAI’s DALL-E 3, which can generate text (from the text prompt), is available as a paid service. There is also an OpenAI whisper API that can generate text from voice.

Microsoft Bing acts like chatGPT. The Microsoft Bing Image Creator, MidJourney, and Leonardo.ai are the available applications that can generate images by giving a text prompt, for example.

Image generation with Leonardo.ai, generated on December 27, 2023 based on AlbedoBase XL fine-tuned model. Prompt” “(Ultra Long Exposure Photography)) high quality, highly detailed, Colorful beautiful young woman like Jennifer Lopez silhouette neon dots, beautiful silhouette, Electronic devices such as a very light gray PC in the background, by asama inspired by yukisakura, high detailed,”
The image was generated by DALL-E 3 API on Dec 24, 2023. Prompt: A huge golden dragon with the text “AI” in 3D style with white, blue, and orange colors. The text has a reflection on splashing water — a photo-realistic image.

Similarly to others, Samsung Gauss is a multi-modal foundation model capable of generating text, images, and code.

What’s Next

2024 and beyond will be exciting years as more multi-modal foundation models will be built and released. It has been starting in 2023, and we shall see more advanced models soon to come. Imagine the possible use cases for different types of business applications, for example.

The Path from ANI to AGI and ASI

While we see a lot of improvement in 2023, we may still need some time to reach Artificial General Intelligence (AGI). There have been several definitions of AGI, though. We are now at Artificial Narrow Intelligence (ANI), aiming to AGI.

(Meredith Ringel Morris, et al., 2023) Deepmind proposes six levels (levels 0 to 5) of autonomy, as in the following table. From level 0 (No AI), humans do everything, all the way to level 5 (ASI — Artificial Super Intelligence), which is fully autonomous AI. We are now at level 3 (Emerging AGI — Emerging Artificial General Intelligence).

Mike Wooldridge from the Royal Institutes (Mike Wooldridge, 2023) argues that we may be able to reach the lowest level of general intelligence as listed below (“Augmented LLMs” and possibly the third “Machines that can do any language-based tasks a human can do.”). Machines are far from developing consciousness like humans. The current significant improvement is still in language-based tasks. Robotics with AI still needs to be able to simulate the capability of humans, e.g., handling a dishwasher, just like almost any human can do.

Mike gives four levels of AGI under the title of “Varieties of General Intelligence” as follows:

  • Machines that can do anything a human can do.
  • Machines that can do any cognitive tasks a human can do.
  • Machines that can do any language-based tasks a human can do.
  • Augmented LLMs.

Happy New Year 2024.

References

--

--