[Generative-AI] Experimenting with GPT-4 and Dall-E 3

Andi Sama
10 min readNov 19, 2023

--

Is it even possible for humanity to create Artificial General Intelligence (AGI), and if so, are we on the right track?

Andrew WidjajaSinergi Wahana Gemilang with Andi Sama

Image generated by DALL-E 3, Nov 16, 2023. Prompt: “A stunning splash of a water drop in high resolution. Put the text “SWG” blended in the image. The image is composed of water drops with a mix of blue, white and orange colors”.

By now, most of us would have been exposed to chatGPT somehow, especially since the public release of chatGPT version 3.5 in November 2022, which has been provided for free by OpenAI. The simple user interface has accelerated advancements in this area of Generative Pre-Trained (GPT) Large-Language Model (LLM) in which we, as humans, can ‘communicate’ with the machine (the AI — Artificial Intelligence) in a natural way as if we are communicating with another human.

The following screenshot shows the typical web-based user interface for the chatGPT-3.5 application based on GPT-3.5.

A typical chatGPT-3.5 User Interface. It was accessed on Nov 7, 2023.

The conversation below shows chatGPT-3.5’s response when using the following prompt: “You are an expert international chef. Please create a creative recipe for a fine dining dessert for the recently married couple spending a weekend in a 5-star resort hotel. Limit the response to no more than 100 words. Present in table format if suitable.

“Chatting” with the free chatGPT-3.5. It was accessed on Nov 7, 2023.

Sounds promising as a “human” partner to replace Google search? Well, that’s something that it can do. Most of the time, it can reply with a compelling response, and we tend to believe the response, although, to some extent, it can hallucinate. Yes, hallucinate. The response can sound convincing; however, it may not be the correct answer when we do the fact check.

We can make follow-up conversations with chatGPT to ask it to elaborate further on the previous responses, creating the chained questions and answers — the chained requests and responses.

As a large language model with more than 175 billion hyper-parameters in its neural network model, chatGPT-3.5 has been trained with a huge dataset (until January 2022) to “complete” the sentence we send to the model. The free version of chatGPT-3.5 has a limitation of 4096 tokens.

In contrast, the current paid version of GPT-4 (accessed through its APIs — Application Programming Interface) extends this to 128 thousand tokens along with faster response. GPT-4 was said (Matthias Bastian, 2023) to have 1.76 trillion hyper-parameters consisting of 8 sub-models with 220 billions each.

chatGPT is the application with a provided user interface utilizing the GPT LLM model. It can be based on the free version of GPT 3.5 (GPT-3.5), GPT-3.5-turbo, or the paid version of GPT-4 (GPT-4 and the latest GPT-4-turbo).

A quick prompt to chatGPT asking for its limitations reveals the following information.

I’m a computer program, so I don’t have personal limitations like a human. However, there are some things to keep in mind when interacting with me:

- Lack of Personal Experience: I don’t have feelings, consciousness, or personal experiences. My responses are generated based on patterns in data I’ve been trained on.

- No Real-Time Information: My knowledge is based on information available up until my last training cut-off in January 2022. I’m not aware of events or information that have come out after that date.

- Text-Based Interaction: I can’t interact with the physical world, see, hear, or understand audio or images. I respond exclusively through text.

- Accuracy and Verification: While I strive for accuracy, I can still generate incorrect or misleading information. Always double-check critical information from trusted sources.

- Security and Privacy: Don’t share sensitive personal information with me, as I don’t have the capability to guarantee confidentiality.

- Limit on Output Length: I can generate responses of varying lengths, but there are practical limits to the length of a single response.

- Not a Replacement for Professional Advice: While I can provide information and suggestions, I am not a substitute for professional advice in fields like medicine, law, finance, etc. Always consult a qualified expert for such matters.

Remember, my responses are based on patterns in data and should be used with discernment and common sense. If in doubt, it’s always best to seek advice from a qualified human expert.

Being able to master building the prompts to suit various vertical industry use cases, hence “communicating with the LLM in the optimized way to produce the best possible result,” may land you an excellent job in the USA with about USD 300K annually (gross). Initially, you can start with the documentation from OpenAI (OpenAI, 2023c). For those who want to pursue further, the prompt marketplace for LLM may be an excellent site as a reference (PromptBase, 2023).

A Quick Look at Artificial Intelligence

LLM, as we know, is the advancement from Deep Learning (DL), which is a subset of Machine Learning (ML) and also a subset of AI within the Computer Science (CS) discipline.

Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL).

The AI model is a trained machine learning or deep learning model. An AI model is typically trained using a dataset from a specific domain. Therefore, given new input data, the AI model can predict the output based on its trained dataset. Doing the prediction is doing the inference, meaning running the AI model in the production environment.

Training and Inferencing an AI model

The LLM training facility is only reachable for a select few researchers. GPT-3.5 was said to be trained on the massive infrastructure with thousands of high-end GPU (Graphics Processing Units) hardware in a distributed and scalable infrastructure (including CPU — Central Processing Unit, distributed storage, and software frameworks), usually a cloud-based platform. Once the model is created, doing the inference is also a big challenge, although it does not require the infrastructure as huge as when it was trained.

Machine Learning model development and deployment.

Luckily, we do not always need to train the base AI model from scratch. Transfer learning and fine-tuning from a base AI model can help. We can download the base AI model from a repository such as Hugging Face (Hugging Face, 2023) and then retrain the model by introducing our specific dataset based on the downloaded base AI model.

The LLM model is the trained AI model using datasets from multiple domains. As it has been trained with massive multi-domain datasets, it can “complete” the sentences given to it up to a certain number of tokens, now thousands of tokens even for the free version of chatGPT-3.5 utilizing the GPT-3.5 model.

Experimenting with GPT-4 and Dall-E 3

The latest OpenAI model for LLM is GPT-4 (for text-to-text natural language completion). The model for text-to-image generation is DALL-E 3. Similar two conversations mentioned before with chatGPT-3.5 are illustrated below. This time, we are using the paid version, GPT-4.

GPT-4 in action. “Chatting” with the paid version of chatGPT. It was accessed on Nov 9, 2023.
GPT-4 in action. “Chatting” with the paid version of chatGPT. It was accessed on Nov 9, 2023.

Designing an Interactive App with GPT-4 and DALL-E 3

GPT-4 and DALL-E 3 models run on the OpenAI cloud. Registered users with authorized access can access the APIs directly using REST-API tools like Postman.

We will now discuss building an app to access GPT-4 and DALL-E 3 through OpenAI’s API. The first example of text-to-image generation with DALL-E 3 is shown below. GPT-4 and DALL-E 3 models run on the OpenAI platform and are accessed as APIs.

Dall-E 3 in action — now using Bahasa Indonesia (Indonesian language). Prompt: “seorang guru wanita menjelaskan perkembangan teknologi modern kepada dua orang siswa pria di bawah pohon rindang dan hujan salju”. Generated on Nov 7, 2023.

We leverage Telegram to use GPT-4 and DALL-E 3 models in this experiment. The workflow illustrates how Telegram communicates to the OpenAI Platform through a Telegram bot and a custom service (written in Python) running on Google Cloud Run, a managed compute platform that can run stateless containers. The user sends a message to a bot through the Telegram client app.

The flow of requests and responses to generate responses from GPT-4 or DALL-E 3.

Using the webhook mechanism, the custom service will get updates from the Telegram platform. The custom service will invoke the appropriate model on the OpenAI platform based on the retrieved message. Invoking GPT if it is just a text, or DALL-E instead if the retrieved message started with “generate” or “/generate.”

The custom service runs on Google Cloud Run and is configured using CPU only and allocated during request processing. This means the custom service will be temporarily activated only when Telegram calls it and starts processing the retrieved data. If no applications call, it will shut down automatically — releasing the consumed resources.

DALL-E 3, More Examples

More examples of text-to-image generation with different prompts are shown below.

Dall-E 3 in action. Prompt: “Artificial intelligence in metaverse world.” It was generated on Nov 7, 2023.
Dall-E 3 in action. Prompt: “Students learn artificial intelligence and metaverse on campus wearing funny hats during bright and clear weather.” It was generated on Nov 7, 2023.
swgemilang’s bot running on the Telegram client generates the image by interacting with GPT-4 and Dall-E APIs. It was generated on Nov 7, 2023.
Dall-E 3 in action. Prompt: “A flying Wonder Woman superhero is fighting a creature with a robot face and a green-hulk body holding a thor hammer among ancient castles in the afternoon with a lot of lighting and small funny creatures.” It was generated on Nov 7, 2023.
Dall-E 3 in action. Prompt: “A flying car moves towards a mountain on the multiverse with several very beautiful flying female angels with white wings and colorful high-class gowns.” It was generated on Nov 7, 2023.

Code Generation

GPT-3.5 and GPT-4 can generate code (Mukund Kapoor, 2023). By the following prompt “create a [C] language script to parse [csv] and extract [generative, ai, is, an, exciting, development] with the following requirement [create in a function callable from main()],” it can generate the code as follows.

Code generations with GPT-3.5 and GPT-4.

This generated code can serve as a template for us to work on. Before using the code, we should verify whether it generates the correct logic as intended.

Towards the Future

Current advancements are the cumulation of years of R&D. The research evolution started about a decade ago when Alexnet was introduced as the promising and functioning deep-learning-based neural network supported by the ImageNet dataset.

Generative AI models have been made possible since the invention of the Generative Adversary Network (GAN) in 2014 and a breakthrough paper released in 2017 called Transformer the “Attention is All You Need.” Since then, various generative models have been introduced, including the first known democratized models by OpenAI: LLM-based GPT-4 and DALL-E 3.

The free version of GPT-3.5 has been trained with the dataset up to January 2022, while the dataset used for training the paid version of GPT-4 was as recent as April 2023.

There is another open-source LLM by Meta (formerly Facebook) called LLAMA. Llama2 model (e.g., with its 13 billion hyper-parameters with a file size of about 7.4GB) can be downloaded from a repository and run locally (Adam Conway, 2023) on the local Mac or Windows machine. The model will run faster if the local computer is equipped with GPUs.

The illustration of an LLM called llama2 by Meta (formerly Facebook) running locally on a laptop with an Intel i7 processor and 16GB RAM. With the loaded 13 billion llama2 model, the output generation was slow, about 0.5–2 tokens per second.

The previous example on code generation with the same prompt, if executed using the llama2–13b model, produced the following output. Again, we should verify the correctness of the generated code.

Code generations with llama2–13b.

The recent advancements in quantum computing (Andi Sama, 2023a) may also open up a new way to train the AI model. Quantum Machine Learning (QML) has been initially explored for transfer learning (Andi Sama, 2020), in which the AI model (deep-learning based) was first trained classically. Some parts of the neural network layers (usually the last few layers) were modified using the quantum approach, combining classical and real quantum computers (classical-quantum hybrid approach). We may need to wait for ten more years or so until a universal quantum computer has enough quantum bits (qubits) to perform useful processing (Andi Sama, 2022).

The Last Decade in AI History, The Advancements and Beyond.
The history of Generative-AI (the model names) from 2019 until 2023 (Gilles Legoux, 2023).

Further information on the advancements of technology can be found in SWG Insight digital magazines (SWG, 2023). Released in 2011, SWG Insight is Sinergi Wahana Gemilang (SWG)’s quarterly magazine discussing various technology advancements such as Bigdata, Security, the Internet of Things (IoT), Artificial Intelligence (AI), Blockchain, Metaverse, and Quantum Computing.

Quarterly SWG Insight Digital Magazine, published since 2011.

References

--

--