Can AI Think?

16 min readJul 17, 2024

A thought on the recent advancements in Generative AI

Andi Sama — CIO, Sinergi Wahana Gemilang with Dr. Frederikus Fios — Associate Professor in Philosophy of Science, Binus University, and Dr. Hendi Sama — Assistant Professor in Information Systems, Universitas International Batam

Microsoft Co-Pilot generated the image in July 2024 (Dall-E 3 foundation model) with the prompt “Generate an image of a robot sitting on a park bench full of butterflies, thinking about various things on a beautiful evening. Make the robot from colorful, shiny, precious, large natural gemstones in flowing blue water.”

We have been witnessing some Artificial Intelligence (AI)-powered advancements: from classifying objects and generating descriptions of objects in an image to recommending the next film to watch by Netflix, paraphrasing sentences by Grammarly, driverless cars by Waymo and Tesla, delivering goods with a drone by Amazon Prime, a new shopping experience by Amazon Go, Know Your Customer (e-KYC), onboarding process in Banking through face recognition by digital banking providers, and many more. AI is getting smarter. Of course, the latest publicly accessible breakthrough was ChatGPT, which opened up AI capabilities by enabling people to talk in natural language, like humans.

Whether AI can think has been questioned for decades. The question was even raised in the mid-20th century when Alan Turing, the father of modern computer science, stated the question in his 1950 paper "Can machines think?" (Epstein R., 1992).

It has been nearly 75 years now.

Since the 2020s, the latest advancements in Generative AI, in which humans can converse with AI in natural language, have given the perception that AI is capable of thinking. Generative AI can generate new content that it has never seen during training. In traditional AI, the model can only predict based on the trained dataset.

Humans can view the thinking process philosophically. Nagwa (2024) summarizes the key points of philosophical thinking: It involves pursuing knowledge and searching for truth. Philosophy allows us to understand our values. Philosophical thinking involves evaluating, criticizing, and defending a particular society's values. That's what humans can do.

Do we observe that AI can think based on recent technological advancements? This Big Question has led to several debates, especially in the scientific communities.

Let's ask five known generative AI applications: Google Gemini, IBM Granite-code through Ollama, Meta.ai Llama, Microsoft Co-Pilot, and OpenAI chatGPT — whether they can think.

Google Gemini's response when asked, "Can you think?" The foundation model is Google Gemini.

Ollama's response when asked, "Can you think?" The foundation model is IBM Granite-code:20b-instruct, which runs locally on a laptop with an i7 Intel Processor and 16GB RAM without a GPU.

Meta Meta.ai's response when asked, "Can you think?" The foundation model is Meta Llama3.

Microsoft Co-Pilot's response when asked, "Can you think?" The foundation model is OpenAI GPT.

OpenAI ChatGPT's response when asked, "Can you think?" The foundation model is OpenAI GPT-4o, released in May 2024.

ChatGPT — the First Application considered to use a Large Language Model

The public release of the initial chatGPT in November 2022 has given the public early access to technology usually available only to a few selected people, typically working in or having access to high-tech and heavily funded R&D facilities.

Technology started as a tool to predict the following words to assist humans in regular activities. It started as an intelligent assistant, such as when we type the search keywords in Google Search Engine or construct a document using a word processor in Office365.

Researchers have been pushing the technology to the limit by bringing it to the next level at scale. Instead of just predicting the following words, the technology is moving forward to predict the following "many" words or multiple sentences even more to generate patterns it has never seen before. The advancements mark the era of Generative AI, differentiating it from the previous era of Traditional AI.

Traditional Programming. Given input and Process, the output is generated.

Traditional AI. Program "Process" is created through function approximation, given input and output datasets.

Generative AI. The program "Process" is created through function approximation, like in Traditional AI. As the datasets grow significantly, the model can generate new content it has never seen before.

Transformer Architecture in the Generative AI Era

It all started with the 2017 paper "Attention is All You Need." The paper, popularly known as Transformer, has recently triggered all the Generative AI advancements.

Many Generative AIs are multi-modal, generating texts, images, voices, and even videos. These advancements open new opportunities one may not have imagined, such as producing an entire Hollywood movie from the written text script with predefined scenes, flows, actors, and actresses.

Multi-modal LLMs

Many applications are emerging, from practical, just-for-fun applications to industry-grade applications. LLM extends to more than just text-to-text generation: large vision/video model (LVM), large code model (LCM), etc.

The core feature of LLMs is the capability to have natural conversations like humans, often supporting multi-modal and multiple languages.

AI assistants with its foundation models. Some of the published AI assistants with the foundation models are OpenAI chatGPT on GPT-4o, Microsoft Co-Pilot on OpenAI GPT-4 and DALL-E 3, Google Gemini on Gemini, IBM Watsonx on Granite and others, AWS Bedrock based on Titan, and Meta.AI based on Llama3.
Combining text prompts with a still photograph generates a short talking video (Akool.ai).
Generate ideas for product designs by giving some options while considering the constrained variables (Autodesk Product Design Suite).
Through a text prompt, you can ask Adobe Photoshop to remove and replace specific areas, such as an image background.
Code Assistance (as well as code generation) assists programmers in developing applications in multiple programming languages and migrating codes from one programming language to another. It supports up to 115 programming languages (IBM Granite-code).
Translate one language to another with voice support (Samsung Gauss and Google Gemini nano on Samsung S24 smartphone).
Text-to-video generation (OpenAI Sora) may be included in the next release of Adobe Premiere and the recently released Dream Machine by Luma.AI and Kling AI by Kuaishou Technology.
And many more to come.

Introduced by Alan Turing in 1950, the Turing test determines whether humans can tell whether the counterpart they are talking to is a human or a computer. A computer is said to pass the Turing test if the human can not differentiate whether she is talking to another human or a computer.

A few AIs claim to have passed or are close to passing the Turing test. ChatGPT latest version 4, for example (Matthew Jackson, 2024).

Considerations in Generative AI Development and Deployment

Generative AI is based on LLM, the Large Language Model, or the foundation model. A foundation model is a pre-trained model, the neural network trained with a vast dataset, most commonly trained with accessible data from the Internet.

At its fundamental, a neural network such as the LLM works by numbers and only with numbers. When we see a Generative AI application such as chatGPT that appears to be able to communicate through texts, it translates our input texts to numbers. Then, it translates numbers to text when predicting the following texts (words) to generate them based on the input given.

This mechanism applies to other modalities, like images, voices, and video. The texts, images, voices, and videos are first converted to numbers through a specific encoding mechanism before entering the model and converted back (from generated numbers) to the required modality by a decoding mechanism. The mechanism applies when training (creating the model) and inferencing (operating the model).

AI Development — The Training

The training process often involved thousands of graphics processing units (GPUs), which took several weeks to several months of training time. The massive infrastructure dedicated to the training process translated tenths to hundreds of millions of dollars in training costs. The required infrastructure cost to operate the trained model is usually just a fraction of the cost to train the model — depending on the model complexity, typically the model size and quantization.

For example, a neural network-based model with 70 billion parameters and 4-byte quantization per parameter will have a file size of about 280 billion Bytes, meaning 280 GB. The actual size varies as the model can contain certain information in its headers and when pruning (dropping some of its parameters) is done before finalizing the model. The 4-byte (32-bit) quantization can sometimes be reduced to 16 or 8 bits for optimization.

Augment and Retrain the Pre-trained Models with External Data

Training a foundation model from scratch is very expensive; even an enterprise that plans to adopt Generative AI may be unable to do so. Fortunately, there are some workarounds for this challenging situation. Given some contexts, one is augmenting the pre-trained model with an external dataset to add new knowledge to the model. The standard approaches are Retrieval-Augmented Generation (RAG) and Fine-Tuning.

Based on a foundation model, the pre-trained model, we can add new knowledge without any changes to the pre-trained model — the RAG approach. Ideally, RAG is a common approach if we have just a few hundred to a few thousand pairs of augmented datasets, at least in 2024, as model retraining with fine-tuning still requires access to many GPUs.

Another approach is to modify specific trainable parameters in the pre-trained model while leaving other parameters untouched. Ideally, this can be done when we have about 10k to 100k pairs (text prompt and text response) of new datasets to retrain the trainable parameters.

AI Deployment — The Inferencing

We can choose several options for AI deployment, inferencing the pre-trained models, or specifically for Generative AI, the LLMs. We can deploy to the cloud, locally in our data center, or for initial development, even run the LLM on a laptop.

Smaller LLMs with up to 20 billion parameters are generally called Small Language Models (SLM). SLM is a more compact version of LLM, offering a balance between capability and control for focused business applications. It is lightweight, easier to build, and computationally less intensive—it can be deployed on smartphones and edge devices. In addition to offering significant cost savings for deployment and reducing bias and hallucination, SLM is tailored for specific business tasks such as chat, analytics, and content generation.

Cloud is the first choice for most deployments as it provides convenience and a faster time to market. We choose the type of hardware infrastructure that we need (CPU, RAM, GPU) and the serving services to run our models (the serving services, with their scalable and secure application infrastructure, are mostly chosen automatically).

For on-premise deployment to our data center, we must provide at least a few high-end servers equipped with high-end GPUs for inferencing, meaning deploying the pre-trained models to production that can serve a certain number of concurrent sessions. First, 4 to 8 GPUs and a few hundred GB RAM are not uncommon, so the initial investment may need to be a few hundred thousand dollars. We also need to set up the application infrastructure to serve the models.

Alternatively, a high-end laptop with 32 to 64 GB RAM equipped with GPUs may be sufficient, especially for the initial exploration and testing.

One of the available models that has been optimized and quantized to as low as 4 bits is the newly introduced Meta Llama3–8b (released as open source in early 2024). Llama3–8b is the 8 billion parameters foundation model with 4-bit quantization with a file size of 4.7GB. We can run the model locally (do inferencing) with a high-end laptop, e.g., with an i7 processor and 16GB RAM, even without GPU. The performance with Ollama as the serving application for the model is still acceptable if only for the initial testing, producing a response to the prompt in a few tokens per second.

Ollama supports Windows or Linux operating systems.

Another example is the IBM Granite code, which was recently announced (May 2024). These pre-trained foundation models are available with as low as 3b to 34b (billion parameters). For example, the 20b and 34b parameters have 11GB and 19GB file sizes, respectively, and can be run locally on a high-end GPU-equipped laptop (such as Zyrex Maveric Ultra X laptop with an i9 Intel CPU, 64 GB CPU RAM, and NVidia RTX 4090 16GB GPU RAM). The performance is about a few tokens per second using Ollama as the serving application (both interactive by providing the text-based user interface and as a serving API application in which external applications can invoke the models through Ollama using REST API).

If we run the 20b or 34b Granite-code foundation models locally on a laptop without GPU support, the performance will be significantly degraded to more than 10 seconds per token. Significantly slower.

To get a few tokens per second acceptable performance on a laptop with an Intel i7 CPU with 16GB RAM, we can run the smaller size Granite-code 8b version, for example, instead of the larger size 20b or 34b.

We can run the following command to serve the granite-code 8b foundation model on the Windows operating system with Ollama. For the first time, if the foundation model is not available, it will automatically downloaded.

D:\ollama run granite-code:8b

Here is a sample of a conversation with an IBM Granite-code 8b foundation model that runs locally on top of Ollama on a Windows laptop with an Intel i7 processor and 16GB RAM without GPU. The performance is still acceptable for a single user at a few tokens per second.

D:\Andisama>ollama run granite-code:8b
>>> Hi, this is Andi Sama. Please introduce yourself.
Hi! I am a language model for code created by IBM to accelerate code
generative tasks and increase developer productivity.

>>> do you have a name instead of just "a language model"?
Yes, my name is Granite Code, a member of IBM's enterprise-focused family
of foundation models for code.

>>> what are you capabilities, and what are the area that you are the
... most capable of?
As a code language model, I am trained by IBM researchers for generative
artificial intelligence (AI) tasks that are ready for enterprise use.
My current capabilities are limited to the programming languages and
development tools supported by IBM watsonx Code Assistant. I am
specifically designed to accelerate development while maintaining
the principles of trust, security and compliance at their core.

I can help on a variety of fronts inclduing:
- Generate code from natural language requests such as "create an
object store bucket"
- Assist with difficult-to-write sections of code such as handling
errors or implementing logic
- Enable CI/CD processes by suggesting changes to infrastructure
definitions or generating deployment scripts
- Help with application modernization initiatives by recommending
microservices architectures
- And much more

>>> let's start with something. please show me a node.js code as
...a FaaS code to call an external REST API serving application to get
...a response from an LLM
Here is a sample Node.js code that demonstrates how to call an external
REST API using IBM Cloud Functions (FaaS) and retrieve a response
from the API:

```javascript
const request = require('request');

exports.main = function(params) {
  return new Promise((resolve, reject) => {
    const options = {
      method: 'GET',
      url: 'https://api.example.com/data'
    };

    request(options, (error, response, body) => {
      if (!error && response.statusCode == 200) {
        resolve({ message: 'Success', data: JSON.parse(body) });
      } else {
        reject({ error: 'Failed to retrieve data from API' });
      }
    });
  });
};
```

In this code, we are using the `request` module to make a GET request
to an external REST API (`https://api.example.com/data`).
The API call is wrapped inside a Promise, which allows us to handle the
response asynchronously. If the API call is successful (status code 200),
we resolve the Promise with a success message and the parsed JSON data
from the response body. Otherwise, we reject the Promise with an
error message indicating that the API call failed.

This code can be deployed as a FaaS action in IBM Cloud Functions
using the IBM Cloud CLI or IBM Cloud dashboard. When you invoke
this function, it will make a GET request to the specified REST API
and return the response data as the output of the function.

>>> thanks for the code and the explanation, it will surely help me
... in my journey as a developer in the modern world
Of course! You're welcome. As you continue your journey as a developer,
I hope this example helps you understand how to use Node.js and
IBM Cloud Functions to call external APIs and retrieve data.
Keep up the good work and happy coding!

>>> Send a message (/? for help)

Similarly, if we have the sufficient hardware infrastructure (laptop or server with GPU 16GB or 32GB) we can run the following command to serve the granite-code 34b foundation model on the Linux operating system with Ollama.

$ollama run granite-code:34b

Alternatively, the hugging face transformers serving application can be tested using Python programming in Anaconda's Jupyter Notebook environment. The serving application supports many LLMs, including Meta Llama and IBM Granite.

Conclusion and The Future to Come

The early 2020s have been an exciting year for the development of Generative AI. At the same time, 2024 is an exciting year to start adopting the capability of Generative AI to integrate into enterprise business workflows.

Yes, integrating with enterprise business applications is possible through several approaches: RAG and Fine-Tuning.

RAG (Retrieval Augmented Generation) leverages LLM's capability with external data without modifying the foundation model. It typically does not need a GPU. RAG is a shorter-term approach.
Fine-tuning permanently augments the foundation model by retraining it (thus modifying the pre-trained model) with external data (significantly more expensive as the training process requires GPU). Fine-tuning is a long-term approach.

We will see more exciting advancements in 2025 and beyond for better, more advanced, intelligent multi-modal foundation models. There will still be large general-purpose models with hundreds of billions of parameters. However, there will also be some more specific models focusing on specific tasks and domains, resulting in smaller file sizes and more cost-effective inferencing.

The MoE (Mixture of Experts) approach, like the Mistral 8x7b foundation model, for example (8 sub-foundation models, each with 7 billion parameters), has several mixes of sub-models, each focusing on specific purposes, rather than a large model with trillion parameters that try to focus on everything like OpenAI GPT-4o and Google Gemini Ultra.

Deepmind (Meredith Ringel Morris, et al., November 2023) proposed six levels (levels 0 to 5) of autonomy. From level 0 (No AI), humans do everything, all the way to level 5 (ASI — Artificial Super Intelligence), fully autonomous AI. We are now at level 3 (Emerging AGI — Emerging Artificial General Intelligence).

To attempt to answer our original question of "Can AI think?", we see now that the current Generative AI, at the very basic, is just the text prediction machine that can predict the following words at scale, given that as the AI has been trained with a vast dataset, commonly the whole accessible data from the Internet.

The AI seems to think while only predicting the following words at scale. The “words” in LLMs can be any modal: text, image, voice, video, etc. That’s why LLMs are now mostly multi-modals.

Mistral’s Mixture of Expert (MoE) foundation model, when asked, “Can you think?”. Mistral is the 7 billion parameters model that is said to surpass the capability of Meta’s Llama 2 13b on all benchmarks.

Yann LeCun, a Professor at New York University and the Chief AI Scientist at Meta, recently posted on the Facebook social media platform (May 2024) that "existing AI has some levels of understanding but is far from being able to think like humans."

"Can AI Think?" Prof. Yann LeCun, May 2024.

Professor Li Fei Fei of Stanford University envisions the path to Spatial Intelligence, where AI can understand the environment three-dimensionally (Li Fei Fei, 2024). Integrating a well-known LLM such as ChatGPT into a human-like robot, Ameca (Bloomberg Live, 2024), for example, may open up the physical realization of an AI as a human companion or assistant.

In the early years of AI about ten years back, Prof. Fei Fei asked Andrej Karpathy, one of her former doctoral students, when he presented his research on providing a text description on an image, "Can you do the reverse?" meaning generate an image from the text description. Andrej, who later became the co-founder of OpenAI and the senior director of AI at Tesla, said at that time, "It is impossible." And now, we see that LLMs can do that and even more. The LLM is more than just generating text. It generates images, voices, videos, and many more.

Furthermore, recent advancements in neuromorphic computing (Charles Q. Choi, 2023; Fabrizio Ottati, 2023) and biological organoid computing (Skye Jacobs, 2024) may illuminate the alternative future of computers that can think. This may also be combined with the advancement of quantum research, computing with quantum bits (qubits) instead of bits (Andi Sama, 2023b). Quantum Machine Learning (QML) is an example of the hybrid approach to Machine Learning (Andi Sama, Cahyati S. Sangaji, Agung Trisetyarso, 2020), the combination of the classical world (bit-based) and the quantum world (qubit-based).

To conclude, Dr. Fios, an Associate Professor in Philosophy of Science at Binus University, states, "AI is incapable of thinking independently like humans. Humans have consciousness, freedom, autonomy, empathy, love, moral responsibility, and ethics. AI does not have those human traits. Through models, AI creates narrative based on predefined data and patterns" (Rm. Siprianus Send, 2024).

Well, you may have a different opinion. Do you think AI can think?

References

Andi Sama, 2024, "Talk to Your Data by Leveraging Generative AI," https://andisama.medium.com/talk-to-your-data-by-leveraging-generative-ai-2950fbdb28e5.
Andi Sama, 2023, "Generative AI — The Exciting Technology Leapfrog in 2023 and Promising Years Ahead," https://andisama.medium.com/generative-ai-the-exciting-technology-leapfrog-in-2023-and-promising-years-ahead-fa9505099169.
Andi Sama, 2023b, "My Journey to the Exciting World of Quantum Computing — A Novice's Perspective," https://andisama.medium.com/my-journey-to-the-exciting-world-of-quantum-computing-a-novices-perspective-c6d66e9e997e
Andrew Widjaja, Andi Sama, 2023, "[Generative-AI] Experimenting with GPT-4 and Dall-E 3," https://andisama.medium.com/generative-ai-experimenting-with-gpt-4-and-dall-e-3-6beada36323e.
Andi Sama, Cahyati S. Sangaji, Agung Trisetyarso, 2020, "Hello Tomorrow — I am a Hybrid Quantum Machine Learning," https://medium.com/@andisama/hello-tomorrow-i-am-a-hybrid-qml-b70751e36142.
Bloomberg Live, 2024, "EA's Ameca, the AI Powered Robot, Discusses the Future," https://www.youtube.com/watch?v=xD7hAbBJst8&t=15s.
Charles Q. Choi, 2023, "IBM Debuts Brain-Inspired Chip For Speedy, Efficient AI NorthPole is the top, says its maker," https://spectrum.ieee.org/neuromorphic-computing-ibm-northpole.
Epstein R., 1992, "Can Machine Thinks?" https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/download/993/911.
Fabrizio Ottati, 2023, "TrueNorth: A Deep Dive Into IBM's Neuromorphic Chip Design," https://open-neuromorphic.org/blog/truenorth-deep-dive-ibm-neuromorphic-chip-design/.
Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, Ajmal Mian, 2024, "A Comprehensive Overview of Large Language Models," https://ar5iv.labs.arxiv.org/html/2307.06435.
IBM, 2024, "Methods for tuning foundation models," https://www.ibm.com/docs/en/cloud-paks/cp-data/4.8.x?topic=studio-methods-tuning.
Li Fei Fei, 2024, "With Spatial Intelligence, AI Will Understand the Real World," https://www.youtube.com/watch?v=y8NtMZ7VGmU&t=77s.
Matthew Jackson, 2024, "Study finds ChatGPT's latest bot behaves like humans, only better," https://humsci.stanford.edu/feature/study-finds-chatgpts-latest-bot-behaves-humans-only-better.
Meredith Ringel Morris et al., November 2023, "Levels of AGI for Operationalizing Progress on the Path to AGI," https://arxiv.org/abs/2311.02462.
Nagwa, 2024, "Lesson Explainer: Benefits of Philosophical Thinking," https://www.nagwa.com/en/explainers/473164169802.
OpenAI, 2024, "Creating video from text," https://openai.com/sora.
Rm. Siprianus Send, 2024, “Konstruksi dan validasi pengetahuan AI”, https://ffunwirakupang.ac.id/dr-frederikus-fios-konstruksi-dan-validasi-pengetahuan-ai/
Skye Jacobs, 2024, "Researchers hope that the world's first 'living processor' will lead to energy-efficient computing power," https://www.techspot.com/news/103224-researchers-hope-world-first-living-processor-lead-energy.html