ChatGPT: The journey to the actual Artificial Intelligence

Artificial Intelligence has seen an incredible boom in recent years. From simple decision-making algorithms to complex, human-like intelligence, AI has evolved to become an integral part of our lives. In this essay, we will discuss the evolution of AI from neural networks to deep learning to practical and accessible use cases such as ChatGPT.

If we trace back the journey that has brought us to the current state of AI, a logical place to start would take us back to the concept of machine learning with neural networks.  Neural networks are computing systems that are inspired by the biological structure of the human brain. They consist of layers of interconnected nodes (often referred to as neurons or perceptrons) that process information and generate outputs. Neural networks are designed to learn from examples and improve their performance with training, by a reinforcing mechanism that defines the connections, or weights, between the nodes. The first neural network was developed in the 1940s and it was improved later with the idea of the perceptron. It was a simple model that could perform logical operations, but it was not until the 1980s that neural networks became widely used for real-world applications, they were well-suited for some types of classification problems, such as hand-written text recognition and image classifications.

In time, the concept evolved, along three main directions: we developed more efficient ways to train the networks, we understood how to assemble more complex architectures of nodes, understanding that certain types of architectures are more suited for certain types of tasks, and lastly, all was supported by advancement in hardware technology. This brought us to the concept of Deep Learning. Deep learning is a subfield of machine learning that uses neural networks with multiple hidden layers (of nodes) to model complex patterns in data. Deep learning algorithms can learn representations of data, which makes them ideal for tasks like image recognition, natural language processing, and speech recognition. We could argue that deep learning is just a complicated neural network, but, as already mentioned in the previous article of this series, a big part of the advancement in this field is exactly that: we now understand better how to assemble and train more complicated networks.

To further evolve on deep learning we could probably mention the use of transformers. Transformers are a type of deep learning architecture that was introduced in 2017. Transformers are different from traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs) because they use a self-attention mechanism to process input data, which allows them to capture long-range dependencies and improve performance on tasks like language modeling and machine translation.

Transformers are a type of deep learning architecture that has revolutionised natural language processing (NLP) tasks such as language modeling and machine translation. 


The self-attention mechanism in Transformers allows the model to weigh the importance of each word in the input sequence when generating the output sequence. This is achieved by computing an attention score for each word in the input sequence with respect to all other words in the sequence. The attention scores are then used to compute a weighted sum of the input sequence, which is then used to generate the output sequence. By doing this, the model can focus on the most relevant parts of the input sequence and generate more accurate and coherent output.

Pretraining is another important technique which is now widely used.  Pretraining is a technique used in deep learning that involves training a model on a large amount of data before fine-tuning it for a specific task. Pretraining helps models learn general representations of data that can be transferred to other tasks.

Finally, this brings us to the recently mediatised and publicly accessible use-case of ChatGPT where the focus is on Language Models: Language models are deep learning models that are trained on large amounts of text data to learn the statistical structure of language. They are used for tasks like text generation, machine translation, and sentiment analysis. The most popular language models are based on the transformer architecture and have achieved impressive performance on various language tasks.

ChatGPT is a large language model (LLM) developed by OpenAI that uses the GPT architecture and a large amount of pretraining data to generate human-like responses to text-based conversational inputs. ChatGPT is capable of generating coherent and contextually relevant responses to a wide range of conversational topics.

The advancements in computing power and data availability have enabled the development of increasingly complex models that can perform a wide range of tasks. One interesting fact to notice is the acceleration at which the new models and improvements are coming: it took us almost 30 years from the first neural net, to have something useful outside the academia, and now, it appears that every few months we are reaching interesting new results.


While there are still many challenges to overcome, such as bias and data privacy, which we will discuss in an upcoming article in this AI series, the future of AI is surely exciting. AI is on track to take a defining and permanent part of our lives;  It will be on us to understand how to make the best ethical use of it.