ChatGPT: The magic inside

Recently, ChatGPT has grabbed the media's attention, increasing again the interest of the general public for AI. Incidentally, it is interesting to notice that AI never fails to make headlines whenever there are exciting breakthroughs, and yet irrespective of the media attention, experts quietly push the boundaries of the technology.

So let’s take a step back and really to try to understand what lies under the bonnet of ChatGPT;  what are the lesser known breakthroughs that have led us to the underlying GPT-3, and more generally what is inside the large language models (LLMs), from which GPT-3 has been implemented.

We know that ChatGPT can answer questions by producing text answers, in different formats: it is able to produce short sentences as well as long essays. But more technically, what does ChatGPT – and in more generally, what does an LLM – attempt to solve?

An LLM fundamentally attempts to produce a good continuation of the text that it’s been fed as input, where, by “good continuation”, we essentially mean “to fill in the blanks” as a human being would do with varying degrees of expertise in the field. The context is given by the current text.

With the risk of oversimplifying, we can draw a parallel to the following example:  Let’s say we have the text  “what a wonderful “.  As human beings, having red countless texts, and song lyrics, wemight instinctively think that “day” would be a good continuation;  “what a wonderful day”.  Clearly this oversimplifies what an LLM does, but the idea is there…

Now take this basic idea and extend at a very large scale: The experience (or training) for the LLM comes by having ingested a huge corpus of human written text (think of the full Wikipedia and all available text on internet). And the task is, starting from the text in hands, to calculate a list of the most probable words that could “fill in the blanks” so to say, based on the training examples.

To add some details, but again, without going too deep, LLMs do something like this, except that they don’t look at words, they look for things, or tokens, with similar meaning.

At each step LLMs get a list of tokens with probabilities and pick one from the list. The chosen token is not necessarily the one with the highest probability to occur:  It’s been empirically proven that it is better to add a random factor to pick from the shortlisted tokens, and in this way we obtain a more interesting output. Often the degree of randomness can be adapted to the desired output. The fact that there’s randomness here explains also the fact that the same input produces different outputs each time that a same request is run.

There are many interesting considerations that could be made at this point; For example, can this concept of randomness be made to resemble the concept of creativity?

Ultimately, LLMs are nothing but giant neural networks (more on this in an upcoming article).  As it stands today, GPT-3 is a network with around 175 billion weights. It is a neural network that is architecturally set up for dealing with language, and its architecture is called “a transformer”.

As it is often the case in the AI field, the breakthrough is mostly in the details:  In the detailed architecture, how deep and how large, and with which type of layers;  how to train and how to tune the parameters. Interestingly at the moment, there isn’t a scientific understanding or explanation on how to assemble an optimal neural network, nor the best way to execute its training.  There only exist empirical best practices: Certain architectures work best for some tasks; other architectures for other tasks. Nevertheless the field of AI is is very quickly progressing. What ChatGPT has achieved, and what remains to come, is full of opportunities.