How Modern AI Works, from NLP to ML and LLMs

This is a continuation of what has now become a short series. It bothers me that there are still so many misunderstandings surrounding AI. So here’s a quick peek inside the “black box” that modern AI seems to be for many people.

As it turned out, this became a four-part series. You are now reading part three. If you’d like, you can find the other parts here:

And now, part three, the final part. For now.

Now we arrive at the point where the most well-known AI systems currently exist. Keep in mind, most AI used in practice today is not this kind of AI at all, and there is a good reason for that.

Yet this is also the part that many people find the hardest to understand, ML, or Machine Learning, and LLMs, Large Language Models. This also includes transformers, large context machines that process enormous amounts of text. Well-known examples include ChatGPT, DeepSeek and many other systems.

What makes this generation of AI feel so elusive is that it uses enormous amounts of data, combined with massive amounts of computation, to understand human language and generate answers on its own.

Once again, not uncontrolled

But let there be no doubt that these systems are still kept firmly under control. By that I mainly mean the wording and behavior of the AI itself. The training data, however, becomes more complicated. Modern models learn from enormous collections of text, often gathered from books, websites, documentation and other public sources. That is where things can become problematic if people are not careful.

You could almost say that systems like ChatGPT were designed to overwhelm people with a huge wow moment. And they succeeded. That effect was achieved by consuming enormous amounts of data and allowing the AI to generate as much language as possible on its own. The result is something like a giant knowledge base that speaks your language.

But there is also a downside to that. Control becomes much harder as the scale increases. Neither the training data nor the generated answers can still be fully reviewed by humans. There is simply too much information for people to completely oversee anymore. That is why the role of programmers is increasingly shifting toward guiding and supervising models, instead of manually programming everything as we used to do.

That makes these systems impressive conversational partners, but not yet specialists you can blindly trust. It was an enormous wow moment when this generation of AI appeared, but by now the first cracks are starting to show in terms of reliability and practical use within specialized fields. An LLM is usually not a specialist, but rather a "jack of all trades, master of none". In a sense, the opposite of a tightly scoped NLP system designed for one specific task.

Even this is still just mathematics

And that is where the "black box" feeling comes from. You do not need to be a technician to notice that some answers are questionable, or that the help provided does not quite match what you meant. Still, many users continue to experience the Eliza effect, the feeling that a system understands far more than it actually does. Programmers, on the other hand, often experience the Weizenbaum effect instead, where you start seeing the mechanics behind the illusion.

On top of that come practical problems such as hallucinations, limited context windows and quality degradation during longer conversations. These are real technical limitations that are still being actively researched. Some of them will likely improve over time, while others may always remain a challenge.

That does not make this technology bad. On the contrary, it is still a huge step forward. And do not forget that modern ML and LLM systems still rely on classic NLP techniques for certain components, such as tokenization, classification, embeddings, entity recognition and tool routing.

But it is simply not quite the "holy grail" some people wanted it to be. It is not AGI, Artificial General Intelligence, in the sense of an all-knowing artificial superintelligence. That term is now increasingly used because the current generation of systems has largely claimed the original term "AI", while most experts agree that true AGI has not yet been achieved.

How it is built and used makes all the difference

So should we use ML and LLM systems as general practitioners for medical complaints? No, absolutely not. That would be genuinely dangerous. Can NLP systems help in such situations? To some extent, yes. Both traditional NLP and tightly controlled ML and LLM systems can communicate pleasantly with users. A traditional NLP system is also much easier to constrain, making it simpler to return only verified information instead of hallucinations or unreliable facts.

But in the end, they remain tools. Not doctors who measure your blood pressure, physically examine you and form a diagnosis based on education and experience. NLP, ML and LLM systems cannot truly see you, examine you or fully weigh what is relevant in your specific situation.

The problem, then, is not AI itself. The problem is how people choose to apply and use AI. We do not need to fear AI, no more than people once needed to fear the steam train. But we do need to keep using common sense. As users, as businesses purchasing AI solutions, and above all as the creators and providers of AI systems. Because with expertise comes responsibility.