Latin America off the AI map: Why we urgently need our own language model

ChatGPT has become the fastest-growing platform in internet history: it reached one million users in five days and 100 million in just two months. Its success is due to the novelty of the system, its ease of use, and its free access. Today, people use it to work, study, or solve everyday tasks, turning it—mistakenly—into a new search engine. And despite its many virtues, we continue to fear artificial intelligence, especially regarding the future of work. However, there are other equally worrying challenges in the Ibero-American context that should already be part of the public agenda.

Models such as ChatGPT, Gemini, Bert, or Claude are not a single technology in themselves, but rather integrate different systems. One of them is the Large Language Model (LLM), which is used to train enormous volumes of data that allow machines to process and generate text with surprising naturalness.

ChatGPT is the interface (the face) of the LLM, and according to OpenAI—the company that developed ChatGPT—it was trained on “public and free” information available on the internet, such as websites, blogs, forums, Wikipedia, articles, and academic documents. At first glance, this gives us a sense of informational diversity, but in practice it means that more than 70% of the data used for training is in English. And that is precisely where our problems begin: in the disparity of data sources.

Linguistic bias

When ChatGPT generates responses in Spanish, they are not the product of data in that language, but rather the result of automatic translations. The outcome is a strong Anglo-Saxon cultural influence that can distort nuances and expressions inherent to the Spanish language and Hispanic thought. Using ChatGPT—or any other language model—is, in a way, like watching an English-language film with Spanish subtitles.

What is truly unsettling is that, although Spanish is the second most-used language on the internet, its digital content represents only 6% of the web, compared to 49% for English. On platforms like Netflix, barely one third of films are non-Anglophone, and that third is spread across about thirty languages. So even though the digital world may seem diverse—because we are sold the idea that everyone can generate content—the reality is that most of what we see, read, and hear carries a North American accent.

Another problem—one of many we have with technology—is epistemological. Technology has gradually appropriated terms traditionally used to describe human attributes, such as intelligence, reasoning, analysis, etc., and because we lack technical knowledge, we have equated these with their human meanings. Thus, when we say an LLM uses “natural language,” it does not refer to the language spoken by humans. It means that, thanks to mathematical and statistical models, it can decipher how we use words.

Therefore, the ethical question is: does ChatGPT truly understand the world’s cultural diversity, or does it simply reflect the cultural limitations of its training data? I think we all know the answer.

Algorithmic colonialism

The problem becomes more serious when those limitations translate—literally—into invisibility, because Ibero-American cultural representation in these datasets is minimal. And that is alarming, considering that Spanish is spoken in 21 countries across three continents, and that there are more than 635 million Spanish speakers. And this is not simply a matter of including vocabulary: it is a matter of identity. People in Colombia and Spain do not speak the same way: we share a language, but we differ in history, mestizaje, geography, the tropics, accents, and even the mosquitoes. Suppose these nuances do not exist in the data used to train LLMs. In that case, our voices will be ignored—not to mention historically marginalized groups such as women, Indigenous peoples, Afro-descendants, etc.

Today, when colonialism is discussed so often, perhaps we should turn our attention to a new, subtler, and more perverse form: algorithmic colonialism, where Anglo-Saxon values and ways of thinking dominate the digital discourse. Meanwhile, we continue using ChatGPT to “improve” our texts… without noticing that, little by little, the algorithm is redefining even the way we communicate.

And this new form of cultural invisibility is already taking its toll. A study from the Complutense University notes that our communication through email, social networks, and WhatsApp is changing: we now use shorter phrases and a more artificial tone. Texts generated by ChatGPT in Spanish tend to be literal translations from English, which eliminates nuances and characteristic linguistic turns, simplifies expressive richness, and fragments the paragraph structure typical of Spanish. Not to mention how we have stopped using proper punctuation, or how many now place a comma after a greeting—following English norms—when Spanish rules require a colon.

The need for an LLM in Spanish

We Latin Americans—who are always searching for the differences among ourselves—should start thinking about the urgency of having an LLM in Spanish. A project like this would not only allow for greater inclusion in the digital world, but would also give us a common-purpose initiative that would generate jobs, knowledge, resources, partnerships (among universities, governments, and companies), and the possibility of appearing on the global artificial intelligence map.

Our region needs to find spaces that enable integration—not of a single identity, but of a collective that speaks the second most-used language on the internet. We have the moral obligation to build local datasets with linguistic and cultural diversity, because if we as a region do not do it, who will?

This is not about “imitating” Silicon Valley, but about developing an ethics grounded in Latin America—one that responds to our contexts, understands who we are, and generates cultural, social, and technological added value for the region. It’s about moving from having our programmers work for first-world countries as cheap technological labor to programming and producing technology by and for ourselves—technology that can also be exported globally.

Because joining the artificial intelligence wave does not mean that all Spanish speakers use ChatGPT; it means creating the local conditions to build our own technology. The real breakthrough is not speaking to a machine in Spanish, but teaching it to think from Spanish—with our values, our voices, and our way of understanding the world.

Only then will artificial intelligence stop translating us and finally begin to recognize us.

_{This text is part of the collaboration between the Organization of Ibero-American States for Education, Science and Culture (OEI) and Latin America21 for the dissemination of the platform Voices of Ibero-American Women. Learn more and join the} _Platform_HERE_.

Autor

Liliana Acosta

Otros artículos del autor

Philosophy specializing in ethics applied to technology. Founder of Thinker Soul, a consultancy dedicated to business digitalization and innovation. Specializing in reflecting on and disseminating information about artificial intelligence (AI).