Problems with AI text translations
If there are few parallel texts between two languages, then translation is performed in two stages.
For example, when translating from Russian into Malay, the order will be: first from Russian into English, then from English into Malay.
The result is close to natural, but even in such a short chain, errors may occur due to ambiguous words.
Neural networks have made translation much better - sometimes it is difficult to distinguish it from a human Neural networks also analyze an array of parallel texts - in this sense, nothing has changed. But instead of simple identifiers, the neural network approach uses a vector representation. Each vector consists of numbers that characterize the word by lexical and semantic features.
Statistical machine translation breaks the original sentence into words and phrases, after which the system searches for a match for them in another language. In neural network translation, the entire sentence is translated. It turns into a vector space, where each word has a vector of several hundred numbers. The neural network determines the relationship between words, even if they are at opposite ends of the sentence. Therefore, the translation is more natural.
Despite the emergence of a neural network approach, statistical machine analysis has not yet been completely abandoned. For example, Yandex.Translator uses a hybrid translation model that includes statistical and neural network approaches. After processing the text by two models, an algorithm is included in the work that selects the best option.
The translation has gotten better, but there are still a lot of mistakes. Neural networks fail? The number of translation errors depends on many factors. Among them - the relationship of languages and the amount of data on which the neural network was trained.
For example, Google Translate algorithms were taught in English-Spanish and English-French language pairs. Based on the results of the study, professional translators rated the quality of text processing in these pairs almost at the level of human translation, including the https://gglot.com/ project.
The closer the languages are to each other in structure, the higher the accuracy of the translation. But if you take languages from different systems - for example, Russian and Japanese, then universal translators start to lame.
Neural network translation also uses a corpus of parallel texts. Accordingly, the problem of insufficient data remains. If there are not enough parallel texts for translation, an intermediary language is used - English. Because of this, inaccuracies arise. You can easily check this yourself if you translate the sentence sequentially into several languages, like explaned here Wiki.
For example, here is a translation of one of the paragraphs from this article: Russian - English - Mongolian - Hungarian - Russian. It was like this:
“The number of translation errors depends on many factors. Among them - the relationship of languages and the amount of data on which the neural network was trained. "
The translation turned out to be clumsy. On the other hand, this is an absolutely pointless experiment. It is unlikely that in real life someone needs such a chain. But the results of the check show exactly what happens to the translation when there are not enough parallel texts between languages.