Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Text Translation Using Special Characters in Deepl (API) #28

Open
Abbis-1 opened this issue Oct 2, 2024 · 5 comments
Open

Comments

@Abbis-1
Copy link

Abbis-1 commented Oct 2, 2024

Hello I have a Problem with the Deepl API in ABAP.

The Problem:
I have a text that is in a table format. Each table row can contain a maximum of 72 characters. The text can have a structure, as shown in the example, where the texts are separted by bullet points. The text should be maintained during the translation.

This is how a text can look like.

Screenshot 2024-10-02 111545

This is the text in table format I have to work with.

image

The solution I have tried but dont work:

I take the text from the table and convert it into a string. I insert a special character {@} between the lines each time. I also use the ignored_tags function, and use the tag <x></x> with it. I wrap my ignored_tags around my placeholder and it looks like this <x>{@}<x>. I thought the tags and placeholders would then be ignored in the translation, but it dosnt look like it works. At last, I split the string back <x>{@}</x> with this placeholder back, so I get the translated text into a table form. Thats why I need this placeholder.

This is the text that I am sending to the Deepl API, translated from english to portuguese (brasil).

english I translate:
image

portuguese (brasil) I get back from the Deepl API:

image

As you can see, the complete sentence "e.g. <x>{@}</x> DataScope) is applied, the client needs to adjust the link <x>{@}</x>." is doubled in the portuguese (brasil) translation).

I would say Deepl apparently has trouble handling such special characters. There are duplicates of words or sentences.

Also if I use the Deepl Webtranslator instead of the Deepl API, I get the problem with duplicates.

english -> portuguese (brasil)
image

english -> french
Screenshot 2024-10-02 115521

english -> german
Screenshot 2024-10-02 115709

In french I get the same problem.
In german it dont double the sentence but doubles the e.g. to z z.B.

Is there a way to use a special character as a placeholder that is completely ignored by the Deepl API translation? Or do you have a solution that you could recommend?

@JanEbbing
Copy link
Member

Hi, thanks for your report!

  • What you're trying won't work at all in the web translator as only the API supports tag handling.
  • I think what you're trying to do is still confusing the models as it still introduces the {@} character sequence in the middle of sentences
  • I think a better approach is to instead wrap each line in tags, and use that on your end to delimit.

As an example,

Bigger screen, makes work easier and increases the overview of machine
performance or operational metrics
- Enhanced service program options.
- Remote troubleshooting and analysis enabled

would become

<line>Bigger screen, makes work easier and increases the overview of machine </line>
<line>performance or operational metrics</line>
<line>- Enhanced service program options.</line>
<line>- Remote troubleshooting and analysis enabled</line>

This gives me

<line>Tela maior, facilita o trabalho e aumenta a visão geral da máquina </line>
<line>desempenho da máquina ou métricas operacionais</line>
<line>- Opções aprimoradas do programa de serviço.</line>
<line>- Possibilidade de análise e solução de problemas remotos</line>

and should be more robust to these kinds of issues.

@Abbis-1
Copy link
Author

Abbis-1 commented Oct 8, 2024

Thanks JanEbbing,

that solution doesn't work also. Do you have any suggestions? I can't be the only one with this issue.

Some of the tags disappear during translation. For Example in the original text, I have the character <x> or </x> 60 times and in the transaltion I only get <x> or </x> 55 times back. They just disappear.

@Abbis-1
Copy link
Author

Abbis-1 commented Oct 10, 2024

Do you have any other suggestions for the problem?

@JanEbbing
Copy link
Member

Hi, could you please provide an example call that fails in the way you described, if the contents are not sensitive? I can then forward this to the responsible team.

@Abbis-1
Copy link
Author

Abbis-1 commented Oct 14, 2024

Hello Jan Ebbing,

I have texts that need to be translated. They can be very short marketing texts with little content, but also long texts of up to 6000-8000 characters. I have texts in German that should be translated into English using the DeepL Api. The English text translated by Deepl will than be translated into French, Portuguese (Brazil) and Russian. The structure of the text should be maintained as much as possible.

Here is a small example:
I generated a short German text.
First, I have the German text in the system:

image

Now my program starts:
I get the text as a table from the system. I then have the text in this format

image

The table can be seen as a list/arraylist, where each row is a record (in this case, a line of text). The table can contain 1 to n rows. I have to convert the table to a string. Then, as you said, I added a sepecial character before and after each line. The special character I use is . I loop through the entire text, and each line receives a <b> </b>.

image

<b>Entdecken Sie unsere innovativen Lösungen, </b> <b>die Ihr Unternehmen auf das nächste Level heben. </b> <b></b> <b>Optimieren Sie Ihre Prozesse, steigern Sie Ihre</b> <b>Effizienz und erleben Sie, wie einfach es sein</b> <b>kann, Ihre Ziele zu erreichen. Jetzt informieren</b> <b>und durchstarten! </b>

After that, I send the text to the Deepl API as a string.

image

and I receive this back:

image

<?xml version="1.0" encoding="UTF-8"?>#<vertriebst
exte><element><text>&lt;b&gt;Discover our innovati
ve solutions&lt;/b&gt; &lt;b&gt;that will take you
r company to the next level.&lt;/b&gt; &lt;b&gt;&l
t;/b&gt; &lt;b&gt;Optimize your processes, increas
e your&lt;/b&gt; &lt;b&gt;efficiency and experienc
e how easy it&lt;/b&gt; &lt;b&gt;can be to achieve
 your goals. Find out now&lt;/b&gt; &lt;b&gt;and g
et started!</text><language>en</language></element
></vertriebstexte>

I then convert the characters back, which looks like this:

image

<b>Discover our innovative solutions</b> <b>that w
ill take your company to the next level.</b> <b></
b> <b>Optimize your processes, increase your</b> <
b>efficiency and experience how easy it</b> <b>can
 be to achieve your goals. Find out now</b> <b>and
 get started!

Then, I delete the </b>, leaving only the <b>. As you can see the last </b> is missing, it has been lost in the translation.

image

Then I use the split function and split by <b> to convert the string back into a table. The system only accepts the text in a table form.

image

This is how it looks in the system:

image

The same process then from English to French

image

I checked the translation with the DeepL Translator, and it looked good.

image

For simple texts, it works well. However, as I mentioned at the beginning, I have marketing texts that are more than 6000 characters long, are in a table with more than 100 rows. This texts can contain brand names and various special characters like – ( ) / & : . Since the texts are sent as a single string via the DeepL Api, there seem to be problems.

I generated another English text and sent it via the DeepL Api. With this text, I’m already having issues with the translation, even without placeholders.

1. Detailed options for performance analysis:
a. Communication between training sessions takes place via the modern
fitness tracking system FitTrack. This also enables a comprehensive
analysis of the performance of all athletes integrated into the new
program.
b. Heart rate monitoring with precise information on the heart health of
each athlete (only available if the option of the new health module has
been booked!).
c. Display of the best time in case of a new record.
d. Runtime monitoring of training sessions (only in conjunction with the
training upgrade option!).
e. Monitoring of muscle fatigue during training.
f. Display of best performance in case of injuries during training.
g. More information and alerts on the training app for improved
performance optimization.
h. Extended support for numerous training goals; the athlete clicks on
them and receives comprehensive tips to improve performance.
i. There are additional options in the coaching program to create and
optimize individual training plans. The benefits summarized:
- Improved and comprehensive analysis options for athletic performance -
Accelerated identification of strengths and weaknesses
- Optimum efficiency in training
- Increased protection through regular health checks
- Significant cost reduction for training equipment thanks to innovative
technologies
- Larger screen in the app, which facilitates training planning and
improves the overview of progress
- Extended functionality of the coaching system

When I send this text to the DeepL API, it will be sent as a string. When I send this strring to the API, it appers the same in the web translator (picture below). In this example, points f. and g. are duplicated

image

But actually, I want to translate the text, with the given structure (for this example: including the list from a. to i. and all the dashes), from English to French. The biggest problems for me is the duplications, as I can’t figure out a logical reason for what causes them. They seem to occur randomly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants