Introduction
In this tutorial, you will see how to convert the text in CSV file columns to other languages using the DeepL API in the Python programing language.
DeepL is one of the most popular and accurate text translation platforms. DeepL, as the name suggests, incorporates advanced deep learning algorithms for training text translation models.
In addition to raw text strings, DeepL supports translating documents in PDF, MS Word, and PowerPoint formats. However, I wanted to translate text in CSV file columns, which DeepL does not support.
In this tutorial, I will explain how I achieved translating text in CSV columns using the DeepL API. The resultant CSV will have new columns containing translated text.
Translating Text with DeepL in Python
I chose Python language for translating text in CSV files since DeepL has an official Python client that you can exploit for text translation in your code.
The official GitHub repository explains installing the DeepL API along with sample scripts.
Here I will provide a simple example for your reference. The following are the steps:
Create an Object of the Translator
class and pass it your DeepL authorization key.
Pass the text you want to translate to the translate_text()
method of the Translator
class. You must also pass the ISO 639-1 standard language code to the target_lang
attribute of the translate_text()
method.
To get the translated text, access the text
attribute of the object returned by the translate_text()
function.
Here is an example:
import os
from deepl import Translator
auth_key = os.environ['deepL-key']
dl_translator = Translator(auth_key)
result = dl_translator.translate_text("Python is awesome", target_lang="FR")
print(result.text)
Output:
Python est génial
Translating Text in a CSV File using a Pandas Dataframe
Let’s now see how to translate a CSV file. For example, I will translate a CSV file containing Yelp Reviews. The file looks like the one in the following screenshot. The text
column contains reviews in the English language.
A Pandas Dataframe is ideal for reading, manipulating, and saving CSV files in Python. The following script imports the yelp.csv
file into a Pandas dataframe.
import pandas as pd
dataset = pd.read_csv("D:\Datasets\yelp.csv")
dataset.head()
Output:
I will remove all the columns except the text
column. You can only translate 500,000 characters with the free DeepL API. Therefore, for the sake of demonstration, I will keep the first 20 rows of the dataset.
import pandas as pd
dataset = pd.read_csv("D:\Datasets\yelp.csv")
dataset = dataset.filter(["text"], axis = 1).head(20)
dataset.head()
Output:
Let’s now translate reviews in the text
column. I want to add a new column that contains a translation of the reviews in the corresponding row in the text
column.
To do so, I will define a translate_df()
function that accepts a Pandas dataframe. The function will call the translate_text()
method to translate text in the individual rows of the text
column.
Next, I can use the apply()
method of Pandas dataframe and pass the function variable translate_df
to this method. The apply()
method applies the translate_df()
function to each row of the input Pandas dataframe. I will add a new column named text_fr
, which contains the translated text.
The following script translates text into the French language.
import os
import pandas as pd
from deepl import Translator
auth_key = os.environ['deepL-key']
dl_translator = Translator(auth_key)
def translate_df(df):
return dl_translator.translate_text(df["text"], target_lang="FR").text
dataset["text_fr"] = dataset.apply(translate_df, axis = 1)
dataset.head()
Output:
Instead of hardcoding the target language in the translate_df()
function, you can pass the target language as a parameter value to the translate_df()
function. Here is an example that translates the input text into the Spanish language.
import os
import pandas as pd
from deepl import Translator
auth_key = os.environ['deepL-key']
dl_translator = Translator(auth_key)
def translate_df(df, lang):
return dl_translator.translate_text(df["text"], target_lang= lang).text
language = "es"
dataset["text_es"] = dataset.apply(translate_df,
lang = language,
axis = 1)
dataset.head()
Output:
You might want to convert text into multiple languages. To do so, you can modify the translate_df()
and pass it a list of target languages. You can then iterate through the target languages list to translate the input text.
The translated text for each language is appended to a list that you can convert to a Pandas series. Finally, you can return the Pandas series to the calling function. The following script modifies the translate_text()
function.
import os
import pandas as pd
from deepl import Translator
auth_key = os.environ['deepL-key']
dl_translator = Translator(auth_key)
def translate_df(df, lang):
translate_list = []
for i in lang:
translate_list.append(dl_translator.translate_text(df["text"], target_lang= i).text)
return pd.Series(translate_list)
The script below converts the text in the text
column of the input Pandas dataframe to Spanish and French.
import numpy as np
languages = ["es", "fr"]
col_names = ["text_es", "text_fr"]
for newcol in col_names:
dataset[newcol]=np.nan
dataset[col_names] = dataset.apply(translate_df,
lang = languages,
axis = 1)
dataset.head()
In the output below, you can see that the input text is converted to Spanish and French.
Output:
Once you translate text in a Pandas dataframe, you can easily store the Pandas dataframe as a CSV file using the to_csv()
function. Here is an example:
dataset.to_csv("D:\Datasets\yelp_translated.csv", index = False)
The output below shows the CSV file containing the input text and corresponding translations in Spanish and French.
Output:
DeepL Library is a convenient library for text translation. It provides raw text and document translation services for PDF, MS Word, and PowerPoint documents. However, you cannot translate text in CSV file columns by default. In this tutorial, I explained how you could achieve that using DeepL API Python client and Pandas dataframes. I hope you will find it helpful. If you have any suggestions or improvements, feel free to comment.