Translating CSV Files using DeepL and Pandas Dataframes in Python

usmanmalik57 1 Tallied Votes 831 Views Share

Introduction

In this tutorial, you will see how to convert the text in CSV file columns to other languages using the DeepL API in the Python programing language.

DeepL is one of the most popular and accurate text translation platforms. DeepL, as the name suggests, incorporates advanced deep learning algorithms for training text translation models.

In addition to raw text strings, DeepL supports translating documents in PDF, MS Word, and PowerPoint formats. However, I wanted to translate text in CSV file columns, which DeepL does not support.

In this tutorial, I will explain how I achieved translating text in CSV columns using the DeepL API. The resultant CSV will have new columns containing translated text.

Translating Text with DeepL in Python

I chose Python language for translating text in CSV files since DeepL has an official Python client that you can exploit for text translation in your code.

The official GitHub repository explains installing the DeepL API along with sample scripts.

Here I will provide a simple example for your reference. The following are the steps:

Create an Object of the Translator class and pass it your DeepL authorization key.
Pass the text you want to translate to the translate_text() method of the Translator class. You must also pass the ISO 639-1 standard language code to the target_lang attribute of the translate_text() method.
To get the translated text, access the text attribute of the object returned by the translate_text() function.

Here is an example:

import os
from deepl import Translator

auth_key = os.environ['deepL-key']
dl_translator = Translator(auth_key)

result = dl_translator.translate_text("Python is awesome", target_lang="FR")
print(result.text) 

Output:

Python est génial

Translating Text in a CSV File using a Pandas Dataframe

Let’s now see how to translate a CSV file. For example, I will translate a CSV file containing Yelp Reviews. The file looks like the one in the following screenshot. The text column contains reviews in the English language.

image_1.JPG

A Pandas Dataframe is ideal for reading, manipulating, and saving CSV files in Python. The following script imports the yelp.csv file into a Pandas dataframe.

import pandas as pd

dataset = pd.read_csv("D:\Datasets\yelp.csv")

dataset.head()

Output:

image_2.JPG

I will remove all the columns except the text column. You can only translate 500,000 characters with the free DeepL API. Therefore, for the sake of demonstration, I will keep the first 20 rows of the dataset.

import pandas as pd

dataset = pd.read_csv("D:\Datasets\yelp.csv")

dataset = dataset.filter(["text"], axis = 1).head(20)
dataset.head()

Output:

image_3.JPG

Let’s now translate reviews in the text column. I want to add a new column that contains a translation of the reviews in the corresponding row in the text column.

To do so, I will define a translate_df() function that accepts a Pandas dataframe. The function will call the translate_text() method to translate text in the individual rows of the text column.

Next, I can use the apply() method of Pandas dataframe and pass the function variable translate_df to this method. The apply() method applies the translate_df() function to each row of the input Pandas dataframe. I will add a new column named text_fr, which contains the translated text.

The following script translates text into the French language.

import os
import pandas as pd
from deepl import Translator

auth_key = os.environ['deepL-key']
dl_translator = Translator(auth_key)

def translate_df(df):
     return dl_translator.translate_text(df["text"], target_lang="FR").text


dataset["text_fr"] = dataset.apply(translate_df, axis = 1)
dataset.head()

Output:

image_4.JPG

Instead of hardcoding the target language in the translate_df() function, you can pass the target language as a parameter value to the translate_df() function. Here is an example that translates the input text into the Spanish language.

import os
import pandas as pd
from deepl import Translator

auth_key = os.environ['deepL-key']
dl_translator = Translator(auth_key)

def translate_df(df, lang):
     return dl_translator.translate_text(df["text"], target_lang= lang).text

language = "es"
dataset["text_es"] = dataset.apply(translate_df, 
                                   lang = language, 
                                   axis = 1)
dataset.head()

Output:

image_5.JPG

You might want to convert text into multiple languages. To do so, you can modify the translate_df() and pass it a list of target languages. You can then iterate through the target languages list to translate the input text.

The translated text for each language is appended to a list that you can convert to a Pandas series. Finally, you can return the Pandas series to the calling function. The following script modifies the translate_text() function.

import os
import pandas as pd
from deepl import Translator

auth_key = os.environ['deepL-key']
dl_translator = Translator(auth_key)

def translate_df(df, lang):

    translate_list = []
    for i in lang:
        translate_list.append(dl_translator.translate_text(df["text"], target_lang= i).text)

    return pd.Series(translate_list)

The script below converts the text in the text column of the input Pandas dataframe to Spanish and French.

import numpy as np
languages = ["es", "fr"]
col_names = ["text_es", "text_fr"]

for newcol in col_names:
    dataset[newcol]=np.nan

dataset[col_names] = dataset.apply(translate_df, 
                                   lang = languages, 
                                   axis = 1)

dataset.head()

In the output below, you can see that the input text is converted to Spanish and French.

Output:

image_6.JPG

Once you translate text in a Pandas dataframe, you can easily store the Pandas dataframe as a CSV file using the to_csv() function. Here is an example:

dataset.to_csv("D:\Datasets\yelp_translated.csv", index = False)

The output below shows the CSV file containing the input text and corresponding translations in Spanish and French.

Output:

image_7.JPG

DeepL Library is a convenient library for text translation. It provides raw text and document translation services for PDF, MS Word, and PowerPoint documents. However, you cannot translate text in CSV file columns by default. In this tutorial, I explained how you could achieve that using DeepL API Python client and Pandas dataframes. I hope you will find it helpful. If you have any suggestions or improvements, feel free to comment.

henry0024 commented: Thank you for sharing this information +0