On April 14, 2025, OpenAI released GPT-4.1 — a model touted as the new state-of-the-art, outperforming GPT-4o on all major benchmarks.
As always, I like to evaluate new LLMs on simple tasks like text classification and summarization to see how they compare with current leading models.
In this article, I will share the results I obtained for multi-class and multi-label text classification and text summarization using the OpenAI GPT-4.1 model. So, without further ado, let's begin.
The script below installs the Python libraries you need to run codes in this article.
!pip install openai
!pip install rouge-score
!pip install --upgrade openpyxl
!pip install pandas openpyxl
The following script imports the required libraries and modules into our Python application.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from itertools import combinations
from collections import Counter
from sklearn.metrics import hamming_loss, accuracy_score
from rouge_score import rouge_scorer
from openai import OpenAI
from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
Finally, we create the OpenAI client object we will use to call the OpenAI API. To access the API, you will need the OpenAI API key.
client = OpenAI(api_key = OPENAI_API_KEY)
We will first summarize articles in the News Article Summary dataset.
The following script imports the dataset into your application and displays its first five rows.
# https://github.com/reddzzz/DataScience_FP/blob/main/dataset.xlsx
dataset = pd.read_excel(r"/content/summary_dataset.xlsx")
print(dataset.shape)
dataset.head()
Output:
The content
column contains the article content, whereas the human_summary
column contains …