Custom Loss Functions in PyTorch: A Comprehensive Guide

usmanmalik57 3 Tallied Votes 405 Views Share

Introduction

Loss functions are the driving force behind all machine learning algorithms. They quantify how well our models are performing by calculating the difference between the predicted and actual outcomes. The goal of every machine learning algorithm is to minimize this loss function, thereby improving the model’s accuracy.

Various libraries, such as PyTorch, TensorFlow, and Keras, provide a plethora of built-in loss functions like Mean Squared Error (MSE), Cross-Entropy, and many more. These built-in functions cover a wide range of tasks and are sufficient for many standard machine learning problems.

However, there are scenarios where these built-in loss functions may not suffice. This could be due to the unique nature of the problem at hand, or the need for a specific optimization strategy. In such cases, we need to design our own custom loss functions.

This article will guide you through the process of creating custom loss functions in PyTorch. So, Let’s get started!

Understanding Loss Functions

A loss function, alternatively referred to as a cost function, measures the degree of deviation between predicted outcomes and actual results. It serves as a metric to assess the effectiveness of an algorithm in modeling a given dataset. When predictions significantly diverge from actual values, the loss function yields a higher value. Conversely, a lower value is produced when predictions are relatively accurate.

In machine learning, the ultimate goal is to minimize this loss function. This process is known as optimization. By minimizing the loss, we are essentially fine-tuning our model to make more accurate predictions.

In PyTorch, there are several common types of loss functions, such as Mean Squared Error (MSE) for regression problems, and Cross Entropy Loss for classification problems. These loss functions work by comparing the model’s predictions with the true values and then outputting a single value that represents the model’s total error.

Custom Loss Functions

But what if your problem doesn’t fit neatly into a box? What if you need to consider multiple factors and can’t use a standard loss function? That’s where custom loss functions come into play.

Creating a custom loss function in PyTorch is not as daunting as it might seem. It involves defining a new function that calculates and returns the loss between the predicted and actual values. Here’s a step-by-step guide on how you can do this:

  1. Define the Function: The first step is to define your custom loss function. This function should take in the predicted and actual values as inputs and return a single value representing the loss. The function can be defined using standard Python syntax.

  2. Calculate the Loss: Inside your function, you’ll need to write the code that calculates the loss. This could involve operations like subtraction, squaring, or taking the logarithm, depending on what kind of loss you want to calculate.

  3. Return the Loss: Finally, your function should return the calculated loss. This value will be used by PyTorch’s optimization algorithms to update the model’s parameters.

  4. Use the Function: Once your custom loss function is defined, you can use it just like any other loss function in PyTorch. Simply pass your predicted and actual values to the function to calculate the loss.

The following is a simple example where we define a custom loss function that calculates the sum of the squares of the differences between the real and predicted values.


import torch

# Define your custom loss function
def custom_loss(y_real, y_pred):
    # Calculate loss
    loss = torch.sum((y_real - y_pred)**2)
    return loss

Use Case: Applying the Custom Loss Function

Let’s consider a problem where we’re predicting housing prices using a dataset with features like the number of rooms, location, size of the house, etc. We aim to write a custom loss function that adds more weight to the loss if the predicted value is off by more than 20% than the target values.
Let's see how to do this in code.

Importing Necessary Libraries

The following script imports the required libraries for this article.


# Import necessary libraries
import pandas as pd
import torch
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from torch import nn, optim

Importing and Preprocessing the Data

We will use a sample dataset from Kaggle that contains house prices along with various other features.

The following script imports the dataset.


# Load the data

data = pd.read_csv("/content/house_price_dataset.csv")

Next, we will define the preprocess_data() function that performs the following preprocessing steps on the dataset.

  1. Removes the street column as it might not be useful for the model.
  2. Breaks the date column into year, month, and day columns to better capture any temporal patterns in the data.
  3. Performs one-hot encoding on city, statezip, and country columns to convert these categorical variables into a form that could be provided to the model.

def preprocess_data(df):
    # Step 1: Remove the 'street' column
    df = df.drop(columns=['street'])

    # Step 2: Break the 'date' column into 'year', 'month', and 'day'
    df['date'] = pd.to_datetime(df['date'])
    df['year'] = df['date'].dt.year
    df['month'] = df['date'].dt.month
    df['day'] = df['date'].dt.day
    df = df.drop(columns=['date'])

    # Step 3: Perform one-hot encoding on 'city', 'statezip', and 'country' columns using pd.get_dummies
    df = pd.concat([df, pd.get_dummies(df[['city', 'statezip', 'country']], drop_first=True)], axis=1)
    df = df.drop(columns=['city', 'statezip', 'country'])

    return df

data = preprocess_data(data)

We define another function get_train_test_split, that divides the dataset into features and labels, and subsequently splits the dataset into train and test sets.


def get_train_test_split(df):
    # Preprocessing steps as before
    # ...

    # Separate the features and the label
    X = df.drop(columns=['price'])  # Features
    y = df['price']                 # Label

    # Split the dataset into a training set and a test set (80% train, 20% test)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    return X_train, X_test, y_train, y_test

X_train, X_test, y_train, y_test = get_train_test_split(data)

Creating Hugging Face Dataset, Model, and Custom Loss Function

The next step is to create a hugging face dataset for training and test sets. We will also create corresponding data loaders for processing training and test sets in batches.


# Define the dataset

import torch
from torch.utils.data import Dataset

class HousingDataset(Dataset):
    def __init__(self, X, y):
        self.X = torch.tensor(X.values, dtype=torch.float32)
        self.y = torch.tensor(y.values, dtype=torch.float32)

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        data_val = self.X[idx]
        target = self.y[idx]
        return data_val, target


train_dataset = HousingDataset(X_train, y_train)
test_dataset = HousingDataset(X_test, y_test)

batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

Next, we will define the RegressionModel for the house price prediction which is create a simple feed-forward neural network model. It has several fully connected (nn.Linear) layers with ReLU activation functions, and the output layer for regression does not have an activation function.


# Define the regression model
class RegressionModel(nn.Module):
    def __init__(self, input_size):
        super(RegressionModel, self).__init__()
        self.fc1 = nn.Linear(input_size, 1024)
        self.fc2 = nn.Linear(1024, 512)
        self.fc3 = nn.Linear(512, 256)
        self.fc4 = nn.Linear(256, 64)
        self.fc5 = nn.Linear(64, 32)
        self.fc6 = nn.Linear(32, 1)  # Output layer for regression

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.relu(self.fc4(x))
        x = F.relu(self.fc5(x))
        x = self.fc6(x)  # No activation function for the output layer in regression
        return x

input_size = len(X_train.columns)
model = RegressionModel(input_size)

Finally, we will define our custom loss function custom_mse_loss. This loss function calculates the mean squared error (MSE) between the output (predicted values) and the target (actual values).

If the relative error percentage (the absolute difference between the output and the target divided by the target), is greater than a certain threshold (20% in this case), the loss is multiplied by a penalty weight (2.0 in this case) to give more weight to examples where the model’s prediction is off by more than 20%. This encourages the model to pay more attention to such examples and learn to predict them more accurately. The function then returns the mean of the squared errors, hence the name custom_mse_loss.


def custom_mse_loss(output, target, threshold=0.20, penalty_weight=2.0):

    # Calculate squared error
    squared_error = (output - target) ** 2

    # Calculate the relative error
    relative_error = torch.abs((output - target) / target)

    # Apply additional weight where the error exceeds the threshold
    weighted_error = torch.where(relative_error > threshold, penalty_weight * squared_error, squared_error)

    # Calculate the mean of the weighted errors
    loss = torch.mean(weighted_error)
    return loss

This custom_mse_loss function can be particularly useful in cases where being off by more than a certain amount is particularly undesirable, such as predicting housing prices. If a house is listed for $500,000, a prediction of $600,000 might be acceptable, but a prediction of $1,000,000 would be way off. By applying a penalty to such examples, the model is encouraged to make its predictions as close as possible to the actual values.

Training and Evaluating the Model

Next, we can train the model we defined. The training process remains the same except that to calculate the loss we do not pass any default PyTorch loss function, rather we pass the predicted and target values to the custom_mse_loss function in the training loop. Average loss for all the batches is printed after every epoch. The output shows the loss values for the last 5 epochs.


# Initialize the model
input_size = len(X_train.columns)
model = RegressionModel(input_size)

# Define the loss function and optimizer
optimizer = optim.Adam(model.parameters(), lr=0.0001)

# Training loop
num_epochs = 100  # You can change the number of epochs

for epoch in range(num_epochs):
    model.train()
    total_loss = 0

    for batch in train_loader:
        x, y = batch
        optimizer.zero_grad()
        y_pred = model(x)
        # calling custom loss funciton.
        loss = custom_mse_loss(y_pred.squeeze(), y)  # squeeze to match y's dimensions
        loss.backward()
        optimizer.step()
        total_loss += loss.item()

    print(f"Epoch {epoch+1}, Loss: {total_loss/len(train_loader)}")


Ouput:


Epoch 96, Loss: 141574589395.47827
Epoch 97, Loss: 141916568317.77393
Epoch 98, Loss: 142382569596.66086
Epoch 99, Loss: 141684189023.72174
Epoch 100, Loss: 141654507626.85217

Finally, we can evaluate our model on the test set, as shown in the following script. Here we print the mean absolute error value, which is the average of difference between predicted and target values for all the examples in the test set.


# Evaluate the model
model.eval()
with torch.no_grad():
    mae = 0
    for batch in test_loader:
        x, y = batch
        y_pred = model(x)
        mae += torch.abs(y - y_pred).mean().item()
    mae /= len(test_data)

print(f'Mean Absolute Error on the test set: {mae}')

Finally, we print what percentage of error do we have for all examples. The result show that on average the predicted values are only 1.98% off the target values which is quite impressive.


mean_price = data["price"].mean()

percentage = (mae / mean_price) * 100
print(f"MAE is {percentage:.2f}% of the mean price.")

Output:


MAE is 1.98% of the mean price.

Tips and Best Practices

When creating custom loss functions, keep these tips in mind:

  • Ensure your loss function is differentiable since PyTorch uses gradient descent for optimization.
  • Test your loss function with a small dataset to ensure it’s working as expected.
  • Be mindful of overfitting. A complex loss function might fit the training data well but perform poorly on unseen data.

Conclusion

While libraries like Keras and PyTorch provide a variety of built-in loss functions, there are scenarios where these may not suffice. This is where the power of custom loss functions comes into play, allowing us to tailor the learning process to our specific needs.

In this comprehensive guide, we delved into the implementation of a custom loss function in PyTorch. Through the practical example of predicting housing prices, we demonstrated how to design a loss function that adds more weight to predictions that deviate significantly from the target values.
I hope you liked this article, feel free to leave feedback or suggestions.

AndreRet 526 Senior Poster

As always, precise and in detail, great tutorial!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.