Upvoted Posts by usmanmalik57

usmanmalik57 12 Junior Poster in Training

1 Month Ago

Evaluating OpenAI GPT 4.1 for Text Summarization and Classification Tasks

On April 14, 2025, OpenAI released GPT-4.1 — a model touted as the new state-of-the-art, outperforming GPT-4o on all major benchmarks.
As always, I like to evaluate new LLMs on simple tasks like text classification and summarization to see how they compare with current leading models.

In this article, I will share the results I obtained for multi-class and multi-label text classification and text summarization using the OpenAI GPT-4.1 model. So, without further ado, let's begin.

Importing and Installing Required Libraries

The script below installs the Python libraries you need to run codes in this article.

!pip install openai
!pip install rouge-score
!pip install --upgrade openpyxl
!pip install pandas openpyxl

The following script imports the required libraries and modules into our Python application.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from itertools import combinations
from collections import Counter
from sklearn.metrics import hamming_loss, accuracy_score
from rouge_score import rouge_scorer
from openai import OpenAI

from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

Finally, we create the OpenAI client object we will use to call the OpenAI API. To access the API, you will need the OpenAI API key.

client = OpenAI(api_key = OPENAI_API_KEY)

Text Summarization with GPT-4.1

We will first summarize articles in the News Article Summary dataset.

The following script imports the dataset into your application and displays its first five rows.

# https://github.com/reddzzz/DataScience_FP/blob/main/dataset.xlsx


dataset = pd.read_excel(r"/content/summary_dataset.xlsx")
print(dataset.shape)
dataset.head()

Output:

The content column contains the article content, whereas the human_summary column contains …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

2 Months Ago

DeepSeek R1 vs Llama 3.1-405b for Text Classification and Summarization

In a previous article, I presented a comparison of DeepSeek-R1-Distill-Llama-70b with the DeepSeek-R1-Distill-Qwen-32B for text classification and summarization.

Both these models are distilled versions of the original DeepSeek R1 model. Recently, I wanted to try the original version of the DeepSeek R1 model using the DeepSeek API. However, I was not able to test it because the DeepSeek API was not allowing requesting the model due to high demand. My other option was to access the model via the Hugging Face API, but I could not run that either since it requires very high memory.

Finally, I found a solution via the FireworksAI API. Fireworks AI provides access to the DeepSeek R1 model hosted inside the United States, so you do not have to worry about sending your data to undesired locations.

In this article, you will see a comparison of the original DeepSeek R1 model with Llama 3.1-405B model for text classification and summarization.

So, let's begin without ado.

Importing and Installing Required Libraries

The following script installs the Fireworks Python library and the other libraries required to run scripts in this article.

!pip install --upgrade fireworks-ai
!pip install rouge-score
!pip install --upgrade openpyxl
!pip install pandas openpyxl

The script below imports the required libraries into your Python application.

You will also need the Fireworks API key to access the Fireworks API via the Python library.

from fireworks.client import Fireworks
import os
import pandas as pd
import time
from rouge_score import rouge_scorer …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

3 Months Ago

Benchmarking DeepSeek R1 for Text Classification and Summarization

DeepSeek-R1 is a groundbreaking family of reinforcement learning (RL)-driven AI models developed by the Chinese AI firm DeepSeek. It is designed to rival industry leaders like OpenAI and Google in complex decision-making and optimization problems.

In this article, we will benchmark the DeepSeek R1 model for text classification and summarization as we did with Qwen and LLama models in a previous article.

So, let's begin without further ado.

Importing and Installing Required Libraries

We will use a distilled version of DeepSeek R1 model from the Hugging Face Inference API. You can use larger versions of DeepSeek models via the DeepSeek API.

The following script installs the required libraries.


!pip install huggingface_hub==0.24.7
!pip install rouge-score
!pip install --upgrade openpyxl
!pip install pandas openpyxl

The script below imports the required libraries into your Python application.

from huggingface_hub import InferenceClient
import os
import pandas as pd
from rouge_score import rouge_scorer
from sklearn.metrics import accuracy_score
from collections import defaultdict

Calling DeepSeek R1 Model Using Hugging Face Inference API

Calling the DeepSeek R1 model via the Hugging Face Inference API is similar to calling any text generation model. You need to create an object of the InferenceClient class and pass it the model ID.

hf_token = os.environ.get('HF_TOKEN')

#deepseek-R1-distill endpoint
#https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
deepseek_model_client = InferenceClient(
    "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
    token=hf_token
)

Let's test the DeepSeek model we just imported. The following script defines the make_prediction() function, which accepts the model object, the system role, and the user query and returns the model response.

…

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

5 Months Ago

Qwen 2.5-72b Vs. Llama 3.3-70b for Text Classification and Summarization

Open-source LLMs are gaining significant traction due to their ability to match the performance of advanced proprietary LLMs. These models are free to use and allow users to modify their source code or fine-tune them on their own systems, making them highly versatile for various applications.

Alibaba's Qwen and Meta's Llama series are two prominent contenders in the open-source LLM landscape.

In this article, we compare the performance of Qwen 2.5-72b and Llama 3.3-70b, which was released on December 6, 2024. Meta claims that Llama 3.3-70b achieves the same performance as its Llama 3.2 model with 405 billion parameters, making it a powerful and efficient choice for NLP tasks.

By the end of this article, you will understand which model best suits specific NLP needs.

So, let's dive right in!

Installing and Importing Required Libraries

We need to install some libraries to call the Hugging Face inference API to access the Qwen and LLama models. We will also use the rouge-score library to calculate ROUGE scores for text summarization tasks. Below is the script to install the necessary libraries:

!pip install huggingface_hub==0.24.7
!pip install rouge-score
!pip install --upgrade openpyxl
!pip install pandas openpyxl

After installation, import the required libraries as shown below:

from huggingface_hub import InferenceClient
import os
import pandas as pd
from rouge_score import rouge_scorer
from sklearn.metrics import accuracy_score
from collections import defaultdict

Calling Qwen 2.5 and Llama 3.3 Models Using Hugging Face Inference API

To access models using the Hugging Face inference API, you …

Computer Science artificial-intelligence-llm python

sadiaafrin commented: good post +0

usmanmalik57 12 Junior Poster in Training

5 Months Ago

Question/Answering over SQL Data Using LangGraph Framework

This tutorial demonstrates how to build an AI agent that queries SQLite databases using natural language. You will see how to leverage the LangGraph framework and the OpenAI GPT-4o model to retrieve natural language answers from an SQLite database, given a natural language query.

So, let's begin without ado.

Importing and Installing Required Libraries

The following script imports the required libraries.


!pip install langchain-community
!pip install langchain-openai
!pip install langgraph

I ran the codes in this article on Google Colab[https://colab.research.google.com/], therefore I didnt have to install any other library.

The script below imports the required libraries into your Python application

import sqlite3
import pandas as pd
import csv
import os

from langchain_community.utilities import SQLDatabase
from langchain_community.tools.sql_database.tool import QuerySQLDataBaseTool
from langchain_openai import ChatOpenAI
from langchain import hub
from langgraph.graph import START, StateGraph

from typing_extensions import Annotated
from IPython.display import Image, display

from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

Creating a Dummy SQLite Database

We will perform question/answering using the SQLite database and LangGraph. The process remains the same for other database types.

We will create a dummy SQLite dataset using the Titanic dataset CSV file from Kaggle.

The following script converts the Titanic-Dataset.csv file into the titanic.db SQLite database.

dataset = pd.read_csv('Titanic-Dataset.csv')
dataset.head()

Output:


import pandas as pd
import sqlite3

def csv_to_sqlite(csv_file_path, database_name):

    try:
        # Read the CSV file into a pandas DataFrame with headers
        df = pd.read_csv(csv_file_path)

        # Connect to SQLite database (or create it if it doesn't exist)
        conn …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

5 Months Ago

Evaluating GPT-4o November Model for Text Classification and Summarization

On November 20, 2024, OpenAI updated its GPT-4o model, claiming it is more creative and accurate on several benchmarks.

In this article, I compare the GPT-4o November update with the previous version (August update) for text summarization and classification tasks.

By the end of this article, you will see whether the new update outperforms the previous one, particularly for text classification and summarization tasks.

So, let's begin without ado.

Importing and Installing Required Libraries

You must install the OpenAI Python library to access OpenAI models in Python. In addition, you need to install a few other libraries that will help you evaluate OpenAI models for text summarization and classification tasks.


!pip install openai
!pip install rouge-score
!pip install --upgrade openpyxl
!pip install pandas openpyxl

The following script imports the required libraries in our Python application.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from itertools import combinations
from collections import Counter
from sklearn.metrics import hamming_loss, accuracy_score
from rouge_score import rouge_scorer
from openai import OpenAI

from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

We will also define an OpenAI client object to call the OpenAI API.

client = OpenAI(api_key = OPENAI_API_KEY)

Comparison for Text Summarization

Let's first see the results of text summarization. We will use the News Article Summary dataset you can download from Kaggle to summarize the articles.



# Kaggle dataset download link
# https://github.com/reddzzz/DataScience_FP/blob/main/dataset.xlsx


dataset = pd.read_excel(r"/content/dataset.xlsx")
print(dataset.shape)
dataset.head()

Output:

The dataset contains news articles and human …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

6 Months Ago

Fine-tuning OpenAI GPT-4o for Multi-label Text Classification

In my previous article, I presented a comparison of GPT-4o and Claude 3.5 Sonnet for multi-label text classification. The accuracies achieved by both models were relatively low.

Fine-tuning is one solution to overcome the low performance of large-language models. With fine-tuning, you can incorporate custom domain knowledge into an LLM's weights, leading to better performance on your custom dataset.

This article will show how to fine-tune the OpenAI GPT-4o model on the multi-label research paper classification dataset. It is the same dataset I used for zero-shot multi-label classification in my previous article. You will see significantly better results with the fine-tuned GPT-4o model.

So, let's begin without ado.

Importing and Installing Required Libraries

We will fine-tune the OpenAI GPT-4o model using the OpenAI API in Python. The following script installs the OpenAI Python library.


!pip install openai

The script below imports the required libraries into your Python application.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from itertools import combinations
from collections import Counter
from sklearn.metrics import hamming_loss, accuracy_score
import json
import os
from openai import OpenAI

Importing and Preprocessing the Dataset

We will fine-tune the GPT-4o model using the same multi-label research paper classification dataset we used in the last article.

The following script imports the dataset into a Pandas dataframe and displays the dataset header.

## dataset download link
## https://www.kaggle.com/datasets/shivanandmn/multilabel-classification-dataset?select=train.csv

dataset = pd.read_csv(r"D:\Datasets\Multilabel Research Paper Classification\train.csv")
print(f"Dataset Shape: {dataset.shape}")
dataset.head()

Output:

The dataset has …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

6 Months Ago

OpenAI GPT-4o vs Claude 3.5 Sonnet for Multi-label Text Classification

In one of my previous articles, you saw a comparison of GPT-4o vs. Claude 3.5 sonnet for zero-shot text classification. In that article; we performed multi-class text classification where input tweets belonged to one of the three categories.

In this article, we will go a step further and perform zero-shot multi-label text classification with GPT-4o and Claude 3.5 sonnet models. We will compare the two models using accuracy and hamming loss criteria and see which model is suited for zero-shot multi-label text classification.

So, let's begin without ado.

Installing and Importing Required Libraries

We will call the Claude 3.5 sonnet and GPT-4o models using the Anthropic and OpenAI Python libraries.

The following script installs these libraries.


!pip install anthropic
!pip install openai

The script below imports the required libraries into your Python application.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from itertools import combinations
from collections import Counter
from sklearn.metrics import hamming_loss, accuracy_score

import anthropic
from openai import OpenAI

from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
ANTHROPIC_API_KEY = userdata.get('ANTHROPIC_API_KEY')

Importing and Visualizing the Dataset

We will use a multi-label text classification dataset from Kaggle containing research paper titles and abstracts that can belong to one or more of the six output categories: Computer Science, Physics, Mathematics, Statistics, Quantitative Biology, and Quantitative Finance.

The following script imports the training set from the dataset and plots the dataset header.


## dataset download link
## https://www.kaggle.com/datasets/shivanandmn/multilabel-classification-dataset?select=train.csv

dataset = pd.read_csv("/content/train.csv") …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

6 Months Ago

Image Analysis Using Llama 3.2 Vision Instruct Model

On September 25, 2024, Meta released the Llama 3.2 series of multimodal models. The models are lightweight yet extremely powerful for image-to-text and text-to-text tasks.

In this article, you will learn how to use the Llama 3.2 Vision Instruct model for general image analysis, graph analysis, and facial sentiment prediction. You will see how to use the Hugging Face Inference API to call the Llama 3.2 Vision Instruct model.

The results are comparable with the proprietary Claude 3.5 Sonnet model as explained in this article.

So, let's begin without ado.

Importing Required Libraries

We will call the Llama 3.2 Vision Instruct model using the Hugging Face Inference API. To access the API, you need to install the following library.

pip install huggingface_hub==0.24.7

The following script imports the required libraries into your Python application.

import os
import base64
from IPython.display import display, HTML
from IPython.display import Image
from huggingface_hub import InferenceClient
import requests
from PIL import Image
from io import BytesIO
import matplotlib.pyplot as plt

A Basic Image Analysis Example with Llama 3.2 Vision Instruct Model

Let's first see how to analyze an image using the Llama 3.2 vision instruct model using the Hugging Face Inference API.

We will analyze the following image.

image_url = r"https://healthier.stanfordchildrens.org/wp-content/uploads/2021/04/Child-climbing-window-scaled.jpg"
Image(url=image_url, width=600, height=600)

Output:

To analyze an image using the Hugging Face Inference, you must first create an object of the InferenceClient class from the huggingface_hub module. You must pass your Hugging Face access token …

Computer Science artificial-intelligence-llm python

Lihui Zhang commented: No understand but impressive... I just learned LeNet-5 +0

usmanmalik57 12 Junior Poster in Training

7 Months Ago

RAG with LangChain and Hugging Face Serverless Inference API

This article explains how to create a retrieval augmented generation (RAG) chatbot in LangChain using open-source models from Hugging Face serverless inference API.

You will see how to call large language models (LLMs) and embedding models from Hugging Face serverless inference API using LangChain. You will also see how to employ these LLMs and embedding models to create LangChain chatbots with and without memory.

So, let's begin without ado.

Installing and Import Required Libraries

We will first install the Python libraries required to run codes in this article.


!pip install langchain
!pip install langchain_community
!pip install pypdf
!pip install faiss-gpu
!pip install langchain-huggingface
!pip install --upgrade --quiet huggingface_hub

The script below imports the required libraries into your Python application.

Since we will be accessing the Hugging Face serverless inference API, you must obtain your access token from Hugging Face.

Note: The codes in this article are run with Google Colab, so I used the user data.get() method to access environment variables. You must use the method that is appropriate for your environment.

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from langchain_community.embeddings import (
    HuggingFaceInferenceAPIEmbeddings,
)
from langchain_community.vectorstores import FAISS
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_core.documents import Document
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain.memory import ChatMessageHistory

import os
from google.colab import userdata

hf_token = userdata.get('HF_API_TOKEN')

…

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

7 Months Ago

Qwen vs Llama - Who is winning the Open Source LLM Race

Open-source LLMS, owing to their comparable performance with advanced proprietary LLMs, have been gaining immense popularity lately. Open-source LLMs are free to use, and you can easily modify their source code or fine-tune them on your systems.

Alibaba's Qwen and Meta's Llama series of models are two major players in the open source LLM arena. In this article, we will compare the performance of Qwen 2.5-72b and Llama 3.1-70b models for zero-shot text classification and summarization.

By the end of this article, you will have a rough idea of which model to use for your NLP tasks.

So, lets begin without ado.

Installing and Importing Required Libraries

We will call the Hugging Face inference API to access the Qwen and LLama models. In addition, we will need the rouge-score library to calculate ROUGE scores for text summarization tasks. The script below installs the required libraries for this article.


!pip install huggingface_hub==0.24.7
!pip install rouge-score
!pip install --upgrade openpyxl
!pip install pandas openpyxl

The script below installs the required libraries.


from huggingface_hub import InferenceClient
import os
import pandas as pd
from rouge_score import rouge_scorer
from sklearn.metrics import accuracy_score
from collections import defaultdict

Calling Qwen 2.5 and Llama 3.1 Using Hugging Face Inference API

To access models via the Hugging Face inference API, you will need your Hugging Face User Access tokens.

Next, create a client object for the corresponding model using the InferenceClient class from the huggingface_hub library.
You …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

7 Months Ago

Fine-tuning OpenAI Vision Models for Visual Question-Answering

In my previous article, I explained how to fine-tune OpenAI GPT-4o model for natural language processing tasks.

In OpenAI DevDay, held on October 1, 2024, OpenAI announced that users can now fine-tune OpenAI vision and multimodal models such as GPT-4o and GPT-4o mini. The best part is that fine-tuning vision models are free until October 31.

In this article, you will see an example of how to vision fine-tune the GPT-4o model on your custom visual question-answering dataset. So, let's begin without ado.

Importing and Installing Required Libraries

You will need to install the OpenAI Python library.

!pip install openai

In this article, we will be using the following Python libraries. Run the following script to import them into your Python application.


from openai import OpenAI
import pandas as pd
import json
import os
from sklearn.utils import shuffle
from sklearn.metrics import accuracy_score

Importing and Preprocessing the Dataset

We will fine-tune the GPT-4o model on a visual question-answering dataset you can download from Kaggle.

The following script imports the CSV file containing the question, the image ID, and the corresponding answer to the question.

#Data download link
#https://www.kaggle.com/datasets/bhavikardeshna/visual-question-answering-computer-vision-nlp

dataset = pd.read_csv(r"D:\Datasets\dataset\data_train.csv")
dataset.head()

Output:

Here is the image with the id image100. You can see cups on the shelves.

For vision fine-tuning, you must pass image URLs to the OpenAI API. Hence, we will upload our images to a cloud service (Github for this article). The dataset consists of over 1500 …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

7 Months Ago

Image Generation with State of the Art Flux Diffusion Models

In one of my previous articles, I explained how to generate stunning images for free using diffusion models and showed how to generate Stability AI's diffusion models for text-to-image generation.

Since then, the AI domain has progressed considerably, particularly in image generation. Black Forest Labs has released Flux.1 series of state-of-the-art vision models.

In this article, you will see how to use Flux.1 models for text-to-image generation and text-to-image modification. You will import Flux models from Hugging Face and generate images using Python code.

So, let's begin without ado.

Installing and Importing Required Libraries

Flux models are gated on Hugging Face, meaning you have to log into your account to access Flux models. To do so from a Python application, particularly Jupyter Notebook, you need to download the huggingface_hub module. In addition, you need to download the diffusers module from Hugging Face.

The script below downloads these two modules.


!pip install huggingface_hub
!pip install git+https://github.com/huggingface/diffusers.git

Note: To run scripts in this article, you will need Nvidia GPUs. You can use Google Colab, which provides free Nvidia GPUs.

Next, let's import the required libraries into our Python application:


from huggingface_hub import notebook_login
import torch
import matplotlib.pyplot as plt
from diffusers import FluxPipeline
from diffusers import FluxImg2ImgPipeline
from diffusers.utils import load_image

notebook_login() # you need to log into your hugging face account using access token

Text to Image Generation with Flux

Flux models have two variants: timestep-distilled (FLUX.1-schnell) and guidance-distilled (FLUX.1-dev). The timestep-distilled …

1280px-Taj_Mahal,_Agra,_India_edit3.jpg 226.49 KB

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

8 Months Ago

Text Classification and Summarization with Qwen 2.5 Model From Hugging Face

On September 19, 2024, Alibaba released the Qwen 2.5 series of models. The Qwen 2.5-72B base and instruct models outperformed larger state-of-the-art models like Llama 3.1-405B on multiple benchmarks. It is safe to assume that Qwen 2.5-72B is a state-of-the-art open-source large language model.

This article will show you how to use Qwen 2.5 models in your Python applications. You will see how to import Qwen 2.5 models from the Hugging Face library and generate responses. You will also see how to perform text classification and summarization tasks on custom datasets using the Qwen 2.5-7B. So, let's begin without ado.

Note: If you have a GPU with larger memory, you can also try Qwen 2.5-7B using the scripts provided in this code.

Installing and Importing Required Libraries

You can run the scripts in this article on Google Colab. In this case, you only need to install the following libraries.


!pip install rouge-score
!pip install --upgrade openpyxl
!pip install pandas openpyxl

The following script imports the libraries you need to run scripts in this article.

from transformers import AutoModelForCausalLM, AutoTokenizer
import pandas as pd
from sklearn.metrics import accuracy_score
from rouge_score import rouge_scorer

A Basic Example of Using Qwen 2.5 Instruct Model in Hugging Face

Before moving to text classification and summarization on custom datasets, let's first see how to generate a single response from the Qwen 2.5-7B model.

Importing the Model and Tokenizer from Hugging Face

The first step is to import the model weights and …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

8 Months Ago

Text and Image to Video Generation using Diffusion Models in Hugging Face

The AI wave has introduced a myriad of exciting applications. While text generation and natural language processing are leading the AI revolution, image, and vision-based technologies are quickly catching up. The intersection of text and vision applications has seen a rapid surge recently.

In this article, you'll learn how to generate videos using text and image inputs. We'll leverage open-source models from Hugging Face to bring these applications to life. So, without further ado, let's dive in!

Installing and Importing Required Libraries

We will use the Hugging Face diffusion models to generate videos from text and images. The following script installs the libraries you will need to import these models from Hugging Face.

!pip install --upgrade transformers accelerate diffusers imageio-ffmpeg

For text-to-video generation, we will use the CogVideoX-2b diffusion model. For image-to-video generation, we will use the Stability AI's img2vid model.

The following script imports the Hugging Face pipelines for the two models. We also import some utility classes to save videos and display images.


import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video

from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image

Text to Video Generation with Hugging Face Diffusers

The first step is to create a Hugging Face pipeline that can access the CogVideoX-2b model. You can also use the CogVideoX-5b model, but it requires more space and memory.

The following script creates the pipeline for the CogVideoX-2b model. We also call some utility methods such as enable_model_cpu_offload(), enable_sequential_cpu_offload(), …

input-image-for-video-generation.png 518.91 KB

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

8 Months Ago

Extracting Structured Outputs from LLMs in LangChain

Large language models (LLMS) are trained to predict the next token (set of characters) following an input sequence of tokens. This makes LLMs suitable for unstructured textual responses.

However, we often need to extract structured information from unstructured text. With the Python LangChain module, you can extract structured information in a Python Pydantic object.

In this article, you will see how to extract structured information from news articles. You will extract the article's tone, type, country, title, and conclusion. You will also see how to extract structured information from single and multiple text documents.

So, let's begin without ado.

Installing and Importing Required Libraries

As always, we will first install and import the required libraries.
The script below installs the LangChain and LangChain OpenAI libraries. We will extract structured data from the news articles using the OpenAI GPT-4 latest LLM.


!pip install -U langchain
!pip install -qU langchain-openai

Next, we will import the required libraries in a Python application.


import pandas as pd
import os
from typing import List, Optional
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import OpenAI
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

Importing the Dataset

We will extract structured information from the articles in the News Article with Summary dataset.

The following script imports the data into a Pandas DataFrame.


dataset = pd.read_excel(r"D:\Datasets\dataset.xlsx")
dataset.head(10)

Output:

Defining the Structured Output Format

To extract structured output, we need to define the attributes of …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

8 Months Ago

Enhancing RAG Functionalities using Tools and Agents in LangChain

Retrieval augmented generation (RAG) allows large language models (LLMs) to answer queries related to the data the models have not seen during training. In my previous article, I explained how to develop RAG systems using the Claude 3.5 Sonnet model.

However, RAG systems only answer queries about the data stored in the vector database. For example, you have a RAG system that answers queries related to financial documents in your database. If you ask it to search the internet for some information, it will not be able to do so.

This is where tools and agents come into play. Tools and agents enable LLMs to retrieve information from various external sources such as the internet, Wikipedia, YouTube, or virtually any Python method implemented as a tool in LangChain.

This article will show you how to enhance the functionalities of your RAG systems using tools and agents in the Python LangChain framework.

So, let's begin without an ado.

Installing and Importing Required Libraries

The following script installs the required libraries, including the Python LangChain framework and its associated modules and the OpenAI client.



!pip install -U langchain
!pip install langchain-core
!pip install langchainhub
!pip install -qU langchain-openai
!pip install pypdf
!pip install faiss-cpu
!pip install --upgrade --quiet  wikipedia
Requirement already satisfied: langchain in c:\us

The script below imports the required libraries into your Python application.


from langchain.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
from langchain.tools.retriever import create_retriever_tool
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_openai import ChatOpenAI, …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

8 Months Ago

How to Fine-tune the OpenAI GPT-4o Model - The Wait is Finally Over

On August 20, 2024, OpenAI enabled GPT-4o fine-tuning in the OpenAI playground and the OpenAI API. The much-awaited feature is free for fine-tuning 1 million daily tokens until September 23, 2024.

In this article, I will show you how to fine-tune the OpenAI GPT-4o model for text classification and summarization tasks.

It is important to note that in my previous articles I have already demonstrated results obtained for zero-shot text classification and zero-shot text summarization using default GPT-4o model. In this article, you will see that fine-tuning a GPT-4o model improves text classification and text summarization performance significantly.

So, let's begin without an ado.

Installing and Importing Required Libraries

The following script installs the Python libraries you need to run codes in this article.


!pip install openai
!pip install rouge-score
!pip install --upgrade openpyxl
!pip install pandas openpyxl

The script below imports the required libraries into your Python application.


import os
import json
import time
import pandas as pd
from rouge_score import rouge_scorer
from sklearn.metrics import accuracy_score
from openai import OpenAI

Fine-tuning GPT-4o for Text Classification

In a previous article, I explained the process of fine-tuning GPT-4o mini and GPT-3.5 turbo models for zero-shot text classification.

The process remains the same for fine-tuning GPT-4o.
We will first import the text classification dataset, which in this article is the Twitter US Airline Sentiment Dataset.

The following script imports the dataset.


dataset = pd.read_csv(r"D:\Datasets\Tweets.csv")
dataset.head()

Output:

Next, …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

9 Months Ago

GPT-4o Snapshot vs Meta Llama 3.1 70b for Zero-Shot Text Summarization

In a previous article, I compared GPT-4o mini vs. GPT-4o and GPT-3.5 Turbo for zero-shot text summarization. The results showed that the GPT-4o mini achieves almost similar performance for zero-shot text classification at a much-reduced price compared to the other models.

I will compare Meta Llama 3.1 70b with OpenAI GPT-4o snapshot for zero-shot text summarization in this article. Meta Llama 3.1 series consists of Meta's state-of-the-art LLMs, including Llama 3.1 8b, Llama 3.1 70b, and Llama 3.1 405b. On the other hand, [OpenAI GPT-4o[(https://platform.openai.com/docs/models)] snapshot is OpenAIs latest LLM. We will use the Groq API to access Meta Llama 3.1 70b and the OpenAI API to access GPT-4o snapshot model.

So, let's begin without ado.

Installing and Importing Required Libraries

The following script installs the Python libraries you will need to run scripts in this article.


!pip install openai
!pip install groq
!pip install rouge-score
!pip install --upgrade openpyxl
!pip install pandas openpyxl

The script below installs the required libraries into your Python application.


import os
import time
import pandas as pd
from rouge_score import rouge_scorer
from openai import OpenAI
from groq import Groq

Importing the Dataset

This article will summarize the text in the News Articles with Summary dataset. The dataset consists of article content and human-generated summaries.

The following script imports the CSV dataset file into a Pandas DataFrame.


# Kaggle dataset download link
# https://github.com/reddzzz/DataScience_FP/blob/main/dataset.xlsx


dataset = pd.read_excel(r"D:\Datasets\dataset.xlsx")
dataset = dataset.sample(frac=1)
dataset['summary_length'] = dataset['human_summary'].apply(len)
average_length …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

9 Months Ago

Comparison of Fine-tuning GPT-4o mini vs GPT-3.5 for Text Classification

In my previous articles, I presented a comparison of OpenAI GPT-4o mini model with GPT-4o and GPT-3.5 turbo models for zero-shot text classification. The results showed that GPT-4o mini, while significantly cheaper than its counterparts, achieves comparable performance.

On 8 August 2024, OpenAI enabled GPT-4o mini fine-tuning for developers across usage tiers 1-5. You can now fine-tune GPT-4o mini for free until 23 September 2024, with a daily token limit of 2 million.

In this article, I will show you how to fine-tune the GPT-4o mini for text classification tasks and compare it to the fine-tuned GPT-3.5 turbo.

So, let's begin without ado.

Importing and Installing Required Libraries

The following script installs the OpenAI Python library you can use to make calls to the OpenAI API.


!pip install openai

The script below imports the required liberaries into your Python application.


from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from openai import OpenAI
import pandas as pd
import os
import json

Importing the Dataset

We will use the Twitter US Airline Sentiment dataset for fine-tuning the GPT-4o mini and GPT-3.5 turbo models.

The following script imports the dataset and defines the preprocess_data() function. This function takes in a dataset and an index value as inputs. It then divides the dataset by sentiment category, returning 34, 33, and 33 tweets from each category, beginning at the specified index. This approach ensures we have around 100 balanced records. You can use more number of records for …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

9 Months Ago

GPT-4o mini vs. GPT-4o vs GPT-3.5 Turbo for Text Summarization

In my previous article on GPT-4o mini, I compared the performance of GPT-4o mini against GPT-3.5 Turbo and GPT-4o for zero-shot text classification. We saw that GPT-4o mini, being 36% times cheaper, achieves only 2% less accuracy than GPT-4o. Furthermore, while being 1/3 of the price, the GPT-4o mini significantly outperformed the GPT-3.5 turbo model.

This article will compare GPT-4o mini, GPT-4o, and GPT-3.5 turbo for zero-shot text summarization. We will evaluate the models' text summarization capabilities using metrics such as ROUGE scores and LLM-based evaluation.

So, let's begin without ado.

Importing and Installing Required Libraries

You must install the following Python libraries to run code in this article.


!pip install openai
!pip install rouge-score
!pip install --upgrade openpyxl
!pip install pandas openpyxl

The script below imports the required libraries.


import os
import time
import pandas as pd
from rouge_score import rouge_scorer
from openai import OpenAI

Importing the Dataset

We will summarize the articles in the News Articles with Summary dataset. The dataset consists of article content and human-generated summaries.

The following script imports the CSV dataset file into a Pandas DataFrame.



# Kaggle dataset download link
# https://github.com/reddzzz/DataScience_FP/blob/main/dataset.xlsx


dataset = pd.read_excel(r"D:\Datasets\dataset.xlsx")
dataset = dataset.sample(frac=1)
print(dataset.shape)
dataset.head()

Output:

The content column contains the article content, while the human_summary column contains human-generated summaries of the article.

Next, we will find the average number of characters in all summaries. We will use this number to summarize articles using LLMs.


dataset['summary_length'] …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

10 Months Ago

GPT-4o mini - A Cheaper and Faster Alternative to GPT-4o

On July 18th, 2024, OpenAI released GPT-4o mini, their most cost-efficient small model. GPT-4o mini is around 60% cheaper than GPT-3.5 Turbo and around 97% cheaper than GPT-4o. As per OpenAI, GPT-4o mini outperforms GPT-3.5 Turbo on almost all benchmarks while being cheaper.

In this article, we will compare the cost, performance, and latency of GPT-4o mini with GPT-3.5 turbo and GPT-4o. We will perform a zero-shot tweet sentiment classification task to compare the models. By the end of this article, you will find out which of the three models is better for your use cases. So, let's begin without ado.

Importing and Installing Required Libraries

As a first step, we will install and import the required libraries.

Run the following script to install the OpenAI library.


!pip install openai

The following script imports the required libraries into your application.


import os
import time
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from openai import OpenAI

Importing and Preprocessing the Dataset

To compare the models, we will perform zero-shot classification on the Twitter US Airline Sentiment dataset, which you can download from kaggle.

The following script imports the dataset from a CSV file into a Pandas dataframe.


## Dataset download link
## https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment?select=Tweets.csv

dataset = pd.read_csv(r"D:\Datasets\tweets.csv")
print(dataset.shape)
dataset.head()

Output:

The dataset contains more than 14 thousand records. However, we will randomly select 100 records. Of these, 34, 33, and 33 will have neutral, positive, …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

10 Months Ago

Image Analysis Using Claude 3.5 Sonnet Model

In my article on Image Analysis Using OpenAI GPT-4o Model, I explained how GPT-4o model allows you to analyze images and answer questions related images precisely.

In this article, I will show you how to analyze images with the Anthropic Claude 3.5 Sonnet model, which has shown state-of-the-art performance for many text and vision problems. I will also share my insights on how Claude 3.5 Sonnet compares with GPT-4o for image analysis tasks. So, let's begin without ado.

Importing Required Libraries

You will need to install the anthropic Python library to access the Claude 3.5 Sonnet model in this article. In addition, you will need the Anthropic API key, which you can obtain here.

The following script installs the Anthropic Python library.


!pip install anthropic

The script below imports all the Python modules you will need to run scripts in this article.


import os
import base64
from IPython.display import display, HTML
from IPython.display import Image
from anthropic import Anthropic

General Image Analysis

Let's first perform a general image analysis. We will analyze the following image and ask Claude 3.5 Sonnet if it shows any potentially dangerous situation.


# image source: https://healthier.stanfordchildrens.org/wp-content/uploads/2021/04/Child-climbing-window-scaled.jpg

image_path = r"D:\Datasets\sofa_kid.jpg"
img = Image(filename=image_path, width=600, height=600)
img

Output:

Note: For comparison, the images we will analyze in this article are the same as those we analyzed with GPT-4o.

Next, we will define a method that converts an image into Base64 format. The Claude 3.5 …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

10 Months Ago

Extracting YouTube Channel Statistics in Python Using YouTube Data API

Are you interested in finding out what a YouTube channel mostly discusses? Do you want to analyze YouTube videos of a specific channel? If yes, we are in the same boat.

YouTube video titles are a great way to determine the channel's primary focus. Plotting a word cloud or a bar plot of the most frequently occurring words in YouTube video titles can give you precise insight into the nature of a YouTube channel. I will do exactly this in this tutorial using the Python programming language.

So, let's begin without ado.

Getting YouTube Data API Key

You can get information about a YouTube channel in Python programming via the YouTube Data API. However, to access the API, you must create a new project in Google Cloud Platform. You can do so for free.

Once you create a new project, click the Go to APIs overview link, as shown in the screenshot below.

Next, click the ENABLE APIS AND SERVICES link.

Search for youtube data api v3.

Click the ENABLE button.

You will need to create credentials. To do so, click the CREDENTIALS link.

If you have any existing credentials, you will see them. To create new credentials, click the + CREATE CREDENTIALS link and select API key.

Your API key will be generated. Copy and save it in a secure place.

Now, you can …

Computer Science data-science python

Salem commented: Interesting post +16

usmanmalik57 12 Junior Poster in Training

10 Months Ago

Retrieval Augmented Generation with Claude 3.5 Sonnet

In my previous article I presented results comparing Anthropic Claude 3.5 Sonnet and OpenAI GPT-4o models for zero-shot text classification. The results showed that the Claude 3.5 Sonnet significantly outperformed GPT-4o.

These results motivated me to develop a simple retrieval augmented generation system with LangChain that enables the Claude 3.5 Sonnet model to answer questions pertaining to custom documents.

By the end of this article, you will know how to develop a chatbot that uses the Claude 3.5 Sonnet LLM to answer questions on custom documents.

So, let's begin without ado.

Installing and Importing Required Libraries

The following script installs the libraries required to run scripts in this article.

!pip install -U langchain
!pip install -U langchain-anthropic
!pip install langchain-openai
!pip install pypdf
!pip install faiss-cpu

Subsequently, the script below imports the required libraries into your Python application.


from langchain_anthropic import ChatAnthropic

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_core.documents import Document
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
import os

Generating Default Response with Claude 3.5 Sonnet

Let's first generate a default response using Claude 3.5 Sonnet LLM in LangChain.

You will need an anthropic API key which you can get here.

Next, create an object of the ChatAnthropic class and pass the anthropic API key, the model ID, …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

10 Months Ago

Comparing GPT-4o vs Claude 3.5 Sonnet for Zero Shot Text Classification

On June 20, 2024, Anthropic released the Claude 3.5 sonnet large language model. Claude claims it to be the state-of-the-art model for many natural language processing tasks, surpassing the OpenAI GPT-4o model.

My first test for comparing two large language models is their zero-shot text classification ability. In this article, I will compare the Antropic Claude 3.5 sonnet with the OpenAI GPT-4o model for zero-shot tweet sentiment classification.

So, let's begin without ado.

Importing and Installing Required Libraries

The following script installs the Anthropic and OpenAI libraries to access the corresponding APIs.


!pip install anthropic
!pip install openai

The script below imports the required libraries into your Python application.


import os
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import anthropic
from openai import OpenAI

Importing and Preprocessing the Dataset

We will use the Twitter US Airline Sentiment dataset to perform zero-shot classification. You can download the dataset from Kaggle.

The following script imports the dataset into a Pandas DataFrame.

## Dataset download link
## https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment?select=Tweets.csv

dataset = pd.read_csv(r"D:\Datasets\tweets.csv")
print(dataset.shape)
dataset.head()

Output:

Tweet sentiment falls into three categories: neutral, positive, and negative. For comparison, we will filter 100 tweets. The neutral, positive, and negative categories will contain 34, 33, and 33 tweets, respectively.

# Remove rows where 'airline_sentiment' or 'text' are NaN
dataset = dataset.dropna(subset=['airline_sentiment', 'text'])

# Remove rows where 'airline_sentiment' or 'text' are empty strings
dataset = dataset[(dataset['airline_sentiment'].str.strip() != '') & (dataset['text'].str.strip() != …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

11 Months Ago

Tabular Data Classification with Hugging Face Meta Tree Transformer

As a data scientist, I have extensively used the Hugging Face library for processing unstructured data such as images, text, and audio. My previous blogs have covered various transformer models for these types of data. Lately, however, I discovered that Hugging Face also provides transformer models for tabular data. One such transformer is the Meta Tree Transformer.

This article will explore using the Meta Tree Transformer model to classify tabular data, detailing each process step and providing insights based on the Bank Note Authentication dataset.

Installing and Importing Required Libraries

You must install and import the following libraries to run the codes in this article.


!pip install metatreelib
!pip install --upgrade scikit-learn
!pip install imodels


from metatree.model_metatree import LlamaForMetaTree as MetaTree
from metatree.decision_tree_class import DecisionTree, DecisionTreeForest
from metatree.run_train import preprocess_dimension_patch
from transformers import AutoConfig
from sklearn.metrics import accuracy_score
import imodels # pip install imodels
import sklearn
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
import torch
from torch.utils.data import Dataset, DataLoader
import random

Loading and Preprocessing the Dataset

The dataset used in this tutorial is the Bank Note Authentication dataset, which you can download from Kaggle. The dataset contains features extracted from images of banknotes and is used to classify whether a banknote is authentic or not.

The dataset consists of the following columns:

variance
skewness
curtosis
entropy
class

The class column is the target variable, indicating whether the banknote is authentic (1) or not (0).

First, we need …

Computer Science machine-learning python

usmanmalik57 12 Junior Poster in Training

11 Months Ago

Comparing Fine-tuned and Default GPT-3.5 Turbo for Text Classification

Comparison Between Fine-tuned and Default GPT-3 Turbo for Text Classification

In one of my previous articles, I showed you how to perform zero-shot text classification using OpenAI GPT-4o and Meta Llama 3 models. I used the default models for predicting sentiments of airline tweets. The default models perform substantially well out of the box. However, you can fine-tune them on your specialized tasks to further enhance their performance.

In this article, I will show you how to fine-tune the OpenAI GPT-3 Turbo model for airline tweet classification. The fine-tuned model's performance will substantially increase compared to the default model. I would have loved to fine-tune GPT-4, but OpenAI currently does not support that.

By the end of this article, you will know how to fine-tune the GPT-3.5 Turbo model for text classification on your custom dataset. So, let's begin without ado.

Importing Required Libraries

The first step is to install the OpenAI Python library. If you haven't already, you must create an OpenAI account to retrieve your OpenAI API Key.

pip install openai

The script below imports the required Python libraries into our application.

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from openai import OpenAI
import pandas as pd
import json

Importing the Dataset

We will fine-tune our model using the Twitter US Airline Sentiment dataset. This will help us compare the performance of the fine-tuned GPT-3.5 Turbo with the default GPT-4o model we used in a previous article.

The …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

11 Months Ago

Image Analysis Using OpenAI GPT-4o Model

OpenAI announced the GPT-4o (omni) model on May 13, 2024. The GPT-4o model, as the name suggests, can process multimodal inputs, such as text, image, and speech. As per OpenAI, GPT-4o is the state-of-the-art and best-performing large language model.

Among GPT-4o's many capabilities, I found its ability to analyze images and answer related questions highly astonishing. In this article, I demonstrate some of the tests I performed for image analysis using OpenAI GPT-4o.

Note: If you are interested in seeing how GPT-4o and Llama 3 compare for zero-shot text classification, check out my previous article.

So, let's begin without further ado.

Importing and Installing Required Libraries

The following script installs the OpenAI python library that you will use to access the OpenAI API.

pip install openai

The script below imports the libraries required to run code in this article.

import os
import base64
from IPython.display import display, HTML
from IPython.display import Image
from openai import OpenAI

General Image Analysis

Let's first take an image and ask some general questions about it. The script below displays the sample image we will use for this example.

# image source: https://healthier.stanfordchildrens.org/wp-content/uploads/2021/04/Child-climbing-window-scaled.jpg

image_path = r"D:\Datasets\sofa_kid.jpg"
img = Image(filename=image_path, width=600, height=600)
img

Next, we define the encode_image64() method that accepts an image path and converts the image into base64 format. OpenAI expects images in this format.

The script below also creates the OpenAI client object we will use to call the OpenAI API.

Output:

Computer Science artificial-intelligence-llm image python

usmanmalik57 12 Junior Poster in Training

11 Months Ago

OpenAI GPT-4o vs Meta Llama 3 for Zero Shot Text Classifiation

On April 18, 2024, Meta AI released Llama 3, which they claimed to be the most capable openly available LLM to date. Concurrently, OpenAI announced GPT-4o (omni) on May 13, 2024, which is touted as the state-of-the-art proprietary model for various NLP benchmarks.

As a guy who loves to compare open-source and proprietary models, I decided to test the performance of both these models on a simple zero-shot text classification task. I present my findings in this article.

Note: Checkout one of my previous articles to see the comparison of GPT-4 vs. Gemini-Pro vs. Claude-3 for zero shot text classification.

So, let’s begin comparing GPT-4o vs Llama 3.

Importing and Installing Required Libraries

The following script installs the libraries required to run the scripts in this article. We will call the GPT-4o model using the official OpenAI API, while the Llama 3 model will use the Groq API. Both require API keys, which you can obtain by signing up for the respective services.

!pip install openai
!pip install groq
!pip install pandas
!pip install scikit-learn

Next, we will import the required libraries.

import os
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from openai import OpenAI
from groq import Groq

Importing and Preprocessing the Data

We will use the same dataset we used to compare GPT-4 vs Claude 3 and Gemini Pro models. You can download the dataset from this Kaggle link. The dataset …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Text to Speech Conversion Using Hugging Face Transformers

Introduction

Text-to-speech (TTS) technology has revolutionized how we interact with devices, making accessing content through auditory means easier. TTS is vital in various applications such as virtual assistants, audiobooks, accessibility tools for the visually impaired, and language learning platforms.

This tutorial will explore how to convert text-to-speech using Hugging Face's MeloTTS transformer, a powerful model designed for high-quality TTS tasks.

We will walk through installing the necessary libraries, creating basic examples, experimenting with different accents and languages, adjusting speech speed, and ultimately, combining these elements into a comprehensive TTS function.

Note: Check out my article on how to generate stunning images from text if you are interested in text-to-image generation.

Installing Required Libraries

To begin, we must clone the MeloTTS repository from GitHub and install the required dependencies. This can be done with the following commands:


!git clone https://github.com/myshell-ai/MeloTTS.git
%cd MeloTTS
!pip install -e .
!python -m unidic download

In the above script, the git clone command fetches the MeloTTS repository, and we navigate into the cloned directory. The pip install -e . command installs the package in editable mode, allowing us to make changes if necessary. Finally, the unidic download command downloads the language dictionary required for text processing.

A Basic Example

Let's create a basic example of converting English text to speech using the MeloTTS model. In the following code, we import the TTS class from the melo.api module and set the speech speed to 1.0.

Notice that we pass the language …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Generate Stunning AI Images for Free Using Diffusion Models

In this tutorial, you will see how to generate stunning AI-generated images from text inputs using state-of-the-art diffusion models from Hugging Face. You'll learn about base diffusion models and how combining them with a refiner creates even more detailed, refined results. Diffusion models are powerful because they iteratively refine an image starting from pure noise.

Advanced generative AI tools like Midjourney and OpenAI DALL·E 3 use diffusion models to generate photo-realistic AI images. However, these models charge fees to generate AI images. With diffusion models from Hugging Face, you can generate AI images for free. So, let's dive in!

Installing Required Libraries

To begin, let's install the libraries necessary for this project. Execute the following commands to get all dependencies ready:

!pip install diffusers --upgrade
!pip install invisible_watermark transformers accelerate safetensors

Generating AI Images Using Base Diffusion Models

Most state-of-the-art text-to-image diffusion models consist of a base model and a refiner. We'll first generate an image using the base diffusion model. We will use the stabilityai/stable-diffusion-xl-base-1.0 (SDXL) model for image generation. SDXL employs an ensemble of expert models for latent diffusion. Initially, the base model generates (noisy) latent images, which are then refined by a specialized model during the final denoising stages. You can use any other text-to-image diffusions from Hugging Face.

The following Python script initializes a Hugging Face pipeline for the diffusion model and sets it up for GPU acceleration.


from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16,
                                         use_safetensors=True,
                                         variant="fp16") …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Summarizing YouTube Video Transcriptions Using Distil Whisper and LLM

In this tutorial, you will see how to summarize YouTube video transcriptions using Distil Whisper large V3 and Mistral-7b-Instruct. Both Distill Whisper Large V3 and Mistral-7B-Instruct models are open-source and free-to-use models.

The Distil Whisper large V3 model is a faster and smaller variant of the Whisper large V3 model, a state-of-the-art speech-to-text model. You will use this model to transcribe YouTube audio. Next, you will use the Mistral-7b-Instruct LLM to summarize the transcriptions. In the process, you will learn to extract audio from YouTube videos. We have many interesting things to see, so let's begin without ado.

Importing and Installing Required Libraries

As always, the first step is to install and import the required libraries. The following script installs libraries required to run codes in this tutorial.


!pip install -q -U transformers==4.38.0
!pip install -q -U bitsandbytes==0.42.0
!pip install -q -U accelerate==0.27.1
!pip install -q datasets
!pip install -q pytube

The script below imports the required libraries.


import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, logging
from datasets import load_dataset
from pytube import YouTube
from transformers import BitsAndBytesConfig

Extracting Audios from YouTube Videos

We will begin by extracting audio from the YouTube video we want to transcribe.
You can use the YouTube class from the pytube module, as shown in the following script.

youtube_video_url = "https://www.youtube.com/watch?v=5sLYAQS9sWQ"
youtube_video_content = YouTube(youtube_video_url)

The streams attribute of the YouTube class object returns various audio and …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Retrieval Augmented Generation with Hugging Face Models in LangChain

In my previous articles, I explained how to develop customized chatbots using Retrieval Augmented Generation (RAG) approach in LangChain. However, I used proprietary models such as OpenAI, which can be expensive when you try to scale.

In this article, I will show you how to use the open-source and free-of-cost models from Hugging Face to develop chatbot applications in LangChain. By the end of this tutorial, you will be able to import any Hugging Face Large Language Model (LLM) and embedding model in LangChain and develop your customized chatbot applications.

Importing and Installing Required Libraries

First, install and import the libraries and modules you will need to run codes in this tutorial.

The codes in this tutorial are run on Google Colab, where some of the libraries are preinstalled. You can install the rest of the libraries via the following pip command.


!pip install -q -U transformers==4.38.0
!pip install -q -U sentence-transformers
!pip install -q -U faiss-cpu
!pip install -q -U bitsandbytes==0.42.0
!pip install -q -U accelerate==0.27.1
!pip install -q -U huggingface_hub
!pip install -q -U langchain
!pip install -q -U pypdf

The script below imports the required libraries in your application.


from transformers import AutoModelForCausalLM, AutoTokenizer, logging, pipeline
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import PromptTemplate
from langchain.vectorstores import FAISS
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate
from sentence_transformers import SentenceTransformer
from transformers import BitsAndBytesConfig …

Computer Science artificial-intelligence-llm

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Question Answering with YouTube Videos Using RAG in LangChain

In previous articles, I explained how to use natural language to interact with PDF documents and SQL databases, using the Python LangChain module and OpenAI API.

In this article, you will learn how to use LangChain and OpenAI API to create a question-answering application that allows you to retrieve information from YouTube videos. So, let's begin without ado.

Importing and Installing Required Libraries

Before diving into the code, let's set up our environment with the necessary libraries.

We will use the Langchain module to access the vector databases and execute queries on large language models to retrieve information about YouTube videos. We will also employ the YouTube Transcript API for fetching video transcripts, the Pytube library for downloading YouTube videos, and the FAISS vector index for efficient similarity search in large datasets.

The following script installs these modules and libraries.


!pip install -qU langchain
!pip install -qU langchain-community
!pip install -qU langchain-openai
!pip install -qU youtube-transcript-api
!pip install -qU pytube
!pip install -qU faiss-cpu

The script below imports the required libraries into our Python application.


from langchain_community.document_loaders import YoutubeLoader
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_core.documents import Document
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
import os

Creating Text Documents from YouTube Videos

The first step involves converting …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Using Natural Language to Query SQL Databases with Python LangChain Module

The advent of large language models (LLM) has replaced complex scripts with natural language for automating various tasks. You can now use LLM to interact with your databases using natural language, which makes life easier for people who do not have sufficient SQL knowledge.

In this article, you will learn how to retrieve information from SQL databases using natural language. For this purpose, you will use the Python LangChain library. The LangChain agents convert natural language questions into SQL queries and return the response in natural language.

Using natural language queries, you will learn how to interact with PostgreSQL, MySQL, and SQLite databases. You will retrieve information from the sample Northwind database. You can download the Northwind database samples for PostgreSQL, MySQL, and SQLite from Github. This article assumes you imported the Northwind database into the corresponding servers.

So, let's begin with ado.

Installing and Importing Required Libraries

To connect your Python application with PostgreSQL and MySQL, you must install the PostGreSQL and MySQL connectors. Execute the following script to download these connectors.

# connector for PostgreSQL
!pip install psycopg2

# connector for MySQL
!pip install mysql-connector-python

Defining the LLM and Agent

As previously said, I will use LangChain agents to execute natural language queries on various databases. To do so, we need a large language model (LLM) and database objects.

The following script imports the GPT-4 LLM via LangChain.


openai_key = os.environ.get('OPENAI_KEY2')

llm = ChatOpenAI(
    openai_api_key = openai_key ,
    model = 'gpt-4',
    temperature = 0.5 …

Computer Science artificial-intelligence-llm python sql

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Paris Olympics Ticket Information Chatbot with Memory Using LangChain

In my previous article, I explained how I developed a simple chatbot using LangChain and Chat-GPT that can answer queries related to Paris Olympics ticket prices.

However, one major drawback with that chatbot is that it can only generate a single response based on user queries. It can not answer follow-up questions. In short, the chatbot has no memory where it can store previous conversations and answer questions based on the information in the past conversation.

In this article, I will explain how to add memory to this chatbot and execute conversations where the chatbot can respond to queries considering the past conversation.

So, let's begin without further ado.

Installing and Importing Required Libraries

The following script installs the required libraries for this article.

!pip install -U langchain
!pip install langchain-openai
!pip install pypdf
!pip install faiss-cpu

The script below imports required libraries.


from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_core.documents import Document
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
import os

Paris Olympics Chatbot for Generating a Single Response

Let me briefly review how we developed a chatbot capable of generating a single response and its associated problems.

The following script creates an object of the ChatOpenAI llm with the GPT-4 model, a model that powers Chat-GPT.

openai_key = os.environ.get('OPENAI_KEY2') …

Computer Science artificial-intelligence-llm python

Prosigns commented: Thank you for sharing this. I have gone through your post and discussions. It helped me in my work I was stuck at. I am prosignshouston here +0

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Paris Olympics Chatbot- Get Ticket Information Using Chat-GPT and LangChain

I was searching for Paris Olympics ticket prices for tennis games recently. The official website directs you to a PDF document containing ticket prices and venues for all the games. However, I found the PDF document to be very hard to navigate. To make things easier, I developed a chatbot to search this PDF document and answer my queries in natural language. And this is what I am going to share in this article.

I used the OpenAI API to create document embeddings (convert documents to numeric values) and the Python LangChain library as the orchestration framework to develop this chatbot.

So, let's begin without ado.

Installing and Importing Required Libraries

The following script installs the libraries required to run scripts in this article.


!pip install -U langchain
!pip install langchain-openai
!pip install pypdf
!pip install faiss-cpu

The script below imports required libraries.


from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_core.documents import Document
import os

Generate Default Responses from Chat-GPT

Let's first generate some responses from Chat-GPT without augmenting its knowledge base with information about the Paris Olympics ticket price.

In a Python application, you will use the OpenAI API key to generate Chat-GPT responses. You can retrieve your API key by signing up for OpenAI API.

You can save your API …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Claude 3 Opus Vs. Google Gemini Vs. GPT-4 for Zero-Shot Text Classification

On March 4, 2024, Anthropic launched the Claude 3 family of large language models. Anthropic claimed that its Claude 3 Opus model outperforms GPT-4 on various benchmarks.

Intrigued by Anthropic's claim, I performed a simple test to compare the performances of Claude 3 Opus, Google Gemini Pro, and OpenAI's GPT-4 for zero-shot text classification. This article explains the experiment and the results obtained, along with my personal observations.

Note: I have already compared the performance of Google Gemini Pro and Chat-GPT on another dataset, in one of my previous articles. This article adds Claude 3 Opus to the list of compared models. In addition, the tests are performed on a significantly more difficult dataset.

So, let's begin without an ado.

Importing and Installing Required Libraries

The following script installs the corresponding APIs for importing Claude 3 Opus, Google Gemini Pro, and OpenAI GPT-4 models.


!pip install anthropic
!pip install --upgrade google-cloud-aiplatform
!pip install openai

The script below imports the required libraries.


import os
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

import anthropic
from openai import OpenAI
import vertexai
from vertexai.preview.generative_models import GenerativeModel, Part

Importing and Preprocessing the Dataset

We will use LLMs to make zero-shot predictions on the US Airline Sentiment dataset, which you can download from Kaggle.

The dataset consists of tweets regarding various US airlines. The tweets are manually annotated for positive, negative, or neutral sentiments. The text column contains the tweet …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

7 NLP Tasks to Perform for Free in Python with Mistral 7b LLM

In the rapidly evolving field of Natural Language Processing (NLP), open-source large language models (LLMs) are becoming increasingly popular as they are free to use. Among these, the Mistral family of models stands out as a state-of-the-art model that is freely accessible to the public.

Comparable in performance to the renowned GPT 3.5, Mistral 7b enables users to perform various NLP tasks, such as text generation, text classification, and more, without any cost.

While GPT 3.5 can be used for free in a browser, utilizing its functions in a Python application via OpenAI API incurs charges. This is where open-source Large Language Models (LLMs) like Mistral 7b become game-changers.

This article will explore leveraging the Mistral 7b Instruct model (seven billion parameters) to execute seven common NLP tasks within your Python applications using the HuggingFace library. So, let’s dive in without further ado.

Importing and Installing Required Libraries

The following script installs the libraries required to run scripts in this article.


!pip install git+https://github.com/huggingface/transformers
!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U accelerate==0.27.1

Since I am using Google Colab to run the scripts in this article, the rest of the libraries are pre-installed in the environment.

The following script imports the required libraries.


from transformers import AutoModelForCausalLM, AutoTokenizer, logging
from transformers import BitsAndBytesConfig
import torch

Importing and Configuring the Mistral 7b Instruct Model

Mistral 7b is a large model with seven billion parameters. We will quantize it by reducing its weight precisions to four …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Retrieval Augmented Generation (RAG) with Google Gemma From HuggingFace

In a previous article, I explained how to fine-tune Google's Gemma model for text classification. In this article, I will explain how you can improve performance of a pretrained large language model (LLM) using retrieval augmented generation (RAG) technique. So, let's begin without ado.

What is Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) enhances a language model's knowledge by integrating external information into the response generation process. By dynamically pulling relevant information from a vast corpus of data, RAG enables models to produce more informed, accurate, and contextually rich responses, bridging the gap between raw computational power and real-world knowledge.

RAG works in the following four steps:

Store data containing external knowledge into a vector database.
Convert the input query into corresponding vector embeddings and retrieve the text from the database having the highest similarity with the input query.
Formulate the query and the information retrieved from the vector database.
Pass the formulated query to an LLM and generate a response.

You will see how to perform the above steps in this tutorial.

RAG with Google Gemma from HuggingFace

We will first import the required libraries and then import our dataset from Kaggle. The dataset consists of Warren Buffet letters to investors from 1977 to 2021.

Next, we will split our dataset into chunks using the Pythhon LangChain module. Subsequently, we will import an embedding model from HuggingFace and create a dataset containing vector embeddings for the text chunks.

After that, we will …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Fine Tuning Google Gemma Model for Text Classification in Python

On February 21, 2024, Google released Gemma, a family of state-of-the-art open-source large language models (LLMs). As per initial results, its 7b (seven billion parameter) version is known to perform better than Meta's Llama 2, the previous state-of-the-art open-source LLM.

As always, my first test with any new open-source LLM is the text classification task. In this tutorial, I will show you how you can fine-tune the Google Gemma LLM for text classification tasks in Python. So, let's begin without ado.

Installing and Importing Required Libraries

The following script installs libraries required to run scripts in this article.

!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.0
!pip3 install -q -U datasets
!pip install huggingface-hub

The script below imports the required libraries into your Python application.


import os
import transformers
import torch
from google.colab import userdata
from datasets import load_dataset
from trl import SFTTrainer
from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig, GemmaTokenizer
import pandas as pd
from datasets import Dataset

Finally, you must run the following script and enter your Hugging Face user access token.

!huggingface-cli login

Google Gemma is a new model, and you must agree to its terms of use before importing it from Hugging Face. You can agree to its terms of use on the Hugging Face Gemma model card.

Testing Google Gemma …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Extract Tabular Data from PDF Images using Hugging Face Table Transformer

In a previous article, I explained how to extract tabular data from PDF image documents using Multimodal Google Gemini Pro. However, there are a couple of disadvantages with Google Gemini Pro. First, Google Gemini Pro is not free, and second, it needs complex prompt engineering to retrieve table, columns, and row pixel coordinates.

To solve the problems above, in this article, you will see how to extract tables from PDF image documents using Microsoft's Table Transformer from the Hugging Face library. You will see how to detect tables, rows, and columns within a table, extract cell values from tables using an OCR, and save the table as CSV. So, let's begin without ado.

Installing and Importing Required Libraries

The first step is to install various libraries you will need to run scripts in this article.

!pip install transformers
!sudo apt install tesseract-ocr
!pip install pytesseract
!pip install easyocr
!sudo apt-get install -y poppler-utils
!pip install pdf2image
!wget "https://fonts.google.com/download?family=Roboto" -O roboto.zip
!unzip roboto.zip -d ./roboto

The following script imports the required libraries into your application.


from transformers import AutoImageProcessor, TableTransformerForObjectDetection
import torch
from PIL import Image, ImageDraw, ImageFont
import matplotlib.pyplot as plt
import csv
import numpy as np
import pandas as pd
from pdf2image import convert_from_path
from tqdm.auto import tqdm
import pytesseract
import easyocr

Table Detection with Table Transformer

The Table Transformer has two sub-models: table-transformer-detection, and table-structure-recognition-v1.1-all model. As a first step, we will detect tables within a PDF document using the table-transformer-detection model.

Importing …

Computer Science artificial-intelligence-llm pdf python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

PDF Image Table Extractor Web App with Google Gemini Pro and Streamlit

In my previous article, I explained how to convert PDF image to CSV using Multimodal Google Gemini Pro. To do so, I wrote a Python script that passes text command to Google Gemino Pro for extracting tables from PDF images and storing them in a CSV file.

In this article, I will build upon that script and develop a web application that allows users to upload images and submit text queries via a web browser to extract tables from PDF images. We will use the Python Streamlit library to develop web data applications.

So, let's begin without ado.

Installing Required Libraries

You must install the google-cloud-aiplatform library to access the Google Gemini Pro model. For Streamlit data application, you will need to install the streamlit library. The following script installs these libraries:


google-cloud-aiplatform
streamlit

Creating Google Gemini Pro Connector

I will divide the code into two Python files: geminiconnector.py and main.py. The geminiconnector.py library will contain the logic to connect to the Google Gemini Pro model and make API calls.

Code for geminiconnector.py

import os
from vertexai.preview.generative_models import GenerativeModel, Part
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = r"PATH_TO_JSON_API_FILE"

model = GenerativeModel("gemini-pro-vision")
config={
    "max_output_tokens": 2048,
    "temperature": 0,
    "top_p": 1,
    "top_k": 32
}


def generate(img, prompt):

    input = img + [prompt]

    responses = model.generate_content(    
        input,
        generation_config= config,
        stream=True,
    )
    full_response = ""

    for response in responses:
        full_response += response.text

    return full_response

I have already explained the details for the above code in my previous article. Therefore I will not delve …

Computer Science artificial-intelligence-llm pdf python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Converting PDF Image to CSV Using Multimodal Google Gemini Pro

In this article, you will learn to use Google Gemini Pro, a state-of-the-art multimodal generative model, to extract information from PDF and convert it to CSV files. You will use a simple text prompt to tell Google Gemini Pro about the information you want to extract. This is a valuable skill for data analysis, reporting, and automation.

You will use Python language to call the Google Vertex AI API functions and extract information from PDF converted to JPEG images.

So, let's begin without ado.

Importing and Installing Required Libraries

I ran my code on Google Colab, where I only needed to install the Google Cloud APIs. You can install the Google Cloud API via the following script installs.

pip install --upgrade google-cloud-aiplatform

Note: You must create an account with Google Cloud Vertex AI and get your API keys before running the scripts in this tutorial. When you sign up for the Google cloud platform, you will get free credits worth $300.

The following script imports the required libraries into our application.


import base64
import glob
import csv
import os
import re
from vertexai.preview.generative_models import GenerativeModel, Part

Defining Helping Functions for Image Reading

Before using Google Gemini Pro to extract information from PDF tables, you must convert your PDF files to image formats, e.g. JPG, PNG, etc. Google Gemini Pro can only accept images as input, not PDF files. You can use any tool that can convert PDF files to JPG images, such as

Computer Science artificial-intelligence-llm pdf python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Comparing Google Gemini Pro with OpenAI GPT-4 for Zero-Shot Classification

In this article, we will compare two state-of-the-art large language models for zero-shot text classification: Google Gemini Pro and OpenAI GPT-4.

Zero-shot text classification is a task where a model is trained on a set of labeled examples but can then classify new examples from previously unseen classes. This is useful for situations where the labeled data is small, or the output classes are dynamic and unpredictable.

We will use the IMDB movie review dataset as an example and try to classify the reviews into positive or negative sentiments without using any labeled data. We will use the results to compare the speed, accuracy, and price of Google Gemini Pro and OpenAI GPT-4. By the end of this tutorial, you will know which model to select for your custom use cases.

Importing and Installing Required Libraries

The first step is to install the required libraries. I ran my code on Google Colab. Therefore, I only needed to install the Google Cloud and OpenAI APIs. The following script installs these libraries.

Note: It is important to mention that you must create an account with OpenAI and Google Cloud Vertex AI and get your API keys before running the scripts in this tutorial. OpenAI and Gemini Pro are paid LLMs, but you can get free credits for testing when you sign up.

pip install --upgrade google-cloud-aiplatform
pip install openai

The rest of the libraries come pre-installed with Google Colab.
The following script imports the libraries you …

Computer Science artificial-intelligence-llm python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

TensorFlow Keras Sequence Data Generator for Multimodal Classification

I recently tackled a challenging research task involving multimodal data for a classification problem using TensorFlow Keras. One of the trickiest aspects was figuring out how to load multimodal data in batches from storage efficiently.

While TensorFlow Keras offers helpful functions for batch-loading images from various sources, the documentation and online resources don't explicitly cover how to load images in combination with other data types like CSV files.

However, with some experimentation, I discovered a solution to this problem. In this article, I'll demonstrate how to create custom data loaders capable of batch-loading data from multiple sources, such as image directories and CSV files.

We will solve a multimodal classification problem with images and corresponding texts as inputs. We will train a Keras model that classifies this multimodal input into one of the three predefined categories. This is called multi-class classification.

So, let's begin without ado.

Importing Required Libraries

We will extract text and image features using Transformer models from the Huggingface library. The following script installs the Huggingface transformers library.


! pip install accelerate -U
! pip install datasets transformers[sentencepiece]

The script below imports the libraries required to execute scripts in this article. I did not have to install these libraries since I used a Google Colab notebook.

import pandas as pd
import os
import numpy as np

import tensorflow as tf

from transformers import AutoTokenizer, TFBertModel
from transformers import AutoImageProcessor, TFViTModel


from keras.utils import Sequence
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Input, …

Computer Science python tensorflow

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Track Faces from Videos with Margins Using Deep Learning in Python

In this article, you will learn how to track faces within a video using the Python DeepFace library. Additionally, you'll discover how to include portions of the video background in face tracking by implementing custom methods that utilize the DeepFace library's extract_faces() method for face extraction.

I explained how to extract faces from videos using the Python DeepFace library in [one of my previous articles. However, I recently encountered a couple of issues when working with DeepFace's extract_faces() method:

This method does not allow the extraction of portions of the face background. It also sometimes ignores the boundary features of a face, such as ears, hair, etc.
Videos created by stitching together faces extracted by DeepFace are often jittery, as the extracted frames frequently miss some boundary facial features.

In this article, I provide solutions to these two problems.

It is pertinent to mention that OpenCV library provides functionalities for video tracking. However, they use very naive methods, which are less accurate than deep learning methods provided by the DeepFace library. Hence, I preferred DeepFace over OpenCV.

Installing and Importing Required Libraries

The following script installs the DeepFace and MoviePy libraries. The DeepFace library will be used to extract faces from videos. You will use the MoviePy library to create a modified video that contains facial regions by stitching together individual image frames.

! pip install deepface
! pip install moviepy

The script imports the Python libraries required to run the code in …

custom-deepface-video-output.gif 1827.63 KB

Computer Science python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Reducing Video Frames and Frame Rates (FPS) in Python

A video is a series of images, or frames, shown in rapid succession. Its frame rate, measured in frames per second (FPS), dictates the display speed. For instance, a 30 FPS video shows 30 frames each second. The frame count and frame rate determine a video's detail, smoothness, file size, and the processing power needed for playback or editing.

Higher frame rates and more frames result in finer detail and smoother motion but at the cost of larger file sizes and greater processing requirements. Conversely, lower frame rates and fewer frames reduce detail and smoothness but save on storage and processing needs.

In this article, you will see how to reduce the frame rate per second for a video, and total number of frames in a video using the Python programming language.

But before that, let's see why you would want to reduce the number of frames and frame rate of a video.

Why Reduce the Number of Frames and the Frame Rate of a Video?

Reducing the number of frames and frame rate of a video can be beneficial for several reasons:

Storage Efficiency: Videos with fewer frames and lower frame rates take up less disk space, which is helpful when storage capacity is limited or for easier online sharing.

Bandwidth Conservation: Such videos use less network bandwidth, making them suitable for streaming over slow or unstable internet connections.

Performance Optimization: They require fewer computational resources, ideal for low-end devices or resource-intensive processes like deep learning algorithms.

Let's now …

Computer Science python

usmanmalik57 12 Junior Poster in Training

1 Year Ago

Custom Loss Functions in PyTorch: A Comprehensive Guide

Introduction

Loss functions are the driving force behind all machine learning algorithms. They quantify how well our models are performing by calculating the difference between the predicted and actual outcomes. The goal of every machine learning algorithm is to minimize this loss function, thereby improving the model’s accuracy.

Various libraries, such as PyTorch, TensorFlow, and Keras, provide a plethora of built-in loss functions like Mean Squared Error (MSE), Cross-Entropy, and many more. These built-in functions cover a wide range of tasks and are sufficient for many standard machine learning problems.

However, there are scenarios where these built-in loss functions may not suffice. This could be due to the unique nature of the problem at hand, or the need for a specific optimization strategy. In such cases, we need to design our own custom loss functions.

This article will guide you through the process of creating custom loss functions in PyTorch. So, Let’s get started!

Understanding Loss Functions

A loss function, alternatively referred to as a cost function, measures the degree of deviation between predicted outcomes and actual results. It serves as a metric to assess the effectiveness of an algorithm in modeling a given dataset. When predictions significantly diverge from actual values, the loss function yields a higher value. Conversely, a lower value is produced when predictions are relatively accurate.

In machine learning, the ultimate goal is to minimize this loss function. This process is known as optimization. By minimizing the loss, we are essentially fine-tuning our model to …

Computer Science python