The advent of large language models (LLM) has replaced complex scripts with natural language for automating various tasks. You can now use LLM to interact with your databases using natural language, which makes life easier for people who do not have sufficient SQL knowledge.
In this article, you will learn how to retrieve information from SQL databases using natural language. For this purpose, you will use the Python LangChain library. The LangChain agents convert natural language questions into SQL queries and return the response in natural language.
Using natural language queries, you will learn how to interact with PostgreSQL, MySQL, and SQLite databases. You will retrieve information from the sample Northwind database. You can download the Northwind database samples for PostgreSQL, MySQL, and SQLite from Github. This article assumes you imported the Northwind database into the corresponding servers.
So, let's begin with ado.
Installing and Importing Required Libraries
To connect your Python application with PostgreSQL and MySQL, you must install the PostGreSQL and MySQL connectors. Execute the following script to download these connectors.
# connector for PostgreSQL
!pip install psycopg2
# connector for MySQL
!pip install mysql-connector-python
Defining the LLM and Agent
As previously said, I will use LangChain agents to execute natural language queries on various databases. To do so, we need a large language model (LLM) and database objects.
The following script imports the GPT-4 LLM via LangChain.
openai_key = os.environ.get('OPENAI_KEY2')
llm = ChatOpenAI(
openai_api_key = openai_key ,
model = 'gpt-4',
temperature = 0.5
)
Next, we define a get_db_response
function that accepts the database object and the user query as input parameters. Inside the function, we create a LangChain SQL agent using the create_sql_agent()
function. You pass the LLM and database to the agent.
In addition, you need to set the agent_type
attribute to open_tools
, which tells the agent to use the OpenAI tools to process agent queries. Finally, we set the verbose
attribute to True
to see how the agent processes the input query.
Finally, you can call the invoke()
method to run the query via the agent.
def get_db_response(db, query):
agent_executor = create_sql_agent(LLM,
db=db,
agent_type="openai-tools",
verbose=True)
response = agent_executor.invoke(query)
return response
Generating Response from PostgreSQL Database
Generating a response from the LangChain SQL agent is straightforward. First, you must create a database object using the SQLDatabase
class. Then, you must pass the database connection string to the from_uri()
method of the SQLDatabase
class.
For PostgreSQL, the syntax of the connection string is as follows:
f"postgresql+psycopg2://postgres:<<password>>@<<server>>:<<port>>/<<database>>"
In the above, replace values for <<password>>
, <<server>>
, <<port>>
and <<database>>
.
Next, you can write a query string and pass the database object and the query string to the get_db_response()
function you defined earlier.
In the following script, we retrieve information about the top 10 employees with the highest sales.
postgres_uri = f"postgresql+psycopg2://postgres:mani123@localhost:5432/northwind"
pg_db = SQLDatabase.from_uri(postgres_uri)
query = "Return the top 10 employees who did most sales"
response = get_db_response(pg_db, query)
Output:
The above output shows the agent's actions in executing the query. It searches for the employee
, salesorder
, and orderdetail
tables to retrieve the required information.
You can use the output
key of the response dictionary to see the returned response.
print(response['output'])
Output:
The top 10 employees who made the most sales are:
1. Yael Peled with 9798 sales
2. Judy Lew with 7852 sales
3. Sara Davis with 7812 sales
4. Don Funk with 6055 sales
5. Maria Cameron with 5913 sales
6. Russell King with 4654 sales
7. Paul Suurs with 3527 sales
8. Sven Buck with 3036 sales
9. Zoya Dolgopyatova with 2670 sales
The above output shows the list of the top 10 employees with the most sales.
Generate Response from MySQL Database
Generating a response from a MySQL database is almost the same as PostgreSQL. The only thing that changes is the connection string, whose syntax should be as follows:
'mysql+mysqlconnector://<<user>>:<<password>>@<<server>>:<<port>>/<<database>>'
For example, in the following script, we retrieve the top 5 products with the least sales from the Northwind database in a MySQL server.
mysql_uri = 'mysql+mysqlconnector://root:mani123@localhost:3306/northwind'
mysql_db = SQLDatabase.from_uri(mysql_uri)
query = "Give me the 5 products with least sales"
response = get_db_response(mysql_db, query)
Output:
The above output shows the agent's thought process.
You can print the final output using the script below:
print(response['output'])
Output:
The 5 products with the least sales are:
1. Product AOZBW with 95 units sold.
2. Product KSZOI with 122 units sold.
3. Product EVFFA with 125 units sold.
4. Product MYNXN with 138 units sold.
5. Product XLXQF with 184 units sold.
Generate Response from SQLite Database
Finally, you can use a connection string with the following syntax to generate a response from an SQLite database:
"sqlite:///<<sqlite database path"
The script below returns the names of the top 10 customers with the most orders. The output shows a snapshot of the agent's thought process.
sqlite3_uri = "sqlite:///D:/Datasets/Northwind.db"
sqlite3_uri = SQLDatabase.from_uri(sqlite3_uri)
query = "Give me the name of top 10 customers with most number of orders"
response = get_db_response(sqlite3_uri, query)
Output:
The script below returns the final output:
print(response['output'])
Output:
The top 10 customers with the most number of orders are:
1. Customer LCOUJ with 31 orders
2. Customer THHDP with 30 orders
3. Customer IRRVL with 28 orders
4. Customer FRXZL with 19 orders
5. Customer CYZTN with 19 orders
6. Customer UMTLM with 18 orders
7. Customer NYUHS with 18 orders
8. Customer HGVLZ with 18 orders
9. Customer RTXGC with 17 orders
10. Customer ZHYOS with 15 orders
Conclusion
This article shows how you can interact with SQL databases using natural language via the Python LangChain agents. Though the agents described in the article are extremely useful for executing SELECT queries, you cannot use them to execute, update, or delete queries on databases. I will show you how to do that in one of my following tutorials.