AI Agent Development with Database Access
An AI agent with database access can independently query data, generate SQL queries based on natural language questions, and perform write operations with necessary validations. Text-to-SQL is a key technology that enables non-technical users to interact with data through dialogue.
Text-to-SQL: Converting Questions to SQL
from langchain_openai import ChatOpenAI
from langchain_community.utilities import SQLDatabase
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langchain.agents import create_sql_agent
# Connect to PostgreSQL
db = SQLDatabase.from_uri(
"postgresql://user:password@localhost:5432/company_db",
include_tables=["orders", "customers", "products", "inventory"],
sample_rows_in_table_info=3, # Include sample data in schema
)
llm = ChatOpenAI(model="gpt-4o", temperature=0)
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
# SQL agent with automatic error correction
agent = create_sql_agent(
llm=llm,
toolkit=toolkit,
verbose=True,
handle_parsing_errors=True,
max_iterations=10,
)
# Query examples
result = agent.invoke({"input": "What are the top 5 customers by revenue in the last 3 months?"})
result = agent.invoke({"input": "Show products with less than 10 units in stock"})
Security: Read-Only Access
from langchain_community.utilities import SQLDatabase
from sqlalchemy import create_engine, text
# Read-only PostgreSQL user
READ_ONLY_USER_URI = "postgresql://readonly_user:pass@localhost:5432/db"
# Additional validation: prevent DML operations
def validate_sql_query(query: str) -> bool:
"""Check that the query is SELECT only"""
forbidden_keywords = ["INSERT", "UPDATE", "DELETE", "DROP", "CREATE", "ALTER", "TRUNCATE"]
query_upper = query.upper()
for keyword in forbidden_keywords:
if keyword in query_upper:
return False
return True
class SafeSQLTool:
def __init__(self, db_uri: str):
self.engine = create_engine(db_uri)
def execute_query(self, query: str) -> str:
if not validate_sql_query(query):
return "ERROR: Only SELECT queries are allowed"
# Limit number of rows
if "LIMIT" not in query.upper():
query = f"{query.rstrip(';')} LIMIT 100"
with self.engine.connect() as conn:
result = conn.execute(text(query))
rows = result.fetchall()
columns = result.keys()
return str([dict(zip(columns, row)) for row in rows])
Database Schema with Context for LLM
Text-to-SQL quality depends critically on the quality of schema description:
SCHEMA_CONTEXT = """
Database tables:
1. orders
- id: PK, INTEGER
- customer_id: FK → customers.id
- status: VARCHAR (pending, confirmed, shipped, delivered, cancelled)
- total_amount: DECIMAL(12,2) — order amount in currency
- created_at: TIMESTAMP
- shipped_at: TIMESTAMP (NULL if not shipped)
2. customers
- id: PK
- name: VARCHAR — company name or full name
- inn: VARCHAR(12) — tax ID for legal entity/individual entrepreneur
- segment: VARCHAR (enterprise, mid, small) — customer segment
- manager_id: FK → employees.id — responsible manager
3. products
- sku: VARCHAR — article number
- name: VARCHAR
- category: VARCHAR
- price_rub: DECIMAL
- cost_rub: DECIMAL — cost price
IMPORTANT: Order statuses: 'delivered' = successfully completed. 'cancelled' = cancelled.
Revenue = sum of total_amount for orders with 'delivered' status.
"""
system_prompt = f"""You are a data analyst. Convert questions to SQL queries.
Use the following database schema:
{SCHEMA_CONTEXT}
Rules:
- SELECT queries only
- Always add LIMIT (no more than 1000)
- Use aliases for readability
- When aggregating — add ORDER BY"""
Practical Use Case: BI Agent for E-commerce
Task: analytical assistant for a commercial director — sales analysis, ABC analysis of assortment, order funnel, cohort retention.
Database: PostgreSQL, 15 tables, 3M orders.
Dialog examples:
User: "What is the conversion from pending to delivered in the last 30 days by customer segment?"
Agent generates:
SELECT
c.segment,
COUNT(*) FILTER (WHERE o.status = 'pending') AS pending,
COUNT(*) FILTER (WHERE o.status = 'delivered') AS delivered,
ROUND(
COUNT(*) FILTER (WHERE o.status = 'delivered')::decimal /
NULLIF(COUNT(*), 0) * 100, 1
) AS conversion_pct
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at >= NOW() - INTERVAL '30 days'
GROUP BY c.segment
ORDER BY conversion_pct DESC
LIMIT 100;
Results:
- Time to get analytics: 2 days → 3 minutes
- SQL accuracy (questions → correct SQL): 87%
- Typical errors: incorrect JOINs in complex queries (resolved by few-shot examples in prompt)
Few-Shot Examples for Improved Accuracy
FEW_SHOT_EXAMPLES = """
Examples of correct queries:
Question: Top 10 products by margin in the last quarter
SQL:
SELECT p.name, p.sku,
SUM(oi.quantity * (p.price_rub - p.cost_rub)) AS margin_rub
FROM order_items oi
JOIN products p ON oi.sku = p.sku
JOIN orders o ON oi.order_id = o.id
WHERE o.status = 'delivered'
AND o.created_at >= DATE_TRUNC('quarter', CURRENT_DATE) - INTERVAL '3 months'
GROUP BY p.name, p.sku
ORDER BY margin_rub DESC
LIMIT 10;
Question: Average order value by month in 2025
SQL:
SELECT DATE_TRUNC('month', created_at) AS month,
ROUND(AVG(total_amount), 0) AS avg_check,
COUNT(*) AS order_count
FROM orders
WHERE status = 'delivered'
AND created_at BETWEEN '2025-01-01' AND '2025-12-31'
GROUP BY 1
ORDER BY 1;
"""
Timeline
- Develop Text-to-SQL agent: 2–3 weeks
- Configure schema and few-shot examples: 1 week
- Test on real queries: 1–2 weeks
- Total: 4–6 weeks







