Structured Retrieval: Database Queries (SQL)

Relational Retrieval and Text-to-SQL

Exposing relational databases to language models is a common pattern for structured retrieval. Unlike unstructured search, which relies on text similarity, databases require precise queries written in Structured Query Language (SQL). Under this pattern, the language model is used as a Text-to-SQL generator: it translates natural language requests into SQL statements, executes them via a tool, and reads the returned data to answer the user.

To perform this translation, the model must understand the schema of the database. This is achieved by including metadata in the model's prompt instructions:

Table DDL: The SQL statements used to create the tables (e.g., CREATE TABLE customers ...).
Column Descriptions: Explanations of specific fields, units, and relationships.
Sample Queries: Example inputs and their corresponding SQL outputs to establish formatting expectations.

The Write/Read Security Boundary

Exposing a database connection to an autonomous model introduces substantial security risks. While SQL is a powerful query language, it is not inherently read-only; it contains commands to write, modify, and delete data (INSERT, UPDATE, DELETE, DROP). Exposing an unrestricted database connection allows for potential SQL injection attacks or model hallucinations that can destroy data or compromise privacy.

[!IMPORTANT] Exposing SQL tools requires database-level user privilege constraints.

Developers must never rely on prompt instructions (e.g., "only write SELECT queries") to secure a database. The database connection profile used by the agent tool must be programmatically restricted to read-only access (such as SELECT permissions on a specific read-replica or restricted user account). Any attempt to write or alter the schema must trigger a database permission exception.

Relational Retrieval Architecture

The diagram below outlines the Text-to-SQL generation and validation loop:

Relational Retrieval: Text-to-SQL Loop

Generation: The model parses the schema definition and compiles the user's natural language request into a SELECT statement.
Privilege Enforcement: The execution environment executes the query using a connection profile that rejects non-SELECT queries.
Result Formatting: The matching rows are converted to a structured format (like a JSON array) and appended to the context.

Interactive Playground: Text-to-SQL Execution

The following playground initializes an in-memory database representing customer accounts and exposes a query tool. The tool simulates a database-level privilege engine by validating that only SELECT statements are executed, raising a permission error for write operations.

Try It Yourself

Notice how the query generated is structured specifically around the database schema (country = 'US' ORDER BY balance DESC LIMIT 1). The tool returns the raw matching row in JSON, which the model translates back into a conversational answer. Exposing the database as a SQL tool provides access to relational operators without forcing the model to store or recall millions of transactional records in its context window.