Many data leaders we speak with are reaching a point of AI exhaustion. Over the last year, they have been inundated with pitches for standalone AI platforms that promise to solve every business problem with a single click. However, in our experience, the most successful implementations do not happen in a vacuum. Instead, they happen when teams ask: How do I implement AI using the tools I already use like dbt and Terraform?
In practice, most successful AI initiatives rely on existing data engineering workflows rather than new standalone platforms. This shift reflects a maturing market where practitioners prioritize reliability, version control, and governance over flashy, black box interfaces. By extending your Modern Data Stack (MDS), you can deploy production-grade AI while maintaining the same CI/CD (Continuous Integration and Continuous Deployment) rigor you apply to your SQL (Structured Query Language) pipelines.
How do I implement AI using the tools I already use like dbt and Terraform?
To implement AI using your existing stack, you must treat your AI components as extensions of your existing data lifecycle rather than separate entities. This means using dbt to transform raw data into AI-ready features and using Terraform to provision the infrastructure required for vector databases and model endpoints. We call this approach the MDS-AI Extension Model because it leverages the governed, tested data you already have in your warehouse to power Large Language Models (LLMs).
In our work with mid-market SaaS (Software as a Service) companies, we find that the primary barrier to AI adoption is not the model itself, but the data delivery mechanism. By using dbt for feature engineering, you ensure that the context fed into an LLM context window is subject to the same testing and documentation as your BI (Business Intelligence) reports. Simultaneously, using Terraform for infrastructure ensures that your vector databases and API (Application Programming Interface) gateways are versioned and reproducible.
| Component | Traditional MDS Role | AI Extension Role |
|---|---|---|
| dbt | Aggregating revenue for BI | Feature engineering and RAG context preparation |
| Terraform | Managing BigQuery datasets | Provisioning vector search and model endpoints |
| BigQuery | Analytical storage | Vector storage and similarity search execution |
| GitHub | Code versioning | Versioning prompt templates and infrastructure |
Leveraging using dbt for machine learning features
The most critical part of any AI system is the quality of the data it consumes. When we build RAG (Retrieval-Augmented Generation) systems, the "Retrieval" part is simply a data engineering problem. Instead of letting an AI platform ingest raw, messy data, we recommend using dbt for machine learning features by creating a dedicated "feature store" layer within your warehouse.
This involves building specific dbt models that flatten complex relational data into text chunks or feature vectors. For example, if you are building a support bot, you do not just send raw Zendesk tickets to the model. You use dbt to join ticket data with customer metadata, strip out PII (Personally Identifiable Information), and format the output into a clean string.
-- example dbt model for LLM context
-- models/ai_features/support_ticket_context.sql
WITH raw_tickets AS (
SELECT * FROM {{ ref('stg_zendesk_tickets') }}
),
customer_segments AS (
SELECT * FROM {{ ref('dim_customers') }}
)
SELECT
t.ticket_id,
t.updated_at,
'Customer Tier: ' || c.subscription_tier ||
' | Subject: ' || t.subject ||
' | Description: ' || t.description AS llm_context_string
FROM raw_tickets t
JOIN customer_segments c ON t.customer_id = c.customer_id
WHERE t.status = 'solved'
AND t.updated_at > CURRENT_DATE() - INTERVAL 90 DAYBy defining these models in dbt, you gain access to dbt tests and documentation. You can set alerts if your LLM context strings are null or if the volume of data drops, ensuring your AI does not start hallucinating due to a broken upstream pipeline. If you are still in the early stages of this journey, our AI Stack Audit can help you identify which parts of your existing dbt project are ready for AI integration.
Building ai infrastructure with Terraform and dbt
Once your data is prepared, you need a place for it to go. Building ai infrastructure with Terraform and dbt allows you to manage the entire lifecycle of an AI application from a single repository. Terraform acts as the foundation, provisioning the necessary resources such as Google Cloud Vertex AI, Pinecone vector indexes, or AWS SageMaker endpoints.
The advantage of using Terraform is that your AI environment becomes reproducible. If you need to spin up a UAT (User Acceptance Testing) environment for a new model version, you can do so by simply changing a variable in your Terraform configuration. This prevents the "it works on my machine" syndrome that often plagues AI research projects.
Our team advocates for a "Warehouse-First" approach to AI infrastructure. If you are already using BigQuery or Snowflake, you may not even need a separate vector database. Terraform can be used to enable the built-in vector search capabilities of your existing warehouse, reducing the TCO (Total Cost of Ownership) and simplifying your security model.
Ready to fix your data foundation?
Book a free diagnostic call and find out where your stack stands.
Book a CallWhy use terraform for ai pipeline deployment?
The deployment phase is where many AI projects fail to transition from a notebook to production. Using terraform for ai pipeline deployment ensures that your API keys, IAM (Identity and Access Management) roles, and resource quotas are handled securely and transparently.
Consider a scenario where you are deploying a custom LLM endpoint. Without Terraform, someone on the team might manually create a GPU (Graphics Processing Unit) instance in the console, set up a public IP (Internet Protocol), and forget to rotate the credentials. By using Terraform, you define the instance type, the VPC (Virtual Private Cloud) settings, and the service accounts in code.
# Example Terraform block for an AI endpoint resource
resource "google_vertex_ai_endpoint" "support_model" {
display_name = "support-ticket-classifier"
location = "us-central1"
project = var.project_id
# Deploying the model version managed as code
deployed_models {
model = google_vertex_ai_model.latest_version.id
display_name = "v1-stable"
dedicated_resources {
machine_spec {
machine_type = "n1-standard-4"
}
min_replica_count = 1
max_replica_count = 3
}
}
}This level of control is essential for mid-market data teams who must answer to security and compliance departments. When we implement these patterns for our clients, we integrate them into their existing CI/CD pipelines, allowing them to deploy AI infrastructure with the same confidence they deploy a new SQL table.
Comparing the TCO: Existing stack vs. AI platforms
One of the most compelling reasons to stick with your existing tools is the financial impact. New AI platforms often charge a significant premium, sometimes including a percentage of your total cloud spend or a high per-user license fee. By leveraging dbt and Terraform, you primarily pay for the underlying compute and storage you are already consuming.
In our experience, the implementation of a custom AI workflow using existing tools fits within the $5,000-$8,000 range for an initial build. In contrast, dedicated enterprise AI platforms often carry a substantial annual license cost before you even run your first query.
| Cost Factor | AI-Only Platform | MDS-AI Extension (dbt/Terraform) |
|---|---|---|
| Licensing | Substantial annual license fee | $0 (Open Source / Existing) |
| Data Gravity | High (Data must move to their cloud) | Low (Data stays in your warehouse) |
| Learning Curve | High (New UI, proprietary logic) | Low (SQL and HCL knowledge) |
| Vendor Lock-in | High | Low (Portable code) |
| Implementation | Large upfront cost and 3-6 months | $5,000-$8,000 and 1-2 weeks |
By staying within the MDS, you also avoid the "Data Silo" problem. When your AI logic lives in dbt, your BI tools can easily report on AI performance. You can track the ROI (Return on Investment) of your AI models directly alongside your revenue metrics in your existing dashboards. We teach these specific patterns in our Learn AI Bootcamp, helping data teams transition from engineering to AI builders.
How do I ensure AI data quality using dbt?
Data quality is the most frequently cited concern for teams moving AI into production. If a SQL query fails, a report is blank. If an AI pipeline fails, the model might give a confident, incorrect answer that costs the business money.
To mitigate this, we use dbt tests to validate the data going into the LLM. We check for:
- Freshness: Is the context data recent?
- Completeness: Are required fields for the prompt present?
- Volume: Did the number of available records drop unexpectedly?
By applying these standard data engineering practices to AI, we treat the LLM as just another downstream consumer of our data, no different than a Tableau dashboard or a CRM (Customer Relationship Management) sync. This approach allows the data team to own the "Truth" layer, while the developers focus on the application logic.
Frequently Asked Questions About AI with dbt and Terraform
Can I use dbt to manage vector embeddings?
Yes, you can use dbt to manage the metadata and text chunks that will be embedded. While dbt itself does not generate the embeddings (which usually requires an API call to a model provider), it is the perfect tool for orchestrating the preparation of the text. Many teams use dbt to call external functions in BigQuery or Snowflake that perform the embedding generation via SQL.
Is Terraform necessary for AI if we only use one or two models?
While you can manually set up a few models, Terraform becomes essential as you scale. If you plan to move from a single prototype to a suite of AI agents, managing those resources, API keys, and permissions manually becomes a bottleneck and a security risk. Terraform provides the documentation and audit trail needed for production systems.
How does using dbt for AI affect my warehouse costs?
Using dbt for AI features will increase your warehouse compute usage, but it is typically more cost-effective than using a third-party data processing tool. By keeping the transformation logic within the warehouse, you avoid data egress charges and leverage the highly optimized execution engines of modern warehouses like BigQuery or Snowflake.
Do I need a dedicated vector database if I use Terraform?
Not necessarily. Many modern warehouses have integrated vector search capabilities. Terraform can provision these features within your existing database. However, if you have extremely low-latency requirements (sub-50ms) or massive scale, Terraform can also be used to provision and manage a specialized vector database like Pinecone or Weaviate alongside your warehouse.
How do I manage LLM prompt versions with these tools?
We recommend managing prompt templates as code within the same repository as your dbt models or your Terraform configuration. This allows you to link specific data transformations to specific prompt versions. When you update a dbt model that changes the context format, you can simultaneously update the prompt template in the same PR (Pull Request).
Ready to build your AI foundation?
Building production AI does not require you to abandon the tools that have made your data team successful. By extending your existing dbt and Terraform workflows, you can ship reliable, governed, and cost-effective AI systems that the business can trust.
If you are looking to accelerate this transition, our team provides a clear path forward. We offer a hands-on Learn AI Bootcamp designed specifically for data engineers and analysts who want to master these patterns. If you prefer a more tailored approach, you can book a free consultation to discuss your specific architecture and how we can help you implement AI using the tools you already know and love.