Skip to main content

Inserts

An INSERT statement is functionally the same as using dbt to SELECT from an existing source or other dbt model. If you are faced with an INSERT-SELECT statement, the easiest way to convert the statement is to just create a new dbt model, and pull the SELECT portion of the INSERT statement out of the procedure and into the model. That’s basically it!

To really break it down, let’s consider a simple example:

INSERT INTO returned_orders (order_id, order_date, total_return)

SELECT order_id, order_date, total FROM orders WHERE type = 'return'

Converting this with a first pass to a dbt model (in a file called returned_orders.sql) might look something like:

SELECT
order_id as order_id,
order_date as order_date,
total as total_return

FROM {{ ref('orders') }}

WHERE type = 'return'

Functionally, this would create a model (which could be materialized as a table or view depending on needs) called returned_orders that contains three columns: order_id, order_date, total_return) predicated on the type column. It achieves the same end as the INSERT, just in a declarative fashion, using dbt.

A note on FROM clauses

In dbt, using a hard-coded table or view name in a FROM clause is one of the most serious mistakes new users make. dbt uses the ref and source macros to discover the ordering that transformations need to execute in, and if you don’t use them, you’ll be unable to benefit from dbt’s built-in lineage generation and pipeline execution. In the sample code throughout the remainder of this article, we’ll use ref statements in the dbt-converted versions of SQL statements, but it is an exercise for the reader to ensure that those models exist in their dbt projects.

Sequential INSERTs to an existing table can be UNION ALL’ed together

Since dbt models effectively perform a single CREATE TABLE AS SELECT (or if you break it down into steps, CREATE, then an INSERT), you may run into complexities if there are multiple INSERT statements in your transformation that all insert data into the same table. Fortunately, this is a simple thing to handle in dbt. Effectively, the logic is performing a UNION ALL between the INSERT queries. If I have a transformation flow that looks something like (ignore the contrived nature of the scenario):

CREATE TABLE all_customers

INSERT INTO all_customers SELECT * FROM us_customers

INSERT INTO all_customers SELECT * FROM eu_customers

The dbt-ified version of this would end up looking something like:

SELECT * FROM {{ ref('us_customers') }}

UNION ALL

SELECT * FROM {{ ref('eu_customers') }}

The logic is functionally equivalent. So if there’s another statement that INSERTs into a model that I’ve already created, I can just add that logic into a second SELECT statement that is just UNION ALL'ed with the first. Easy!

0