Skip to main content

Streaming Inserts

PostgreSQL Only

Streaming inserts use PostgreSQL's COPY FROM STDIN protocol. This feature is not available for other databases.

Streaming inserts use PostgreSQL's COPY protocol to load data significantly faster than individual INSERT statements. Data is text-encoded in batches and streamed directly to the server, bypassing the overhead of prepared statements.

streamingInsert.of() returns an Operation<Long> that can be transacted like any other operation. The COPY participates in the current transaction, so it can be composed with other operations atomically.

Single Column

For simple cases, use the PgText encoder from the type directly:

// Insert a list of strings using COPY
fun insertNames(names: Iterator<String>, tx: Transactor): Long {
return streamingInsert
.of("COPY users(name) FROM STDIN", 1000, names, PgTypes.text.pgText())
.transact(tx)
}
ParameterDescription
copyCommandA PostgreSQL COPY ... FROM STDIN command
batchSizeNumber of rows to buffer before flushing to the server
rowsAn Iterator over your data
textA PgText<T> encoder for your row type

Multi-Column Rows

For rows with multiple columns, derive a PgText encoder from a RowCodec:

// Define a RowCodec for your row type
val productCodec: RowCodec<ProductRow> = RowCodec.builder<ProductRow>()
.field(PgTypes.text, ProductRow::name)
.field(PgTypes.numeric, ProductRow::price)
.field(PgTypes.int4, ProductRow::quantity)
.build(::ProductRow)

// PgText.from() derives a text encoder from the RowCodec
val productText: PgText<ProductRow> = PgText.from(productCodec.underlying)

fun insertProducts(products: Iterator<ProductRow>, tx: Transactor): Long {
return streamingInsert
.of("COPY products(name, price, quantity) FROM STDIN", 1000, products, productText)
.transact(tx)
}

PgText.from(rowCodec) uses each column's text encoder to produce tab-delimited COPY format. The same RowCodec you use for reading rows can drive bulk loading.

Supported Types

Most PostgreSQL types support text encoding for COPY. Types that don't (such as jsonb) will throw UnsupportedOperationException at encode time.

Batch Size

The batchSize parameter controls how many rows are buffered in memory before being flushed to PostgreSQL. A larger batch size reduces network round-trips but uses more memory. A value between 1000-10000 is a reasonable starting point.