Welcome to the Daft blog

Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.

Daft v0.7.7: Image Dedup Hashing, Parquet Cache Fix, and df.shuffle()
Announcements
April 3, 2026

Daft v0.7.7: Image Dedup Hashing, Parquet Cache Fix, and df.shuffle()

Daft v0.7.7 ships five perceptual image hashing algorithms for deduplication, fixes a parquet streaming regression, and adds df.shuffle() for ML data prep.

Daft v0.7.7: Image Dedup Hashing, Parquet Cache Fix, and df.shuffle()
Announcements
April 3, 2026

Daft v0.7.7: Image Dedup Hashing, Parquet Cache Fix, and df.shuffle()

Daft v0.7.7 ships five perceptual image hashing algorithms for deduplication, fixes a parquet streaming regression, and adds df.shuffle() for ML data prep.

Daft v0.7.6: Every Major Lake Format, O(1) Scalars, and Swordfish Plan Caching
Announcements
March 31, 2026

Daft v0.7.6: Every Major Lake Format, O(1) Scalars, and Swordfish Plan Caching

Daft natively reads and writes every major open lake format — Iceberg, Delta Lake, Hudi, and now Apache Paimon. Plus O(1) scalar columns, fingerprint-based plan caching in Swordfish, and production observability.

Daft UDF Patterns: Four Patterns, One Notebook
Product
March 30, 2026

Daft UDF Patterns: Four Patterns, One Notebook

Row-wise, generator, async, and stateful UDFs — one notebook, one dataset, runnable side by side.

GPU Inference with @daft.cls
Product
March 23, 2026

GPU Inference with @daft.cls

Run GPU models on millions of rows without OOM. Real patterns from ByteDance, Essential AI, and more.

Stateful UDFs with daft.cls: Python Classes that Scale
Product
March 17, 2026

Stateful UDFs with daft.cls: Python Classes that Scale

Turn any Python class into a distributed operator. Hold models, connections, and clients across rows with one decorator.

Daft v0.7.5: A Plugin System, 5x Faster Parquet, and a Real-Time Query Debugger
Engineering
March 11, 2026

Daft v0.7.5: A Plugin System, 5x Faster Parquet, and a Real-Time Query Debugger

Native Extensions via Stable C ABI, Live Query Dashboard, and 2-5x faster Parquet Reads on Nested Types

Stateless UDFs with daft.func - four patterns, one decorator
Product
March 10, 2026

Stateless UDFs with daft.func - four patterns, one decorator

Row-wise, async, generator, and batch UDFs in Daft — one decorator, zero boilerplate, local or distributed.

Daft UDFs: What is a UDF and why do you need one?
Product
March 3, 2026

Daft UDFs: What is a UDF and why do you need one?

Daft User Defined Functions (UDFs) let you run custom Python inside a distributed DataFrame pipeline. Leverage Row-wise, Async, Generators, and Batch.

How We're Making Observability Better in Daft
Engineering
March 2, 2026

How We're Making Observability Better in Daft

Daft Observability Roadmap: metrics, OTEL integration, real-time dashboards, and DataFrame APIs for debugging and monitoring distributed pipelines.

PreviousPage 1 of 6Next
Get updates, contribute code, or say hi.
Daft Engineering Blog
Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.
Github Discussions Forums
join
GitHub logo
The Distributed Data Community Slack
join
Slack logo