Join us as we explore innovative ways to handle multimodal datasets, optimize performance, and simplify your data workflows.

Daft v0.7.7 ships five perceptual image hashing algorithms for deduplication, fixes a parquet streaming regression, and adds df.shuffle() for ML data prep.

Daft v0.7.7 ships five perceptual image hashing algorithms for deduplication, fixes a parquet streaming regression, and adds df.shuffle() for ML data prep.

Daft natively reads and writes every major open lake format — Iceberg, Delta Lake, Hudi, and now Apache Paimon. Plus O(1) scalar columns, fingerprint-based plan caching in Swordfish, and production observability.

Row-wise, generator, async, and stateful UDFs — one notebook, one dataset, runnable side by side.

Run GPU models on millions of rows without OOM. Real patterns from ByteDance, Essential AI, and more.

Turn any Python class into a distributed operator. Hold models, connections, and clients across rows with one decorator.

Native Extensions via Stable C ABI, Live Query Dashboard, and 2-5x faster Parquet Reads on Nested Types

Row-wise, async, generator, and batch UDFs in Daft — one decorator, zero boilerplate, local or distributed.

Daft User Defined Functions (UDFs) let you run custom Python inside a distributed DataFrame pipeline. Leverage Row-wise, Async, Generators, and Batch.

Daft Observability Roadmap: metrics, OTEL integration, real-time dashboards, and DataFrame APIs for debugging and monitoring distributed pipelines.