Join us for the next Data Engineer Things Meetup as we delve into the world of Data and AI. This event will feature insightful discussions on the latest trends and practices in the field, and how these technologies are shaping the future of data engineering. Whether you are a seasoned data professional or just starting in the industry, this meetup is a great opportunity to network with like-minded individuals and learn from experts in the field.
In DET events, we cover topics such as AI, analytics, data visualization, orchestration, open-source tech, data sharing, multi model databases, and other essential skills for any data engineer. Our speakers will share their experiences and insights, providing you with valuable knowledge to apply in your own projects. Don't miss out on this chance to connect with fellow data enthusiasts and expand your skills in the exciting world of data engineering!
5 PM - 5:30 PM - Enjoy snacks, meet your Seattle DET community
5:30 PM - 5:40 PM - DET Overview, Resources, Opportunities
5:40 PM - 7 PM - Listen to industry experts on Data Engineering topics
7 PM - 8 PM - Network with data enthusiasts
Speakers:
Dipankar Mazumdar
Topic - Redefining Open Lakehouse Architecture with Apache Hudi 1.0
Abstract - Apache Hudi is an open lakehouse platform with a high-performance table format designed for fast, reliable data ingestion, incremental processing, and advanced query capabilities on data lakes. In this talk, we’ll trace Apache Hudi’s journey from its origins at Uber to powering mission-critical workloads at Amazon, ByteDance, Notion, and beyond.
We’ll then dive into the novel innovations introduced in Hudi 1.0 - a transformative release that redefines its storage engine to meet the demands of modern data systems. Key architectural advancements include the new Non-Blocking Concurrency Control (NBCC) mechanism, purpose-built for high-throughput and streaming workloads. NBCC eliminates locking delays, enabling seamless concurrent writes and updates. Hudi 1.0 also introduces an LSM tree-style timeline, optimizing metadata access and enabling scalable, infinite time travel. Additionally, new indexing capabilities, including expression-based and secondary indexes, significantly improve query performance across diverse workloads. Together, these innovations make Apache Hudi a powerful, open-source foundation for building modern lakehouse platforms.
Jack Ye
Topic - Multimodal AI Lakehouse with Lance & LanceDB
Abstract - The next wave of AI applications demands not just structured data, but seamless access to images, text, embeddings, and other complex modalities—often at scale. Yet with current open lakehouse solutions, many teams still resort to loading data into closed systems for vector search, full-text search, or feature engineering—reintroducing the very data silos that data lakes were meant to eliminate.
In this talk, we introduce Lance, a next-generation columnar data format optimized for AI, and LanceDB, a vector-native query engine built on top of Lance. We’ll explore how Lance enables fast random access, zero-copy reads, and efficient data evolution—making it ideal for ML feature stores, retrieval-augmented generation (RAG), and hybrid search.
We’ll walk through real-world use cases and demo how Lance unifies SQL analytics, vector search, full-text search, data loading, and feature engineering through a single open storage layer—unlocking truly multimodal analytics and AI workflows.
Call for Speakers!
Do you have a killer talk or interesting use case you’ve been working on? We want to hear from YOU! If you're interested in speaking at future DET events, submit your talk through this link: http://meetup.dataengineerthings.org/cfp
Event Sponsored by: Onehouse Team