AI in the Wild: Streaming, Serving, and Scaling Production Systems

Jun 12

We just wrapped up a lovely evening at Adyen for our meetup "AI in the Wild: Streaming, Serving and Scaling Production Systems" 🎉

In their first talk, "Accelerating and protecting shoppers’ payments with Apache Flink at Adyen", Vitalii Zhebrakovskyi and Matteo Tonelli showed us how a Flink-powered engineering solution enables Adyen to use ML models for fraud detection. 🔍💳

They built a data streaming pipeline based on Kafka, Flink, and Cassandra to serve ML features with a latency of a few minutes, compared with the daily-refreshing batch jobs based on Hadoop + Spark they were using before. ⚡

Here are some of the main takeaways from their talk:
🧩 They built a DSL abstraction for declarative feature engineering, to allow data scientists to write infrastructure-agnostic features
🏗️ The Flink platform is managed centrally, and available to all teams that need it
🛡️ Redundancy, isolation, and region-level abstraction are used to make the Flink jobs resilient and gracefully handle failovers with state
📈 The first iteration of this new infrastructure yielded a 17 percentage point improvement in model performance

In the second talk "From Notebook to Living Room: Building a Self-Hosted AI Stack that replaces SaaS", Calogero Zarbo (head of Advanced Computing at Sandbox Wealth) presented some experiments he did with some of his data-minded friends. Their aim was to try and escape the SaaS subscription trap by building a self-hosted AI and productivity stack. 🏠🔧

Amongst the rich variety of tools and experiments he reviewed, here are some favourites:
🖥️ Hardware: Ubuntu server on a single machine, equipped with an RTX 5090. Tailscale VPN for secure access without exposing the setup to the internet
🤖 AI stack: OpenWebUI (chat interface) + Cline (coding agent) + Vane (AI search). SearXNG as a metasearch engine. LLama scheduler to dynamically route requests, with MoE models (for quick tasks) + dense model (for coding and deep reasoning)
🗜️ KV cache quantization: after several experiments, they opted for a hybrid approach — keeping Keys at higher 8-bit precision while using TurboQuant to compress the values

Thanks to the speakers and all the participants! 🙏

Kally Chung

AI in the Wild: Streaming, Serving, and Scaling Production Systems

Taming Data Pipelines: Scaling Databricks & Linting dbt

From Context to Conversation: Building AI-Powered Workflows at Scale

PyData Amsterdam