AI in the Wild: Streaming, Serving, and Scaling Production Systems

We just wrapped up a lovely evening at Adyen for our meetup "AI in the Wild: Streaming, Serving and Scaling Production Systems" πŸŽ‰

In their first talk, "Accelerating and protecting shoppers’ payments with Apache Flink at Adyen", Vitalii Zhebrakovskyi and Matteo Tonelli showed us how a Flink-powered engineering solution enables Adyen to use ML models for fraud detection. πŸ”πŸ’³

They built a data streaming pipeline based on Kafka, Flink, and Cassandra to serve ML features with a latency of a few minutes, compared with the daily-refreshing batch jobs based on Hadoop + Spark they were using before. ⚑

Here are some of the main takeaways from their talk:
🧩 They built a DSL abstraction for declarative feature engineering, to allow data scientists to write infrastructure-agnostic features
πŸ—οΈ The Flink platform is managed centrally, and available to all teams that need it
πŸ›‘οΈ Redundancy, isolation, and region-level abstraction are used to make the Flink jobs resilient and gracefully handle failovers with state
πŸ“ˆ The first iteration of this new infrastructure yielded a 17 percentage point improvement in model performance

In the second talk "From Notebook to Living Room: Building a Self-Hosted AI Stack that replaces SaaS", Calogero Zarbo (head of Advanced Computing at Sandbox Wealth) presented some experiments he did with some of his data-minded friends. Their aim was to try and escape the SaaS subscription trap by building a self-hosted AI and productivity stack. πŸ πŸ”§

Amongst the rich variety of tools and experiments he reviewed, here are some favourites:
πŸ–₯️ Hardware: Ubuntu server on a single machine, equipped with an RTX 5090. Tailscale VPN for secure access without exposing the setup to the internet
πŸ€– AI stack: OpenWebUI (chat interface) + Cline (coding agent) + Vane (AI search). SearXNG as a metasearch engine. LLama scheduler to dynamically route requests, with MoE models (for quick tasks) + dense model (for coding and deep reasoning)
πŸ—œοΈ KV cache quantization: after several experiments, they opted for a hybrid approach β€” keeping Keys at higher 8-bit precision while using TurboQuant to compress the values

Thanks to the speakers and all the participants! πŸ™

Previous
Previous

Taming Data Pipelines: Scaling Databricks & Linting dbt

Next
Next

From Context to Conversation: Building AI-Powered Workflows at Scale