AI in the Wild: Streaming, Serving, and Scaling Production Systems
We just wrapped up a lovely evening at Adyen for our meetup "AI in the Wild: Streaming, Serving and Scaling Production Systems" π
In their first talk, "Accelerating and protecting shoppersβ payments with Apache Flink at Adyen", Vitalii Zhebrakovskyi and Matteo Tonelli showed us how a Flink-powered engineering solution enables Adyen to use ML models for fraud detection. ππ³
They built a data streaming pipeline based on Kafka, Flink, and Cassandra to serve ML features with a latency of a few minutes, compared with the daily-refreshing batch jobs based on Hadoop + Spark they were using before. β‘
Here are some of the main takeaways from their talk:
π§© They built a DSL abstraction for declarative feature engineering, to allow data scientists to write infrastructure-agnostic features
ποΈ The Flink platform is managed centrally, and available to all teams that need it
π‘οΈ Redundancy, isolation, and region-level abstraction are used to make the Flink jobs resilient and gracefully handle failovers with state
π The first iteration of this new infrastructure yielded a 17 percentage point improvement in model performance
In the second talk "From Notebook to Living Room: Building a Self-Hosted AI Stack that replaces SaaS", Calogero Zarbo (head of Advanced Computing at Sandbox Wealth) presented some experiments he did with some of his data-minded friends. Their aim was to try and escape the SaaS subscription trap by building a self-hosted AI and productivity stack. π π§
Amongst the rich variety of tools and experiments he reviewed, here are some favourites:
π₯οΈ Hardware: Ubuntu server on a single machine, equipped with an RTX 5090. Tailscale VPN for secure access without exposing the setup to the internet
π€ AI stack: OpenWebUI (chat interface) + Cline (coding agent) + Vane (AI search). SearXNG as a metasearch engine. LLama scheduler to dynamically route requests, with MoE models (for quick tasks) + dense model (for coding and deep reasoning)
ποΈ KV cache quantization: after several experiments, they opted for a hybrid approach β keeping Keys at higher 8-bit precision while using TurboQuant to compress the values
Thanks to the speakers and all the participants! π