Profitability AI: Build it right. Make it fast. Keep it cheap.
Scaling Personalized Push Notifications
Floris Fok, Senior AI Engineer @ Prosus
This talk explores how we productionize personalized push notifications at scale - moving from proof-of-concept to serving 130 billion tokens per day to nearly half of Brazil's population.
We'll share the journey from traditional CRM systems to personalized-powered notifications, covering the data processing pipeline, key architectural decisions, and operational challenges. Learn the trade-offs we navigated between latency and personalization depth, how we achieved a cost per order under 10 cents, and practical insights into productionizing foundation models for commerce.
LLM distillation explained: Make smarter, cheaper, and deployable AI for enterprises
Mashrur Haider, Tech PM @ Nebius AI Studio
Running large LLMs in production is expensive, but often unnecessary. In this masterclass, Mashrur Haider breaks down how distillation, a popular post-training technique, can cut inference costs by up to 70% while maintaining enterprise-grade performance. You’ll learn how distillation compares to quantization and fine-tuning, seeing real benchmarks. Key takeaways: Distillation 101: How it works and why enterprises use it. Benchmarks: Cost savings without accuracy trade-offs. Workflow: From data prep to deployment on Nebius Token Factory. Scaling: Running distilled models in production with compliance and reliability.