Profitability AI: Build it right. Make it fast. Keep it cheap.

Scaling Personalized Push Notifications

Floris Fok, Senior AI Engineer @ Prosus

This talk explores how we productionize personalized push notifications at scale - moving from proof-of-concept to serving 130 billion tokens per day to nearly half of Brazil's population.

We'll share the journey from traditional CRM systems to personalized-powered notifications, covering the data processing pipeline, key architectural decisions, and operational challenges. Learn the trade-offs we navigated between latency and personalization depth, how we achieved a cost per order under 10 cents, and practical insights into productionizing foundation models for commerce.


LLM distillation explained: Make smarter, cheaper, and deployable AI for enterprises

Mashrur Haider, Tech PM @ Nebius AI Studio

Running large LLMs in production is expensive, but often unnecessary. In this masterclass, Mashrur Haider breaks down how distillation, a popular post-training technique, can cut inference costs by up to 70% while maintaining enterprise-grade performance. You’ll learn how distillation compares to quantization and fine-tuning, seeing real benchmarks. Key takeaways: Distillation 101: How it works and why enterprises use it. Benchmarks: Cost savings without accuracy trade-offs. Workflow: From data prep to deployment on Nebius Token Factory. Scaling: Running distilled models in production with compliance and reliability.

Previous
Previous

Engineering LLMs: Data pipelines, regulations, and reality

Next
Next

Agentic AI in Action : From RAG to Agents and Bringing Archives Back to Life