Notas del episodio
What if you could skip half of your LLM’s computations—and still get the same output?
In this episode of One-Shot, we sit down with Tomasz Kolinko, the Warsaw-based founder of Effort Engine—a new AI inference algorithm that dynamically adjusts precision in real time.
This isn’t quantization. It’s something weirder—and maybe more useful.
Tomasz walks us through how he:
- Built a custom algorithm that runs 2–3x faster on MacBooks
- Developed a system that can skip 50%+ of model computations dynamically
- Created heatmaps to visualize token-level divergence
- Benchmarked everything himself… and shared the code
You’ll also see:
- Live demos of inference tuning from 100% to 5%
- Why AI models still work (sometimes better!) with just 30% effort
- How a DIY hac ...