How Tomasz Kolinko Is Rewriting the Rules of AI Inference

AI Tinkerers - "One-Shot" por Joe Heitzeberg

Notas del episodio

What if you could skip half of your LLM’s computations—and still get the same output?

In this episode of One-Shot, we sit down with Tomasz Kolinko, the Warsaw-based founder of Effort Engine—a new AI inference algorithm that dynamically adjusts precision in real time.

This isn’t quantization. It’s something weirder—and maybe more useful.

Tomasz walks us through how he:

- Built a custom algorithm that runs 2–3x faster on MacBooks

- Developed a system that can skip 50%+ of model computations dynamically

- Created heatmaps to visualize token-level divergence

- Benchmarked everything himself… and shared the code

You’ll also see:

- Live demos of inference tuning from 100% to 5%

- Why AI models still work (sometimes better!) with just 30% effort

- How a DIY hac ... 

 ...  Leer más
Palabras clave
Local AI InferenceAI optimizationLLM performanceMachine learning efficiencyModel computationAI algorithmsEffort EngineAI accelerationDeep learning performanceAI research and innovation