How engineers shrink massive AI models

pplpod
How engineers shrink massive AI m...

pplpod di pplpod

E5721

24:34

Note sull'episodio

The science of Model Compression deconstructs the transition from over-packed data centers to a high-stakes study of Pruning and the architecture of mobile intelligence. This episode of pplpod analyzes the evolution of Quantization, exploring the mechanics of Low-Rank Factorization alongside the mathematical precision of SVD and Deep Compression. We begin our investigation by stripping away the "steamer trunk" facade to reveal a surgical process where lossy compression allows a smartphone to run advanced neural networks without melting the processor. This deep dive focuses on the "Jenga" methodology, deconstructing how engineers utilize Hessian values and magnitude metrics to set non-load-bearing parameters to exactly zero, effec ...

... Leggi dettagli

Funzionalità

Risorse

Podcasts

How engineers shrink massive AI models

pplpod di pplpod

Note sull'episodio