• pplpod
  • How engineers shrink massive AI m...
Note sull'episodio

The science of Model Compression deconstructs the transition from over-packed data centers to a high-stakes study of Pruning and the architecture of mobile intelligence. This episode of pplpod analyzes the evolution of Quantization, exploring the mechanics of Low-Rank Factorization alongside the mathematical precision of SVD and Deep Compression. We begin our investigation by stripping away the "steamer trunk" facade to reveal a surgical process where lossy compression allows a smartphone to run advanced neural networks without melting the processor. This deep dive focuses on the "Jenga" methodology, deconstructing how engineers utilize Hessian values and magnitude metrics to set non-load-bearing parameters to exactly zero, effec ... 

 ...  Leggi dettagli