Tech Frontier
Jamba: A Hybrid Transformer-Mamba...

Jamba: A Hybrid Transformer-Mamba Language Model

Tech Frontier por Julien Rineau

03:38

Notas del episodio

We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU. Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tok ...

Palabras clave

Language processing

Funcionalidades

Recursos

Podcasts

Jamba: A Hybrid Transformer-Mamba Language Model

Tech Frontier por Julien Rineau

Notas del episodio

Palabras clave