Hunyuan-Large: An Open-Source MoE...

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

AI Papers Podcast Daily di AIPPD

Note sull'episodio

This document describes Hunyuan-Large, a large open-source language model developed by Tencent. This model utilizes a Mixture of Experts (MoE) architecture, which leverages multiple specialized sub-models to improve performance on a variety of tasks. Hunyuan-Large was trained on a massive dataset, including a significant amount of synthetic data, and utilizes several techniques to optimize performance, such as key-value cache compression, expert routing, and expert-specific learning rate scaling. The model is evaluated on a wide range of benchmarks, demonstrating its superior capabilities in areas such as language understanding, generation, logical reasoning, mathematics, coding, ... 

Leggi dettagli
Parole chiave
AIai research papersai researcharxivarxiv.orgai paperslatest ai researcharXiv AI papersAI breakthroughslatest AI developmentsAI research summaries