S2E5 | What senses do agents need to act? — The Shift Podcast by Microsoft Azure

The Shift: Your open questions ab...

S2E5 | What senses do agents need...

S2E5 | What senses do agents need to act? — The Shift Podcast by Microsoft Azure

The Shift: Your open questions about agents, honest discussi... di Microsoft

S2 · E5

7 apr 2026

25:05

Note sull'episodio

For AI agents to move from reasoning to action, they need more than text alone. But what “senses” actually matter?

In this episode of The Shift Podcast: Agentic Edition, members of the Microsoft Foundry team discuss how multimodal inputs—such as text, vision, and speech—shape how agents perceive and interact with the world. The conversation explores what’s practical today, rather than assuming fully autonomous systems.

The discussion covers:

· Why multimodal AI expands what agents can understand.

· How vision, voice, and text models are combined in applications.

· The role of tools and APIs in enabling agent action.

· Where modality adds value—and where it introduces complexity.

Rather than framing modalities as future cap ...

Leggi dettagli

Parole chiave

generative AIagentsagentic AIAgentic Systemsai agentsbuilding agents, ai platformsMultimodal AIAI app development API management API orchestration