New Security Risk: Why LLM Scale Doesn't Deter Backdoor Attacks

Cybersecurity Podcast di Sagamore.ai

Note sull'episodio

Today, we are discussing a startling finding that fundamentally challenges how we think about protecting large language models (LLMs) from malicious attacks. We’re diving into a joint study released by Anthropic, the UK AI Security Institute, and The Alan Turing Institute.

As you know, LLMs like Claude are pretrained on immense amounts of public text from across the internet, including blog posts and personal websites. This creates a significant risk: malicious actors can inject specific text to make a model learn undesirable or dangerous behaviors, a process widely known as poisoning. One major example of this is the introduction of backdoors. These are specific phrases, like the trigger

Now, previous research often assumed that attackers needed to control a percentage of the training data. If true ... 

 ...  Leggi dettagli
Parole chiave
VulnerabilityCybersecurity RiskArtificial IntelligenceRisk Management