Tag: LLM Safety
-

Adversarial Reinforcement Learning for LLM Agent Safety
As large language models evolve from passive assistants into tool-using agents, a new class of risk emerges. These agents can browse the web, read emails, query databases, and take actions…

