Publications | TENG Yan's Homepage

Below are some recent publications:

Probing the robustness of large language models safety to latent Perturbations
Arxiv 2026 (under review)
SafeEvalAgent: Toward Agentic and Self-Evolving Safety Evaluation of LLMs
Arxiv 2026 (under review)
GhostEI-Bench: Do Mobile Agent Withstand Environmental Injection in Dynamic On-Device Environments? Arxiv 2026 (under review)
A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports
Arxiv 2026 (under review)
FreezeVLA: Action-Freezing Attacks on Vision-Language-Action Models
Arxiv 2026 (under review)
FA2RM: Adversarial-Augmented Reward Model Arxiv 2026 (under review)
The Other Mind: How Language Models Exhibit Human Temporal Cognition
AAAI 2026
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45∘ Law
Arxiv 2025
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?
ACM MM 2025
SafeVid: Toward Safety Aligned Video Large Multimodal Models NeurIPS 2025
JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models
NeurIPS 2025
LinguaSafe: A Comprehensive Multilingual Safety Benchmark for Large Language Models
NeurIPS 2025
Reflection-Bench: Evaluating Epistemic Agency in Large Language Models
ICML 2025
A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos
ACL findings 2025
From Evasion to Concealment: Stealthy Knowledge Unlearning for LLMs
ACL findings 2025
Collectivism and individualism political bias in large language models: A two-step approach.
Big Data & Society 2025
HoneypotNet: Backdoor Attacks Against Model Extraction
AAAI 2025
Chain of Risks Evaluation (CORE): A framework for safer large language models in public mental health
Psychiatry and Clinical Neurosciences 2025
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
ICCV 2025
StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data
ICCV 2025
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
NeurIPS 2024
ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models
EMNLP 2024
Fake Alignment: Are LLMs Really Aligned Well?
NAACL 2024
Flames: Benchmarking Value Alignment of LLMs in Chinese
NAACL 2024
From Pixels to Principles: A Decade of Progress and Landscape in Trustworthy Computer Vision
Science and Engineering Ethics 2024
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Report 2024