TENG Yan's Homepage | TENG Yan's Homepage

TENG Yan

Shanghai Artificiall Intelligence Laboratory

TENG Yan is currently a research scientist of Safe & Trustworhty Center at Shanghai AI Lab. She received her PhD from Delft University of Technology. Her research focuses on AI safety, value-driven design and alignment of AI, AI governance, and interdisciplinary AI research.

We have opening positions for full-time employees, interns, joint PhDs.

Recent Research

I. Safety Alignment

SafeVid: Toward Safety Aligned Video Large Multimodal Models
NeurIPS, 2025
A2RM: Adversarial-Augmented Reward Model
Submitted to ICLR, 2026
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45∘ Law
arxiv, 2025
The Other Mind: How Language Models Exhibit Human Temporal Cognition
AAAI, 2026
Fake Alignment: Are LLMs Really Aligned Well?
Naacl, 2024

II. Advanced Evaluation

SafeEvalAgent: Toward Agentic and Self -Evolving Safety Evaluation of LLMs
Submitted to ICLR, 2026
A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports
Submitted to ICLR, 2026
GhostEI-Bench: Do Mobile Agent Withstand Environmental Injection in Dynamic On-Device Environments?
Submitted to ICLR, 2026
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?
ACM MM, 2025
LinguaSafe: A Comprehensive Multilingual Safety Benchmark for Large Language Models
Arxiv, 2025
Reflection-Bench: Evaluating Epistemic Agency in Large Language Models
ICML, 2025

III. Attack and Defense

Probing the robustness of large language models safety to latent Perturbations
Submitted to ICLR, 2026
FreezeVLA: Action-Freezing Attacks on Vision-Language-Action Models
Submitted to ICLR, 2026
JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models
NeurIPS, 2025
A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos
ACL findings, 2025
From Evasion to Concealment: Stealthy Knowledge Unlearning for LLMs
ACL findings, 2025
HoneypotNet: Backdoor Attacks Against Model Extraction
AAAI, 2025
StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data
ICCV, 2025
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
ICCV, 2025

IV. Interdisciplinary Research

The Other Mind: How Language Models Exhibit Human Temporal Cognition
AAAI, 2026
From Pixels to Principles: A Decade of Progress and Landscape in Trustworthy Computer Vision
Science and Engineering Ethics, 2024
Collectivism and individualism political bias in large language models: A two-step approach
Big Data & Society, 2025
Chain of Risks Evaluation (CORE): A framework for safer large language models in public mental health
Psychiatry and Clinical Neurosciences, 2025