Publications
Below are some recent publications:
- Probing the robustness of large language models safety to latent Perturbations
Arxiv 2026 (under review) - SafeEvalAgent: Toward Agentic and Self-Evolving Safety Evaluation of LLMs
Arxiv 2026 (under review) - GhostEI-Bench: Do Mobile Agent Withstand Environmental Injection in Dynamic On-Device Environments? Arxiv 2026 (under review)
- A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports
Arxiv 2026 (under review) - FreezeVLA: Action-Freezing Attacks on Vision-Language-Action Models
Arxiv 2026 (under review) - FA2RM: Adversarial-Augmented Reward Model Arxiv 2026 (under review)
- The Other Mind: How Language Models Exhibit Human Temporal Cognition
AAAI 2026 - SafeWork-R1: Coevolving Safety and Intelligence under the AI-45∘ Law
Arxiv 2025 - Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?
ACM MM 2025 - SafeVid: Toward Safety Aligned Video Large Multimodal Models NeurIPS 2025
- JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models
NeurIPS 2025 - LinguaSafe: A Comprehensive Multilingual Safety Benchmark for Large Language Models
NeurIPS 2025 - Reflection-Bench: Evaluating Epistemic Agency in Large Language Models
ICML 2025 - A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos
ACL findings 2025 - From Evasion to Concealment: Stealthy Knowledge Unlearning for LLMs
ACL findings 2025 - Collectivism and individualism political bias in large language models: A two-step approach.
Big Data & Society 2025 - HoneypotNet: Backdoor Attacks Against Model Extraction
AAAI 2025 - Chain of Risks Evaluation (CORE): A framework for safer large language models in public mental health
Psychiatry and Clinical Neurosciences 2025 - IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
ICCV 2025 - StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data
ICCV 2025 - MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
NeurIPS 2024 - ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models
EMNLP 2024 - Fake Alignment: Are LLMs Really Aligned Well?
NAACL 2024 - Flames: Benchmarking Value Alignment of LLMs in Chinese
NAACL 2024 - From Pixels to Principles: A Decade of Progress and Landscape in Trustworthy Computer Vision
Science and Engineering Ethics 2024 - From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Report 2024