Publications
Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects
ICML 2025 Workshop on Reliable and Responsible Foundation Models
We evaluate the capability of large language models to conduct personalized phishing attacks and compare their performance with human experts. AI-automated attacks performed on par with human experts (54% click-through rate) and 350% better than the control group. Our custom-built tool automates the entire spear phishing process, including information gathering and creating personalized vulnerability profiles for each target.
Can AI Models be Jailbroken to Phish Elderly Victims? An End-to-End Evaluation
The 3rd International AI Governance Workshop (AIGOV), held in conjunction with AAAI 2026
We demonstrate how attackers can exploit AI safety vulnerabilities to target vulnerable populations. Testing safety guardrails across six leading language models using four distinct attack categories, we found critical failures where several models exhibited near-complete susceptibility to certain attack vectors. In a study with 108 senior participants, AI-generated phishing emails successfully compromised 11% of participants.