Model extraction attacks pose significant security threats to deployed language models, potentially compromising intellectual property and user privacy. This survey provides a comprehensive taxonomy of LLM-specific extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt security attacks. We analyze various attack methodologies including API-based knowledge distillation, direct querying, parameter recovery, and prompt stealing techniques that exploit transformer architectures. We then examine defense mechanisms organized into model protection, data privacy protection, and prompt security strategies, evaluating their effectiveness across different deployment scenarios. We propose specialized metrics for evaluating both attack effectiveness and defense performance, addressing the specific challenges of generative language models. Through our analysis, we identify critical limitations in current approaches and propose promising research directions, including integrated attack methodologies and adaptive defense mechanisms that balance security with model utility. This work serves NLP researchers, ML engineers, and security professionals seeking to protect language models in production environments.
Time | Speaker | Title |
---|---|---|
01:00 PM - 01:20 PM | Lincan Li, Kaize Ding, Yue Zhao | Opening and Welcome |
01:20 PM - 01:50 PM | Lincan Li | Background and Motivation: Model Extraction in the Age of LLMs |
01:50 PM - 02:20 PM | Lincan Li | Taxonomy of Model Extraction Attacks in LLMs |
02:20 PM - 02:50 PM | Lincan Li | Defense Techniques Against Model Extraction |
02:50 PM - 03:10 PM | Lincan Li | Evaluation Metrics and Trade-offs |
03:10 PM - 03:40 PM | Lincan Li, Kaize Ding, Yue Zhao | Case Studies and Real-World Scenarios |
03:40 PM - 04:00 PM | Lincan Li, Kaize Ding, Yue Zhao | Research Gaps and Future Directions |