Empowering Large Language Models to Edge Intelligence: A Survey of Edge Efficient LLMs and Techniques

作者： RuiWang ¹ ZhiyongGao ¹ LiuyangZhang ¹ ShuaibingYue ¹ ZiyiGao ¹
作者单位：

1. University of Science and Technology Beijing
通讯作者： RuiWang Email:wangrui@ustb.edu.cn
提交时间：2024-11-25 10:02:34

摘要:
Large language models (LLMs) have showcased exceptional capabilities across various natural language processing (NLP) tasks in recent years, such as machine translation, text summarization, and question answering. Despite their impressive performance, the deployment of these models on edge devices, such as mobile phones, IoT devices, and edge computing nodes, is significantly hindered by their substantial computational and memory requirements. This survey provides a comprehensive overview of the state-of-the-art techniques and strategies for enabling efficient inference of LLMs on edge devices. We explore approaches including the development of small language models (SLMs), model compression techniques, inference optimization strategies, and dedicated frameworks for edge deployment. Our goal is to highlight the advancements and ongoing challenges in this field, offering valuable insights for researchers and practitioners striving to bring the power of LLMs to edge environments.

Large Language Model Edge Intelligence Small Language Model Model Compression Efficient Inference On-device LLM

来自： 王睿
分类： 计算机科学 >> 自然语言理解与机器翻译
投稿状态： 已投稿期刊
引用： ChinaXiv:202411.00258 (或此版本 ChinaXiv:202411.00258V1)
DOI:10.12074/202411.00258
CSTR:32003.36.ChinaXiv.202411.00258
科创链TXID： a436ec51-cd78-4432-bc14-e51c0ba61b2d
推荐引用方式： RuiWang,ZhiyongGao,LiuyangZhang,ShuaibingYue,ZiyiGao.Empowering Large Language Models to Edge Intelligence: A Survey of Edge Efficient LLMs and Techniques.中国科学院科技论文预发布平台.[DOI:10.12074/202411.00258] (点此复制)

版本历史

[V1]

2024-11-25 10:02:34

ChinaXiv:202411.00258V1

下载全文

相关论文推荐

1. MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning	2025-06-10
2. Semantic structures within natural language and their cognitive functions	2025-06-03
3. Physical models realizing the transformer architecture of large language models	2025-05-27
4. DO-RAG: A Domain-Specific QA Framework Using Knowledge Graph-Enhanced Retrieval-Augmented Generation	2025-05-20
5. Understanding Real-World Vulnerabilities in Distributed Cloud Systems	2025-05-08
6. Mathematical formalism and physical models for generative artificial intelligence	2025-05-07
7. What surface characteristics truly affect thermal contact resistance -- An interpretability study based on deep learning and convolutional neural networks	2025-04-11
8. The Thermal Contact Resistance Dataset and the Artificial Intelligence-Driven Prediction of Thermal Contact Resistance in Multi-material Systems	2025-04-11
9. AD-Det: Boosting Object Detection in UAV Images with Focused Small Objects and Balanced Tail Classes	2025-04-06
10. Utilizing Large Language Models to Analyze PSR.exe Recorded Input for Computer Use	2025-03-21


公开评论匿名评论仅发给作者