Abstract
The breakthrough of artificial intelligence (AI) techniques has accelerated their applications in a wide range of industries, such as security protection, transportation, agriculture, and medical care. With the support of edge computing environments, providing latency guaranteed AI as a Service (AIaaS) can accelerate the deployment of data-intensive and computation-intensive AI applications and reduce the investment cost of the customers. However, the deployment architecture and working mechanism design, and performance optimization problems specific for AIaaS with configurable data quality and model complexity have not been studied in existing works. To address the problem, we propose a configurable model deployment architecture (CMDA) for edge AIaaS and present a flexible working mechanism by enabling the joint configuration of data quality ratios (DQRs) and model complexity ratios (MCRs) for the AI tasks. Along with commonly used resource allocation operations, the manager can improve the energy and delay performance of AI services with the desired quality of results (QoRs). We develop an energy-delay minimization problem under the framework of CMDA and propose a polynomial regression based relaxing method to solve the task configuration subproblem. We conduct experiments and simulations on the ImageNet classification and the common objects in context (COCO) object detection tasks using state-of-the-art deep learning models. We present the corresponding result quality tables (RQTs) and QoR regression models to illustrate the proposed method. The results of single task configuration and multi-task configuration and resource allocation on ImageNet classification and COCO object detection tasks demonstrate that the proposed method can achieve over 5× HDEC improvement compared with non-optimization schemes, and also show that joint configuration of DQR and MCR can achieve over 1:2× HDEC improvement compared with the methods that only configure DQR or MCR.
Original language | English |
---|---|
Pages (from-to) | 1954-1969 |
Number of pages | 16 |
Journal | IEEE Transactions on Cloud Computing |
Volume | 11 |
Issue number | 2 |
DOIs | |
State | Published - Apr 1 2023 |
Bibliographical note
Publisher Copyright:© 2022 IEEE.
Keywords
- AI as a Service
- delay-energy optimization
- edge computing
- resource allocation
- task configuration
ASJC Scopus subject areas
- Software
- Information Systems
- Hardware and Architecture
- Computer Networks and Communications
- Computer Science Applications