Reducing NLP Model Embeddings for Deployment in Embedded Systems

Karolyn Babalola, Arnaja Mitra, Jing Qin

Producción científica: Chapterrevisión exhaustiva

Resumen

State-of-the-art natural language processing (NLP) models have revolutionized the way machines understand, generate, and summarize human language; however, these modern techniques take advantage of the general abundance of available computing resources. Deploying such models into resource restricted and/or embedded systems is severely limited due to their memory, network, and power demands. When these models are deployed in resource-limited environments, users must determine the maximum performance degradation they are willing to withstand to meet the requirements of the implementation. This study builds on prior research that assessed the effectiveness of smaller BERT models for use in resource-limited settings. It evaluates the performance of reduced-size BERT models in named-entity recognition (NER) tasks. The main focus is on investigating whether reducing the token embedding size of a model through various dimension-reduction methods can maintain a tolerable level of performance while enabling deployment to more restricted compute environments. In particular, this study employs principal components analysis (PCA), truncated singular value decomposition (TSVD), agglomerative clustering (AC), and uniform manifold approximation and projection (UMAP) to reduce the embedding matrix of pre-trained DistilBERT and discuss optimal hyperparameters.

Idioma originalEnglish
Título de la publicación alojadaAssociation for Women in Mathematics Series
Páginas227-241
Número de páginas15
DOI
EstadoPublished - 2025

Serie de la publicación

NombreAssociation for Women in Mathematics Series
Volumen37
ISSN (versión impresa)2364-5733
ISSN (versión digital)2364-5741

Nota bibliográfica

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Financiación

The authors would like to thank the Women in Data Science and Mathematics Research Workshop (WiSDM) hosted by UCLA in 2023 for the support of this collaboration and also UCLA IPAM for sponsoring access to an HPC cluster. The research of Qin is supported by the NSF grant DMS-1941197. Acknowledgments The authors would like to thank the Women in Data Science and Mathematics Research Workshop (WiSDM) hosted by UCLA in 2023 for the support of this collaboration and also UCLA IPAM for sponsoring access to an HPC cluster. The research of Qin is supported by the NSF grant DMS-1941197.

FinanciadoresNúmero del financiador
University of California, Los Angeles
National Science Foundation Arctic Social Science ProgramDMS-1941197

    ASJC Scopus subject areas

    • Gender Studies
    • General Mathematics

    Huella

    Profundice en los temas de investigación de 'Reducing NLP Model Embeddings for Deployment in Embedded Systems'. En conjunto forman una huella única.

    Citar esto