Reducing NLP Model Embeddings for Deployment in Embedded Systems

Karolyn Babalola, Arnaja Mitra, Jing Qin

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

State-of-the-art natural language processing (NLP) models have revolutionized the way machines understand, generate, and summarize human language; however, these modern techniques take advantage of the general abundance of available computing resources. Deploying such models into resource restricted and/or embedded systems is severely limited due to their memory, network, and power demands. When these models are deployed in resource-limited environments, users must determine the maximum performance degradation they are willing to withstand to meet the requirements of the implementation. This study builds on prior research that assessed the effectiveness of smaller BERT models for use in resource-limited settings. It evaluates the performance of reduced-size BERT models in named-entity recognition (NER) tasks. The main focus is on investigating whether reducing the token embedding size of a model through various dimension-reduction methods can maintain a tolerable level of performance while enabling deployment to more restricted compute environments. In particular, this study employs principal components analysis (PCA), truncated singular value decomposition (TSVD), agglomerative clustering (AC), and uniform manifold approximation and projection (UMAP) to reduce the embedding matrix of pre-trained DistilBERT and discuss optimal hyperparameters.

Original languageEnglish
Title of host publicationAssociation for Women in Mathematics Series
Pages227-241
Number of pages15
DOIs
StatePublished - 2025

Publication series

NameAssociation for Women in Mathematics Series
Volume37
ISSN (Print)2364-5733
ISSN (Electronic)2364-5741

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Funding

The authors would like to thank the Women in Data Science and Mathematics Research Workshop (WiSDM) hosted by UCLA in 2023 for the support of this collaboration and also UCLA IPAM for sponsoring access to an HPC cluster. The research of Qin is supported by the NSF grant DMS-1941197. Acknowledgments The authors would like to thank the Women in Data Science and Mathematics Research Workshop (WiSDM) hosted by UCLA in 2023 for the support of this collaboration and also UCLA IPAM for sponsoring access to an HPC cluster. The research of Qin is supported by the NSF grant DMS-1941197.

FundersFunder number
University of California, Los Angeles
National Science Foundation Arctic Social Science ProgramDMS-1941197

    Keywords

    • BERT model
    • Embedded system
    • NLP
    • Named-entity recognition
    • Token embedding

    ASJC Scopus subject areas

    • Gender Studies
    • General Mathematics

    Fingerprint

    Dive into the research topics of 'Reducing NLP Model Embeddings for Deployment in Embedded Systems'. Together they form a unique fingerprint.

    Cite this