Meta learning with language models: Challenges and opportunities in the classification of imbalanced text

Vassilev, Apostol; Jin, Honglan; Hasan, Munawar

doi:https://doi.org/10.48550/arXiv.2310.15019

Other

Meta learning with language models: Challenges and opportunities in the classification of imbalanced text

Documentation Topics

Date Published: 10/23/2023

Author(s)

Apostol Vassilev (NIST), Honglan Jin (NIST), Munawar Hasan (NIST)

Abstract

Detecting out of policy speech (OOPS) content is important but difficult. While machine learning is a powerful tool to tackle this challenging task, it is hard to break the performance ceiling due to factors like quantity and quality limitations on training data and inconsistencies in OOPS definition and data labeling. To realize the full potential of available limited resources, we propose a meta learning technique (MLT) that combines individual models built with different text representations. We analytically show that the resulting technique is numerically stable and produces reasonable combining weights. We combine the MLT with a threshold-moving (TM) technique to further improve the performance of the combined predictor on highly-imbalanced in-distribution and out-of-distribution datasets. We also provide computational results to show the statistically significant advantages of the proposed MLT approach.

Keywords

natural language processing; Out of policy speech detection; Meta learning; Deep learning; Large Language Models

Control Families

None selected

Documentation

Publication:
https://doi.org/10.48550/arXiv.2310.15019

Supplemental Material:
None available

Document History:
10/23/23: Other (Final)

Topics

Technologies

artificial intelligence