Improving Multi-Label Business Text Classification with Imbalanced Data: Adjusted BCE Weighting and Threshold Optimization for Rare Labels in BERT Models

Document Type : Original Article

Authors

1 Apadana Institute, Shiraz, Iran

2 Assistance Prof, Apadana Institute, Shiraz, Iran

Abstract

Multi-label classification of business texts in the presence of imbalanced label distributions remains a significant challenge in Natural Language Processing. Tail labels, which are associated with very few training samples, typically exhibit weak predictive performance even when advanced transformer-based models such as BERT are employed. This limitation hinders the reliable identification of rare but potentially valuable business opportunities within large-scale textual data. The present study aims to enhance tail-label performance by introducing an adjusted weighting strategy into the Binary Cross-Entropy (BCE) loss function. The proposed approach consists of two main components. First, a label-specific weight is calculated as the ratio of negative to positive samples for each label and then constrained within a predefined range to prevent excessive dominance of either frequent or rare labels. Second, an optimal decision threshold is determined through grid search over the interval [0.1, 0.9], enabling improved balance between precision and recall across labels. Experiments are conducted on an English multi-label dataset containing 1,000 samples and 20 imbalanced labels, with label frequencies varying from 180 to 5 instances. The data are split into 80% training and 20% testing sets. Results show that the weighted BERT model achieves a Hamming accuracy of 0.623, a macro-F1 score of 0.091, and a tail-label F1 score of 0.025. Notably, using only one twenty-eighth of the baseline dataset size, the model retains approximately 70% of baseline accuracy while improving tail-label performance compared to the unweighted setting. The method offers a practical, computationally efficient solution for data-scarce and resource-constrained environments.

Keywords

Main Subjects


  Arslan, M., & Cruz, C. (2023). Business-text classification with imbalanced data and moderately large label spaces for digital transformation. arXiv preprint arXiv:2306.07046.
  Tang, X., Liu, Y., & Zhang, J. (2021). Research on automatic labeling of imbalanced texts using BERT and word2vec. Scientific Reports, 11(1), Article 11855.
  Tsai, C.-F., Wu, H.-C., & Hu, Y.-H. (2023). A comparative study on multi-label text classification methods with imbalanced label distribution. Expert Systems with Applications, 215, Article 119387.
  Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 85(3), 333–359.
  Cui, Y., Jia, M., Lin, T.-Y., Song, Y., & Belongie, S. (2019). Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 9268–9277).
  Wang, M., Li, Y., & Liu, Z. (2023). An empirical study on active learning for multi-label text classification. In Proceedings of the 3rd Workshop on Insights from Neural Generative Language Models (pp. 102–110).
  Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (pp. 2980–2988).
  Cui, Y., Jia, M., Lin, T.-Y., Song, Y., & Belongie, S. (2019).Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 9268–9277).
  Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019).
BERT: Pre-training of deep bidirectional transformers for language understanding.
In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT) (pp. 4171–4186).
  Johnson, R., & Lee, T. (2019).Handling rare labels in multi-label text classification.
Journal of Artificial Intelligence Research, 65, 1–24.
  Liu, Y., Ott, M., Goyal, N., et al. (2019).RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint, arXiv:1907.11692.
  Nam, J., Kim, J., Mencía, E. L., Gurevych, I., & Fürnkranz, J. (2017).Large-scale multi-label text classification—Revisiting neural networks.In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) (pp. 437–452).
  Tsai, C.-F., Wu, H.-C., & Hu, Y.-H. (2023).A comparative study on multi-label text classification methods with imbalanced label distribution. Expert Systems with Applications, 215, 119387.
  Wang, M., Li, Y., & Liu, Z. (2021). Threshold optimization for imbalanced multi-label text classification. In Proceedings of the Workshop on Insights from Neural Generative Models (pp. 102–110).