[1] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017, December 4-9).
Attention is all you need. 31st Conference on Neural Information Processing System, Long Beach, California, USA.
https://doi .org/10.48550/arxiv.1706.03762
[2] Devlin, J., Chang, M-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding.
Computation and Language, 1-16.
https://doi.org/10.48550/arXiv.1810.04805
[3] Agerri, R., Vicente, I. S., Campos, J. A., Barrena, A., Saralegi, X., Soroa, A., & Agirre, E. (2020, May 11-16).
Give your text representation models some love: the case for basque. Proceedings of the 12th Conference on Language Resources and Evaluation, Marseille, France.
https://doi.org/10.48550/arXiv.2004.00033
[4] Martin, L., Muller, B., Suárez, P. J. O., Dupont, Y., Romary, L., de La Clergerie, É. V., Seddah, D., & Sagot, B. (2019, July 5-10).
CamemBERT: a tasty French language model. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, Washington.
http://dx.doi.org/10.18653/v1/2020.acl-main.645
[5] Virtanen, A., Kanerva, J., Ilo, R., Luoma, J., Luotolahti, J., Salakoski, T., Ginter, F., & Pyysalo, S. (2019). Multilingual is not enough: BERT for Finnish.
Computation and Language, 1-14.
https://doi.org/10.48550/arXiv.1912.07076
[6] Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2019, July 5-10).
Unsupervised cross-lingual representation learning at scale. The 58th Annual Meeting of the Association for Computational Linguistics, Seattle, Washington.
https://doi.org/10.48550/arXiv.191 1.02116
[7] Farahani, M., Gharachorloo, M., Farahani, M., & Manthouri, M. (2021). ParsBERT: Transformer-based Model for Persian Language Understanding.
Neural Processing Letters,
53(6), 3831-3847.
https://doi.org/10.1007/s11063-021-10528-4
[8] Taghizadeh, N., Doostmohammadi, E., Seifossadat, E., Rabiee, H. R., & Tahaei, M. S. (2021). SINA-BERT: a pre-trained language model for analysis of medical texts in Persian.
Computation and Language, 1-9.
https://doi.org/10.48550/arXiv.2104.076 13
[9] Huang, G., & Hu, H. (2019). c-RNN: A Fine-Grained Language Model for Image Captioning.
Neural Processing Letters,
49(2), 683-691.
https://doi.org/10.1007/s11063-018-9836-2
[10] Niu, J., Yang, Y., Zhang, S., Sun, Z., & Zhang, W. (2019). Multi-task Character-Level Attentional Networks for Medical Concept Normalization.
Neural Processing Letters,
49(3), 1239-1256.
https://doi.org/10.1007/s11063-018-9873-x
[11] Dai, A. M., & Le, Q. V. (2015, December 7-12).
Semi-supervised sequence learning. Annual Conference on Neural Information Processing Systems 2015, Montreal, Quebec, Canada.
https://proceedings.neurips.cc/paper_files/paper/2015/hash/7137debd45ae4d0 ab9aa953017286b20-Abstract.html
[12] Ramachandran, P., Liu, P. J., & Le, Q. V. (2017, September 7-11).
Unsupervised pretraining for sequence to sequence learning. Conference on Empirical Methods in Natural Language Processing 2017, Denmark.
https://doi.org/10.48550/arXiv.1611.02683
[13] Sutskever, I., Vinyals, O., & Le, Q. V. (2014, December 8-13).
Sequence to sequence learning with neural networks 28th Annual Conference on Neural Information Processing Systems 2014, Montreal, Canada.
https://proceedings.neurips.cc/paper_files/paper/20 14/hash/a14ac55a4f27472c5d894ec1c3c743d2-Abstract.html
[14] Howard, J., & Ruder, S. (2018, July 15-20).
Universal language model fine-tuning for text classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia.
https://doi.org/10.18653/v1/P18-1031
[15] Graves, A. (2012). Long Short-Term Memory. In A. Graves (Ed.),
Supervised Sequence Labelling with Recurrent Neural Networks. Springer Berlin Heidelberg.
https://doi. org/10.1007/978-3-642-24797-2_4
[16] Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.
University of British Columbia,
12, 1-12.
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=dOad5HoAAAAJ&citation_for_view=dOad5HoAAAAJ:W7OEmFMy1HYC
[17] Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019, December 8-14).
Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, Vancouver, British Columbia, Canada.
https://proceedings.neurips.cc/paper/2019/hash/dc6a7e655d7e5840e66733e 9ee67cc69-Abstract.html
[18] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach.
Computation and Language, 1-13.
https://doi.org/10.48550/arXiv.1907.11692
[19] Lample, G., & Conneau, A. (2019, December 13-14).
Cross-lingual language model pretraining. The 33rd Annual Conference on Neural Information Processing Systems, Vancouver, Canada.
https://doi.org/10.48550/arXiv.1901.07291
[20] Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer.
The Journal of Machine Learning Research,
21(1), 5485-5551.
https://arxiv.org/abs/1910.10683
[21] Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020, April 26-30).
Albert: A lite bert for self-supervised learning of language representations. 8th International Conference on Learning Representations, Addis Ababa, Ethiopia.
http s://doi.org/10.48550/arXiv.1909.11942
[22] Beltagy, I., Lo, K., & Cohan, A. (2019). SciBERT: A pretrained language model for scientific text.
Computation and Language, 1-6.
https://doi.org/10.48550/arXiv.190 3.10676
[23] Araci, D. (2019).
Finbert: Financial sentiment analysis with pre-trained language models [Master, Amsterdam]. Netherlands.
https://arxiv.org/abs/1908.10063
[24] Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2019). BioBERT: a pre-trained biomedical language representation model for biomedical text mining.
Bioinformatics,
36(4), 1234-1240.
https://doi.org/10.1093/bioinformatics/btz682
[25] Huang, K., Altosaar, J., & Ranganath, R. (2020, April 2-4).
Clinicalbert: Modeling clinical notes and predicting hospital readmission. Conference on Health, Inference, and Learning 2020, Toronto, Ontario, Canada.
https://arxiv.org/abs/1904.05342
[26] Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., & Androutsopoulos, I. (2020, November 16-20).
LEGAL-BERT: The muppets straight out of law school. The 2020 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic.
https://arxiv.org/abs/2010.02559
[27] De Vries, W., Van Cranenburgh, A., Bisazza, A., Caselli, T., Van Noord, G., & Nissim, M. (2019). Bertje: A dutch bert model.
Computation and Language, 1-6.
https://doi. org/10.48550/arXiv.1912.09582
[28] Polignano, M., Basile, P., De Gemmis, M., Semeraro, G., & Basile, V. (2019, November 13-19).
Alberto: Italian BERT language understanding model for NLP challenging tasks based on tweets. 6th Italian Conference on Computational Linguistics,, Bari, Italy.
https://iris.unito.it/handle/2318/1759767
[29] Antoun, W., Baly, F., & Hajj, H. (2020, May 11-16).
Arabert: Transformer-based model for arabic language understanding. Proceedings of the Twelfth International Conference on Language Resources and Evaluation, Marseille, France.
https://doi.org/10.48550/arXiv .2003.00104
[30] Joshi, M., Chen, D., Liu, Y., Weld, D. S., Zettlemoyer, L., & Levy, O. (2020). SpanBERT: Improving Pre-training by Representing and Predicting Spans.
Transactions of the Association for Computational Linguistics,
8, 64-77.
https://doi.org/10.1162/tacl_a_00300