This repo contains the data and code for our EMNLP-2025 paper M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis.
This is a dataset suitable for the multilingual ABSA task with triplet extraction.
All datasets are stored in the data/ folder:
- All dataset contains 7 domains.
domains = ["coursera", "hotel", "laptop", "restaurant", "phone", "sight", "food"]
- Each dataset contains 21 languages.
langs = ["ar", "da", "de", "en", "es", "fr", "hi", "hr", "id", "ja", "ko", "nl", "pt", "ru", "sk", "sv", "sw", "th", "tr", "vi", "zh"]
- The labels contain triplets with [aspect term, aspect category, sentiment polarity]. Each sentence is separated by "####", with the first part being the sentence and the second part being the corresponding triplet. Here is an example, where the triplet includes [aspect term, aspect category, sentiment polarity].
This coffee brews up a nice medium roast with exotic floral and berry notes .####[['coffee', 'food quality', 'positive']]
- Each dataset is divided into training, validation, and test sets.
We recommend to install the specified version of the following packages:
- transformers==4.0.0
- sentencepiece==0.1.91
- pytorch_lightning==0.8.1
- Set up the environment as described in the above section.
- Download the pre-trained mT5-base model from https://huggingface.co/google/mt5-base and place it under the folder mT5-base/ .
- Run command bash run.sh, which train the model on source language under UABSA/TASD task.
- Run command bash evaluate.sh, which test the model on target language under UABSA/TASD task.
Detailed Usage: We conduct experiments on two ABSA subtasks with M-ABSA dataset in the paper, you can change the parameters in run.sh to try them:
- task:
tasdfor triplet extraction,uabsafor (aspect term - sentiment polarity) pair extraction - dataset: one of the seven datasets in [
food,restaurant,coursera,laptop,sight,phone,hotel]
python main.py --task tasd \
--dataset hotel \
--model_name_or_path mt5-base \
--paradigm extraction \
--n_gpu 0 \
--do_train \
--do_direct_eval \
--train_batch_size 16 \
--gradient_accumulation_steps 2 \
--eval_batch_size 16 \
--learning_rate 3e-4 \
--num_train_epochs 5
- Set up the environment as described in the above section.
- Download the LLMs from huggingface and enter the direction of the model in the "TODO" place holder of each python file for LLM evaluation.
Detailed Usage: We conduct experiments on two ABSA subtasks with M-ABSA dataset in the paper, you can change the parameters on command line directly:
- model: one of the models in [
gemma,llama,mistral,qwen] - task:
tasdfor triplet extraction,uabsafor (aspect term - sentiment polarity) pair extraction - test_lang: one of the languages in [
ar,da,de,en,es,fr,hi,hr,id,ja,ko,nl,pt,ru,sk,sv,sw,th,tr,vi,zh] - type: one of the seven datasets in [
food,restaurant,coursera,laptop,sight,phone,hotel]
python {model}_{task}.py --test_lang "en" --type "food"
If the code or dataset is used in your research, please star our repo and cite our paper as follows:
@inproceedings{wu-etal-2025-absa,
title = "{M}-{ABSA}: A Multilingual Dataset for Aspect-Based Sentiment Analysis",
author = "Wu, ChengYan and
Ma, Bolei and
Liu, Yihong and
Zhang, Zheyu and
Deng, Ningyuan and
Li, Yanshu and
Chen, Baolan and
Zhang, Yi and
Xue, Yun and
Plank, Barbara",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-main.128/",
doi = "10.18653/v1/2025.emnlp-main.128",
pages = "2530--2557",
ISBN = "979-8-89176-332-6",
}