Skip to content

swaggy66/M-ABSA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 

Repository files navigation

M-ABSA

This repo contains the data and code for our EMNLP-2025 paper M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis.

arXiv

Data Description:

This is a dataset suitable for the multilingual ABSA task with triplet extraction.

All datasets are stored in the data/ folder:

  • All dataset contains 7 domains.
domains = ["coursera", "hotel", "laptop", "restaurant", "phone", "sight", "food"]
  • Each dataset contains 21 languages.
langs = ["ar", "da", "de", "en", "es", "fr", "hi", "hr", "id", "ja", "ko", "nl", "pt", "ru", "sk", "sv", "sw", "th", "tr", "vi", "zh"]
  • The labels contain triplets with [aspect term, aspect category, sentiment polarity]. Each sentence is separated by "####", with the first part being the sentence and the second part being the corresponding triplet. Here is an example, where the triplet includes [aspect term, aspect category, sentiment polarity].
This coffee brews up a nice medium roast with exotic floral and berry notes .####[['coffee', 'food quality', 'positive']]
  • Each dataset is divided into training, validation, and test sets.

Code Requirements

We recommend to install the specified version of the following packages:

  • transformers==4.0.0
  • sentencepiece==0.1.91
  • pytorch_lightning==0.8.1

Quick Start for the Baseline

  • Set up the environment as described in the above section.
  • Download the pre-trained mT5-base model from https://huggingface.co/google/mt5-base and place it under the folder mT5-base/ .
  • Run command bash run.sh, which train the model on source language under UABSA/TASD task.
  • Run command bash evaluate.sh, which test the model on target language under UABSA/TASD task.

Detailed Usage: We conduct experiments on two ABSA subtasks with M-ABSA dataset in the paper, you can change the parameters in run.sh to try them:

  • task: tasd for triplet extraction, uabsa for (aspect term - sentiment polarity) pair extraction
  • dataset: one of the seven datasets in [food, restaurant, coursera, laptop, sight, phone, hotel]
python main.py --task tasd \
               --dataset hotel \
               --model_name_or_path mt5-base \
               --paradigm extraction \
               --n_gpu 0 \
               --do_train \
               --do_direct_eval \
               --train_batch_size 16 \
               --gradient_accumulation_steps 2 \
               --eval_batch_size 16 \
               --learning_rate 3e-4 \
               --num_train_epochs 5

Quick Start for the LLM Evaluation

  • Set up the environment as described in the above section.
  • Download the LLMs from huggingface and enter the direction of the model in the "TODO" place holder of each python file for LLM evaluation.

Detailed Usage: We conduct experiments on two ABSA subtasks with M-ABSA dataset in the paper, you can change the parameters on command line directly:

  • model: one of the models in [gemma, llama, mistral, qwen]
  • task: tasd for triplet extraction, uabsa for (aspect term - sentiment polarity) pair extraction
  • test_lang: one of the languages in [ar, da, de, en, es, fr, hi, hr, id, ja, ko, nl, pt, ru, sk, sv, sw, th, tr, vi, zh]
  • type: one of the seven datasets in [food, restaurant, coursera, laptop, sight, phone, hotel]
python {model}_{task}.py  --test_lang "en" --type "food"

Citation

If the code or dataset is used in your research, please star our repo and cite our paper as follows:

@inproceedings{wu-etal-2025-absa,
    title = "{M}-{ABSA}: A Multilingual Dataset for Aspect-Based Sentiment Analysis",
    author = "Wu, ChengYan  and
      Ma, Bolei  and
      Liu, Yihong  and
      Zhang, Zheyu  and
      Deng, Ningyuan  and
      Li, Yanshu  and
      Chen, Baolan  and
      Zhang, Yi  and
      Xue, Yun  and
      Plank, Barbara",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.128/",
    doi = "10.18653/v1/2025.emnlp-main.128",
    pages = "2530--2557",
    ISBN = "979-8-89176-332-6",
}

About

M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •