This repository is the official implementation of the paper entitled "A Logical Fallacy-Informed Framework for Argument Generation", by Luca Mouchel, Debjit Paul, Shaobo Cui, Robert West, Antoine Bosselut and Boi Faltings, to be published at NAACL 2025.
The pipeline supports any causal and sequence-to-sequence models from HuggingFace and consists of the following stages:
- Data Collection with ChatGPT
- Supervised Fine-Tuning (SFT)
- Preference Optimization with existing methods (DPO, KTO, CPO, PPO) and our method (FIPO)
- win-rate and fallacy-rate evaluations with GPT-4
We use a python environment using virtualenv
. To create an environment and install the required packages, follow the next steps:
pip install virtualenv
virtualenv fallacies
source fallacies/bin/activate
pip install -r requirements.txt
The scripts for this step are in src/augment_arguments.py
To implement SFT, you can run
python src/preference-optimization/trainer.py --train-using=SFT --train-data=data/sft/train.json --model-name=<HF model-id> --use-peft=True
You can also add extra training arguments which have a default value, including:
python src/preference-optimization/trainer.py ... --n-epochs=<> --batch-size=<> --gradient-accumulation-steps=<> --learning-rate=<> --warmup-steps=<> --weight-decay=<> --logging-steps=<> --save-steps=<> --output-dir=<>
By default, running SFT will save the trained model in models/sft_<model-name>
e.g., if you train with Llama-2-7b (meta-llama/Llama-2-7b-hf
), we will save the model as models/sft_Llama-2-7b-hf
.
For methods requiring a reference model, you don't have to specify the model-id, however, you must provide the path to the reference (SFT) model previously trained.
the required arguments are the following - and you can add extra training arguments as mentioned above
python src/preference-optimization/trainer.py --train-using=DPO --beta=<> --ref-model-path=<Path to SFT model> --train-data=data/preference-data/train.json
For KTO, we have two additional arguments, which both default to 1.0
python src/preference-optimization/trainer.py --train-using=KTO --beta=<> --ref-model-path=<Path to SFT model> --desirable-weight=<> --undesirable-weight=<> --train-data=data/preference-data/train.json
PPO requires explicit reward modelling - for which we need a reward model. The reward model we pick is a binary fallacy argument classifier (0 - not a fallacy; 1 - fallacy) which uses the preferred and dispreferred arguments in the preference dataset. You can train the binary classifier using:
python src/fallacy_classifier/train.py --language-model=<> --epochs=<> --batch-size=<> --val-batch-size=<> --lr=<> --data-dir=<> --gradient-accumulation=<> --train-data=data/preference-data/train.json
By default, we use the howey/electra-large-mnli
language model. The classifier is then saved in the folder models/fallacy_clf/.
You can then run PPO using
python src/preference-optimization/trainer.py --train-using=PPO --ref-model-path=<Path to SFT model> --reward-model-path=models/fallacy_clf/<> --train-data=data/preference-data/train.json
The rewards used during the PPO phase are the "not a fallacy" logits - i.e.,
tokens = reward_tokenizer(generated_responses)
rewards = reward_model(**tokens)[:, 0] ## the model outputs two logits in the form [not a fallacy logit, is a fallacy logit]
we use the not a fallacy logits as rewards.
These methods are easier to run. Here you should not specify a reference model path, instead, specify the huggingface model-id (e.g., meta-llama/Llama-2-7b-hf)
python src/preference-optimization/trainer.py --train-using=CPO --model-name=<HF model id> --beta=<> --train-data=data/preference-data/train.json
This is the method we introduce in our paper, it uses CPO as the backbone preference optimization method, and adds a classification loss on top. The loss is defined by
where in our case
You can also specify the weighting scheme - either uniform or frequency. The frequency works better, as it gives a larger weight to fallacy types occurring more often - teaching the model to learn more from certain fallacy types, rather than having the same penalty for all types. To run FIPO:
python src/preference-optimization/trainer.py --train-using=FIPO --model-name=<HF model id> --lambda-value=<> --weighting-scheme=<frequency or uniform> --beta=<> --train-data=data/preference-data/train.json
Here is an example of running the pipeline:
- SFT -
python src/preference-optimization/trainer.py --train-using=SFT --train-data=data/sft/train.json --model-name=meta-llama/Llama-2-7b-hf --use-peft=True
- DPO -
python src/preference-optimization/trainer.py --train-using=DPO --train-data=data/preference_optimization/train.json --ref-model-path=models/sft_Llama-2-7b-hf
If you find this code useful or our paper useful to your research, please cite:
@article{mouchel2024logical,
title={A logical fallacy-informed framework for argument generation},
author={Mouchel, Luca and Paul, Debjit and Cui, Shaobo and West, Robert and Bosselut, Antoine and Faltings, Boi},
journal={arXiv preprint arXiv:2408.03618},
year={2024}
}