FedRLHF: A Convergence-Guaranteed Framework for Privacy-Preserving and Personalized RLHF

(updating...) Code release progress

⚠️ we are still testing the setup & configuration script. we will update this page once everything is ready ⚠️

[IMDB experiments] code setup and scripts uploaded
[IMDB experiments] code clean up, test and detailed instructions
[MovieLens experiments] code setup and scripts upload
[MovieLens experiments] code clean up, test and detailed instructions

Overview

This repository contains the official implementation of FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized Reinforcement Learning with Human Feedback, as presented at AAMAS 2025.

FedRLHF combines federated learning principles with reinforcement learning from human feedback (RLHF) to provide:

Privacy-Preserving Training: Securely train models without sharing raw data between clients.
Personalized Reinforcement Learning: Incorporate human feedback to personalize the policy.
Convergence Guarantees: Rigorous proofs for convergence under federated settings.

Features

Two Benchmark Tasks:
- IMDb (Sentiment Analysis + Reward Modeling)
- MovieLens (Recommendation Systems with Federated Policies)
Components:
- Federated Server and Client implementations.
- Reward Modeling and Policy Optimization.
- Tools for Visualization and Performance Analysis.
Live Demo: Showcase of personalization using pre-trained models (planned).

Repository Structure

FedRLHF/
├── IMDb/                        # IMDb-based federated RLHF task
│   ├── centralized_training.py
│   ├── server.py
│   ├── client.py
│   ├── config.py
│   ├── plot_combined_performance.py
│   ├── visualize_rewards_trends.py
│   ├── req.txt                  # Dependencies
│   └── start_multiple_clients.bash
├── MovieLens/                   # MovieLens-based federated RLHF task
│   ├── fed_rlhf/                # Core FedRLHF Implementation
│   │   ├── server.py
│   │   └── client.py
│   ├── utils/                   # Utilities for metrics and visualization
│   ├── models/                  # Reward and Base Models
│   ├── data/                    # Dataset loading
│   ├── environment.yml          # Conda environment setup
│   └── main.py                  # Entry point for MovieLens experiments
└── README.md

Getting Started

Prerequisites

Ensure you have the following installed:

Python >= 3.8
Conda or Virtualenv (for environment setup)
Flower
(updating..)

Setup

Clone the Repository:

git clone https://github.com/flint-xf-fan/Federated-RLHF.git
cd Federated-RLHF

Install Dependencies: For IMDb:

pip install -r IMDb/req.txt

For MovieLens:

conda env create -f MovieLens/environment.yml
conda activate fedrlhf

Running Experiments

IMDb

TODO

MovieLens

TODO

Citation

If you use this code in your research, please cite the following paper (preprint):

@article{fan2024fedrlhf,
  title={FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF},
  author={Fan, Flint Xiaofeng and Tan, Cheston and Ong, Yew-Soon and Wattenhofer, Roger and Ooi, Wei-Tsang},
  journal={arXiv preprint arXiv:2412.15538},
  year={2024}
}

TODO: replace the preprint with aamas version.

License

See the aamas license documentation.

Contact

For questions or collaboration inquiries, please contact:

Name: Flint
Email: [email protected]

Enjoy using FedRLHF!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
exp-LLM-IMDB		exp-LLM-IMDB
.gitignore		.gitignore
FedRLHF-problem.pdf		FedRLHF-problem.pdf
FedRLHF-problem.png		FedRLHF-problem.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FedRLHF: A Convergence-Guaranteed Framework for Privacy-Preserving and Personalized RLHF

(updating...) Code release progress

Overview

Features

Repository Structure

Getting Started

Prerequisites

Setup

Running Experiments

IMDb

MovieLens

Citation

License

Contact

About

Releases

Packages

Languages

flint-xf-fan/Federated-RLHF

Folders and files

Latest commit

History

Repository files navigation

FedRLHF: A Convergence-Guaranteed Framework for Privacy-Preserving and Personalized RLHF

(updating...) Code release progress

Overview

Features

Repository Structure

Getting Started

Prerequisites

Setup

Running Experiments

IMDb

MovieLens

Citation

License

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages