CORL (Clean Offline Reinforcement Learning)

🧵 CORL is an Offline Reinforcement Learning library that provides high-quality and easy-to-follow single-file implementations of SOTA offline reinforcement learning algorithms. Each implementation is backed by a research-friendly codebase, allowing you to run or tune thousands of experiments. Heavily inspired by cleanrl for online RL, check them out too! The highlight features of CORL are:

📜 Single-file implementation
📈 Benchmarked Implementation (11+ offline algorithms, 5+ offline-to-online algorithms, 30+ datasets with detailed logs )
🖼 Weights and Biases integration

You can read more about CORL design and main results in our technical paper.

Tip

⭐ If you're interested in __discrete control__, make sure to check out our new library — [Katakomba](https://github.com/corl-team/katakomba). It provides both discrete control algorithms augmented with recurrence and an offline RL benchmark for the NetHack Learning environment.

Info

**Minari** and **Gymnasium** support: [Farama-Foundation/Minari](https://github.com/Farama-Foundation/Minari) is the
next generation of D4RL that will continue to be maintained and introduce new features and datasets. 
Please see their [announcement](https://farama.org/Announcing-Minari) for further detail. 
We are currently slowly migrating to the Minari and the progress
can be tracked [here](https://github.com/corl-team/CORL/issues/2). This will allow us to significantly update dependencies 
and simplify installation, and give users access to many new datasets out of the box!

Warning

CORL (similarily to CleanRL) is not a modular library and therefore it is not meant to be imported.
At the cost of duplicate code, we make all implementation details of an ORL algorithm variant easy 
to understand. You should consider using CORL if you want to 1) understand and control all implementation details 
of an algorithm or 2) rapidly prototype advanced features that other modular ORL libraries do not support.

Algorithms Implemented

Algorithm	Variants Implemented	Wandb Report
Offline and Offline-to-Online
✅ Conservative Q-Learning for Offline Reinforcement Learning (CQL)	`offline/cql.py` `finetune/cql.py` docs	`Offline` `Offline-to-online`
✅ Accelerating Online Reinforcement Learning with Offline Datasets (AWAC)	`offline/awac.py` `finetune/awac.py` docs	`Offline` `Offline-to-online`
✅ Offline Reinforcement Learning with Implicit Q-Learning (IQL)	`offline/iql.py` `finetune/iql.py` docs	`Offline` `Offline-to-online`
Offline-to-Online only
✅ Supported Policy Optimization for Offline Reinforcement Learning (SPOT)	`finetune/spot.py` docs	`Offline-to-online`
✅ Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning (Cal-QL)	`finetune/cal_ql.py` docs	`Offline-to-online`
Offline only
✅ Behavioral Cloning (BC)	`offline/any_percent_bc.py` docs	`Offline`
✅ Behavioral Cloning-10% (BC-10%)	`offline/any_percent_bc.py` docs	`Offline`
✅ A Minimalist Approach to Offline Reinforcement Learning (TD3+BC)	`offline/td3_bc.py` docs	`Offline`
✅ Decision Transformer: Reinforcement Learning via Sequence Modeling (DT)	`offline/dt.py` docs	`Offline`
✅ Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble (SAC-N)	`offline/sac_n.py` docs	`Offline`
✅ Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble (EDAC)	`offline/edac.py` docs	`Offline`
✅ Revisiting the Minimalist Approach to Offline Reinforcement Learning (ReBRAC)	`offline/rebrac.py` docs	`Offline`
✅ Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size (LB-SAC)	`offline/lb_sac.py` docs	`Offline Gym-MuJoCo`

Citing CORL

If you use CORL in your work, please use the following bibtex

@inproceedings{
tarasov2022corl,
  title={CORL: Research-oriented Deep Offline Reinforcement Learning Library},
  author={Denis Tarasov and Alexander Nikulin and Dmitry Akimov and Vladislav Kurenkov and Sergey Kolesnikov},
  booktitle={3rd Offline RL Workshop: Offline RL as a ''Launchpad''},
  year={2022},
  url={https://openreview.net/forum?id=SyAS49bBcv}
}