- Soft-Actor-Critic-and-Extensions: SAC with PER, ERE, Munchausen, D2RL, parallel envs
- CQL: Conservative Q-Learning for offline RL (DQN-CQL & SAC-CQL)
- DQN-Atari-Agents: Modular DDQN, Dueling, Noisy, C51, Rainbow, DRQN
- IQN-and-Extensions: Implicit Quantile Networks with PER, Noisy, N-step, Dueling
- Deep-Reinforcement-Learning-Algorithm-Collection: Reference implementations across deep RL
- Upside-Down-Reinforcement-Learning: Schmidhuber's ⅂ꓤ in PyTorch
- bricksrl: LEGO-based platform for democratizing robotics and RL research · project page
- Autonomous-Robocar: Self-driving RC-car: Raspberry Pi + CNN predicting steering and throttle from camera
- torchtrade: Modular RL framework for algorithmic trading · project page
- DistRL-LLM: Distributed RL for LLM fine-tuning across multiple GPUs
- SCoRe: Training language models to self-correct via RL
- artificial-agent-lab: Autonomous research lab: PI and PhD agents run experiments and write papers
- sft-kl-lora-trainer:
trl.SFTTrainerwith a KL divergence loss between LoRA adapter and base model - Agent-Tool-RL: Teaching small language models to use tools with RL
- CoT-Decoding: Chain-of-Thought reasoning without prompting
- nanoDiff: Minimal, hackable diffusion language model — nanoGPT for the LLaDA recipe




