Skip to content

Foundational RL

Spring 2026

1. Program Overview and Objectives

The Foundational Deep Reinforcement Learning program develops and maintains the lab's core reinforcement learning algorithms and experimental infrastructure. Its role is to provide a stable, well-engineered algorithmic backbone that supports both fundamental research in learning and downstream applied efforts in autonomy, spaceflight operations, and learning-enabled astrodynamics.

The program's primary objective is to produce clean, readable, and reproducible reference implementations of modern deep reinforcement learning methods, with sufficient flexibility to enable rapid algorithmic iteration, ablation studies, and benchmarking. A parallel objective is to standardize how experiments are configured, executed, logged, and evaluated across the lab, so that results are comparable and cumulative rather than fragmented. The program is also responsible for ensuring that these workflows scale from local development environments to HPC systems, and that algorithms can transition smoothly from canonical benchmarks to domain-specific simulators such as Basilisk.

This program is explicitly focused on foundational, 6.1-style algorithmic research. However, engineering discipline is treated as a first-class concern: algorithms are expected to be trustworthy, results reproducible, and code reusable by others in the lab.


2. Scope and Non-Goals

Within scope are reference implementations of state-of-the-art model-free and model-based reinforcement learning algorithms, along with the experiment infrastructure required to train, evaluate, visualize, profile, and export them. The program owns standardized environment interfaces and wrappers, logging and visualization utilities, and workflows for executing experiments on HPC systems.

Out of scope are mission- or domain-specific reward engineering, environment-specific heuristics tailored to spaceflight problems, and end-user autonomy pipelines. One-off experiment scripts that cannot be reused or integrated into the shared infrastructure are also explicitly excluded.


3. Core Deliverables for the First Semester

By the end of the semester, the program is expected to deliver a shared MLDS-RL repository that serves as the lab's canonical reinforcement learning codebase. This repository should be clearly structured and explicitly designed for extension.

The core of the repository consists of reference algorithm implementations, including modern world-model–based methods and risk-sensitive model-free and model-based approaches. These implementations should prioritize clarity, modularity, and ease of modification over absolute performance, with the expectation that they will be read, modified, and extended by other researchers.

The repository must also provide standardized environment wrappers that allow algorithms to be run consistently across common benchmarks (such as MiniGrid, NovGrid, the DeepMind Control Suite, and Gymnasium) as well as Basilisk-based environments maintained in external repositories. Environment integration should be treated as a stable interface rather than an ad hoc adaptation.

Shared tooling is a core deliverable. This includes experiment configuration management and deterministic seeding, integration with Weights & Biases for logging and analysis, Ray-based execution support where appropriate, SLURM and bash utilities for large-scale runs, and well-documented model export workflows (e.g., ONNX) for downstream use.

The program is also responsible for providing visualization utilities for rollout inspection, GIF or video generation, and introspection of learned world models and latent spaces. These tools should support both qualitative debugging and research communication.

Benchmarking and evaluation support must be in place, including a small but carefully selected benchmark suite with fast turnaround times, scripts for controlled comparisons between baseline and modified algorithms, and standardized metrics and logging conventions. In parallel, the repository should include documented workflows for running experiments on Zaratan or similar clusters, profiling CPU/GPU utilization and memory usage, and scaling experiments across different SLURM configurations.

Comprehensive documentation is required. This includes a "getting started" guide that walks a new student through running a minimal end-to-end experiment, tutorials for adding new algorithms and environments, and documentation for running experiments locally and on HPC systems. Design documentation should explain core abstractions, assumptions, and intended extension points. The repository must also include basic software quality infrastructure, including unit tests for core components, continuous integration for automated checks, and clearly articulated coding standards and contribution guidelines.


4. User Stories and Intended Usage

The infrastructure developed by this program should support a range of common research workflows. A researcher should be able to implement a candidate modification to an existing algorithm such as Dreamer and evaluate it against a baseline on a suite of fast benchmarks. Adding a new environment should automatically enable evaluation of existing algorithms, with results logged consistently for later analysis. Researchers should be able to load trained policies for qualitative rollout visualization, retrieve policies trained on HPC systems for local debugging, and profile algorithms at scale to identify computational bottlenecks.

The tooling should also support resource characterization, allowing researchers to run controlled experiments across multiple SLURM configurations to gather accurate data for allocation requests. Finally, trained policies should be exportable through a supported and well-documented workflow, and algorithms should be runnable on newly developed Basilisk environments with minimal integration overhead.


5. Development Tasks and Execution Expectations

Over the course of the semester, the program is expected to define and execute an architectural roadmap for the MLDS-RL repository. This includes selecting and justifying algorithm baselines and benchmark tasks, designing core abstractions for algorithms, models, and environments, and implementing at least one complete algorithmic pipeline end-to-end with benchmarking support.

Additional effort should be devoted to formalizing experiment configuration and logging conventions, implementing HPC launch and profiling workflows, developing visualization and inspection tools, writing onboarding documentation and tutorials, and setting up testing and continuous integration. Program members are also expected to identify and document interfaces with applied programs to ensure smooth downstream adoption.


6. Collaboration, Meetings, and Workflow Expectations

Program members meet every two weeks for at least one hour on Fridays. These meetings are used to review algorithmic changes and experimental results, discuss negative results and failed hypotheses, review software design decisions, and collectively read and discuss recent literature.

All substantive work is expected to follow disciplined version-control practices. Program members are responsible for filing issues and pull requests, reviewing each other's code, and actively using the shared infrastructure in their own research to expose weaknesses and missing functionality.


7. Faculty Interface and Code Review

At least once per semester, the program will conduct a structured design and code review with the PI. Program members should be prepared to present a high-level overview of the repository architecture, demonstrate representative experiment configurations and results, and discuss technical debt, limitations, and planned refactors. This review is intended to provide senior-level feedback on correctness, design quality, and long-term sustainability.


8. Undergraduate Integration and Mentorship

Program members are responsible for supervising undergraduates contributing to foundational reinforcement learning tasks. Undergraduate work should focus on running benchmarks, analyzing results, implementing small features or tests, and validating reproducibility of published results. Graduate students are expected to provide well-scoped tasks with clear success criteria and to treat undergraduate engagement as a stress test for usability, documentation quality, and clarity.


9. End-of-Semester Deliverables Checklist

By the end of the Spring 2026 semester, the following items should be complete and verifiable.

Repository and Architecture

  • A single MLDS-RL repository designated as the lab's canonical RL codebase

  • Clear, documented abstractions for algorithms, models, and environments

  • Consistent repository structure aligned with stated design goals

Algorithms and Benchmarks

  • At least one full DRL algorithm implemented end-to-end

  • A baseline implementation and at least one modified or extended variant

  • A small, fast benchmark suite with standardized metrics

Environment Integration

  • Working wrappers for at least two benchmark environment families

  • A documented interface for running algorithms on external Basilisk environments

Experiment Infrastructure

  • Deterministic experiment configuration and seeding

  • Weights & Biases logging integrated and documented

  • Scripts for running experiments locally and on HPC

HPC and Scalability

  • Documented Zaratan (or equivalent) execution workflow

  • Profiling support for CPU/GPU utilization and memory

  • Example SLURM scripts spanning multiple resource configurations

Visualization and Analysis

  • Rollout visualization utilities functional and documented

  • At least one example of world-model or latent-space inspection

Documentation and Onboarding

  • A "getting started" guide that runs end-to-end

  • Tutorials for adding a new algorithm and a new environment

  • Design documentation describing key abstractions and assumptions

Software Quality

  • Unit tests covering core training and environment interfaces

  • Continuous integration pipeline running automatically

  • Coding standards and contribution guidelines documented

Collaboration and Oversight

  • Regular biweekly program meetings held

  • Active use of issues and pull requests by multiple contributors

  • One structured code and design review conducted with the PI

Undergraduate Integration

  • At least one undergraduate successfully running benchmarks or experiments

  • Undergraduate-facing scripts or documentation requiring minimal setup