DRL Spacecraft Ops

Spring 2026

1. Program Overview and Objectives

The Deep Reinforcement Learning for Spaceflight Operations program focuses on applying state-of-the-art deep reinforcement learning methods to realistic, high-fidelity spaceflight problems. The program serves as the lab's primary integration layer between foundational reinforcement learning research and operationally relevant astrodynamics, guidance, and spacecraft autonomy use cases.

The central objective of this program is to develop reusable, lightweight, and well-tested simulation environments that capture the essential structure of real spaceflight operations problems without becoming mission-specific or monolithic. These environments are intended to establish best practices for formulating spaceflight problems as reinforcement learning or semi-Markov decision processes, including clear definitions of state, action, reward, constraints, and termination logic. The program is also responsible for providing shared abstractions, tooling, and examples that downstream research projects can extend without re-implementing core functionality.

A key goal is to enable graduate and undergraduate researchers to rapidly prototype, train, evaluate, and analyze DRL agents in realistic mission scenarios, while maintaining tight integration with the lab's broader ecosystem. This includes direct compatibility with the Foundational DRL program's algorithms and tooling, as well as alignment with astrodynamics latent models and high-fidelity flight dynamics simulators. Although the program is application-driven, environments and tools must remain sufficiently general to support algorithmic research, ablation studies, and operational trade analyses rather than single-use demonstrations.

2. Scope and Non-Goals

This program encompasses environment design and problem formulation for spaceflight operations, integration with high-fidelity dynamics and flight software simulators, and the development of clean interfaces to modern reinforcement learning libraries. It owns metrics, logging, and evaluation pipelines tailored to spaceflight tasks, as well as documentation, tutorials, and onboarding materials that lower the barrier to entry for new users.

Explicitly excluded from scope are new reinforcement learning algorithm development, mission-specific autonomy pipelines, and tightly coupled operational software that cannot be reused across scenarios. The program is not intended to produce flight-qualified autonomy systems, nor to absorb one-off environments that do not align with shared abstractions and reuse goals.

3. Core Deliverables for the First Semester

By the end of the semester, the program is expected to deliver a shared environment repository that serves as the lab's canonical collection of DRL-ready spaceflight environments. This repository should be lightweight and modular, with a clear separation between spacecraft dynamics, sensing, actuation, reward logic, operational constraints, and agent interfaces. Dependencies should be kept minimal and justified, with a strong preference for composability and clarity over feature breadth.

The repository should include a small set of well-documented reference environments representing canonical spaceflight operations problems. These are expected to span multiple mission classes, such as rendezvous and proximity operations, agile Earth observation (including scheduling and strip-imaging constraints), station keeping and orbit maintenance, small-body proximity operations, and cislunar scenarios such as NRHO operations. Each environment should clearly state its problem formulation and assumptions, define observation and action spaces, specify reward and termination logic, and include a baseline controller or heuristic policy for comparison.

In addition to the environments themselves, the program must provide a standardized training and evaluation pipeline that allows DRL agents to be trained using external RL libraries. This includes logging, checkpointing, and evaluation scripts, as well as explicit support for both episodic and semi-MDP formulations where appropriate.

Documentation is a required deliverable. This includes a "getting started" tutorial that walks a new student from installation through training an agent in at least one environment, as well as end-to-end example scripts or notebooks for each reference environment. Design documentation should explain architectural decisions, environment abstractions, and intended extension points. The repository must also include basic software quality infrastructure, including unit tests for core components such as environment stepping, reward computation, and constraint enforcement, along with a continuous integration pipeline and documented coding standards.

4. User Stories and Intended Usage

The tools produced by this program should support a range of realistic research workflows. A graduate researcher should be able to instantiate a representative CubeSat-class spacecraft operating in a cislunar environment and train an autonomous agent to manage station keeping and operations under realistic constraints. A researcher studying interplanetary autonomy should be able to formulate a semi-MDP problem in which discrete decision-making occurs alongside continuously evolving low-thrust dynamics. New students should be able to reproduce baseline experiments directly from documentation and then extend environments with confidence. Undergraduate researchers should be able to run preconfigured experiments and analyze results without modifying core infrastructure. Finally, senior developers and the PI should be able to quickly assess correctness, extensibility, and maintainability through code, tests, and documentation.

5. Development Tasks and Execution Expectations

During the semester, the program is expected to define and execute a clear roadmap for the shared environment repository, including explicit definition of the MVP scope and future extensions. This includes selecting and justifying core software dependencies and DRL interfaces, designing environment abstractions and configuration patterns, and implementing at least one environment end-to-end as a reference example.

Additional effort should be devoted to developing baseline policies or heuristics, writing onboarding documentation and tutorials, setting up unit testing and continuous integration, and identifying opportunities for reuse across other lab programs. Ownership of individual tasks may be distributed, but all areas must be addressed collectively.

6. Collaboration, Meetings, and Workflow Expectations

Program members meet every two weeks for at least one hour on Fridays. These meetings are intended for code walkthroughs and design reviews, discussion of roadblocks and architectural decisions, reading and discussing relevant literature, and aligning environment development with emerging research needs.

All substantive work is expected to follow disciplined version-control practices. Program members are responsible for filing issues and pull requests, reviewing each other's code, and actively integrating shared tools into their own research workflows to expose design gaps and usability issues early.

7. Faculty Interface and Code Review

At least once per semester, the program will conduct a dedicated code and design review with the PI. Program members should prepare a concise technical overview of the repository, highlight key abstractions and extension points, and identify known limitations or areas where senior-level feedback is explicitly requested. This review is intended to ensure architectural coherence and long-term sustainability.

8. Undergraduate Integration and Mentorship

Program members are responsible for overseeing undergraduate researchers within the program. Undergraduate work should primarily involve using existing environments and tools for experiments and analysis, as well as implementing small extensions, scripts, or evaluation studies. Graduate students are expected to ensure that undergraduates can run experiments with minimal friction, provide clear task definitions and success criteria, and treat undergraduate use cases as a stress test for usability and documentation quality.

9. End-of-Semester Deliverables Checklist

By the end of the Spring 2026 semester, the following items should be complete and verifiable.

Repository and Architecture

A single shared repository designated as the canonical DRL spaceflight environments codebase
Clear separation between dynamics, sensing, actuation, rewards, constraints, and agent interfaces
Documented environment abstractions and configuration patterns

Reference Environments

At least three fully functional and documented spaceflight environments
Environments spanning at least two distinct mission classes
Baseline controllers or heuristic policies implemented for each environment

Training and Evaluation

Standardized training interface compatible with external RL libraries
Logging, checkpointing, and evaluation scripts functional and documented
Support demonstrated for at least one semi-MDP formulation

Documentation and Onboarding

A "getting started" guide that runs end-to-end
At least one complete example script or notebook per reference environment
Design documentation describing environment structure and extension points

Software Quality

Unit tests covering core environment functionality
Continuous integration pipeline running tests automatically
Coding standards and contribution guidelines documented

Collaboration and Process

Regular biweekly program meetings held
Active use of issues and pull requests by multiple contributors
Evidence that program members are using shared environments in their own research

Faculty Review and Oversight

One structured code and design review conducted with the PI
Summary of current capabilities, limitations, and next steps produced

Undergraduate Integration

At least one undergraduate successfully running experiments using shared environments
Undergraduate-facing scripts or documentation requiring minimal setup