Skip to content

TALOS v0.3.0 Technology Roadmap

Date: April 2026 Version: 0.3.0 Authors: Libre Space Foundation Scope: Forward-looking technology assessment, migration paths, and prioritized roadmap for the TALOS ground station network


Table of Contents

  1. Current Tech Stack Assessment
  2. Orbital Propagation
  3. Visualization
  4. Messaging
  5. Scheduling
  6. Frontend Architecture
  7. Edge Computing
  8. AI/ML Opportunities
  9. Prioritized Roadmap

1. Current Tech Stack Assessment

What is working well

Component Technology Verdict
API server FastAPI 0.100+ / Uvicorn Excellent. Async-native, OpenAPI docs for free, strong ecosystem. No reason to move.
ORM / schema SQLModel 0.0.14+ / Pydantic v2 Good fit. Single model definition for DB + API. Alembic migrations in place.
Orbital math Skyfield + NumPy Correct and well-tested. Pure-Python SGP4 with validated ephemeris loading.
Design system Custom CSS on Astro UXDS tokens Professional space-operations aesthetic. Lightweight, no framework lock-in.
CI/CD GitLab CI with DAG stages, Docker 24, Fly.io deploy Mature pipeline: lint, unit, integration, security scanning, Pages docs.
Auth Magic-link email + JWT sessions Passwordless, simple, appropriate for current scale.
Database PostgreSQL 15 via psycopg2-binary Solid choice. Room to leverage JSONB, PostGIS, and pg_cron as complexity grows.

What needs attention

Component Technology Concern
Agent Pure Python, synchronous threads, raw sockets High resource use on Raspberry Pi. No reconnect logic. Global mutable state.
Frontend Server-rendered Jinja2 + vanilla JS (~800 lines inline) Dashboard JS is monolithic. No component reuse, no type checking, no bundler.
MQTT client (browser) Paho MQTT JS 1.0.1 (Eclipse, 2014) Unmaintained. No MQTT 5.0 support. WebSocket-only with manual reconnect.
Propagation Serial Skyfield calls per satellite O(n) wall-clock time for n satellites. No batch propagation.
Scheduling Manual station-mission assignment via UI No automated conflict detection, no optimization, no multi-pass campaign planning.
Map Leaflet 1.9.4 (2D) Adequate for ground tracks, but no 3D orbit visualization, no terrain, no sensor cones.

2. Orbital Propagation

Current: Skyfield SGP4

TALOS uses Skyfield's EarthSatellite for all orbital computations: pass prediction (find_events), Doppler shift, ground tracks, and footprint circles. The code in director/physics.py is clean and stateless, but each satellite is propagated serially. With 48 time steps per ground track and potentially hundreds of satellites across campaigns, this becomes a bottleneck.

Option A: dSGP4 (ESA) for batch/GPU propagation

dSGP4 (PyTorch-based, released 2024 by ESA's Advanced Concepts Team) offers:

  • Batch CPU propagation: Propagate 1,000 TLEs across 100 time steps in a single vectorized call. On CPU alone this is 10-50x faster than serial Skyfield loops.
  • GPU acceleration: With a CUDA-capable GPU, batch propagation scales to 10,000+ objects trivially. Relevant if TALOS grows to full-catalog scheduling.
  • ML-corrected model: Neural network corrections trained on historical SP data reduce SGP4's known drag-modeling errors by up to 40% for LEO objects.
  • Differentiability: Enables gradient-based orbit determination if TALOS ever ingests station observations.

Trade-offs: Adds PyTorch as a dependency (~800 MB). The ML correction model requires periodic retraining. API is lower-level than Skyfield -- no built-in find_events, so pass prediction logic must be reimplemented using root-finding on elevation angles.

Recommended migration path: Keep Skyfield for single-satellite interactive operations (Doppler, real-time tracking). Introduce dSGP4 as an optional backend in the Director for bulk pass prediction across campaigns. Abstract propagation behind a Propagator protocol so both backends are swappable.

Option B: sgp4-rs (Rust SGP4 with Python bindings)

The sgp4 crate on crates.io provides a Rust-native SGP4 implementation with Python bindings via PyO3. Benchmarks show 3-5x speedup over the C-extension SGP4 used by Skyfield, with zero additional dependencies beyond a small compiled wheel.

Trade-off: Faster than Skyfield but still serial per TLE. No batch vectorization. Best suited for the edge agent where PyTorch is impractical.

Recommendation

Phase 1 (v0.4): Extract a Propagator interface in director/physics.py. Keep Skyfield as default. Phase 2 (v0.5): Add dSGP4 batch backend for campaign-wide pass prediction. Gate behind TALOS_PROPAGATOR=dsgp4 env var. Phase 3 (v0.6+): Evaluate sgp4-rs for the edge agent if it is rewritten in Rust.


3. Visualization

Current: Leaflet 1.9.4 (2D)

The dashboard uses Leaflet with a dark tile layer for a 2D Mercator map. Ground tracks are polylines, stations are markers, and footprints are circles. This works well for operational awareness but cannot represent orbital altitude, sensor cones, or 3D conjunction geometry.

Option A: CesiumJS for 3D globe

CesiumJS (open-source, Apache 2.0) provides a WebGL-based 3D globe with:

  • CZML streaming: Native format for time-dynamic satellite positions. The Director could emit CZML documents over MQTT, and the browser renders interpolated orbits without per-frame position messages.
  • Sensor volumes: Visualize station antenna patterns as 3D cones, showing coverage overlap.
  • Terrain and imagery: Built-in Cesium World Terrain and various imagery providers.

Trade-offs: Large library (~3 MB gzipped). Requires a Cesium Ion token for terrain (free tier: 75,000 monthly tiles). Steeper learning curve than Leaflet. Mobile performance degrades with many entities.

Option B: Leaflet + deck.gl overlay

Keep Leaflet for the base map but add deck.gl layers for GPU-accelerated rendering of thousands of satellite points and arcs. Lighter than CesiumJS, stays in 2D, but handles higher entity counts than native Leaflet markers.

Option C: Resium (React + CesiumJS)

If the frontend moves to React (see Section 6), Resium provides declarative React components wrapping CesiumJS. This is the cleanest integration path if both visualization and frontend architecture change together.

Recommendation

Phase 1 (v0.4): Keep Leaflet. Optimize ground track rendering by caching polylines and updating only on TLE change (already noted in physics.py comments). Phase 2 (v0.5): Add CesiumJS as an opt-in "3D view" alongside the existing 2D map. Serve CZML from the Director. Do not replace Leaflet yet -- operators may prefer the faster 2D view for daily operations. Phase 3 (v0.7+): If the frontend migrates to React/Svelte, evaluate Resium or a Svelte CesiumJS wrapper as the primary visualization.


4. Messaging

Current: MQTT 3.1.1 via Eclipse Mosquitto

TALOS uses Mosquitto as the MQTT broker with Paho Python clients (v2.0) in the Core, Director, and Agent. The browser uses Paho MQTT JS 1.0.1 over WebSocket. The topic hierarchy (talos/gs/{id}/cmd/#, talos/gs/{id}/telemetry/#) is well-structured.

Option A: Upgrade to MQTT 5.0

Mosquitto already supports MQTT 5.0. The Python Paho 2.0 client supports MQTT 5.0 properties. Benefits:

  • Shared subscriptions ($share/group/topic): Allow multiple Director instances to load-balance command processing without duplicate handling. Critical for horizontal scaling.
  • Request/response correlation: Built-in correlation IDs replace the current pattern of publishing a command and waiting for a separate status topic.
  • Message expiry: Stale rotator commands auto-expire instead of being delivered to a station that reconnects after an outage.
  • User properties: Attach metadata (mission ID, priority) to messages without encoding them in the topic or payload.

Trade-off: The browser Paho MQTT JS library does not support MQTT 5.0 and is unmaintained. Must replace with MQTT.js (npm mqtt, actively maintained, MQTT 5.0 support, ~50 KB gzipped).

Option B: Migrate to NATS

NATS offers higher throughput, built-in request/reply, and JetStream for persistence. However, it loses MQTT compatibility with existing agents and the SatNOGS ecosystem. The NATS Python client (nats-py) is less battle-tested in IoT contexts than Paho. Only justified if TALOS scales to hundreds of stations where NATS's queue groups outperform MQTT 5.0 shared subscriptions.

Recommendation

Phase 1 (v0.4): Replace browser Paho MQTT JS with MQTT.js. Enable MQTT 5.0 on the Mosquitto broker. Update Python clients to use MQTT 5.0 properties (shared subscriptions, message expiry, correlation IDs). Phase 2 (v0.6+): Evaluate NATS only if MQTT 5.0 shared subscriptions prove insufficient for multi-Director scaling. The SatNOGS ecosystem alignment strongly favors staying on MQTT.


5. Scheduling

Current: Manual assignment

Station-mission assignments are created manually through the UI. There is no conflict detection (double-booking a station), no optimization across campaigns, and no awareness of pass windows during assignment.

Approach: Google OR-Tools constraint solver

OR-Tools (BSD-3, actively maintained by Google) provides a CP-SAT solver suitable for satellite pass scheduling:

Problem formulation: - Variables: Binary assignment of each (pass, station) pair. - Constraints: No overlapping passes on the same station. Minimum elevation mask. Rotator slew time between consecutive passes. Station availability windows. - Objective: Maximize total scheduled contact time (or priority-weighted coverage).

Integration path: 1. The Director computes all pass windows for a campaign's satellites over a time horizon (using batch propagation from Section 2). 2. Pass windows and station constraints are fed to the CP-SAT solver. 3. The solver returns optimal assignments, which are written to the database as Assignment records. 4. Operators review and approve the schedule via the UI.

Performance: CP-SAT solves a 50-station, 200-satellite, 24-hour problem in under 10 seconds. For larger horizons, the solver supports time limits with best-found-so-far solutions.

Trade-off: OR-Tools is a ~50 MB compiled dependency. The constraint model must be carefully designed to avoid over-constraining or under-constraining. Requires validation against manually scheduled passes.

Alternative: Greedy scheduler first

For v0.4, a greedy scheduler that assigns passes by priority and rejects conflicts avoids the OR-Tools dependency while eliminating double-booking.

Recommendation

Phase 1 (v0.4): Implement conflict detection on assignment creation (database constraint + API validation). Add a greedy auto-scheduler for single campaigns. Phase 2 (v0.5): Introduce OR-Tools CP-SAT for multi-campaign optimization. Expose solver parameters (time limit, priority weights) in the campaign settings UI. Phase 3 (v0.7+): Add rolling-horizon scheduling that re-optimizes as TLEs update and station availability changes.


6. Frontend Architecture

Current: Server-rendered Jinja2 + inline JavaScript

The dashboard is a single Jinja2 template with ~800 lines of inline JavaScript handling MQTT subscriptions, Leaflet map updates, station card rendering, countdown timers, and modal interactions. There is no module system, no type checking, and no component reuse across pages.

Option A: HTMX + Alpine.js (incremental enhancement)

Replace inline JS with HTMX for server-driven partial updates and Alpine.js for client-side reactivity. Benefits:

  • Minimal rewrite: Keep Jinja2 templates. Add hx-* attributes for dynamic content (station cards, pass countdowns). Use Alpine.js for local state (modal open/close, form validation).
  • No build step: Ship plain HTML/JS. Works with the existing FastAPI static file serving.
  • Trade-off: HTMX relies on HTTP round-trips for updates, which adds latency compared to MQTT-pushed state. The real-time dashboard (1 Hz telemetry updates) is a poor fit for HTMX polling. MQTT-driven elements must remain in vanilla JS or Alpine.js reactive stores.

Option B: Svelte SPA with MQTT

Replace Jinja2 dashboard with a Svelte single-page application:

  • Reactive by design: Svelte's reactive declarations naturally handle MQTT-pushed state updates. A writable store backed by MQTT.js gives components automatic re-rendering.
  • Small bundle: Svelte compiles to vanilla JS with no runtime. A full dashboard would likely be 80-120 KB gzipped.
  • Component reuse: Station cards, telemetry panels, map views become reusable .svelte components.
  • Trade-off: Requires a build step (Vite). The Core must serve the SPA and provide a JSON API (FastAPI already does both). Two deployment artifacts (API + SPA static files). Learning curve for contributors unfamiliar with Svelte.

Option C: React SPA

React has the largest ecosystem (including Resium for CesiumJS), but the runtime overhead is larger than Svelte and the boilerplate-to-value ratio is worse for a focused operational dashboard. Not recommended unless CesiumJS/Resium becomes the primary visualization.

Recommendation

Phase 1 (v0.4): Extract inline JS into ES modules served from /static/js/. No framework change, just module boundaries: map.js, mqtt.js, stations.js, countdowns.js. Add JSDoc type annotations. Phase 2 (v0.5): Introduce HTMX for non-real-time pages (stations list, campaign management, settings). Keep the dashboard as vanilla JS + ES modules. Phase 3 (v0.6+): Evaluate Svelte for the dashboard if component complexity grows beyond manageable vanilla JS. The JSON API already exists in FastAPI; the migration is primarily a frontend concern.


7. Edge Computing

Current: Python agent with threading

The agent (agent/agent.py) is a ~110-line Python script using Paho MQTT, raw TCP sockets to rotctld, and a threaded telemetry loop. It runs on Raspberry Pi 3/4 at ground stations.

Problems: Python 3.10 runtime requires ~50-70 MB resident memory. No reconnection logic for MQTT or rotctld -- a transient network failure kills the session. Global mutable state makes testing difficult. No watchdog or health monitoring.

Option A: Rust agent

Rewrite using rumqttc (async MQTT) and tokio: ~2 MB binary, ~5 MB resident (10x reduction). Built-in reconnection with exponential backoff. Cross-compile for armv7/aarch64 from CI with no Python runtime on device.

Trade-off: Steep learning curve. Fewer Libre Space contributors are Rust-proficient. Debugging on remote Pi hardware is harder without Python's REPL.

Option B: Go agent

Go offers a middle ground: compiled binary (~8 MB), garbage collected, strong concurrency via goroutines, and trivial cross-compilation (GOOS=linux GOARCH=arm). Larger binary than Rust but simpler to learn. A fallback if the Rust path proves too steep.

Option C: Improve the Python agent

Refactor to use asyncio with aiomqtt (async Paho wrapper) and structured concurrency. Add reconnection, health checks, and a systemd watchdog notifier. This keeps the existing language and contributor familiarity.

  • Trade-off: Still ~50 MB resident. Still requires Python runtime on the Pi. But the lowest migration effort.

Recommendation

Phase 1 (v0.4): Refactor the Python agent to async (asyncio + aiomqtt). Add reconnection logic, structured state management (dataclass instead of globals), and systemd integration. This is a 1-2 week effort. Phase 2 (v0.6+): Prototype a Rust agent for one station. Compare memory, CPU, and reliability over a 30-day trial. If validated, migrate all stations over v0.7-v0.8. Deferred: Go agent is the fallback if the Rust prototype proves too difficult to maintain.


8. AI/ML Opportunities

8.1 Anomaly detection on telemetry

Station telemetry (rotator position, signal strength, temperature) streams at 1 Hz over MQTT. A lightweight anomaly detector could flag:

  • Rotator stalls (position not tracking commanded azimuth/elevation).
  • Signal dropouts during expected passes.
  • Thermal excursions in outdoor enclosures.

Approach: A simple statistical model (z-score on rolling window, or Isolation Forest from scikit-learn) running in the Director. No deep learning needed for structured 1 Hz time series with clear failure modes. Alerts published to talos/alerts/{station_id}.

Effort: 2-3 weeks. Requires labeled examples of known failures for validation.

8.2 Signal classification

If TALOS integrates IQ sample capture (via SatNOGS flowgraphs or GNU Radio), ML classifiers can identify modulation type, detect interference, or flag unexpected signals. Existing open-source models:

  • TorchSig (MIT, 2024): PyTorch library for RF signal classification. Pre-trained models for common modulation schemes.
  • SigMF + SatNOGS DB: The SatNOGS observation database contains labeled waterfall plots that could serve as training data.

Trade-off: Requires IQ sample pipeline, which TALOS does not currently have. This is a v1.0+ capability dependent on GNU Radio integration.

8.3 Predictive maintenance

Given 3-6 months of telemetry history, predict rotator bearing wear, LNA degradation, or cable weathering from trend analysis on rolling signal-to-noise ratio correlated with rotator age and weather data. Requires telemetry persistence (currently ephemeral MQTT) -- add TimescaleDB before ML models can train.

Recommendation

Phase 1 (v0.5): Add telemetry persistence to PostgreSQL with TimescaleDB. Implement z-score anomaly detection on rotator telemetry. Phase 2 (v0.7): Evaluate TorchSig integration if IQ sample capture is available. Phase 3 (v1.0+): Predictive maintenance models once 6+ months of telemetry history exists.


9. Prioritized Roadmap

Phase 1: v0.4 -- Foundation hardening (Q3 2026, ~6 weeks)

Task Effort Depends on Impact
Extract dashboard JS into ES modules 1 week None Maintainability, testability
Replace Paho MQTT JS with MQTT.js 3 days JS modularization MQTT 5.0 support, maintained library
Enable MQTT 5.0 on Mosquitto broker 1 day None Shared subscriptions, message expiry
Refactor Python agent to asyncio 2 weeks None Reliability, reconnection, testability
Add assignment conflict detection 1 week None Data integrity
Extract Propagator protocol in Director 3 days None Enables future backend swaps
Greedy single-campaign auto-scheduler 1 week Conflict detection Reduces manual work

Phase 2: v0.5 -- Intelligence layer (Q4 2026, ~8 weeks)

Task Effort Depends on Impact
dSGP4 batch propagation backend 3 weeks Propagator protocol 10-50x faster campaign planning
OR-Tools CP-SAT multi-campaign scheduler 3 weeks Batch propagation, conflict detection Optimal pass scheduling
TimescaleDB telemetry persistence 1 week None Enables ML, historical analysis
Z-score anomaly detection on rotator telemetry 2 weeks TimescaleDB Early failure detection
HTMX for non-real-time pages 2 weeks JS modularization Reduced page-load JS, server-driven UI
CesiumJS opt-in 3D view 3 weeks None 3D orbit visualization

Phase 3: v0.6-v0.7 -- Scale and polish (H1 2027, ~12 weeks)

Task Effort Depends on Impact
Rust edge agent prototype 4 weeks None 10x memory reduction on Pi
Svelte dashboard (if complexity warrants) 6 weeks MQTT.js, JSON API Component reuse, type safety
NATS evaluation (if MQTT 5.0 insufficient) 2 weeks Multi-Director deployment Higher-throughput messaging
Rolling-horizon re-optimization 3 weeks OR-Tools scheduler Adaptive scheduling
SatNOGS API v2 integration 2 weeks None Updated satellite/transmitter data

Phase 4: v1.0+ -- Advanced capabilities (H2 2027+)

Task Effort Depends on Impact
IQ sample capture pipeline 8 weeks GNU Radio integration Signal analysis capability
TorchSig signal classification 4 weeks IQ pipeline Automated modulation ID
Predictive maintenance models 4 weeks 6+ months telemetry Proactive maintenance
WebRTC waterfall streaming 4 weeks IQ pipeline Live signal monitoring
Full-catalog SSA integration 3 weeks dSGP4 batch backend Conjunction screening

Technology Decision Matrix

Decision Current Recommended When Risk
Propagation Skyfield (serial) Skyfield + dSGP4 (batch) v0.5 Medium: PyTorch dependency size
Browser MQTT Paho JS 1.0.1 MQTT.js v0.4 Low: drop-in replacement
Broker protocol MQTT 3.1.1 MQTT 5.0 v0.4 Low: Mosquitto already supports it
Scheduling Manual Greedy, then OR-Tools CP-SAT v0.4, v0.5 Medium: constraint model design
3D visualization None CesiumJS (opt-in) v0.5 Low: additive, not replacement
Frontend Jinja2 + inline JS ES modules, then HTMX/Svelte v0.4, v0.6 Medium: migration effort
Edge agent Python threads Async Python, then Rust v0.4, v0.6 High: Rust contributor availability
Telemetry storage Ephemeral MQTT TimescaleDB v0.5 Low: PostgreSQL extension
Messaging backbone Mosquitto MQTT Stay MQTT 5.0; evaluate NATS later v0.4, v0.7 Low: incremental upgrade

Guiding Principles

  1. Incremental migration over big rewrites. Every phase must leave the system deployable and functional. No "dark periods" where the dashboard is half-Svelte, half-Jinja2 without a working state.

  2. Optimize the bottleneck, not the framework. Batch propagation and automated scheduling deliver more user value than a frontend rewrite. Prioritize compute and operations over aesthetics.

  3. SatNOGS ecosystem alignment. TALOS exists within the Libre Space ecosystem. Technology choices (MQTT over NATS, Python over Go, open data formats) should maintain interoperability with SatNOGS Network, DB, and community tools.

  4. Measure before migrating. Before adopting dSGP4, Rust agents, or NATS, run benchmarks against the current stack with realistic workloads (50 stations, 200 satellites, 24-hour horizon). Migrate only when the measured bottleneck justifies the complexity.

  5. Contributor accessibility. Libre Space is a volunteer-driven community. Prefer technologies with broad adoption (Python, TypeScript, PostgreSQL) over niche tools that limit the contributor pool.