TALOS v0.3.0 Technology Roadmap¶

Date: April 2026 Version: 0.3.0 Authors: Libre Space Foundation Scope: Forward-looking technology assessment, migration paths, and prioritized roadmap for the TALOS ground station network

Table of Contents¶

Current Tech Stack Assessment
Orbital Propagation
Visualization
Messaging
Scheduling
Frontend Architecture
Edge Computing
AI/ML Opportunities
Prioritized Roadmap

1. Current Tech Stack Assessment¶

What is working well¶

Component	Technology	Verdict
API server	FastAPI 0.100+ / Uvicorn	Excellent. Async-native, OpenAPI docs for free, strong ecosystem. No reason to move.
ORM / schema	SQLModel 0.0.14+ / Pydantic v2	Good fit. Single model definition for DB + API. Alembic migrations in place.
Orbital math	Skyfield + NumPy	Correct and well-tested. Pure-Python SGP4 with validated ephemeris loading.
Design system	Custom CSS on Astro UXDS tokens	Professional space-operations aesthetic. Lightweight, no framework lock-in.
CI/CD	GitLab CI with DAG stages, Docker 24, Fly.io deploy	Mature pipeline: lint, unit, integration, security scanning, Pages docs.
Auth	Magic-link email + JWT sessions	Passwordless, simple, appropriate for current scale.
Database	PostgreSQL 15 via psycopg2-binary	Solid choice. Room to leverage JSONB, PostGIS, and pg_cron as complexity grows.

What needs attention¶

Component	Technology	Concern
Agent	Pure Python, synchronous threads, raw sockets	High resource use on Raspberry Pi. No reconnect logic. Global mutable state.
Frontend	Server-rendered Jinja2 + vanilla JS (~800 lines inline)	Dashboard JS is monolithic. No component reuse, no type checking, no bundler.
MQTT client (browser)	Paho MQTT JS 1.0.1 (Eclipse, 2014)	Unmaintained. No MQTT 5.0 support. WebSocket-only with manual reconnect.
Propagation	Serial Skyfield calls per satellite	O(n) wall-clock time for n satellites. No batch propagation.
Scheduling	Manual station-mission assignment via UI	No automated conflict detection, no optimization, no multi-pass campaign planning.
Map	Leaflet 1.9.4 (2D)	Adequate for ground tracks, but no 3D orbit visualization, no terrain, no sensor cones.

2. Orbital Propagation¶

Current: Skyfield SGP4¶

TALOS uses Skyfield's EarthSatellite for all orbital computations: pass prediction (find_events), Doppler shift, ground tracks, and footprint circles. The code in director/physics.py is clean and stateless, but each satellite is propagated serially. With 48 time steps per ground track and potentially hundreds of satellites across campaigns, this becomes a bottleneck.

Option A: dSGP4 (ESA) for batch/GPU propagation¶

dSGP4 (PyTorch-based, released 2024 by ESA's Advanced Concepts Team) offers:

Batch CPU propagation: Propagate 1,000 TLEs across 100 time steps in a single vectorized call. On CPU alone this is 10-50x faster than serial Skyfield loops.
GPU acceleration: With a CUDA-capable GPU, batch propagation scales to 10,000+ objects trivially. Relevant if TALOS grows to full-catalog scheduling.
ML-corrected model: Neural network corrections trained on historical SP data reduce SGP4's known drag-modeling errors by up to 40% for LEO objects.
Differentiability: Enables gradient-based orbit determination if TALOS ever ingests station observations.

Trade-offs: Adds PyTorch as a dependency (~800 MB). The ML correction model requires periodic retraining. API is lower-level than Skyfield -- no built-in find_events, so pass prediction logic must be reimplemented using root-finding on elevation angles.

Recommended migration path: Keep Skyfield for single-satellite interactive operations (Doppler, real-time tracking). Introduce dSGP4 as an optional backend in the Director for bulk pass prediction across campaigns. Abstract propagation behind a Propagator protocol so both backends are swappable.

Option B: sgp4-rs (Rust SGP4 with Python bindings)¶

The sgp4 crate on crates.io provides a Rust-native SGP4 implementation with Python bindings via PyO3. Benchmarks show 3-5x speedup over the C-extension SGP4 used by Skyfield, with zero additional dependencies beyond a small compiled wheel.

Trade-off: Faster than Skyfield but still serial per TLE. No batch vectorization. Best suited for the edge agent where PyTorch is impractical.

Recommendation¶

Phase 1 (v0.4): Extract a Propagator interface in director/physics.py. Keep Skyfield as default. Phase 2 (v0.5): Add dSGP4 batch backend for campaign-wide pass prediction. Gate behind TALOS_PROPAGATOR=dsgp4 env var. Phase 3 (v0.6+): Evaluate sgp4-rs for the edge agent if it is rewritten in Rust.

3. Visualization¶

Current: Leaflet 1.9.4 (2D)¶

The dashboard uses Leaflet with a dark tile layer for a 2D Mercator map. Ground tracks are polylines, stations are markers, and footprints are circles. This works well for operational awareness but cannot represent orbital altitude, sensor cones, or 3D conjunction geometry.

Option A: CesiumJS for 3D globe¶

CesiumJS (open-source, Apache 2.0) provides a WebGL-based 3D globe with:

CZML streaming: Native format for time-dynamic satellite positions. The Director could emit CZML documents over MQTT, and the browser renders interpolated orbits without per-frame position messages.
Sensor volumes: Visualize station antenna patterns as 3D cones, showing coverage overlap.
Terrain and imagery: Built-in Cesium World Terrain and various imagery providers.

Trade-offs: Large library (~3 MB gzipped). Requires a Cesium Ion token for terrain (free tier: 75,000 monthly tiles). Steeper learning curve than Leaflet. Mobile performance degrades with many entities.

Option B: Leaflet + deck.gl overlay¶

Keep Leaflet for the base map but add deck.gl layers for GPU-accelerated rendering of thousands of satellite points and arcs. Lighter than CesiumJS, stays in 2D, but handles higher entity counts than native Leaflet markers.

Option C: Resium (React + CesiumJS)¶

If the frontend moves to React (see Section 6), Resium provides declarative React components wrapping CesiumJS. This is the cleanest integration path if both visualization and frontend architecture change together.

Recommendation¶

Phase 1 (v0.4): Keep Leaflet. Optimize ground track rendering by caching polylines and updating only on TLE change (already noted in physics.py comments). Phase 2 (v0.5): Add CesiumJS as an opt-in "3D view" alongside the existing 2D map. Serve CZML from the Director. Do not replace Leaflet yet -- operators may prefer the faster 2D view for daily operations. Phase 3 (v0.7+): If the frontend migrates to React/Svelte, evaluate Resium or a Svelte CesiumJS wrapper as the primary visualization.

4. Messaging¶

Current: MQTT 3.1.1 via Eclipse Mosquitto¶

TALOS uses Mosquitto as the MQTT broker with Paho Python clients (v2.0) in the Core, Director, and Agent. The browser uses Paho MQTT JS 1.0.1 over WebSocket. The topic hierarchy (talos/gs/{id}/cmd/#, talos/gs/{id}/telemetry/#) is well-structured.

Option A: Upgrade to MQTT 5.0¶

Mosquitto already supports MQTT 5.0. The Python Paho 2.0 client supports MQTT 5.0 properties. Benefits:

Shared subscriptions ($share/group/topic): Allow multiple Director instances to load-balance command processing without duplicate handling. Critical for horizontal scaling.
Request/response correlation: Built-in correlation IDs replace the current pattern of publishing a command and waiting for a separate status topic.
Message expiry: Stale rotator commands auto-expire instead of being delivered to a station that reconnects after an outage.
User properties: Attach metadata (mission ID, priority) to messages without encoding them in the topic or payload.

Trade-off: The browser Paho MQTT JS library does not support MQTT 5.0 and is unmaintained. Must replace with MQTT.js (npm mqtt, actively maintained, MQTT 5.0 support, ~50 KB gzipped).

Option B: Migrate to NATS¶

NATS offers higher throughput, built-in request/reply, and JetStream for persistence. However, it loses MQTT compatibility with existing agents and the SatNOGS ecosystem. The NATS Python client (nats-py) is less battle-tested in IoT contexts than Paho. Only justified if TALOS scales to hundreds of stations where NATS's queue groups outperform MQTT 5.0 shared subscriptions.

Recommendation¶

Phase 1 (v0.4): Replace browser Paho MQTT JS with MQTT.js. Enable MQTT 5.0 on the Mosquitto broker. Update Python clients to use MQTT 5.0 properties (shared subscriptions, message expiry, correlation IDs). Phase 2 (v0.6+): Evaluate NATS only if MQTT 5.0 shared subscriptions prove insufficient for multi-Director scaling. The SatNOGS ecosystem alignment strongly favors staying on MQTT.

5. Scheduling¶

Current: Manual assignment¶

Station-mission assignments are created manually through the UI. There is no conflict detection (double-booking a station), no optimization across campaigns, and no awareness of pass windows during assignment.

Approach: Google OR-Tools constraint solver¶

OR-Tools (BSD-3, actively maintained by Google) provides a CP-SAT solver suitable for satellite pass scheduling:

Problem formulation: - Variables: Binary assignment of each (pass, station) pair. - Constraints: No overlapping passes on the same station. Minimum elevation mask. Rotator slew time between consecutive passes. Station availability windows. - Objective: Maximize total scheduled contact time (or priority-weighted coverage).

Integration path: 1. The Director computes all pass windows for a campaign's satellites over a time horizon (using batch propagation from Section 2). 2. Pass windows and station constraints are fed to the CP-SAT solver. 3. The solver returns optimal assignments, which are written to the database as Assignment records. 4. Operators review and approve the schedule via the UI.

Performance: CP-SAT solves a 50-station, 200-satellite, 24-hour problem in under 10 seconds. For larger horizons, the solver supports time limits with best-found-so-far solutions.

Trade-off: OR-Tools is a ~50 MB compiled dependency. The constraint model must be carefully designed to avoid over-constraining or under-constraining. Requires validation against manually scheduled passes.

Alternative: Greedy scheduler first¶

For v0.4, a greedy scheduler that assigns passes by priority and rejects conflicts avoids the OR-Tools dependency while eliminating double-booking.

Recommendation¶

Phase 1 (v0.4): Implement conflict detection on assignment creation (database constraint + API validation). Add a greedy auto-scheduler for single campaigns. Phase 2 (v0.5): Introduce OR-Tools CP-SAT for multi-campaign optimization. Expose solver parameters (time limit, priority weights) in the campaign settings UI. Phase 3 (v0.7+): Add rolling-horizon scheduling that re-optimizes as TLEs update and station availability changes.

6. Frontend Architecture¶

Current: Server-rendered Jinja2 + inline JavaScript¶

The dashboard is a single Jinja2 template with ~800 lines of inline JavaScript handling MQTT subscriptions, Leaflet map updates, station card rendering, countdown timers, and modal interactions. There is no module system, no type checking, and no component reuse across pages.

Option A: HTMX + Alpine.js (incremental enhancement)¶

Replace inline JS with HTMX for server-driven partial updates and Alpine.js for client-side reactivity. Benefits:

Minimal rewrite: Keep Jinja2 templates. Add hx-* attributes for dynamic content (station cards, pass countdowns). Use Alpine.js for local state (modal open/close, form validation).
No build step: Ship plain HTML/JS. Works with the existing FastAPI static file serving.
Trade-off: HTMX relies on HTTP round-trips for updates, which adds latency compared to MQTT-pushed state. The real-time dashboard (1 Hz telemetry updates) is a poor fit for HTMX polling. MQTT-driven elements must remain in vanilla JS or Alpine.js reactive stores.

Option B: Svelte SPA with MQTT¶

Replace Jinja2 dashboard with a Svelte single-page application:

Reactive by design: Svelte's reactive declarations naturally handle MQTT-pushed state updates. A writable store backed by MQTT.js gives components automatic re-rendering.
Small bundle: Svelte compiles to vanilla JS with no runtime. A full dashboard would likely be 80-120 KB gzipped.
Component reuse: Station cards, telemetry panels, map views become reusable .svelte components.
Trade-off: Requires a build step (Vite). The Core must serve the SPA and provide a JSON API (FastAPI already does both). Two deployment artifacts (API + SPA static files). Learning curve for contributors unfamiliar with Svelte.

Option C: React SPA¶

React has the largest ecosystem (including Resium for CesiumJS), but the runtime overhead is larger than Svelte and the boilerplate-to-value ratio is worse for a focused operational dashboard. Not recommended unless CesiumJS/Resium becomes the primary visualization.

Recommendation¶

Phase 1 (v0.4): Extract inline JS into ES modules served from /static/js/. No framework change, just module boundaries: map.js, mqtt.js, stations.js, countdowns.js. Add JSDoc type annotations. Phase 2 (v0.5): Introduce HTMX for non-real-time pages (stations list, campaign management, settings). Keep the dashboard as vanilla JS + ES modules. Phase 3 (v0.6+): Evaluate Svelte for the dashboard if component complexity grows beyond manageable vanilla JS. The JSON API already exists in FastAPI; the migration is primarily a frontend concern.

7. Edge Computing¶

Current: Python agent with threading¶

The agent (agent/agent.py) is a ~110-line Python script using Paho MQTT, raw TCP sockets to rotctld, and a threaded telemetry loop. It runs on Raspberry Pi 3/4 at ground stations.

Problems: Python 3.10 runtime requires ~50-70 MB resident memory. No reconnection logic for MQTT or rotctld -- a transient network failure kills the session. Global mutable state makes testing difficult. No watchdog or health monitoring.

Option A: Rust agent¶

Rewrite using rumqttc (async MQTT) and tokio: ~2 MB binary, ~5 MB resident (10x reduction). Built-in reconnection with exponential backoff. Cross-compile for armv7/aarch64 from CI with no Python runtime on device.

Trade-off: Steep learning curve. Fewer Libre Space contributors are Rust-proficient. Debugging on remote Pi hardware is harder without Python's REPL.

Option B: Go agent¶

Go offers a middle ground: compiled binary (~8 MB), garbage collected, strong concurrency via goroutines, and trivial cross-compilation (GOOS=linux GOARCH=arm). Larger binary than Rust but simpler to learn. A fallback if the Rust path proves too steep.

Option C: Improve the Python agent¶

Refactor to use asyncio with aiomqtt (async Paho wrapper) and structured concurrency. Add reconnection, health checks, and a systemd watchdog notifier. This keeps the existing language and contributor familiarity.

Trade-off: Still ~50 MB resident. Still requires Python runtime on the Pi. But the lowest migration effort.

Recommendation¶

Phase 1 (v0.4): Refactor the Python agent to async (asyncio + aiomqtt). Add reconnection logic, structured state management (dataclass instead of globals), and systemd integration. This is a 1-2 week effort. Phase 2 (v0.6+): Prototype a Rust agent for one station. Compare memory, CPU, and reliability over a 30-day trial. If validated, migrate all stations over v0.7-v0.8. Deferred: Go agent is the fallback if the Rust prototype proves too difficult to maintain.

8. AI/ML Opportunities¶

8.1 Anomaly detection on telemetry¶

Station telemetry (rotator position, signal strength, temperature) streams at 1 Hz over MQTT. A lightweight anomaly detector could flag:

Rotator stalls (position not tracking commanded azimuth/elevation).
Signal dropouts during expected passes.
Thermal excursions in outdoor enclosures.

Approach: A simple statistical model (z-score on rolling window, or Isolation Forest from scikit-learn) running in the Director. No deep learning needed for structured 1 Hz time series with clear failure modes. Alerts published to talos/alerts/{station_id}.

Effort: 2-3 weeks. Requires labeled examples of known failures for validation.

8.2 Signal classification¶

If TALOS integrates IQ sample capture (via SatNOGS flowgraphs or GNU Radio), ML classifiers can identify modulation type, detect interference, or flag unexpected signals. Existing open-source models:

TorchSig (MIT, 2024): PyTorch library for RF signal classification. Pre-trained models for common modulation schemes.
SigMF + SatNOGS DB: The SatNOGS observation database contains labeled waterfall plots that could serve as training data.

Trade-off: Requires IQ sample pipeline, which TALOS does not currently have. This is a v1.0+ capability dependent on GNU Radio integration.

8.3 Predictive maintenance¶

Given 3-6 months of telemetry history, predict rotator bearing wear, LNA degradation, or cable weathering from trend analysis on rolling signal-to-noise ratio correlated with rotator age and weather data. Requires telemetry persistence (currently ephemeral MQTT) -- add TimescaleDB before ML models can train.

Recommendation¶

Phase 1 (v0.5): Add telemetry persistence to PostgreSQL with TimescaleDB. Implement z-score anomaly detection on rotator telemetry. Phase 2 (v0.7): Evaluate TorchSig integration if IQ sample capture is available. Phase 3 (v1.0+): Predictive maintenance models once 6+ months of telemetry history exists.

9. Prioritized Roadmap¶

Phase 1: v0.4 -- Foundation hardening (Q3 2026, ~6 weeks)¶

Task	Effort	Depends on	Impact
Extract dashboard JS into ES modules	1 week	None	Maintainability, testability
Replace Paho MQTT JS with MQTT.js	3 days	JS modularization	MQTT 5.0 support, maintained library
Enable MQTT 5.0 on Mosquitto broker	1 day	None	Shared subscriptions, message expiry
Refactor Python agent to asyncio	2 weeks	None	Reliability, reconnection, testability
Add assignment conflict detection	1 week	None	Data integrity
Extract `Propagator` protocol in Director	3 days	None	Enables future backend swaps
Greedy single-campaign auto-scheduler	1 week	Conflict detection	Reduces manual work

Phase 2: v0.5 -- Intelligence layer (Q4 2026, ~8 weeks)¶

Task	Effort	Depends on	Impact
dSGP4 batch propagation backend	3 weeks	Propagator protocol	10-50x faster campaign planning
OR-Tools CP-SAT multi-campaign scheduler	3 weeks	Batch propagation, conflict detection	Optimal pass scheduling
TimescaleDB telemetry persistence	1 week	None	Enables ML, historical analysis
Z-score anomaly detection on rotator telemetry	2 weeks	TimescaleDB	Early failure detection
HTMX for non-real-time pages	2 weeks	JS modularization	Reduced page-load JS, server-driven UI
CesiumJS opt-in 3D view	3 weeks	None	3D orbit visualization

Phase 3: v0.6-v0.7 -- Scale and polish (H1 2027, ~12 weeks)¶

Task	Effort	Depends on	Impact
Rust edge agent prototype	4 weeks	None	10x memory reduction on Pi
Svelte dashboard (if complexity warrants)	6 weeks	MQTT.js, JSON API	Component reuse, type safety
NATS evaluation (if MQTT 5.0 insufficient)	2 weeks	Multi-Director deployment	Higher-throughput messaging
Rolling-horizon re-optimization	3 weeks	OR-Tools scheduler	Adaptive scheduling
SatNOGS API v2 integration	2 weeks	None	Updated satellite/transmitter data

Phase 4: v1.0+ -- Advanced capabilities (H2 2027+)¶

Task	Effort	Depends on	Impact
IQ sample capture pipeline	8 weeks	GNU Radio integration	Signal analysis capability
TorchSig signal classification	4 weeks	IQ pipeline	Automated modulation ID
Predictive maintenance models	4 weeks	6+ months telemetry	Proactive maintenance
WebRTC waterfall streaming	4 weeks	IQ pipeline	Live signal monitoring
Full-catalog SSA integration	3 weeks	dSGP4 batch backend	Conjunction screening

Technology Decision Matrix¶

Decision	Current	Recommended	When	Risk
Propagation	Skyfield (serial)	Skyfield + dSGP4 (batch)	v0.5	Medium: PyTorch dependency size
Browser MQTT	Paho JS 1.0.1	MQTT.js	v0.4	Low: drop-in replacement
Broker protocol	MQTT 3.1.1	MQTT 5.0	v0.4	Low: Mosquitto already supports it
Scheduling	Manual	Greedy, then OR-Tools CP-SAT	v0.4, v0.5	Medium: constraint model design
3D visualization	None	CesiumJS (opt-in)	v0.5	Low: additive, not replacement
Frontend	Jinja2 + inline JS	ES modules, then HTMX/Svelte	v0.4, v0.6	Medium: migration effort
Edge agent	Python threads	Async Python, then Rust	v0.4, v0.6	High: Rust contributor availability
Telemetry storage	Ephemeral MQTT	TimescaleDB	v0.5	Low: PostgreSQL extension
Messaging backbone	Mosquitto MQTT	Stay MQTT 5.0; evaluate NATS later	v0.4, v0.7	Low: incremental upgrade

Guiding Principles¶

Incremental migration over big rewrites. Every phase must leave the system deployable and functional. No "dark periods" where the dashboard is half-Svelte, half-Jinja2 without a working state.
Optimize the bottleneck, not the framework. Batch propagation and automated scheduling deliver more user value than a frontend rewrite. Prioritize compute and operations over aesthetics.
SatNOGS ecosystem alignment. TALOS exists within the Libre Space ecosystem. Technology choices (MQTT over NATS, Python over Go, open data formats) should maintain interoperability with SatNOGS Network, DB, and community tools.
Measure before migrating. Before adopting dSGP4, Rust agents, or NATS, run benchmarks against the current stack with realistic workloads (50 stations, 200 satellites, 24-hour horizon). Migrate only when the measured bottleneck justifies the complexity.
Contributor accessibility. Libre Space is a volunteer-driven community. Prefer technologies with broad adoption (Python, TypeScript, PostgreSQL) over niche tools that limit the contributor pool.