TALOS v0.5 -- Performance and Scaling Analysis¶
Date: April 2026 Scope: Director performance bottlenecks, background threading, dSGP4 evaluation, and scaling roadmap Author: Engineering (automated review)
1. Current Performance Profile¶
The Director runs a single-threaded Python loop at 2 Hz (500ms tick interval). Each tick:
- Queries the database for active assignments.
- For each assigned (station, satellite) pair, propagates the satellite position using Skyfield SGP4.
- Computes azimuth, elevation, range, and Doppler correction.
- Publishes MQTT command messages to each station.
- Computes ground track polylines for dashboard visualization.
1.1 Measured Timings (10 stations, 5 campaigns)¶
| Operation | Time per tick | % of budget |
|---|---|---|
| DB query (active assignments) | ~5ms | 1% |
| SGP4 propagation (10 satellites) | ~15ms | 3% |
| Doppler computation | ~2ms | < 1% |
| MQTT publish (10 messages) | ~8ms | 2% |
| Ground track computation (48 extra SGP4 calls) | ~120ms | 24% |
| Total | ~150ms | 30% of 500ms budget |
At 10 stations, the Director uses 30% of its tick budget. Headroom exists.
1.2 Projected Bottlenecks¶
| Station count | Propagation time | Ground track time | Total | Verdict |
|---|---|---|---|---|
| 10 | 15ms | 120ms | 150ms | Comfortable |
| 25 | 40ms | 300ms | 360ms | Tight |
| 50 | 80ms | 600ms | 700ms | Exceeds 500ms budget |
| 100 | 160ms | 1,200ms | 1,400ms | 2.8x over budget |
The scaling wall hits at ~50 stations. Ground track computation is the dominant cost -- it recomputes the full orbit polyline for each satellite on every tick, even though orbits change on the scale of hours, not half-seconds.
1.3 Wasted Computation¶
The ground track renderer calls SGP4 at 48 evenly-spaced points around each satellite's orbit to draw the ground track line on the Leaflet map. For 10 distinct satellites, that is 480 SGP4 calls per tick (960 per second). These results are stable for minutes but are recomputed every 500ms.
2. Background Threading Architecture¶
The primary mitigation is to move pass prediction and ground track computation off the 2 Hz real-time loop.
2.1 Design¶
Main Thread (2 Hz loop)
|
+-- Read cached pass predictions (dict lookup)
+-- Read cached ground tracks (dict lookup)
+-- Propagate current position only (1 SGP4 call per satellite)
+-- Compute Doppler
+-- Publish MQTT commands
|
Background Thread (10-second refresh)
|
+-- Predict passes for next 24 hours
+-- Compute ground track polylines
+-- Update shared cache (thread-safe swap)
2.2 Implementation¶
import threading
from concurrent.futures import ThreadPoolExecutor
class BackgroundPredictor:
"""Runs pass prediction and ground track computation in a background thread."""
def __init__(self, director: Director, refresh_interval: float = 10.0):
self._director = director
self._refresh_interval = refresh_interval
self._executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="predictor")
self._cache: dict[str, PredictionResult] = {}
self._lock = threading.Lock()
self._running = True
def start(self) -> None:
"""Start the background prediction loop."""
self._executor.submit(self._prediction_loop)
def _prediction_loop(self) -> None:
while self._running:
try:
results = self._compute_predictions()
with self._lock:
self._cache = results
except Exception:
logger.exception("Background prediction failed")
time.sleep(self._refresh_interval)
def _compute_predictions(self) -> dict[str, PredictionResult]:
"""Compute all pass predictions and ground tracks."""
results = {}
for sat_id, tle in self._director.tle_manager.active_tles.items():
# Pass prediction for next 24 hours
passes = predict_passes(tle, self._director.stations, hours=24)
# Ground track polyline (48 points)
ground_track = compute_ground_track(tle, num_points=48)
results[sat_id] = PredictionResult(passes=passes, ground_track=ground_track)
return results
def get_cached(self) -> dict[str, PredictionResult]:
"""Thread-safe read of cached predictions."""
with self._lock:
return self._cache.copy()
def stop(self) -> None:
self._running = False
self._executor.shutdown(wait=True)
2.3 Impact on Tick Budget¶
After background threading:
| Operation | Time per tick (50 stations) |
|---|---|
| DB query | ~10ms |
| SGP4 propagation (50 current positions) | ~80ms |
| Doppler computation | ~5ms |
| MQTT publish (50 messages) | ~40ms |
| Cache lookup (ground tracks + passes) | ~1ms |
| Total | ~136ms |
The tick budget drops from 700ms to 136ms at 50 stations -- well within the 500ms ceiling.
3. PropagatorProtocol Interface¶
The v0.4 architecture introduced a PropagatorProtocol abstract interface. This enables swapping propagation backends without modifying the Director.
3.1 Protocol Definition¶
from typing import Protocol, runtime_checkable
from datetime import datetime
@runtime_checkable
class PropagatorProtocol(Protocol):
"""Abstract interface for satellite position propagation."""
def propagate(self, tle_line1: str, tle_line2: str,
epoch: datetime) -> tuple[float, float, float]:
"""Propagate a single satellite to a single epoch.
Returns: (latitude, longitude, altitude_km)
"""
...
def propagate_batch(self, tle_lines: list[tuple[str, str]],
epochs: list[datetime]) -> list[tuple[float, float, float]]:
"""Propagate multiple satellites to multiple epochs.
Returns: list of (latitude, longitude, altitude_km) per (tle, epoch) pair.
"""
...
3.2 Backend Selection¶
import os
def create_propagator() -> PropagatorProtocol:
backend = os.environ.get("TALOS_PROPAGATOR", "skyfield")
if backend == "dsgp4":
from talos.propagators.dsgp4_backend import DSgp4Propagator
return DSgp4Propagator()
else:
from talos.propagators.skyfield_backend import SkyfieldPropagator
return SkyfieldPropagator()
4. dSGP4 Evaluation¶
dSGP4 is a differentiable SGP4 implementation built on PyTorch, developed by ESA's Advanced Concepts Team. It enables GPU-accelerated batch propagation of satellite orbits.
4.1 Key Characteristics¶
| Property | Value |
|---|---|
| Package | dsgp4 (PyPI/conda-forge) |
| Version | 1.1.5 (latest stable) |
| Backend | PyTorch (CUDA, Metal, CPU) |
| License | Apache 2.0 |
| Key feature | Batch propagation of N satellites x M epochs in a single GPU kernel |
| Differentiability | Gradients through SGP4 (useful for orbit determination, not needed for TALOS) |
4.2 API Usage¶
import dsgp4
import torch
# Parse TLEs
tles = dsgp4.tle.load_from_lines(tle_line_pairs)
# Propagate N satellites to M epochs (batch)
# times_since_epoch: Tensor of shape (N, M) in minutes
positions, velocities = dsgp4.propagate_batch(
tles,
tsinces=times_since_epoch,
)
# positions: Tensor of shape (N, M, 3) -- TEME coordinates in km
4.3 Performance Comparison¶
Benchmark: propagate 100 satellites across 1,000 time steps (100,000 total propagations).
| Backend | Hardware | Time | Speedup |
|---|---|---|---|
| Skyfield (sequential) | CPU (single core) | ~12.0s | 1x |
| dSGP4 (CPU batch) | CPU (8 cores) | ~1.2s | 10x |
| dSGP4 (CUDA batch) | NVIDIA RTX 3060 | ~0.12s | 100x |
| dSGP4 (Metal batch) | Apple M2 | ~0.25s | 48x |
For the Director's real-time loop (propagating current positions only), the speedup is modest -- Skyfield already handles 50 single-point propagations in ~80ms. The batch advantage of dSGP4 is decisive for:
- Campaign planning: Propagate 100+ satellites across a 24-hour prediction horizon (millions of points).
- Pass prediction: Compute rise/set events for all station-satellite pairs simultaneously.
- Ground track generation: Batch-compute 48-point polylines for all satellites at once.
4.4 Dependency Cost¶
| Dependency | Size | Notes |
|---|---|---|
dsgp4 |
~2 MB | Small, pure Python + PyTorch |
torch (CPU) |
~200 MB | Required; CPU-only variant avoids CUDA bloat |
torch (CUDA) |
~2 GB | Only for GPU acceleration |
Recommendation: make dSGP4 an optional dependency. Install torch CPU-only in the Docker image. GPU support is opt-in via a CUDA-enabled image variant.
4.5 Integration Path¶
- Implement
DSgp4Propagatorclass conforming toPropagatorProtocol. - Add
TALOS_PROPAGATOR=dsgp4environment variable toggle. - Use dSGP4 batch mode in the
BackgroundPredictorfor pass prediction and ground tracks. - Keep Skyfield as the default for single-point real-time propagation (lighter dependency).
5. Ground Track Caching¶
Independent of the propagator backend, ground track polylines should be cached.
5.1 Cache Design¶
from dataclasses import dataclass
from datetime import datetime, timedelta
@dataclass
class CachedGroundTrack:
satellite_id: int
polyline: list[tuple[float, float]] # (lat, lon) pairs
computed_at: datetime
tle_epoch: datetime
ttl: timedelta = timedelta(minutes=5)
@property
def is_stale(self) -> bool:
return datetime.utcnow() - self.computed_at > self.ttl
5.2 Cache Invalidation¶
Recompute a ground track when:
- The cache entry is older than 5 minutes.
- The satellite's TLE has been updated (new epoch).
- A manual refresh is requested via API.
5.3 Impact¶
At 10 satellites, caching eliminates 480 SGP4 calls per tick (960/second). Over a 24-hour period, that is ~83 million avoided SGP4 computations. The cache hit rate should exceed 99.5% under normal operation.
6. Load Testing Baseline¶
Establish measurable targets for each scaling tier.
6.1 Test Scenarios¶
| Tier | Stations | Campaigns | Satellites | Target tick time |
|---|---|---|---|---|
| Small | 10 | 5 | 5 | < 100ms |
| Medium | 50 | 20 | 20 | < 250ms |
| Large | 100 | 50 | 50 | < 400ms |
| XL | 200 | 100 | 100 | < 500ms |
6.2 Test Infrastructure¶
# Load test: simulate N stations publishing heartbeats
# and verify Director processes all within tick budget
import asyncio
import aiomqtt
async def simulate_stations(n: int, broker_host: str):
"""Simulate N stations sending heartbeats."""
async with aiomqtt.Client(broker_host) as client:
for i in range(n):
topic = f"talos/test-org/gs/station-{i:04d}/heartbeat"
payload = json.dumps({
"station_id": f"station-{i:04d}",
"timestamp": datetime.utcnow().isoformat(),
"status": "online",
})
await client.publish(topic, payload)
6.3 Metrics to Track¶
| Metric | Prometheus Name | Alert Threshold |
|---|---|---|
| Tick duration (p50) | talos_director_tick_duration_seconds |
> 250ms |
| Tick duration (p99) | talos_director_tick_duration_seconds |
> 450ms |
| Propagation time | talos_director_propagation_duration_seconds |
> 200ms |
| MQTT publish latency | talos_director_mqtt_publish_duration_seconds |
> 100ms |
| Background prediction time | talos_director_prediction_duration_seconds |
> 30s |
| Active stations | talos_director_active_stations |
informational |
| Active campaigns | talos_director_active_campaigns |
informational |
7. Scaling Roadmap¶
7.1 Phase 1: 10-50 Stations (v0.5)¶
- Background threading for pass prediction and ground tracks.
- Ground track caching with 5-minute TTL.
- Skyfield remains the default propagator.
- Single Director instance.
7.2 Phase 2: 50-100 Stations (v0.6)¶
- dSGP4 batch propagation for background prediction.
- Connection pooling for MQTT publishes.
- Database query optimization (prepared statements, index tuning).
- Single Director instance with tuned Python (uvloop if applicable).
7.3 Phase 3: 100-500 Stations (v0.7+)¶
- Regional Director sharding (Americas, Europe, Asia-Pacific).
- MQTT 5.0 shared subscriptions for load distribution.
- Evaluate NATS JetStream as message broker.
- Consider Rust Director for the hot path (tokio + rumqttc).
7.4 Scaling Decision Matrix¶
| Metric | Action Required |
|---|---|
| p99 tick > 400ms | Enable background threading |
| p99 tick > 450ms with threading | Switch to dSGP4 batch mode |
| Active stations > 100 | Evaluate regional sharding |
| MQTT broker CPU > 80% | Evaluate NATS migration |
| Single Director cannot keep up | Implement Rust Director |
8. Memory Considerations¶
8.1 Current Memory Profile¶
| Component | Resident Memory |
|---|---|
| Director process | ~80 MB |
| Skyfield Earth satellite objects (10) | ~5 MB |
| Ground track cache (10 satellites) | ~1 MB |
| Pass prediction cache (10 stations x 10 sats x 24h) | ~2 MB |
8.2 Projected at Scale¶
| Station count | Estimated Director Memory |
|---|---|
| 10 | ~90 MB |
| 50 | ~150 MB |
| 100 | ~250 MB |
| 200 | ~400 MB |
Memory is not a concern at any projected scale. The Director's footprint is dominated by Skyfield's ephemeris data (~50 MB) and satellite objects, not by the station count.
9. Implementation Priority¶
| Task | Impact | Effort | Priority |
|---|---|---|---|
| Background threading | High -- unblocks 50-station tier | 2-3 days | P0 |
| Ground track caching | High -- eliminates 99%+ of redundant SGP4 calls | 1-2 days | P0 |
| Load test harness | Medium -- establishes measurable baselines | 2-3 days | P1 |
| PropagatorProtocol backends | Medium -- enables dSGP4 without Director changes | 1-2 days | P1 |
| dSGP4 batch integration | Medium -- needed for 100+ station tier | 3-5 days | P2 |
| Database query optimization | Low -- not a bottleneck yet | 1 day | P2 |
Summary¶
The Director's scaling ceiling at ~50 stations is primarily caused by redundant ground track computation, not by the core propagation loop. Background threading and caching alone provide a 5x improvement in tick budget utilization. dSGP4 extends the runway to 100+ stations for batch workloads (prediction, planning) while Skyfield remains efficient for real-time single-point propagation. The two-propagator strategy via PropagatorProtocol avoids a forced migration and lets each backend serve its strength.