Media decisions
This page records the current product and architecture decisions behind Toposync live viewing, media transports, Home Assistant playback, camera publication, and spatial video.
It is intentionally stricter than a how-to guide. If a future change conflicts with one of these decisions, update this page as part of the same change instead of hiding the conflict in local UI heuristics or fallback code.
Related user-facing pages in the canonical English documentation:
Decision summary
| Area | Decision |
|---|---|
| User model | Users manage camera sources, publications, live views, and variants. Transmission is an advanced/generated artifact. |
| Stability | Stable visible playback is more important than latency or peak quality. |
| Liveness | A stream is live only when it has a fresh selected frame, selected writer, and healthy active output. |
| Transports | Transport choice is contextual. HLS remains the compatibility baseline. |
| MSE | MSE is a demand-started web transport through go2rtc and Toposync's signed proxy. |
| WebRTC | WebRTC is for explicit low-latency/PTZ contexts, not every dashboard tile. |
| JSMpeg | JSMpeg is the final visual fallback, session-scoped and low quality by design. |
| Home Assistant Cloud | Cloud-friendly playback goes through native Home Assistant camera entities. |
| Spatial video | Spatial projection is a separate extension concern that consumes mapped cameras and live-view playback. |
| Core boundary | Core stays generic; streaming, camera publication, and spatial video policy live in extensions. |
Decision 1: camera playback starts from published sources
Normal camera playback starts from a camera source, not from a manually created technical transmission.
camera source
-> publication intent
-> reconciler
-> implicit pipeline
-> stream.publish_video
-> transmission runtime
-> media output
-> viewer
Users should not need to understand transmission_id, engine_path,
output_id, or quality_profile_id to view an ordinary camera. Those values
remain necessary diagnostics and internal contracts, but they are not the
primary product model.
A published source has:
- an enabled publication intent;
- a role:
main,sub,zoom, orcustom; - a visible label;
- host/server affinity;
- quality and transport policy hints.
Implications:
- camera discovery should default useful video sources toward publication where that is safe;
- saving a camera source should be enough to reconcile its live-view artifacts;
- generated pipelines and transmissions should be read-only or advanced in the normal UI;
- a broken implicit pipeline is a product-level problem, not something the user should fix by hand in the transmission editor.
Decision 2: manual pipeline publication is still first-class
Automatic camera publication is the primary flow, but users can still publish custom rendered video from pipelines.
The stream.publish_video operator should let a pipeline publish a video as a
live-view group or variant. The user-facing fields are:
- live-view name;
- variant name;
- role;
- visual profile;
- dashboard visibility;
- Home Assistant visibility.
The reconciler owns the generated Transmission and may write the deterministic
transmission_id back into the node. Two manual pipelines may publish into the
same live-view group with different roles, such as one main variant and one
sub variant.
Implications:
- pipeline save/delete events should trigger streaming reconciliation without making the core know about streaming semantics;
- disabling a manual pipeline disables its generated publication artifacts;
- deleting a manual pipeline removes only artifacts generated by that pipeline;
- the reconciler must not delete user-authored pipelines just because a streaming artifact is missing or stale.
Decision 3: liveness is frame freshness, not process presence
Toposync must never mark a frozen last frame as live just because FFmpeg, MediaMTX, go2rtc, or another process is running.
Live playback requires all of the following:
- a recent selected frame;
- an active selected writer;
- a healthy selected media output;
- transport-specific health when playback is active.
Important classifications:
| Classification | Meaning |
|---|---|
no_frame | No selected writer/frame is feeding the transmission. |
source_pipeline_stale | The selected writer exists, but the selected frame is too old. |
publisher_down | Frames exist, but the media publisher is not running for the selected output. |
event_gated_idle | The pipeline is intentionally idle because an event gate is closed. |
app_player_lifecycle | A recent player-side event indicates warmup, stall, or playback failure. |
Primary user hints should be ordered by actionable cause:
- blocking URL/auth errors for the active transport;
- no selected writer/frame;
- stale selected frame;
- publisher down while a writer exists;
- active transport failure;
- technical warnings for inactive transports.
Inactive transport warnings must not become the main error while the active transport is healthy.
Decision 4: transport policy is contextual
There is no single best browser transport for every situation.
| Context | Preferred order | Rationale |
|---|---|---|
| Web grid/passive dashboard | MSE -> HLS -> JSMpeg | Avoid opening WebRTC for every tile. Prefer smooth web playback, keep stable fallback. |
| Web fullscreen | MSE -> HLS -> JSMpeg | Fullscreen usually needs quality and stability more than sub-second latency. |
| Web PTZ or explicit low latency | WebRTC -> MSE -> HLS -> JSMpeg | Low latency is valuable when the user is controlling the camera. |
| Home Assistant ingress | HLS -> MSE -> JSMpeg | Ingress should stay stable and proxy-friendly. Direct browser WebRTC is blocked by default. |
| Home Assistant entity/Cloud | Native Home Assistant camera contract | Home Assistant chooses playback from its camera entity model. |
| App/mobile/PiP | HLS -> JSMpeg | HLS is predictable; WebRTC is explicit; MSE depends on the wrapper. |
| Fixed debug page | User-selected transport only | Diagnostics must not hide failures behind automatic switching. |
RTSP is not a browser transport. It remains the internal and ecosystem contract for Home Assistant Core, VLC/ffplay, Frigate/dev, go2rtc, and diagnostics.
Decision 5: HLS is the stable baseline
HLS is the compatibility baseline for:
- unknown browser/network conditions;
- Home Assistant ingress;
- app/mobile playback;
- fallback after MSE/WebRTC failure;
- diagnostic confirmation that a stream is viewable.
HLS health requires more than a signed URL:
- the playlist responds;
- media sequence advances;
- the tail segment is retrievable;
- the selected runtime frame remains fresh.
Initial buffer stalls can be part of warmup. If playback recovers and the stream has fresh frames, the initial stall should not remain the primary user error.
Decision 6: MSE is a demand-started web transport
MSE is the preferred passive web transport when it is available and compatible. It is implemented through go2rtc, but go2rtc does not own Toposync's camera domain.
Rules:
- go2rtc consumes internal MediaMTX RTSP paths;
- go2rtc must not connect directly to cameras;
- the browser connects to Toposync signed WebSocket proxy URLs, not go2rtc directly;
- MSE output URLs may be returned when the sidecar is startable, even if the go2rtc process is currently stopped;
- opening the signed MSE WebSocket may start or update go2rtc on demand;
- no MSE viewer means no MSE-specific work should be required.
MSE is synthetic. It is derived from a real backing output and should not be
persisted as TransmissionOutput(protocol="mse").
Reasons:
- the user should not pay sidecar cost before a viewer requests it;
- a stopped sidecar is a normal idle state, not a broken state;
- Toposync remains responsible for auth, playback policy, demand, and health.
Decision 7: WebRTC is contextual low latency
WebRTC/WHEP is valuable for PTZ, autotrack, and explicit low-latency inspection. It is not the default for every live tile.
Reasons:
- WebRTC depends on ICE candidates, UDP reachability, NAT, and port mapping;
- Home Assistant add-on ingress makes direct browser WebRTC especially fragile;
- starting WebRTC for every grid tile increases cost and failure noise;
- HLS or MSE can be healthy while WebRTC correctly reports a low-latency networking warning.
Generated outputs should include WebRTC for zoom/PTZ publications or when
transport_policy.enable_webrtc=true. WebRTC warnings become primary only when
the user chose WebRTC, requested low latency, or no stable active transport is
available.
Decision 8: JSMpeg is the final visual fallback
JSMpeg exists so the user can still see something when better transports fail. It is not a quality path.
Rules:
- WebSocket MPEG-TS with MPEG-1 video;
- no audio;
- low resolution and low FPS;
- each active session owns its FFmpeg process;
- the source is the selected Toposync runtime frame or an explicit placeholder;
- the encoder stops when the WebSocket closes.
JSMpeg should not connect to cameras directly and should not be persisted as
TransmissionOutput(protocol="jsmpeg").
Decision 9: media work must be demand-scoped
Expensive media work should only exist while it serves a real session or active publication requirement.
Demand includes:
- playback session id;
- transport;
- output id;
- live-view/transmission;
- lease/heartbeat time to live.
Examples:
- dashboard tiles renew demand while mounted;
- debug pages renew demand for the fixed selected transport only;
- Home Assistant entity playback renews demand while
stream_source(), still, or WebRTC offer handling is active; - MSE starts go2rtc only when a signed MSE WebSocket opens;
- JSMpeg starts FFmpeg only while a WebSocket session exists.
This rule prevents one viewer or one transport from accidentally keeping an unrelated camera source, output, or variant hot.
Decision 10: Home Assistant Cloud uses native entities
The Toposync UI inside Home Assistant ingress and native Home Assistant camera entities are different playback surfaces.
Ingress/sidebar:
- is the Toposync web UI;
- remains HLS-first;
- can use MSE through the Toposync proxy when available;
- can use JSMpeg as a visual fallback;
- does not rely on direct browser WebRTC by default.
Native Home Assistant integration:
- exports published Toposync live views as
camera.*entities; - uses internal Toposync/MediaMTX RTSP for
stream_source(); - uses Toposync still endpoints for thumbnails;
- never exposes direct camera credentials or direct camera RTSP URLs;
- keeps native WebRTC offer handling opt-in until validated for the target network and Home Assistant Cloud path.
Decision:
Toposync publication -> Home Assistant camera entity -> Home Assistant stream component -> Home Assistant UI / Cloud
Do not treat a direct WebRTC player inside a Toposync ingress iframe as the normal Home Assistant Cloud strategy.
Decision 11: spatial video is a separate extension concern
Spatial video combines two domains:
- camera mapping and composition geometry;
- live-view playback and media texture lifecycle.
It belongs in the spatial_video extension. It should consume existing
composition, camera, PTZ, and streaming APIs instead of pushing spatial video
rules into the core or the streaming extension.
Current rules:
- only mapped cameras with active live-view publications are projected;
- 2D and 3D spatial views share projection, PTZ, stream texture, clipping, and marker logic where possible;
- video is projected above the floor/areas and below walls/objects where depth allows it;
- overlapping camera projections are allowed for now;
- z-fighting must be prevented through geometry offsets, material settings, and render ordering.
Decision 12: calibration is global transform plus local refinement
Camera mapping should keep a simple global base while allowing local correction.
The calibration model is:
- four corner pairs define the base projection;
- global move, rotate, and corner dragging adjust the whole view;
- internal refinement points apply local deformation;
- each PTZ view can have its own calibration and refinement points.
The local refinement model is part of calibration, not just a visual effect. It must affect spatial projection and camera-to-world mapping used by pipelines.
Lens correction is intentionally not a separate user workflow yet. Future lens models can be added incrementally, but the current user path is manual spatial refinement.
Decision 13: PTZ mapping can synthesize poses, with warnings
PTZ cameras rarely stay exactly on a calibrated preset. Spatial video may use synthetic poses when the current PTZ state is between or near calibrated views.
Resolution states:
| State | Meaning |
|---|---|
matched | Current pose matches a calibrated view. |
interpolated | Current pose is inside the calibrated envelope. |
extrapolated | Current pose is slightly outside the envelope and still conservative. |
nearest_reference | Current pose is too far; render nearest view with a strong warning. |
single_reference | Only one view exists; render it with a strong warning. |
fallback | Pose data is incomplete but a visual fallback exists. |
unmatched | No usable projection data exists. |
Transport errors have higher visual priority than pose-quality warnings. A bad mapping warning should not look like a broken stream.
Decision 14: area clipping is geometric, not media correction
Spatial video can clip a camera projection to one area from the same composition. The clipping happens when generating projection geometry, not in a per-frame shader or framebuffer path.
Reasons:
- predictable performance;
- stable UV interpolation;
- simpler interaction with 2D and 3D views;
- no extra GPU pass per active stream.
Area clipping is a spatial crop. It should not be used to fix letterboxing, camera aspect mismatch, or transport padding.
Decision 15: content rect is media metadata
The calibration snapshot can differ from the streamed video if an output uses
resize_mode="contain" and adds black padding. The fix is metadata, not user
recalibration.
Playback output URLs include:
{
"content_rect": { "x": 0, "y": 0, "width": 1, "height": 1 }
}
content_rect is the useful video rectangle, normalized in output texture
coordinates. Spatial video remaps UVs to this rectangle before projection.
Rules:
- no user setting should be required for normal letterbox correction;
- the calculation should use the same contain math as streaming resize;
- black-border detection is only a defensive fallback when metadata is missing;
- area clipping remains separate and can still make the projected video look cropped.
Decision 16: debug views are fixed-transport tools
The stream debug route exists to validate one transport against one stream or variant. It must not silently switch transport.
Expected behavior:
- opening
transport=hlstests HLS only; - opening
transport=msetests MSE only; - opening
transport=webrtctests WebRTC only; - opening
transport=jsmpegtests JSMpeg only.
Some transports are expected to fail in some environments. That failure is the point of the tool.
Debug output should include:
- API events;
- demand/heartbeat events;
- playback events;
- transport events;
- probe results;
- first-frame or visual validation status when possible.
Decision 17: Home Assistant ingress paths are part of the contract
Any browser-visible route, link, API URL, WebSocket URL, EventSource URL, extension asset URL, or file URL must work under Home Assistant ingress and other non-root deployments.
Rules:
- use the host/router/base-path helpers instead of hardcoded root paths;
- compare logical routes only after accounting for the public base path;
- extension bundles must be rebuilt when extension UI source changes;
- diagnostics links must preserve the ingress prefix.
A feature that works at http://localhost:5173/ but breaks inside Home
Assistant ingress is incomplete.
Decision 18: validation must inspect media when media visibility is the claim
Metadata-only validation is not enough when the change claims that video is visible, aligned, cropped, or stable.
Preferred validation:
- targeted unit tests for policy, tokens, resize math, geometry, clipping, and PTZ pose resolution;
- browser validation for HLS/MSE/WebRTC/JSMpeg when relevant;
- frame extraction or screenshots/contact sheets for transport fixes;
- visual inspection for spatial projection and
content_rectchanges; - Home Assistant ingress checks whenever routes, WebSockets, or assets change.
Use the smallest reliable test set for the change, but verify the layer that actually failed.
Rejected alternatives
Use go2rtc as the main media engine
Rejected for now. go2rtc is useful for MSE/WebRTC browser playback, but MediaMTX remains the main publication/distribution engine. Toposync should own camera ingestion, publication reconciliation, auth, demand, and health.
Make WebRTC the default for all web playback
Rejected. WebRTC is excellent for low latency, but it is fragile across NAT, Home Assistant ingress, add-on port mapping, and multi-tile dashboards.
Persist MSE and JSMpeg as real transmission outputs
Rejected. Both are synthetic browser transports derived from real backing outputs or runtime frames. Persisting them as first-class outputs would blur engine ownership and make reconciliation more complex.
Ask users to manually fix letterbox padding
Rejected. The padding is introduced by media resizing, so the correction belongs to media metadata. Manual calibration should stay focused on real spatial mapping.
Put spatial video policy in the core
Rejected. Core should provide generic composition, plugin, route, event, and pipeline primitives. Projection policy belongs to the spatial video extension.
When to update this page
Update this page when a change:
- changes the default transport order;
- changes when WebRTC, MSE, or JSMpeg are considered available;
- changes camera publication or manual pipeline publication semantics;
- changes liveness classification or primary user hint priority;
- changes Home Assistant Cloud or ingress behavior;
- changes spatial video projection, PTZ pose synthesis, clipping, or media UV rules;
- moves behavior across the core/extension boundary.