Media decisions

This page records the current product and architecture decisions behind Toposync live viewing, media transports, Home Assistant playback, camera publication, and spatial video.

It is intentionally stricter than a how-to guide. If a future change conflicts with one of these decisions, update this page as part of the same change instead of hiding the conflict in local UI heuristics or fallback code.

Related user-facing pages in the canonical English documentation:

Decision summary

Area	Decision
User model	Users manage camera sources, publications, live views, and variants. `Transmission` is an advanced/generated artifact.
Stability	Stable visible playback is more important than latency or peak quality.
Liveness	A stream is live only when it has a fresh selected frame, selected writer, and healthy active output.
Transports	Transport choice is contextual. HLS remains the compatibility baseline.
MSE	MSE is a demand-started web transport through go2rtc and Toposync's signed proxy.
WebRTC	WebRTC is for explicit low-latency/PTZ contexts, not every dashboard tile.
JSMpeg	JSMpeg is the final visual fallback, session-scoped and low quality by design.
Home Assistant Cloud	Cloud-friendly playback goes through native Home Assistant `camera` entities.
Spatial video	Spatial projection is a separate extension concern that consumes mapped cameras and live-view playback.
Core boundary	Core stays generic; streaming, camera publication, and spatial video policy live in extensions.

Decision 1: camera playback starts from published sources

Normal camera playback starts from a camera source, not from a manually created technical transmission.

camera source
  -> publication intent
  -> reconciler
  -> implicit pipeline
  -> stream.publish_video
  -> transmission runtime
  -> media output
  -> viewer

Users should not need to understand transmission_id, engine_path, output_id, or quality_profile_id to view an ordinary camera. Those values remain necessary diagnostics and internal contracts, but they are not the primary product model.

A published source has:

an enabled publication intent;
a role: main, sub, zoom, or custom;
a visible label;
host/server affinity;
quality and transport policy hints.

Implications:

camera discovery should default useful video sources toward publication where that is safe;
saving a camera source should be enough to reconcile its live-view artifacts;
generated pipelines and transmissions should be read-only or advanced in the normal UI;
a broken implicit pipeline is a product-level problem, not something the user should fix by hand in the transmission editor.

Decision 2: manual pipeline publication is still first-class

Automatic camera publication is the primary flow, but users can still publish custom rendered video from pipelines.

The stream.publish_video operator should let a pipeline publish a video as a live-view group or variant. The user-facing fields are:

live-view name;
variant name;
role;
visual profile;
dashboard visibility;
Home Assistant visibility.

The reconciler owns the generated Transmission and may write the deterministic transmission_id back into the node. Two manual pipelines may publish into the same live-view group with different roles, such as one main variant and one sub variant.

Implications:

pipeline save/delete events should trigger streaming reconciliation without making the core know about streaming semantics;
disabling a manual pipeline disables its generated publication artifacts;
deleting a manual pipeline removes only artifacts generated by that pipeline;
the reconciler must not delete user-authored pipelines just because a streaming artifact is missing or stale.

Decision 3: liveness is frame freshness, not process presence

Toposync must never mark a frozen last frame as live just because FFmpeg, MediaMTX, go2rtc, or another process is running.

Live playback requires all of the following:

a recent selected frame;
an active selected writer;
a healthy selected media output;
transport-specific health when playback is active.

Important classifications:

Classification	Meaning
`no_frame`	No selected writer/frame is feeding the transmission.
`source_pipeline_stale`	The selected writer exists, but the selected frame is too old.
`publisher_down`	Frames exist, but the media publisher is not running for the selected output.
`event_gated_idle`	The pipeline is intentionally idle because an event gate is closed.
`app_player_lifecycle`	A recent player-side event indicates warmup, stall, or playback failure.

Primary user hints should be ordered by actionable cause:

blocking URL/auth errors for the active transport;
no selected writer/frame;
stale selected frame;
publisher down while a writer exists;
active transport failure;
technical warnings for inactive transports.

Inactive transport warnings must not become the main error while the active transport is healthy.

Decision 4: transport policy is contextual

There is no single best browser transport for every situation.

Context	Preferred order	Rationale
Web grid/passive dashboard	MSE -> HLS -> JSMpeg	Avoid opening WebRTC for every tile. Prefer smooth web playback, keep stable fallback.
Web fullscreen	MSE -> HLS -> JSMpeg	Fullscreen usually needs quality and stability more than sub-second latency.
Web PTZ or explicit low latency	WebRTC -> MSE -> HLS -> JSMpeg	Low latency is valuable when the user is controlling the camera.
Home Assistant ingress	HLS -> MSE -> JSMpeg	Ingress should stay stable and proxy-friendly. Direct browser WebRTC is blocked by default.
Home Assistant entity/Cloud	Native Home Assistant camera contract	Home Assistant chooses playback from its camera entity model.
App/mobile/PiP	HLS -> JSMpeg	HLS is predictable; WebRTC is explicit; MSE depends on the wrapper.
Fixed debug page	User-selected transport only	Diagnostics must not hide failures behind automatic switching.

RTSP is not a browser transport. It remains the internal and ecosystem contract for Home Assistant Core, VLC/ffplay, Frigate/dev, go2rtc, and diagnostics.

Decision 5: HLS is the stable baseline

HLS is the compatibility baseline for:

unknown browser/network conditions;
Home Assistant ingress;
app/mobile playback;
fallback after MSE/WebRTC failure;
diagnostic confirmation that a stream is viewable.

HLS health requires more than a signed URL:

the playlist responds;
media sequence advances;
the tail segment is retrievable;
the selected runtime frame remains fresh.

Initial buffer stalls can be part of warmup. If playback recovers and the stream has fresh frames, the initial stall should not remain the primary user error.

Decision 6: MSE is a demand-started web transport

MSE is the preferred passive web transport when it is available and compatible. It is implemented through go2rtc, but go2rtc does not own Toposync's camera domain.

Rules:

go2rtc consumes internal MediaMTX RTSP paths;
go2rtc must not connect directly to cameras;
the browser connects to Toposync signed WebSocket proxy URLs, not go2rtc directly;
MSE output URLs may be returned when the sidecar is startable, even if the go2rtc process is currently stopped;
opening the signed MSE WebSocket may start or update go2rtc on demand;
no MSE viewer means no MSE-specific work should be required.

MSE is synthetic. It is derived from a real backing output and should not be persisted as TransmissionOutput(protocol="mse").

Reasons:

the user should not pay sidecar cost before a viewer requests it;
a stopped sidecar is a normal idle state, not a broken state;
Toposync remains responsible for auth, playback policy, demand, and health.

Decision 7: WebRTC is contextual low latency

WebRTC/WHEP is valuable for PTZ, autotrack, and explicit low-latency inspection. It is not the default for every live tile.

Reasons:

WebRTC depends on ICE candidates, UDP reachability, NAT, and port mapping;
Home Assistant add-on ingress makes direct browser WebRTC especially fragile;
starting WebRTC for every grid tile increases cost and failure noise;
HLS or MSE can be healthy while WebRTC correctly reports a low-latency networking warning.

Generated outputs should include WebRTC for zoom/PTZ publications or when transport_policy.enable_webrtc=true. WebRTC warnings become primary only when the user chose WebRTC, requested low latency, or no stable active transport is available.

Decision 8: JSMpeg is the final visual fallback

JSMpeg exists so the user can still see something when better transports fail. It is not a quality path.

Rules:

WebSocket MPEG-TS with MPEG-1 video;
no audio;
low resolution and low FPS;
each active session owns its FFmpeg process;
the source is the selected Toposync runtime frame or an explicit placeholder;
the encoder stops when the WebSocket closes.

JSMpeg should not connect to cameras directly and should not be persisted as TransmissionOutput(protocol="jsmpeg").

Decision 9: media work must be demand-scoped

Expensive media work should only exist while it serves a real session or active publication requirement.

Demand includes:

playback session id;
transport;
output id;
live-view/transmission;
lease/heartbeat time to live.

Examples:

dashboard tiles renew demand while mounted;
debug pages renew demand for the fixed selected transport only;
Home Assistant entity playback renews demand while stream_source(), still, or WebRTC offer handling is active;
MSE starts go2rtc only when a signed MSE WebSocket opens;
JSMpeg starts FFmpeg only while a WebSocket session exists.

This rule prevents one viewer or one transport from accidentally keeping an unrelated camera source, output, or variant hot.

Decision 10: Home Assistant Cloud uses native entities

The Toposync UI inside Home Assistant ingress and native Home Assistant camera entities are different playback surfaces.

Ingress/sidebar:

is the Toposync web UI;
remains HLS-first;
can use MSE through the Toposync proxy when available;
can use JSMpeg as a visual fallback;
does not rely on direct browser WebRTC by default.

Native Home Assistant integration:

exports published Toposync live views as camera.* entities;
uses internal Toposync/MediaMTX RTSP for stream_source();
uses Toposync still endpoints for thumbnails;
never exposes direct camera credentials or direct camera RTSP URLs;
keeps native WebRTC offer handling opt-in until validated for the target network and Home Assistant Cloud path.

Decision:

Toposync publication -> Home Assistant camera entity -> Home Assistant stream component -> Home Assistant UI / Cloud

Do not treat a direct WebRTC player inside a Toposync ingress iframe as the normal Home Assistant Cloud strategy.

Decision 11: spatial video is a separate extension concern

Spatial video combines two domains:

camera mapping and composition geometry;
live-view playback and media texture lifecycle.

It belongs in the spatial_video extension. It should consume existing composition, camera, PTZ, and streaming APIs instead of pushing spatial video rules into the core or the streaming extension.

Current rules:

only mapped cameras with active live-view publications are projected;
2D and 3D spatial views share projection, PTZ, stream texture, clipping, and marker logic where possible;
video is projected above the floor/areas and below walls/objects where depth allows it;
overlapping camera projections are allowed for now;
z-fighting must be prevented through geometry offsets, material settings, and render ordering.

Camera mapping should keep a simple global base while allowing local correction.

The calibration model is:

four corner pairs define the base projection;
global move, rotate, and corner dragging adjust the whole view;
internal refinement points apply local deformation;
each PTZ view can have its own calibration and refinement points.

The local refinement model is part of calibration, not just a visual effect. It must affect spatial projection and camera-to-world mapping used by pipelines.

Lens correction is intentionally not a separate user workflow yet. Future lens models can be added incrementally, but the current user path is manual spatial refinement.

Decision 13: PTZ mapping can synthesize poses, with warnings

PTZ cameras rarely stay exactly on a calibrated preset. Spatial video may use synthetic poses when the current PTZ state is between or near calibrated views.

Resolution states:

State	Meaning
`matched`	Current pose matches a calibrated view.
`interpolated`	Current pose is inside the calibrated envelope.
`extrapolated`	Current pose is slightly outside the envelope and still conservative.
`nearest_reference`	Current pose is too far; render nearest view with a strong warning.
`single_reference`	Only one view exists; render it with a strong warning.
`fallback`	Pose data is incomplete but a visual fallback exists.
`unmatched`	No usable projection data exists.

Transport errors have higher visual priority than pose-quality warnings. A bad mapping warning should not look like a broken stream.

Decision 14: area clipping is geometric, not media correction

Spatial video can clip a camera projection to one area from the same composition. The clipping happens when generating projection geometry, not in a per-frame shader or framebuffer path.

Reasons:

predictable performance;
stable UV interpolation;
simpler interaction with 2D and 3D views;
no extra GPU pass per active stream.

Area clipping is a spatial crop. It should not be used to fix letterboxing, camera aspect mismatch, or transport padding.

Decision 15: content rect is media metadata

The calibration snapshot can differ from the streamed video if an output uses resize_mode="contain" and adds black padding. The fix is metadata, not user recalibration.

Playback output URLs include:

{
  "content_rect": { "x": 0, "y": 0, "width": 1, "height": 1 }
}

content_rect is the useful video rectangle, normalized in output texture coordinates. Spatial video remaps UVs to this rectangle before projection.

Rules:

no user setting should be required for normal letterbox correction;
the calculation should use the same contain math as streaming resize;
black-border detection is only a defensive fallback when metadata is missing;
area clipping remains separate and can still make the projected video look cropped.

Decision 16: debug views are fixed-transport tools

The stream debug route exists to validate one transport against one stream or variant. It must not silently switch transport.

Expected behavior:

opening transport=hls tests HLS only;
opening transport=mse tests MSE only;
opening transport=webrtc tests WebRTC only;
opening transport=jsmpeg tests JSMpeg only.

Some transports are expected to fail in some environments. That failure is the point of the tool.

Debug output should include:

API events;
demand/heartbeat events;
playback events;
transport events;
probe results;
first-frame or visual validation status when possible.

Decision 17: Home Assistant ingress paths are part of the contract

Any browser-visible route, link, API URL, WebSocket URL, EventSource URL, extension asset URL, or file URL must work under Home Assistant ingress and other non-root deployments.

Rules:

use the host/router/base-path helpers instead of hardcoded root paths;
compare logical routes only after accounting for the public base path;
extension bundles must be rebuilt when extension UI source changes;
diagnostics links must preserve the ingress prefix.

A feature that works at http://localhost:5173/ but breaks inside Home Assistant ingress is incomplete.

Decision 18: validation must inspect media when media visibility is the claim

Metadata-only validation is not enough when the change claims that video is visible, aligned, cropped, or stable.

Preferred validation:

targeted unit tests for policy, tokens, resize math, geometry, clipping, and PTZ pose resolution;
browser validation for HLS/MSE/WebRTC/JSMpeg when relevant;
frame extraction or screenshots/contact sheets for transport fixes;
visual inspection for spatial projection and content_rect changes;
Home Assistant ingress checks whenever routes, WebSockets, or assets change.

Use the smallest reliable test set for the change, but verify the layer that actually failed.

Rejected alternatives

Use go2rtc as the main media engine

Rejected for now. go2rtc is useful for MSE/WebRTC browser playback, but MediaMTX remains the main publication/distribution engine. Toposync should own camera ingestion, publication reconciliation, auth, demand, and health.

Make WebRTC the default for all web playback

Rejected. WebRTC is excellent for low latency, but it is fragile across NAT, Home Assistant ingress, add-on port mapping, and multi-tile dashboards.

Persist MSE and JSMpeg as real transmission outputs

Rejected. Both are synthetic browser transports derived from real backing outputs or runtime frames. Persisting them as first-class outputs would blur engine ownership and make reconciliation more complex.

Ask users to manually fix letterbox padding

Rejected. The padding is introduced by media resizing, so the correction belongs to media metadata. Manual calibration should stay focused on real spatial mapping.

Put spatial video policy in the core

Rejected. Core should provide generic composition, plugin, route, event, and pipeline primitives. Projection policy belongs to the spatial video extension.

When to update this page

Update this page when a change:

changes the default transport order;
changes when WebRTC, MSE, or JSMpeg are considered available;
changes camera publication or manual pipeline publication semantics;
changes liveness classification or primary user hint priority;
changes Home Assistant Cloud or ingress behavior;
changes spatial video projection, PTZ pose synthesis, clipping, or media UV rules;
moves behavior across the core/extension boundary.

Decision summary​

Decision 1: camera playback starts from published sources​

Decision 2: manual pipeline publication is still first-class​

Decision 3: liveness is frame freshness, not process presence​

Decision 4: transport policy is contextual​

Decision 5: HLS is the stable baseline​

Decision 6: MSE is a demand-started web transport​

Decision 7: WebRTC is contextual low latency​

Decision 8: JSMpeg is the final visual fallback​

Decision 9: media work must be demand-scoped​

Decision 10: Home Assistant Cloud uses native entities​

Decision 11: spatial video is a separate extension concern​

Decision 12: calibration is global transform plus local refinement​

Decision 13: PTZ mapping can synthesize poses, with warnings​

Decision 14: area clipping is geometric, not media correction​

Decision 15: content rect is media metadata​

Decision 16: debug views are fixed-transport tools​

Decision 17: Home Assistant ingress paths are part of the contract​

Decision 18: validation must inspect media when media visibility is the claim​

Rejected alternatives​

Use go2rtc as the main media engine​

Make WebRTC the default for all web playback​

Persist MSE and JSMpeg as real transmission outputs​

Ask users to manually fix letterbox padding​

Put spatial video policy in the core​

When to update this page​

Decision summary

Decision 1: camera playback starts from published sources

Decision 2: manual pipeline publication is still first-class

Decision 3: liveness is frame freshness, not process presence

Decision 4: transport policy is contextual

Decision 5: HLS is the stable baseline

Decision 6: MSE is a demand-started web transport

Decision 7: WebRTC is contextual low latency

Decision 8: JSMpeg is the final visual fallback

Decision 9: media work must be demand-scoped

Decision 10: Home Assistant Cloud uses native entities

Decision 11: spatial video is a separate extension concern

Decision 12: calibration is global transform plus local refinement

Decision 13: PTZ mapping can synthesize poses, with warnings

Decision 14: area clipping is geometric, not media correction

Decision 15: content rect is media metadata

Decision 16: debug views are fixed-transport tools

Decision 17: Home Assistant ingress paths are part of the contract

Decision 18: validation must inspect media when media visibility is the claim

Rejected alternatives

Use go2rtc as the main media engine

Make WebRTC the default for all web playback

Persist MSE and JSMpeg as real transmission outputs

Ask users to manually fix letterbox padding

Put spatial video policy in the core

When to update this page