Documentation
Architecture & design
How it works inside and why: the Source → Terminal → Session → renderer/bus pipeline, concurrency, the trust model, and the known limits.
tappty — design document
How tappty is structured and why. This is the architecture companion to the
README (the usage guide); it is for someone modifying the toolkit and
wanting the full picture — contracts, data shapes, the threading model, and the reasoning
behind each part. The dated history is in CHANGELOG.md; the remaining open
work (publish to PyPI; verify Windows) is noted in §11.
1. The one idea
A terminal program's output should flow through a single pipeline where every consumer is equal: the screen renderer, an out-of-process logger, and an automated driver are all just clients of the same observe/control contract. Get that right and a human and a bot can watch — and take turns driving — the exact same session, with no special-casing.
So tappty is split into four decoupled stages, each ignorant of the others' nature:
a program consumers (all equal)
┌─────────┐ bytes ┌──────────┐ grid ┌─────────┐ observe ┌──────────────┐
│ Source │──────────▶ │ Terminal │ ───────▶ │ Session │ ─────────▶ │ curses_ui │
│ (pty / │ │ (glass) │ │ (taps + │ │ pygame_ui │
│ engine)│ ◀──────────│ │ ◀─────── │ control)│ ◀───────── │ bus clients │
└─────────┘ input └──────────┘ write └─────────┘ control │ compositor │
└──────────────┘
- Source produces output and accepts input — it does not know what a screen is.
- Terminal models the glass — it does not know where bytes come from or who draws it.
- Session fans output to observers and routes input back — it does not know if a consumer is a window, a socket, or an AI.
- Renderers / bus clients consume the contract — they do not know each other exist.
Two consequences run through everything below:
- Fixed-size model. The Terminal is a fixed grid (default 80×24). The hosted program stays sealed in its own dimensions; making the real window bigger/smaller is a render-side concern (a viewport), never a resize the program sees.
- Bytes on the wire, characters on the glass. A byte source's raw bytes travel losslessly to stream observers, while the screen shows those bytes decoded to characters. The Session owns that one decode (see §2.3); the terminal backends stay encoding-agnostic.
2. The parts
2.1 Source — source.py
A Source is "something that produces terminal output and consumes input." The interface is
tiny — three callbacks supplied at start(), plus send_input() and stop():
class Source:
encoding = None # wire encoding of raw output, or None if it already emits text
returncode = None # child exit status after on_exit (None if N/A)
error = None # exception that ended the program (None if clean)
def start(self, on_output, on_wait, on_exit): ... # begin producing on a thread
def send_input(self, text): ... # feed input to the program
def stop(self): ... # ask it to end
on_output(text)— the program emitted output (pre-render; see the bytes/text note below).on_wait()— the program is blocked waiting for input ("your turn"). Only in-process runners fire this; a pty/pipe has no readline boundary, so an observer reads the grid.on_exit()— the program ended (always fired exactly once, from afinally).
The three class attributes are the contract the Session reads: encoding tells it whether
to decode (a byte source sets it, e.g. "utf-8"; a text source leaves it None),
returncode carries the child's exit status, and error carries an exception so a blocking
caller can re-raise it. Eight implementations ship:
PtySource(POSIX). Hosts an external program on a real pseudo-terminal:pty.openpty()subprocess.Popen(..., start_new_session=True, close_fds=True), with the child's stdin/stdout/stderr wired to the slave fd and aTIOCSWINSZioctl for the size. The open→spawn section is wrapped so a failed spawn (e.g. command not found) closes both fds and re-raises instead of leaking the pty. Output is forwarded as a byte-transparent latin-1 str (lossless: a stream observer can.encode("latin-1")to recover exact bytes); the Session decodes it for the screen byencoding(default UTF-8).send_inputencodes keystrokes with the same encoding and writes to the master fd.on_waitis not fired.
EngineSource(any OS). Wraps an in-processrunner(emit, readline)callable on a thread.emitison_output; the first time the runner callsreadline, the source fireson_wait()and blocks on an input queue untilsend_inputsupplies a line — giving in-process programs a clean turn boundary a pty can't. It is a text source (encodingNone). A runner exception is captured intoerror.stop()pushes a sentinel that makes a blockedreadline()raise an internal_StopRunner(aBaseException, so a runner'sexcept Exceptioncan't swallow it) and unwind cleanly — soSession.stop()can stop a runner that is waiting for input. (A runner busy elsewhere — compute/sleep — can't be force-stopped; its thread is a daemon and won't block process exit.)CastSource(any OS). Replays a recorded asciinema.castsession — the "recorded session" producer. A text source. It emits the recorded output events with their original timing (speedmultiplier;idle_time_limitcaps long pauses;looprepeats), so a recording streams through the exact same pipeline a live program would — which also makes a render reproducible.stop()is prompt even mid-pause: the inter-event wait is on athreading.Eventthatstop()sets. It sizes itself from the recording header (.width/.height) so the caller can size the Terminal first. Input is ignored. Formats: v2 (newline-delimited JSON — a header{"version":2,"width":..,"height":..}then[time, code, data]events, replaying only"o"output events) and compact v1 ({"version":1,"width","height","stdout":[[delay,data],...]}). Untrusted-input bounds: dimensions clamped toMAX_CAST_DIM(1000), v2 line reads capped atMAX_CAST_LINE(1 MiB), and the unstreamable v1 whole-filejson.loadrefused aboveMAX_CAST_FILE(16 MiB).TtyrecSource/AnsSource/ThreeASource(any OS). Three more replay/art producers sharing a_ReplaySourcebase (the timed play loop):.ttyrec(NetHack-format binary records — a byte source),.ansANSI/BBS art (CP437 + an optional SAUCE trailer), and.3aanimated ASCII art.replay_source(path)dispatches by extension. Untrusted-input bounds:.ttyreccaps each record atMAX_TTYREC_CHUNK;.ans/.3aare loaded whole, so each is refused aboveMAX_ART_FILE(16 MiB); dimensions are clamped where the format carries them (SAUCE). A parse error in any replay source surfaces viaSource.error(re-raised byrun_blocking), not a silent exit. (render_videoadditionally caps the materialized events by both count and cumulative bytes, plus the rendered duration, so a hostile recording can't OOM or balloon the frame count.)PipeSource(any OS). Hosts an external program over plain pipes (subprocesswithstdin/stdout,stderr→stdout,bufsize=0) — no pty. The "non-pty Source" (--no-pty), byte-transparent likePtySource. Caveat: with no tty the child detects it isn't interactive, so many programs block-buffer output and skip prompts/raw mode; it suits cooperative, line-oriented programs. Cross-platform and dependency-free.ConPtySource(Windows). Hosts a program on a Windows pseudo-console (ConPTY) viapywinpty(thewinextra). The Windows counterpart toPtySource. ConPTY emits ANSI/ VT100+ andpywinptyreturns already-decodedstr, so it is a text source and pairs withPyteTerminal(not the VT52 model). Written against the documentedPtyProcessAPI but not yet exercised on real Windows — provisional (finishing it is open work — see §11).
Shared reader loop. The three subprocess/pty sources (PtySource, PipeSource,
ConPtySource) all run the same daemon-thread loop, so it lives once in Source._pump:
pull chunks from a read_one() closure until it returns "" (EOF), forward each to
on_output, then in a finally reap the child's exit status (into returncode) and fire
on_exit. Each source supplies only a small read_one() (its read call, EOF handling, and
whether to decode) and the wait() form. EngineSource and CastSource have genuinely
different loops (turn-based queue; timed replay) and keep their own.
Adding a new byte producer (a telnet stream, a SSH channel) means implementing this one tiny
interface; nothing else changes — as CastSource, PipeSource, and ConPtySource (each
added later with no other module touched) show.
2.2 Terminal backends — terminal.py, pyte_terminal.py
The Terminal models the glass. Two backends ship behind one duck-typed read interface that a Session and the renderers rely on:
cols, rows # fixed dimensions (ints)
cx, cy # cursor column/row
write(text) # the hosted program's output goes here
snapshot() # whole screen as one "\n"-joined string
rows_text() # list of row strings
view_rows(offset=0) # `rows` lines scrolled back `offset` into history (0 = live)
cells(offset=0) # same window as styled `style.Cell`s (char + fg/bg/bold/reverse)
max_scroll() # how many scrolled-off lines are available
clear() # blank the grid, home the cursor
cells() is the colored parallel to view_rows(): each cell carries SGR attributes, which the
GUI renderers draw (see §2.5). The VT52 Terminal reports every cell with the default style
(it has no color); PyteTerminal reports pyte's per-cell fg/bg/bold/italic/underline/
strike/blink/reverse. The shared
style module (Cell, the ANSI palette, rgb()/resolve()/runs()) maps "default" to the
renderer's phosphor color, so uncolored text stays green and color shows only where a program
asks — the green-phosphor identity survives the addition of color.
Both are thread-safe (an RLock): the program thread writes while a render thread reads.
There is no shared base class or Protocol — with only two implementations the implicit
contract above (documented here) is lighter than an ABC, and the codebase is otherwise
annotation-free.
Terminal(VT52 spirit, zero deps). A fixed grid. Printable text advances the cursor with wrap + scroll. Control chars:CR,LF(scroll at the bottom),BS,FF(clear),TAB(8-column stops). VT52 escapes honored:ESC H(home),ESC J(erase to end of screen),ESC K(erase to end of line),ESC Y row col(direct cursor address, bytes offset by 32), andESC A/B/C/D(cursor up/down/right/left, bounds-clamped). It keeps scrollback — lines that scrolled off the top, the hardcopy "paper roll" — purely as a viewing aid (max_scroll/view_rows); the program never sees it. Right for plain/legacy programs that speak VT52; wrong for anything that speaks modern ANSI.PyteTerminal(full ANSI/VT100+, theansiextra). Wraps thepytelibrary behind the same read interface, so it drops in wherever aTerminalgoes (Session(PyteTerminal()),tapterm --ansi) with no change to Session or renderers. It usespyte.HistoryScreen+pyte.Stream, so it gets color/cursor-addressing/line-and-char edits and scrollback (read non-mutatingly fromhistory.top, so the program keeps writing to the live screen while a renderer views older lines). It is encoding-agnostic — it renders whatever characters the Session hands it (Unicode included).pyteis imported lazily (LGPLv3, fine as a separately-installed optional backend). This is the "b-full" backend the design always anticipated and the prerequisite for hosting a Windows ConPTY (which emits VT100+).
Backend selection is the caller's (Session(PyteTerminal(...))) or the CLI's (--ansi); on
Windows the CLI auto-enables ANSI for the ConPTY path (§4). Both validate cols/rows >= 1 at
construction, so a 0-sized grid can't be built and crash on the first write.
2.3 Session — session.py
The hub. It holds a Terminal and a Source, wires the source's three callbacks to the Terminal and the taps, and exposes the observe/control contract every client speaks.
Observe taps (subscribe to taste; each returns the callback so it can be removed with the
matching off_*):
on_stream(cb(text))— tap 1: raw program output, pre-render, temporal. For a byte source this is byte-lossless (a latin-1 transport); it is the program's exact bytes.on_frame(cb())— tap 2: the grid changed. Callsnapshot()to read it. The grid is the output decoded to characters.on_event(cb(name, info))— tap 3: events.WAIT(blocked on input),BELL,CLOSED,DRIVER {who}(the stick changed hands),ERROR {where, error}(a program/runner failure, or an observer callback that raised).
Bytes vs characters (the decode). The two output taps are deliberately different views:
a stream observer sees the program's exact bytes; the screen (the grid, snapshot(), a
renderer, the bus FRAME) is those bytes decoded to characters. The Session owns the one
decode — an incremental decoder (so a multibyte char split across reads is handled),
created in start() from self.source.encoding (UTF-8 by default for byte sources; None
for text sources, which are passed through). On _exit the decoder is flushed
(decode(b"", final=True)) so a stream ending on a partial multibyte sequence still renders
its final �. encoding="latin-1" makes the screen byte-transparent too.
snapshot() shape — the tap-2 / bus-FRAME payload, a plain dict:
{"rows": [str, ...], # plain text per row (for text consumers / loggers)
"cells": [[run, ...], ...], # the styled form: per-row run-length runs, color + attributes
"cx": int, "cy": int, "cols": int, "rows_n": int}
cells is style.encode_row per row — the same run encoding the web renderer uses ([col, text, fg_hex, bg_hex, bold, italic, underline, strike, blink]), so a remote bus client (e.g. a
compositor BusBacking panel) draws full color, not just text. (It is a dict rather than a
typed object because it crosses the bus's JSON boundary, where a type wouldn't survive.)
Observer isolation. Output and frame observers are dispatched through _fanout, which
catches a misbehaving callback, emits an ERROR event as a breadcrumb, and keeps going — one
bad client can't kill the output path for everyone. Event observers are dispatched
defensively too (and never via _fanout, so an ERROR for a failing observer can't recurse).
Control.
send_input(text, by=None)— inject input.by=Noneis trusted/internal; a named controller's input is applied only while it holds the stick. Returns whether it was applied.feed_key(ch, by="local", auto_take=True)/feed_text— interactive keystrokes: local echo + line assembly, sent on Enter; backspace edits the buffer.auto_takemeans typing implicitly grabs the stick (the local human preempts). Local echo goes through_echo_local, which writes to the grid and fans out a frame (so a remote renderer sees typed characters immediately) but not to the stream tap (local echo isn't program output).send_key(data, by="local", auto_take=True)— the raw counterpart for full-screen TUIs: senddata(a printable char, or a VT sequence fromtappty.keys) straight to the program, with no echo and no line buffer (the program redraws itself). Same stick gating. A renderer in raw mode (raw_keys) translates each keystroke and calls this (see §2.5, §9).echo(text)— show injected text on the screen and to observers (so a watcher sees what a remote controller "typed"); routed through the same protected fan-out.
The talking stick (control arbitration). Exactly one controller "drives" (holds the keyboard) at a time:
claim_control(name, role="ai")— register a controller; the first to claim becomes the driver.has_controller(name)reports registration;has_control(name)reports driving.take(name)— grab the stick, courtesy-gated: ahuman/interactivecontroller can preempt anyone; anaican take only a free stick or one held by another non-human.release(name)/drop_controller(name)— give it up / deregister (auto-releases if held).
This is what makes shared control safe — the line buffer is never raced because only the driver's keys register.
Lifecycle. start() builds the decoder and starts the source. run_in_thread(runner)
wraps a bare runner as an EngineSource and starts (non-blocking). run_blocking(runner)
starts and joins the source thread, then re-raises source.error if the program failed.
stop() stops the source and briefly joins its thread — the owning teardown path (a
renderer that started the program calls it on exit; a non-owning view does not).
2.4 The bus — bus.py
The taps and control are in-process. BusServer exposes the same contract over a
Unix-domain socket or TCP, and BusClient is the other end — so a session running in one
process can be observed and driven from another (a logger, a remote renderer, an automated
client). One server = one session, N clients.
Wire format. Newline-delimited text frames: VERB[ payload]\n. The payload is JSON for
most verbs, or rest-of-line literal text for LINE/CMD. Frames are read with a size cap
(MAX_FRAME, 256 KiB) on both ends; an oversized frame drops the connection.
Protocol.
| client → server | meaning | reply |
|---|---|---|
HELLO <json> |
identify {role, name, token?} |
OK {name, driver} | DENIED |
SNAP |
request the current grid | FRAME <snapshot> |
INFO |
session info | INFO <snapshot + done, driver, waiting> |
SUB |
subscribe to pushed OUT/FRAME/EVENT |
OK |
LINE <text> |
inject a line (needs the stick) | — | DENIED |
CMD <text> |
send a line, capture output to the next prompt | RESP <json> | DENIED |
KEY <json-string> |
inject raw keystrokes (JSON-encoded so ctrl chars survive) | — | DENIED |
TAKE / RELEASE |
grab / drop the talking stick | OK/DENIED {driver} |
| server → client (pushed, if subscribed) | meaning |
|---|---|
OUT <json-string> |
a raw output chunk (tap 1 — exact bytes) |
FRAME <snapshot> |
the grid changed (tap 2 — decoded screen) |
EVENT <json> |
{name: WAIT|BELL|CLOSED|DRIVER|ERROR, ...} (tap 3) |
So the bytes/characters split carries over the wire: OUT is raw bytes, FRAME/SNAP is the
decoded screen — as both plain-text rows and styled cells (color + attributes), so a remote
client renders in full color. The protocol is identical over either transport.
Server internals. A daemon accept thread spawns one daemon serve thread per connection.
Per-connection state is a small @dataclass _Conn(name, lock, sub, role, claimed, authed);
verbs dispatch through a {verb: handler} table (HELLO is handled specially, before the
auth gate; unknown verbs are ignored — forward-compatible). The send lock per connection
serializes concurrent pushes.
CMD is the synchronous primitive an automated driver needs: send a line and get back
exactly its output up to the program's next prompt. Each in-flight CMD is a
@dataclass _Capture(buf, ev, size, truncated, completed, cancelled). Output is accumulated
into buf (byte-bounded by MAX_CAPTURE, 1 MiB; truncated flags drops). The capture's
event is set when a WAIT/CLOSED arrives (completed=True) or when stop() shuts the
server down (cancelled=True, but only on a capture that hasn't completed). The reply's
timeout is not (completed and not cancelled) — so a client can distinguish reached the
prompt (clean) from timed out mid-command from interrupted by shutdown. BusClient.cmd
raises TimeoutError rather than return partial output as if complete.
Transports & addressing. _resolve(addr): a (host, port) tuple → TCP (works anywhere,
incl. Windows where AF_UNIX is absent); anything else → a Unix-socket path (and a clear
error if the platform has no AF_UNIX). See §8 for the security posture (owner-only socket,
loopback-only TCP, optional token).
Lifecycle. start() registers the three session taps, binds/listens, and starts
accepting. stop() is the mirror and is restart-safe: it detaches the taps, wakes any pending
CMD captures, drops all client connections (closing sockets and releasing any stick they
held), closes and clears the listener, and unlinks the Unix socket path. A second start()
while running is a no-op; start()/stop() cycles are clean.
BusClient. A background reader thread puts every (verb, payload) on inbox. It is
single-consumer: wait_for(verb) drains the inbox until a match, discarding intervening
messages — for one request/reply at a time before subscribing, not concurrent callers. A
subscriber must drain inbox (it is unbounded). send() is a low-level string-frame API that
rejects non-str payloads and embedded newlines (which would inject frames).
2.5 Renderers — curses_ui.py (CUI), pygame_ui.py / arcade_ui.py (GUI)
Each is a Session client exposing run(session, runner, title=…): start the hosted program,
then loop — read the grid, draw it, forward keystrokes. All are owning renderers: they call
session.stop() on exit (in a finally), so closing the window stops the hosted program. Each
honors session.raw_keys: in line mode it feeds keys via feed_key; in raw mode it translates
its native key events (arrows/function/Ctrl) through tappty.keys and send_keys them raw.
-
curses_uidraws a viewport into the fixed model: the whole 80×24 when the real terminal is big enough, a cursor-following sub-rectangle when it's smaller, plus a status line. The geometry is a pure, unit-tested function —viewport(model_w, model_h, screen_w, screen_h, cx, cy, status=1) → (ox, oy, vw, vh)— so resize never touches the model. It readscells()too, so it draws SGR color via curses color pairs (pure-mapped to indices, allocated lazily, capped atCOLOR_PAIRS) where the terminal supports it, degrading to the phosphor/default foreground otherwise. Input maps Enter/Backspace/printable ASCII to the Session; arrows/function keys are ignored;Ctrl-]force-quits. -
pygame_uidraws the grid in a monospace font with a blinking block cursor, readingcells()so each glyph takes its full SGR style — color (fg/bg, reverse), bold/italic/ underline/strikethrough via the font flags, and a blink phase — with"default"resolving to phosphor green, so uncolored output looks exactly as before. Glyphs are rendered lazily and cached by(char, fg, bold, italic, underline, strike); a non-default background is filled per cell. Scrollback is mouse-wheel / PageUp-PageDown; typing snaps back to live. Optional per-second text + PNG snapshots (andF12on demand) let an automated observer watch the same session;max_secondsis a hard loop cap for scripting/tests,exit_when_donecloses when the program ends.fpsis validated>= 1. -
arcade_uiis the same renderer on the arcade (pyglet/OpenGL) stack — the samerun(...)signature, color, and green-phosphor look aspygame_ui, sharing nothing but the Session contract (the proof that a renderer is just an adapter). Each row is drawn asstyle.runs()— maximal runs of same-colored cells, one pooledarcade.Texteach (a monospace font keeps the columns aligned); the cursor and the scrollback bar are primitives. Same color / scrollback / snapshots /F12/exit_when_done/max_seconds/fpsas pygame.arcadeis imported lazily — thearcade.Windowsubclass is built on firstrun()via a cached factory, since a class can't subclassarcade.Windowuntil arcade is imported — and pyglet's audio driver is forced tosilentbefore that import (an absent sound server otherwise stalls it ~55 s). It needs a real GL context (a display), where the pure-software pygame path does not. -
web_uirenders the session in a browser tab — the samerun(...)contract, over HTTP + a WebSocket (thewebextra). A stdlibhttp.serverserves one inlined HTML/JS page;websockets(its synchronous server — no asyncio) carries the live connection. Per client, one handler thread runs a poll loop:recvkeystrokes (short timeout) and push the latest frame when the grid is dirty, so only that thread sends and the source thread just flips a flag. Frames arestyle.runs()per row (RLE, hex colors) and the browser is a thin canvas painter — not an emulator, exactly like the other renderers (tappty's model already emulated). Keystrokes arrive as logical keys (a char, or"up"/"enter"/"ctrl-c"); the server translates them viatappty.keysand routes tosend_key/feed_key, so byte-mapping stays in one place. Each clientclaim_controls its own name (the talking stick arbitrates several browsers). Loopback-bound with an optionaltoken(§8); it's a control plane, no TLS.
An embedded-widget or arcade-cabinet renderer would be the same shape; nothing else needs to know.
2.6 The compositor — compositor.py
Tiles several panels in one pygame window. A TerminalPanel draws over a pluggable
backing, and both backings present the same tiny interface — grid() (returns the snapshot
dict), feed_key, has_stick, toggle_stick, focus, close — so a panel doesn't care
where its bytes come from; local and remote sessions tile together:
SessionBacking— an in-process Session. A non-owning view:close()is a no-op (the session may outlive the panel or be driven elsewhere). The local operator types only while it explicitly holds the stick (toggle_stick, bound to F2);focus()never grabs control.BusBacking— a remote session over the bus socket. Subscribes forFRAMEsnapshots and forwards keystrokes asKEY; tracks the remote driver and adopts the server-assigned (possibly uniquified) name from theHELLOOK; takes an optionaltoken. It paces queued frames a few per tick instead of jumping to the newest, so a remote program's output scrolls in like a live terminal.
Rendering: draw_terminal paints each tile from the grid's styled cells runs in full SGR
color (the same per-cell color + bold/italic/underline/strike + blink the standalone GUI
renderers draw), so a remote BusBacking panel looks like the local session, color and all. A
shared DrawCtx caches a per-size glyph atlas (keyed by char + color + attributes) and the fit
font size, keyed on (tile size, grid size) so a non-80×24 cast/terminal fits correctly. Each tile supports mouse
pan + zoom (wheel zooms the font, left-drag pans, right-click resets to fit); the default
is the largest font at which the whole grid fits the tile. Keys route to the focused tile (the
talking stick, per tile); Tab cycles focus, Esc quits, fps is validated >= 1.
3. Concurrency & threading model
tappty is multithreaded but the rules are small and deliberate:
- One source thread runs the program.
_pump's reader (pty/pipe/ConPTY),EngineSource's runner, orCastSource's replay each run on a single daemon thread and callSession._output/_wait/_exit. Because_outputis called only from that one thread, the incremental decoder is accessed single-threaded — no lock needed there. - Terminal writes are serialized.
_output,_echo_local, andechotakeSession._lockaroundterm.write, and the Terminal additionally locks itself (anRLock). Reads (rows_text/snapshot/view_rows) take the Terminal lock. So the grid is always read and written consistently across the source thread, a renderer's main thread, and bus serve threads. - The bus has its own lock.
BusServer._lockguards_connsand_captures; each connection has a per-send lock. Serve threads call session methods (snapshot,send_input,take/release,feed_key,echo). - The talking-stick state is locked.
_controllers/drivermutations (claim_control/take/release/drop_controller) and the input gate (send_input/feed_key/send_keycheck who holds the stick and write under the sameSession._lock) are serialized, so a hand-off is atomic: a key is never delivered with a different controller recorded as driver, and the bus's check-and-claim of a controller name is one locked step. Still a low-contention path (one driver at a time, few controllers), not a high-throughput concurrent structure. - All worker threads are daemons. A renderer/
run_blockingjoins with a short timeout, but a stuck source thread never blocks process exit.
4. The tapterm program — cli.py
A thin front-end: build a Source, host it in a Session with a Terminal backend, hand the
Session to a renderer. The pieces it wires:
- Mode (mutually exclusive):
--cui(curses),--gui(pygame),--arcade(arcade/OpenGL),--web(browser over a websocket),--headless(run to completion, print the final screen — for scripting/CI). With no flag it picks GUI only when pygame is importable and a display is available (_display_available: native on Windows/macOS;DISPLAY/WAYLAND_DISPLAY/SDL_VIDEODRIVERon other POSIX), else CUI — so plaintaptermover SSH/cron falls back to CUI instead of failing in pygame. - Terminal backend (
_make_terminal): the VT52Terminal, orPyteTerminalfor--ansi(errors clearly ifpyteis missing). ANSI is auto-enabled on the Windows ConPTY path and for an interactive session (the regular-terminal default), since both want VT100+. - Source (
_make_source):PipeSourcefor--no-pty,ConPtySourceonos.name == "nt", elsePtySource; or, for--play, one of the replay sources viareplay_source(path)(.cast/.ttyrec/.ans/.3a, with--speed/--loop). - Other flags:
--cols/--rows(or-geometry),--cooked(line-oriented instrument mode) and the xterm-style-e/-T/-cd/-hold,--raw,--record/--render,--snapshot,--exit-when-done,--title. With no command it hosts$SHELLas a real terminal. - Exit code:
--headlessreturns the child's exit status (source.returncode or 0), so it is honest in CI rather than always 0.
5. Module map
| Module | Role | Deps |
|---|---|---|
terminal.py |
the fixed-size VT52 character grid model | none |
pyte_terminal.py |
PyteTerminal — full-ANSI/VT100+ backend, drop-in for Terminal |
pyte (deferred; ansi extra) |
style.py |
Cell + the ANSI palette / rgb/resolve/runs shared by cells() and the GUI renderers |
none |
keys.py |
VT/xterm key sequences (KEYS, ctrl) for raw-mode TUI input, shared by the renderers |
none |
source.py |
Source base (+ _pump) and the 8 sources (pty / engine / cast / ttyrec / ans / 3a / pipe / ConPTY) |
stdlib pty/subprocess/json; pywinpty (deferred; win extra) |
recorder.py |
Recorder (record a session to .cast/.ttyrec) + export_ansi/export_3a screen exports |
none |
video.py |
render_video — a recording → .mp4/.webm/.gif |
pygame + pyte (deferred); ffmpeg (video extra) |
session.py |
Session: observe taps, control, talking stick, the bytes↔chars decode | terminal, source |
bus.py |
BusServer / BusClient — the contract over a unix socket or TCP |
stdlib socket/hmac |
compositor.py |
multi-panel window + SessionBacking/BusBacking |
pygame (deferred), curses_ui, bus |
curses_ui.py |
CUI renderer + the pure viewport() |
stdlib curses (deferred) |
pygame_ui.py |
GUI renderer | pygame (deferred) |
arcade_ui.py |
GUI renderer (arcade/pyglet/OpenGL twin of pygame_ui) |
arcade (deferred; gl extra) |
web_ui.py |
browser renderer over HTTP + a WebSocket | stdlib http.server; websockets (deferred; web extra) |
cli.py |
the tapterm program |
session, terminal, source |
__init__.py |
public API | — |
Optional deps are deferred — pygame, arcade, websockets, curses, pyte, and
pywinpty are imported inside the functions/constructors that need them, never at module top.
So import tappty works with none of them installed (verified under a bare interpreter), and
tapterm --cui / --headless need no display. The extras: sdl = pygame-ce, gl = arcade
(arcade_ui), web = websockets (web_ui), video = imageio-ffmpeg (render_video without a
system ffmpeg), ansi = pyte (PyteTerminal), win = pywinpty + windows-curses (ConPtySource
and the curses CUI on Windows; both Windows-marked), dev = pytest + ruff.
6. What is not here (the boundary)
tappty is the generic instrument; application-specific behavior lives on top of its seams, not in the core:
- A custom panel — e.g. an app-specific dashboard view — is just another compositor panel
kind.run()dispatches panels by their.kind, so the compositor stays fully generic. - An application or bot driver is just an
EngineSource(or a bus client) wrapping that program's logic.
So a consumer can layer either on top without the core depending on it. The library ships the generic seams — Source, Terminal, Session, the bus, the renderers, the compositor — and the specific things plug into them.
7. Testing
The suite (the GUI and ANSI tests skip cleanly without their optional deps) exercises the model and the contract through real paths, not mocks:
- Model:
test_term(VT52 cursor/scroll, the escapes, scrollback bounds),test_pyte_terminal(SGR/cursor-address/erase/Unicode/scrollback; skips without pyte). - Session:
test_session_bus,test_talking_stick,test_session_echo(taps, control, the stick, local-echo frames). - Bus:
test_bus_socket(round-trip over a real socket + lifecycle: tap-unsubscribe, client drop, restart, capture-wake),test_bus_tcp(TCP transport),test_bus_cmd(synchronous CMD),test_bus_security(token auth, loopback-only bind, newline injection, safe socket unlink, capture cap, malformed/non-stringHELLO). - Sources:
test_pty_source(a real subprocess on a pty; POSIX-skip on Windows),test_pipe_source(plain pipes + thatConPtySourceis import-guarded),test_source_encoding(the bytes/characters split, thelatin-1knob, partial-multibyte flush at EOF),test_cast_source(v1/v2 replay, timing, clamps,stop()). - Error paths:
test_error_handling(child exit-code propagation, observer-failure isolation, runner-error re-raise, CMD timeout, pty spawn cleanup, fps/dimension validation, headless display default, stopping a blocked engine runner). - Pure math:
test_curses_viewport,test_compositor_view. - Renderers:
test_compositor_backings(the panel-backing data contract);test_gui_smokedrives the realpygame_ui.runand the compositor to completion under the SDLdummydriver (no display), so every blit/draw/flip executes — deterministically, via aCastSourcereplay. It runs whereverpygameis installed and skips cleanly where it isn't, so it never breaks the pygame-free path. (Caveat: that clean skip cuts both ways — droppingpygame-cefrom CI would silently turn this test green-by-skipping, not red.)
What is still by-eye only is pixel-level fidelity (does it look right) — the smoke test proves
the draw path runs and the right grid data reaches it, not that the phosphor is pretty. For the
by-eye pass: under WSLg (DISPLAY/WAYLAND_DISPLAY set, SDL auto-picks x11) both renderers
draw correctly, tapterm --cast rec.cast --gui is the easiest reproducible visual check, and
--snapshot writes a reviewable PNG (harmless EGL/MESA warnings on stderr are failed
hardware-GL probing — SDL falls back to software). The Windows ConPtySource path is unverified
(no Windows runner; finishing it is open work — see §11).
8. Security / trust model
The bus is a terminal control plane: a connected client can read the screen and, holding
the talking stick, inject input — i.e. terminal read/write as the tappty user. The web_ui
renderer is the same kind of plane over HTTP + a WebSocket, and takes the same posture (next
paragraph). Both are trusted-local, not a boundary against a hostile network. The defenses:
- Unix socket (default-safe): the socket file is
0600, so only the owner UID can connect — that file mode is the actual connect gate, the auth. When tappty creates the parent directory it also makes it0700(defense-in-depth); if you point the path at an existing shared directory its permissions are left untouched, so the0600socket file is what protects you there. - TCP: bound to loopback only; binding a non-loopback host raises unless
allow_remote=Trueis passed explicitly. Loopback isn't user-isolated, so on a shared box set atoken. token(optional): a non-empty string shared secret presented inHELLO(constant-time compared viahmac; an empty token is rejected at construction, and a non-string token can't match); a client without it is denied and dropped. It's a casual gate sent in the clear — not transport security; tunnel over SSH/TLS for an untrusted network.- Input validation:
HELLOmust be a JSON object (invalid JSON or a non-object is denied, not silently accepted as an anonymous observer);KEYmust decode to a JSON string;BusClient.sendrejects newline-injected frames. - DoS bounds: protocol frames are read with a size cap (
MAX_FRAME) on both ends;CMDcaptures are byte-bounded (MAX_CAPTURE, with atruncatedflag). - Untrusted
.castfiles: width/height are clamped, v2 line reads are byte-bounded, and the unstreamable v1 whole-file load is refused aboveMAX_CAST_FILE— so a malicious recording can't drive a huge grid allocation or load an unbounded file. web_ui(the browser renderer): binds loopback only by default; an optional non-emptytoken(a WebSocket query param, constant-time compared viahmac) gates connections. Same cleartext-gate caveat as the bustoken— it's not transport security; put it behind an SSH tunnel / a TLS proxy for an untrusted network. No TLS of its own.
Not in scope as bugs: the subprocess path launches argv with shell=False (no shell
injection), and --snapshot writes exactly where the user asked (user-directed output, not a
privilege boundary). Full transport auth (TLS/mTLS) is intentionally out of scope for this
toolkit; the recommendation is a private Unix socket, or loopback+token behind an SSH tunnel.
9. Known limitations (deliberate, not bugs)
Conscious scope choices, recorded so they aren't mistaken for defects:
- Full SGR color + attributes, everywhere — including over the bus. The
cells()API exposes per-cell SGR — fg/bg, bold, italic, underline, strikethrough, blink, reverse — and all four renderers draw them: the GUI backends (pygame_ui,arcade_ui,web_ui) via the font (bold/italic weight) plus drawn underline/strike rules, a blink phase, and bg fills; and the curses CUI via color pairs +A_BOLD/A_ITALIC/A_UNDERLINE/A_BLINK/A_REVERSEwhere the terminal supports them (256/truecolor approximates to the nearest ANSI-16; a colorless terminal falls back to its default foreground). Bold also brightens a named color (a deliberate extra, on top of the heavier weight). Not representable: SGRfaint(2),rapid-blink(6), andconceal(8) — pyte doesn't model them; and curses has no strikethrough attribute, sostrikeis dropped in the CUI only. The bus carries color too:snapshot()/FRAMEincludes styledcells, so a remoteBusBackingpanel renders in full color, not just text (MAX_FRAMEwas raised to 256 KiB to fit a styled frame). - Wide glyphs yes, grapheme clusters no. Wide characters — CJK and single-code-point emoji
(👍 🔥 ✅) — render at their true two columns: the full-ANSI
PyteTerminallays each one out as the glyph plus an empty continuation cell, so the grid stays rectangular and the GUI renderers (which place text by column) land neighbors correctly. The curses CUI drops that continuation cell (style.char_width+_continuations) so ncurses' own two-column advance for the wide glyph lines up instead of doubling it; it sets the locale so ncursesw is in multibyte mode (a non-UTF-8 locale degrades to single-width, never corruption). Out of reach: grapheme clusters — ZWJ emoji families (👨👩👧), flags (🇺🇸), and skin-tone modifiers (👋🏽) are multiple code points meant to fuse into one glyph; pyte splits or collapses them upstream (the family becomes just 👨, the flag two letter-boxes), before tappty ever sees a cell, so there is nothing here to widen. Supporting them would mean a grapheme-segmenting text path that overrides pyte's per-cell processing — deliberately not done. - Two input modes. The
Sessionitself defaults to line-oriented (raw_keys=False, the toolkit's heritage): renderers forward Enter/Backspace/printable text with local echo + a line buffer, sending on Enter, and arrows/function keys are ignored — right for in-process line-readers (EngineSource). (tapterm, though, flips an interactive session to raw by default — the regular-terminal behavior — and exposes the line mode as--cooked.) Raw mode (Session.raw_keys,tapterm --raw) instead forwards every keystroke straight to the program with no echo or buffering, translating special keys to VT sequences (tappty.keys) — so a full-screen TUI (vim, htop) on a pty works, the program handling its own echo and redraw. The CUI additionally switches tocurses.raw()so Ctrl-C/Z/\ reach the program. Normal cursor-key sequences only; DECCKM application-cursor mode is a future refinement. - Frame fan-out is per-chunk. A subscribed bus client gets a full screen snapshot on every output chunk. Fine at the default 80×24; a busy remote dashboard would want frame coalescing / rate-limiting (send the latest at a bounded tick, drop stale frames).
BusClientis single-consumer.wait_for()drains the inbox until a matching verb, discarding others; it's for one request/reply at a time before subscribing, not concurrent callers or overlapping requests. A subscriber must draininbox(it is unbounded).EngineSource.stop()can't interrupt a busy runner. It unblocks a runner waiting inreadline(), but a runner in a compute/sleep loop can't be force-stopped (you can't safely interrupt arbitrary in-process Python). Its thread is a daemon, so it never blocks exit.- Windows is provisional. The platform-bound surface is small and isolated to the Source
seam — proof of the §1 claim. Only
PtySource(POSIXpty/termios/fcntl) and the Unix-socket bus transport are POSIX-specific; the core (Terminal/Session/talking-stick),EngineSource,CastSource,PipeSource, the renderers, and the TCP bus are already cross-platform. So Windows hosting needs only a Windows Source:ConPtySource(ConPTY via pywinpty, thewinextra) exists andcli.pyselects it — but it has never run on real Windows, so it's provisional until exercised. (The stdlib also lackscurseson Windows, so the CUI needswindows-curses— now bundled in thewinextra;curses_uiitself is already portable, since all color setup is guarded with a fallback.) Finishing Windows is the remaining open work: exercise it on a real box (pip install -e '.[ansi,win]', thentapterm --ansi -- cmd/powershelland--cui), confirming the pywinpty details coded from docs — does.read()raiseEOFErrorat child exit, isdimensionsrow-major(rows, cols), does.write()wantstr, what does.wait()return for the exit status — then add awindows-latestCI lane (the pty tests alreadyskipif(os.name=="nt"); a few POSIX-shell tests would need guarding), broaden the POSIX-onlyOperating Systemclassifiers, and flip this and the README's Windows wording from provisional to verified.
10. Provenance — how this design was found
tappty was not designed in the abstract; it was discovered by building a real instrumented terminal against a hard target: many consumers — a human at the screen, an AI, a logger, a dashboard — all needing to watch and take turns driving the same live sessions. Driving those consumers ad hoc, by hand, was the problem; the idea was to externalize that once so every consumer attaches the same way. The shorthand that framed it: "expect + tmux + asciinema, for AIs" — a programmable, observable, shareable terminal. §1's "every consumer is equal" is the distilled result; this section records the reasoning that produced it, for anyone changing the contracts.
Build it against a real need first, generalize later — deliberately, to avoid
pre-generalizing a tool nobody had used yet. The rule was to keep the seams clean so only the
in-process EngineSource was ever application-coupled; the bus, the Terminal/grid, the pty
source, and the talking stick were generic from the start. (What stays generic is §6.)
The three load-bearing decisions (each chosen for a reason, not for symmetry):
- One observe/control contract, with pluggable transport. Build the interface once;
let in-process renderers use a direct call and out-of-process clients use a socket — same
semantics either way. The three taps are deliberately different views, not redundant:
the raw stream is lossless and temporal (it catches transient or scrolled-away output,
bells, exact ordering — for loggers and replay); the grid is spatial and rendered once
then shared, so N clients don't each run an emulator; events carry the turn boundary
(
WAIT= "your turn"). An automated driver was already implicitly using the grid + events; formalizing the contract just added the raw-stream tap and a name. - Scope is a Source seam. Everything downstream of the source — Terminal, the taps,
control, arbitration, renderers — is identical no matter where the bytes come from, so a new
producer is the only thing a new use-case adds. The full-ANSI grid (
PyteTerminal) was deliberately deferred behind this same seam until a non-VT52 program actually needed it, rather than built speculatively up front. - Arbitration is a talking stick — a single driver token, with you-privileged preemption: a human can always take the stick from an AI, and an AI can never preempt a human, so a runaway bot can't lock the operator out ("watch it fly, grab the stick, hand it back"). The stick auto-releases on disconnect/death, the same liveness rule that keeps a session from getting stuck uncontrollable.
Dogfooding shaped the contract. Several parts of the protocol exist because using it exposed the need, not because they were specified up front:
- Injected-input echo (
Session.echo, the busLINEecho) — driving the game through the bus in a watched window, a spectator couldn't see the AI's commands, because injected input isn't echoed like a local keypress. So the driver echoes a command (when it holds the stick) before sending it. - The synchronous
CMDprimitive — every automated driver was hand-rolling the same "send a line, drain output until the nextWAIT" loop, so it became one verb. - An
INFOwaitingflag — a late-attaching controller couldn't tell it was already its turn (the firstWAITfired before it connected). - The explicit stick toggle (the compositor's
F2) — an earlier "last-to-act-wins" rule let a stray click or keystroke silently flip control, which the operator disliked; typing now never grabs the stick.
A scaling lesson shaped the compositor. Running the session engines inside the render
process starved the GIL and slowed frames. The fix is why the compositor talks to backings
rather than hosting sessions itself: real sessions run in separate processes, and the window
is a bus client that only draws FRAME snapshots and forwards input (BusBacking) — no
interpreting in the render loop. SessionBacking is the in-process special case of the same
shape.
Rendering avoids "curses-in-curses." The grid model does two separable jobs: emulate-to-observe (a headless screen an automated client reads — always safe) and emulate-to-display (render for a human). Keeping them separate, plus sealing the program in a fixed 80×24 model with a render-side viewport (§1), is what lets a renderer resize freely without the hosted program ever seeing it.
11. Open work
What's left before (and just after) the first release:
- Publish to PyPI. Both distributions (
tapptyand thetaptermalias) build and aretwine check-clean; the remaining step is the upload itself (twine upload dist/*). - Verify Windows. The ConPTY host (
ConPtySource, thewinextra) and thewindows-cursesCUI are implemented against the documented APIs but not yet exercised on a real Windows box (there's no Windows CI runner) — treat Windows as provisional. Pair--ansiwith the ConPTY path (it emits VT100+);--no-pty(PipeSource) + the TCP bus are the cross-platform fallback.