Export Format Specification v0.9.
This document describes the structure of the export bundle a TellSophie account holder receives when they exercise the on-demand export right. Version 0.9 is provisional: the schema is locked, but the implementation has not yet been exercised. Version 1.0 publishes alongside the iOS app launch.
We publish this before launch because the credibility of the Trust page commitment depends on it. Any future tool, service, or self-hosted player can read TellSophie collections without our cooperation by following this spec.
Bundle structure.
The export is a single .zip file. Top-level layout:
manifest.json— required. Bundle metadata, schema version, child id, timezone, and a content listing.recordings/— directory of media files, one per recording. Filenames match the id field in the per-recording sidecar.recordings/<id>.json— sidecar JSON for each media file. Stored alongside the media (not in a separate sidecars/ directory) so a tool can match a sidecar to its media by base filename.transcripts/— directory of plaintext transcripts, one per recording with a transcript. Same id-based filename convention.README.txt— human-readable note explaining what the bundle is and how to read it.
manifest.json.
Top-level fields, all required unless noted:
schema_version: string. The export-format spec version this bundle conforms to. v0.9 emits"0.9".bundle_id: string. ULID. Unique per export.exported_at: string. ISO 8601 UTC timestamp.child_display_name: string. The display name as configured by the account holder at export time.child_id: string. ULID. Stable across exports.timezone: string. IANA tz database name (e.g.America/Los_Angeles).recording_count: integer.recordings: array of objects withid,created_at,duration_ms,format,contributor_display_name, andprompt_idfields. The full per-recording detail lives in the sidecar JSON; this array is a fast-index for tools that want to enumerate without opening each sidecar.
Per-recording sidecar.
recordings/<id>.json carries:
id: string. Matches the media filename.created_at: string. ISO 8601 UTC.duration_ms: integer.format: string. One ofaudio/m4a,video/mp4. Audio uses AAC at 128kbps; video uses H.264 at the device’s native resolution capped at 1080p.contributor_display_name: string.contributor_id: string. ULID. Stable across the contributor’s recordings.prompt_id: string. The prompt the recording responded to. Stable IDs published in a separate prompt-catalog version that increments alongside the schema.prompt_text: string. The prompt text rendered to the contributor at recording time. Captured here for archival even thoughprompt_idis the stable reference.transcript_path: string or null. Path within the bundle relative to the bundle root.
Standards used.
- Container format: ZIP per APPNOTE.txt 6.3.x.
- Audio encoding: AAC-LC, 128 kbps, 44.1 kHz, in MPEG-4 container (
.m4a). - Video encoding: H.264 baseline profile up to 1080p, AAC audio track, in MPEG-4 container (
.mp4). - Transcripts: UTF-8 plaintext, no timestamps in v0.9 (timestamped transcripts target v1.0 alongside the iOS app launch).
- IDs: ULID per the ULID spec.
- Timestamps: ISO 8601 UTC with millisecond precision.
What changes from v0.9 to v1.0.
Version 1.0 will lock these additions:
transcript_segmentsarray on the sidecar with per-utterance timestamps.- A
signature.txtfile at the bundle root carrying a detached cryptographic signature overmanifest.jsonso a future tool can verify the bundle was emitted by TellSophie infrastructure (independent of the bundle’s file integrity). - A
prompts.jsoncatalog snapshot embedded in the bundle so a future tool does not need network access to resolveprompt_idreferences.
The schema fields above will not change between v0.9 and v1.0. Additions only.
Questions.
If you are building a tool against this spec and find an ambiguity, email privacy@tellsophie.com. We will publish clarifications on this page.