Part 50: Pictures, Videos, Documents

Every website Sentinel built was synthetic. The text was generated. The CSS was generated. The images, if there were any, were placeholder boxes or inline SVGs. The system could create surprisingly functional web applications, but they existed in an artificial bubble — no photographs, no video, no documents from the real world. If you asked for a portfolio site with your photos, the best it could do was describe where the photos would go.

The attachment pipeline changes this. It’s a system for receiving binary files from any messaging channel — photos from a phone, videos from a camera, PDFs from a scanner — and making them available as materials the planner can work with. Send a message with a photo attached, and Sentinel can embed that actual photo in a website it’s building. Send a video, and it can wire it into an HTML5 video player. The gap between “build me a website” and “build me a website with these specific images and this video” is now closed.

The pipeline is channel-agnostic. Each messaging channel — Telegram, email, and now Matrix — has its own way of representing attachments. Telegram sends file IDs that need to be fetched via their bot API. Email has MIME multipart bodies with base64-encoded parts. But the differences stop at the channel adapter. Each adapter extracts the raw bytes and metadata, then passes them to a shared AttachmentIngester that handles everything from there.

The ingester’s first job is validation. A strict MIME type allowlist controls what gets through: common image formats (JPEG, PNG, GIF, WebP), video (MP4, WebM), audio (MP3, WAV, OGG), and documents (PDF, plain text). SVG is explicitly excluded — it can contain embedded JavaScript, making it a script injection vector. A 50MB size limit catches oversized files before they consume disk space. Anything that doesn’t match the allowlist is logged and rejected.

Filename sanitisation is more involved than it might seem. Attachments arrive with filenames set by the sender’s device, which means they can contain path traversal sequences (../../etc/passwd), null bytes, Unicode oddities, or just be absurdly long. The sanitiser strips directory components, replaces unsafe characters, truncates to a reasonable length, and ensures the result is a clean, flat filename. No filename from an external source is ever used as-is for filesystem operations.

Files are stored in a user-scoped directory structure: {workspace}/{user_id}/media/inbox/{attachment_id}/{filename}. The attachment ID is a UUID, so even if two users upload files with identical names, there’s no collision. The user-scoping aligns with the multi-user workspace isolation model — each user’s media is in their own directory tree, and Row-Level Security in Postgres ensures the metadata is similarly isolated.

The MediaStore handles persistence. Metadata — content type, size, SHA-256 hash, original filename, channel source — goes into Postgres with RLS scoping by user ID. The store supports both Postgres and an in-memory fallback dictionary for testing. The content hash enables deduplication detection, though the current implementation stores every upload independently.

When attachments arrive with a message, their metadata is formatted into an [ATTACHMENTS] block in the planner’s context. The planner sees the file paths, content types, and original filenames. When it builds a plan that involves creating a website, it can reference those paths directly — an <img> tag pointing to the uploaded photo, a <video> element pointing to the uploaded MP4. The files are already on disk in the workspace; the planner just needs to know where they are.

The email channel was built alongside the attachment pipeline. It’s a receive-only channel — an IMAP poller that checks for new emails on a configurable interval, extracts attachments, and feeds them through the ingester. Text-only emails are ignored; the existing email tools handle interactive email operations. The email channel is specifically for “send Sentinel a photo by email and it becomes available in the workspace.” A sender allowlist prevents random emails from being processed.

During testing, the real impact became clear. Sending a photo and a short video clip, then asking Sentinel to build a simple showcase page, produced a working website with the actual photo embedded and the actual video playing — within a couple of minutes. No manual file copying, no path configuration, no asset pipeline setup. The attachments arrived, the planner saw them, the code referenced them, and they worked.

This is a milestone that’s easy to understate. Sentinel went from generating purely synthetic content to being able to incorporate real-world materials — photographs, video footage, documents, audio. The websites it builds can now contain actual content, not just LLM-generated text and placeholder elements. It’s the difference between a tool that demonstrates what a website could look like and a tool that builds the website you actually want, with your actual content.