Lab3 - Pipe & Filter and Master-Worker

Context¶
StreamForge is a SaaS startup building a platform that allows content creators — YouTubers, e-learning producers, and media companies — to upload, process, and distribute their video content at scale. The platform must handle hundreds of gigabytes of video uploads per day, apply complex multi-step processing pipelines to each file, and make the processed content available online within strict time limits. The engineering team has agreed on a high-level vision: processing pipelines must be modular and composable, compute resources must scale elastically with demand, and the system must tolerate worker failures without losing progress. Your task is to design the architecture that makes this vision a reality.
Functional Requirements¶
Ingestion¶
- A creator can upload a video file (up to 50 GB) from a web browser or via a REST API.
- Upon upload, the system performs an immediate pre-validation: supported format check, file integrity verification, and detection of prohibited content via an external moderation service.
- The creator receives real-time progress notifications throughout the processing pipeline: current stage, percentage complete, and any errors encountered.
- A creator can cancel an in-progress job at any point.
Processing Pipeline¶
Each uploaded video must pass through the following stages in order:
- Format validation — verify codec compatibility and file structure
- Metadata extraction — extract duration, resolution, frame rate, audio tracks, etc.
- Transcoding — encode to multiple resolutions: 240p, 480p, 720p, 1080p, 4K (available resolutions depend on the creator's subscription plan)
- Thumbnail generation — automatically extract representative frames
- Automatic subtitle extraction — speech-to-text processing to generate subtitle tracks
- Search indexing — index title, description, transcript, and metadata for full-text search
- CDN publication — push processed assets to the global content delivery network
Consider also the following constraints:
- Certain stages are optional depending on the creator's subscription plan (e.g., 4K transcoding and subtitle extraction are Premium-only features).
- A stage failure must not block independent downstream stages — only stages that depend on the failed one are suspended.
- Creators with a Custom plan can define their own pipeline: select output resolutions, enable or disable stages, and set processing priorities.
Distribution¶
- Processed videos are published to a global CDN and served with adaptive bitrate streaming (quality adjusts to viewer bandwidth).
- A creator can schedule a publication date and time for a processed video.
- Creators can replace an existing video with a new version without changing its public URL.
Monitoring & Administration¶
- A real-time dashboard allows administrators to monitor the status of all active processing jobs: stage, progress, assigned worker, estimated completion time.
- Administrators can pause, resume, or cancel any job.
- Automated alerts are triggered when a job exceeds a configurable processing time threshold.
- Usage metrics are collected per creator: number of videos processed, total compute time consumed, storage used.
Non-Functional Requirements & Constraints¶
- Compute intensity: Transcoding a one-hour 4K video on a single machine can take several hours. The system must parallelize this work across multiple workers.
- Throughput: The platform must be able to process several hundred videos concurrently without performance degradation.
- Resilience: If a worker crashes mid-processing, the job must resume from the last checkpoint — not from the beginning. No compute work should be lost.
- Priority: Premium creators must have their jobs processed before Free-tier creators. The scheduling strategy must reflect this contractual commitment.
- Cost efficiency: Compute resources must be allocated dynamically according to load. Idle workers during off-peak hours must be automatically released.
- Extensibility: Adding a new processing stage (e.g., automatic chapter detection, scene recognition) must not require modifying existing pipeline stages.
- Latency: The maximum acceptable delay between upload completion and online availability is 30 minutes for a standard HD video.
- This is StreamForge's first production system. The team expects frequent changes to pipeline stages and processing parameters as the product matures.
Architecture Diagram Requirements¶
You are required to represent your architecture at multiple levels of abstraction:
- High-level architecture diagram: show all modules, storage units, communication protocols, middlewares, queues, external services, and CDN. Every element must be labeled. Include a legend.
- Pipeline detail diagram: show the sequence of filters, the data flowing between them, and how workers are allocated to each filter.
- Worker lifecycle diagram: illustrate how a job is assigned, executed, checkpointed, and recovered after a worker failure.
You may use any architectural notation (UML, C4 Model, ArchiMate, or informal box-and-arrow diagrams) as long as all elements are explicitly labeled and a legend is provided. UML alone is not sufficient.
Questions¶
- Choose and justify the appropriate architecture for StreamForge. Explain precisely how the Pipe & Filter and Master-Worker patterns articulate in your solution, and what problem each one solves.
- Define each filter in your pipeline: its responsibility, its input contract, its output contract, and the conditions under which it can fail independently of the others.
- Propose and justify a job scheduling strategy that enforces priority between Free and Premium creators, while maximizing overall throughput.
- Explain your resilience strategy in detail: how are checkpoints implemented, how does the system detect a worker failure, and how is the job reassigned and resumed?
- How does your architecture support dynamic scaling of compute resources? What triggers the allocation or release of workers?
- Propose the key technologies for your solution (job scheduler, message broker, storage, CDN, transcoding engine) and justify each choice in relation to the architectural requirements.
Design Note
A creator uploads a 2-hour 4K film at 2 AM. Three workers are currently active. Walk through what happens from the moment the file lands on the server to the moment it is available online, including what happens when one of the three workers crashes halfway through transcoding.