Skip to content

Cluster-Wide Scheduler

DifficultyAdvanced
Team Size3-5 people
Time~40-50 hours
Demo-ready byStep 5
PrerequisitesNode.js or Go, Docker, distributed systems concepts
Built byKubernetes scheduler, Nomad, Mesos, YARN

Skills you'll earn: Bin-packing algorithms, constraint solving, health checks, affinity/anti-affinity rules, cluster state management

Start by assigning a task to a node from a list. End with a scheduler that places workloads across a cluster respecting constraints, affinity rules, and health checks.

Step 1: Assign a task to a node (~2-3 hours)

The simplest scheduler: pick a node, run the thing.

  • Maintain a list of nodes (hardcoded for now) with their capacity (CPU, memory)
  • Accept a task request: { image, cpu, memory }
  • Pick the first node with enough free resources
  • Call the Docker API on that node to start the container
  • Update the node's available resources

You now have: Static task placement.

Step 2: Node registration and heartbeats (~4-5 hours)

Hardcoded node lists break when nodes join or leave.

  • Nodes register themselves with the scheduler on startup (HTTP POST with their capacity)
  • Nodes send heartbeats every 10 seconds
  • If a heartbeat is missed for 30 seconds, mark the node as unhealthy
  • Remove nodes that have been unhealthy for 5 minutes

You now have: Dynamic node discovery.

Step 3: Scheduling strategies (~4-5 hours)

First-fit is naive. Different workloads need different placement.

  • Bin-packing: fill nodes as tightly as possible (minimize active nodes)
  • Spread: distribute tasks evenly across nodes (maximize resilience)
  • Let the user pick a strategy per task or set a cluster default
  • Score each node and pick the best one, not just the first fit

You now have: Intelligent placement.

Step 4: Constraints and affinity (~4-5 hours)

Some tasks must run on specific nodes or avoid others.

  • Node labels: gpu=true, zone=us-east, disk=ssd
  • Task constraints: requires: gpu=true (hard constraint — must match)
  • Affinity: prefers: zone=us-east (soft constraint — prefer but don't require)
  • Anti-affinity: avoidColocating: service=database (don't place two databases on the same node)

You now have: Constraint-based scheduling.

Step 5: Health checks and rescheduling (~4-5 hours)

A container is running but the app inside crashed.

  • Define health checks per task: HTTP endpoint, TCP port, or command
  • Scheduler polls health checks every N seconds
  • If a task fails its health check, restart it on the same node
  • If the node itself dies, reschedule all its tasks to healthy nodes

You now have: Self-healing workloads.

Step 6: Resource reclamation and preemption (~4-5 hours)

  • Reclaim resources from completed or failed tasks immediately
  • Priority levels: critical tasks can preempt low-priority ones when the cluster is full
  • Preempted tasks are re-queued and rescheduled when resources free up

Step 7: API and dashboard (~4-5 hours)

  • REST API: submit tasks, list running tasks, get node status, drain a node
  • Dashboard: cluster overview, per-node utilization, task placement map
  • Drain mode: mark a node for maintenance, migrate its tasks elsewhere

Useful Resources

Where to go from here

  • Multi-region scheduling (cross-datacenter placement)
  • Resource over-commit with burst limits
  • Job scheduling (one-off tasks vs. long-running services)
  • Gang scheduling (all-or-nothing placement for multi-container tasks)
  • Integration with a service mesh for traffic management