Skip to content

Gesture-Based Home Automation

DifficultyAdvanced
Team Size2-3 people
Time~30-35 hours
Demo-ready byStep 4
PrerequisitesPython, MQTT basics, Raspberry Pi, basic electronics
Built byGoogle Nest, Amazon Alexa, Home Assistant, Apple HomeKit

Skills you'll earn: MQTT protocol, gesture recognition, IoT device control, relay circuits, real-time video processing

Start by toggling an LED with a hand gesture. End with a system where gestures detected by a camera control lights, fans, and appliances across your home via MQTT.

(Assumes basic Python and electronics knowledge. Includes learning MQTT, ESP32 programming, and building a web dashboard.)

Step 1: Toggle an LED with a gesture (~2-3 hours)

Prove the concept end to end before adding complexity.

  • Detect a hand gesture using a webcam and MediaPipe (open palm = on, fist = off)
  • Send a command over serial to an Arduino/ESP32
  • The microcontroller toggles an LED based on the command

You now have: Gesture to hardware, working.

Step 2: Add MQTT as the communication layer (~3-4 hours)

Serial works for one device on one machine. It doesn't scale to a house.

  • Set up an MQTT broker (Mosquitto)
  • The gesture detection script publishes to topics like home/living-room/light
  • The ESP32 subscribes to its topic and acts on messages
  • Test with mosquitto_pub and mosquitto_sub before wiring in gesture detection

You now have: Decoupled communication.

Step 3: Build a device registry (~3-4 hours)

Hardcoding topic names and device types is fragile.

  • Define a device model: id, name, room, type (switch/dimmer/sensor), topic
  • Store the registry in a JSON file or SQLite database
  • Gesture controller reads the registry to know what it can control
  • Add a simple CLI to register and list devices

You now have: A managed device inventory.

Step 4: Map gestures to device actions (~4-5 hours)

One gesture per device doesn't scale when you have ten devices.

  • Add a "select device" gesture (point and hold at a screen region mapped to a room)
  • Once a device is selected, gestures control it: open palm = on, fist = off, swipe = dim
  • Show a HUD overlay on the camera feed: selected device, current state
  • Confirm actions with a brief highlight on the HUD

You now have: Multi-device gesture control.

Step 5: Sensor feedback and state sync (~4-5 hours)

You send "turn on" but have no idea if the device actually turned on.

  • Devices publish their state back to MQTT (home/living-room/light/state)
  • Gesture controller subscribes to state topics and updates the HUD
  • Handle conflicts: if someone flips a physical switch, the state updates

You now have: Two-way state synchronization.

Step 6: Build a web dashboard (~3-4 hours)

  • Web UI showing all devices grouped by room
  • Toggle switches, sliders for dimmers, sensor readouts
  • Real-time updates via WebSocket connected to the MQTT broker

Step 7: Automation rules (~3-4 hours)

  • Define rules: "if motion sensor triggers after 10pm, turn on hallway light"
  • Simple rule engine: trigger (device + condition) → action (device + command)
  • Rules stored in the registry alongside devices

Useful Resources

Where to go from here

  • Voice control alongside gestures (STT → intent → MQTT)
  • Scene presets ("movie mode" dims lights and turns on the projector)
  • Energy monitoring and usage dashboards
  • Integration with commercial smart home devices (Zigbee, Z-Wave bridges)
  • Mobile app for remote control outside the home network