Building Robot Minds

A small wheeled robot at the center of a glowing circuit-board landscape with holographic layers

A practical guide to building LLM-powered physical agents. Accompanying the LLMos project.


The Question This Book Answers

What happens when you give a robot a language model as its brain — not as a chatbot bolted onto a rover, but as the actual kernel that perceives, reasons, plans, and acts in the physical world?

This book answers that question by building the entire system: from the philosophical thesis through the TypeScript implementation to an 8cm cube robot driving across a desk under LLM control. Every code snippet comes from the actual codebase. Every test count reflects real output. Every architecture diagram reflects real module boundaries.

You do not need a robot to follow along. The entire navigation stack runs in simulation with a mock LLM. No GPU, no API key, no hardware.

npm install && npx tsx scripts/run-navigation.ts --all --verbose

Choose Your Path

“I want to understand the philosophy.” Start with Chapter 1: The Thesis and Chapter 2: Two Brains.

“I want to see the navigation system.” Jump to Chapter 3: The World Model and Chapter 4: The Navigation Loop.

“I want to run something right now.” Go to Chapter 13: Getting Started — clone to running demo in 5 commands.

“I have hardware on my desk.” Read Chapter 7: The HAL, then Chapter 15: V1 Hardware Deployment.

“I want to understand how the robot learns.” Start with Chapter 8: Agents and Skills, then Chapter 11: The Evolution Engine.

“I want to see where this is going — LLM as compiler.” Read Chapter 16: The Neural Compiler — the path from JSON to native machine code.


The Book

This book is organized in three parts — like the layers of a mind.

Prologue: The Thesis

# Chapter The Question It Answers
1 The Thesis: LLM as Kernel Why put a language model in charge of a robot?
2 Two Brains: Development and Runtime Why does LLMos need two different LLMs?

Book I: The Body (The Somatic Layer)

How the robot touches the physical world — its eyes, muscles, and nervous system.

# Chapter The Question It Answers
6 Eyes: Camera to Grid How does a camera frame become an occupancy grid cell?
7 The Nervous System: One Interface, Two Worlds How does the same code drive a simulation and a real robot?
15 The Chassis: From Code to Robot How do you assemble, calibrate, and deploy the V1 Stepper Cube?

Book II: The Brain (The Cognitive Layer)

How the robot thinks — spatial memory, the decision loop, instinct vs. planning, and the ability to predict what it hasn’t seen yet.

# Chapter The Question It Answers
3 Spatial Memory: How a Robot Thinks About Space How does a 50x50 grid represent an entire room?
4 The Thought Cycle: 13 Steps from Sensor to Motor What happens in a single navigation cycle?
5 Instinct vs. Deliberation When should the robot react instantly vs. think carefully?
9 Imagination: Thinking Before Acting How does the robot predict unseen space?

Book III: The Soul (The Evolutionary Layer)

How the robot learns, cooperates, and evolves — its identity as a markdown file, its social behavior in a fleet, and the mutation engine that makes it better.

# Chapter The Question It Answers
8 Identity: Agents, Skills, and the Markdown OS What does it mean for a program to be a markdown file?
10 Social Behavior: Multiple Robots, One World How do robots share what they know?
11 Dreams: The Evolution Engine How does a robot improve while it sleeps?

The North Star: LLM as Compiler

# Chapter The Question It Answers
16 The Neural Compiler: From JSON to Machine Code How does an LLM learn to speak directly to silicon?

The three epochs: JSON strings (now) → 6-byte bytecode (next) → native Xtensa machine code (goal).


Appendices: Practice

# Chapter What You Get
12 346+ Tests: Proving the System Works The testing philosophy — how every layer is validated without hardware
13 Getting Started: Your First 10 Minutes Clone to running demo in 5 commands
14 What’s Next: From Research to Reality The roadmap from simulation to physical deployment

Key Numbers

Metric Value
Tests 346+ across 21 suites
Navigation criteria (live LLM) 6/6 passed
World model grid 50x50 cells at 10cm resolution
Navigation cycle time ~1.4s/cycle (cloud), ~200ms (local)
Action types 5 (MOVE_TO, EXPLORE, ROTATE_TO, FOLLOW_WALL, STOP)
HAL subsystems 5 (locomotion, vision, manipulation, communication, safety)
Test arenas 4 predefined environments
V1 Robot 8cm cube, 6cm wheels, ~4.7 cm/s max
V1 Motor precision 4096 steps/revolution

Prerequisites

To read the book and run simulations:

  • Node.js 18+ and npm
  • TypeScript familiarity
  • Git

To use real LLM inference (optional):

  • OpenRouter API key, or
  • GPU with 8GB+ VRAM for local Qwen3-VL-8B

To build the physical robot (Chapter 15):

  • ESP32-S3-DevKitC-1 (motor controller)
  • ESP32-CAM AI-Thinker (camera)
  • 2x 28BYJ-48 steppers + 2x ULN2003 drivers
  • 8cm 3D-printed chassis
  • 5V 2A USB-C power supply

About

This is a living document. As the codebase evolves, the book is updated to match. Every file path is relative to the repository root — when a chapter says lib/runtime/navigation-loop.ts, you can open that file and see the exact code being discussed.

Built by Evolving Agents Labs.


Start reading: Chapter 1 – The Thesis: LLM as Kernel