Building Robot Minds

A practical guide to building LLM-powered physical agents. Accompanying the LLMos project.
The Question This Book Answers
What happens when you give a robot a language model as its brain — not as a chatbot bolted onto a rover, but as the actual kernel that perceives, reasons, plans, and acts in the physical world?
This book answers that question by building the entire system: from the philosophical thesis through the TypeScript implementation to an 8cm cube robot driving across a desk under LLM control. Every code snippet comes from the actual codebase. Every test count reflects real output. Every architecture diagram reflects real module boundaries.
You do not need a robot to follow along. The entire navigation stack runs in simulation with a mock LLM. No GPU, no API key, no hardware.
npm install && npx tsx scripts/run-navigation.ts --all --verbose
Choose Your Path
“I want to understand the philosophy.” Start with Chapter 1: The Thesis and Chapter 2: Two Brains.
“I want to see the navigation system.” Jump to Chapter 3: The World Model and Chapter 4: The Navigation Loop.
“I want to run something right now.” Go to Chapter 13: Getting Started — clone to running demo in 5 commands.
“I have hardware on my desk.” Read Chapter 7: The HAL, then Chapter 15: V1 Hardware Deployment.
“I want to understand how the robot learns.” Start with Chapter 8: Agents and Skills, then Chapter 11: The Evolution Engine.
“I want to see where this is going — LLM as compiler.” Read Chapter 16: The Neural Compiler — the path from JSON to native machine code.
The Book
This book is organized in three parts — like the layers of a mind.
Prologue: The Thesis
| # | Chapter | The Question It Answers |
|---|---|---|
| 1 | The Thesis: LLM as Kernel | Why put a language model in charge of a robot? |
| 2 | Two Brains: Development and Runtime | Why does LLMos need two different LLMs? |
Book I: The Body (The Somatic Layer)
How the robot touches the physical world — its eyes, muscles, and nervous system.
| # | Chapter | The Question It Answers |
|---|---|---|
| 6 | Eyes: Camera to Grid | How does a camera frame become an occupancy grid cell? |
| 7 | The Nervous System: One Interface, Two Worlds | How does the same code drive a simulation and a real robot? |
| 15 | The Chassis: From Code to Robot | How do you assemble, calibrate, and deploy the V1 Stepper Cube? |
Book II: The Brain (The Cognitive Layer)
How the robot thinks — spatial memory, the decision loop, instinct vs. planning, and the ability to predict what it hasn’t seen yet.
| # | Chapter | The Question It Answers |
|---|---|---|
| 3 | Spatial Memory: How a Robot Thinks About Space | How does a 50x50 grid represent an entire room? |
| 4 | The Thought Cycle: 13 Steps from Sensor to Motor | What happens in a single navigation cycle? |
| 5 | Instinct vs. Deliberation | When should the robot react instantly vs. think carefully? |
| 9 | Imagination: Thinking Before Acting | How does the robot predict unseen space? |
Book III: The Soul (The Evolutionary Layer)
How the robot learns, cooperates, and evolves — its identity as a markdown file, its social behavior in a fleet, and the mutation engine that makes it better.
| # | Chapter | The Question It Answers |
|---|---|---|
| 8 | Identity: Agents, Skills, and the Markdown OS | What does it mean for a program to be a markdown file? |
| 10 | Social Behavior: Multiple Robots, One World | How do robots share what they know? |
| 11 | Dreams: The Evolution Engine | How does a robot improve while it sleeps? |
The North Star: LLM as Compiler
| # | Chapter | The Question It Answers |
|---|---|---|
| 16 | The Neural Compiler: From JSON to Machine Code | How does an LLM learn to speak directly to silicon? |
The three epochs: JSON strings (now) → 6-byte bytecode (next) → native Xtensa machine code (goal).
Appendices: Practice
| # | Chapter | What You Get |
|---|---|---|
| 12 | 346+ Tests: Proving the System Works | The testing philosophy — how every layer is validated without hardware |
| 13 | Getting Started: Your First 10 Minutes | Clone to running demo in 5 commands |
| 14 | What’s Next: From Research to Reality | The roadmap from simulation to physical deployment |
Key Numbers
| Metric | Value |
|---|---|
| Tests | 346+ across 21 suites |
| Navigation criteria (live LLM) | 6/6 passed |
| World model grid | 50x50 cells at 10cm resolution |
| Navigation cycle time | ~1.4s/cycle (cloud), ~200ms (local) |
| Action types | 5 (MOVE_TO, EXPLORE, ROTATE_TO, FOLLOW_WALL, STOP) |
| HAL subsystems | 5 (locomotion, vision, manipulation, communication, safety) |
| Test arenas | 4 predefined environments |
| V1 Robot | 8cm cube, 6cm wheels, ~4.7 cm/s max |
| V1 Motor precision | 4096 steps/revolution |
Prerequisites
To read the book and run simulations:
- Node.js 18+ and npm
- TypeScript familiarity
- Git
To use real LLM inference (optional):
- OpenRouter API key, or
- GPU with 8GB+ VRAM for local Qwen3-VL-8B
To build the physical robot (Chapter 15):
- ESP32-S3-DevKitC-1 (motor controller)
- ESP32-CAM AI-Thinker (camera)
- 2x 28BYJ-48 steppers + 2x ULN2003 drivers
- 8cm 3D-printed chassis
- 5V 2A USB-C power supply
About
This is a living document. As the codebase evolves, the book is updated to match.
Every file path is relative to the repository root — when a chapter says
lib/runtime/navigation-loop.ts, you can open that file and see the exact code
being discussed.
Built by Evolving Agents Labs.
Start reading: Chapter 1 – The Thesis: LLM as Kernel