Skip to Content
TutorialsGridWorld · Start here

GridWorld — Start here

This is the tutorial to read first. By the end of it, you’ll have:

  • A fresh Vite + TypeScript project with IgnitionAI installed.
  • A GridWorldEnv class that implements the TrainingEnv interface from scratch.
  • A DQN agent training live in your browser and finding the shortest path from the top-left to the bottom-right of a 7×7 grid.
  • A visual indicator so you can see the agent’s position updating in real time.

Estimated time: 25–35 minutes on a machine that already has Node installed.

Prerequisites

  • Node.js 20 or later. Check with node --version.
  • npm (comes with Node) or pnpm.
  • A text editor. VS Code is what the screenshots use, but anything works.
  • Zero prior RL knowledge. We’ll explain what matters as we go.

You do not need TensorFlow, Python, CUDA, a GPU, or any prior ML experience.

Step 1 — Create a fresh Vite project

Open a terminal, pick a directory, and run:

npm create vite@latest gridworld-rl -- --template vanilla-ts cd gridworld-rl npm install

Vite gives you a minimal TypeScript project with a dev server. Now install the two IgnitionAI packages:

npm install @ignitionai/core @ignitionai/backend-tfjs

What to observe: package.json should now list both @ignitionai/core and @ignitionai/backend-tfjs under dependencies. If the install failed, you probably have Node < 20 — upgrade and retry.

Why this step exists: Vite is the fastest way to get a TypeScript project with hot reload in the browser. IgnitionAI’s TF.js backend needs a bundler that can serve ES modules, which Vite handles out of the box.

Step 2 — Write the GridWorldEnv class

Create a new file src/gridworld-env.ts and paste this in:

src/gridworld-env.ts
import type { TrainingEnv } from '@ignitionai/core' export class GridWorldEnv implements TrainingEnv { // Four actions: up, right, down, left actions = ['up', 'right', 'down', 'left'] // Agent starts at the top-left, target at the bottom-right agentRow = 0 agentCol = 0 readonly targetRow: number readonly targetCol: number readonly gridSize: number // How many steps the agent has taken in the current episode stepCount = 0 private readonly maxSteps = 100 constructor(gridSize = 7) { this.gridSize = gridSize this.targetRow = gridSize - 1 this.targetCol = gridSize - 1 } // Return the state the agent sees — normalized to [0, 1] observe(): number[] { const max = this.gridSize - 1 return [ this.agentRow / max, this.agentCol / max, this.targetRow / max, this.targetCol / max, ] } // Apply an action to move the agent step(action: number | number[]): void { const a = typeof action === 'number' ? action : action[0] switch (a) { case 0: this.agentRow = Math.max(0, this.agentRow - 1); break // up case 1: this.agentCol = Math.min(this.gridSize - 1, this.agentCol + 1); break // right case 2: this.agentRow = Math.min(this.gridSize - 1, this.agentRow + 1); break // down case 3: this.agentCol = Math.max(0, this.agentCol - 1); break // left } this.stepCount++ } // +10 for reaching the target, -0.1 per step otherwise (dense reward) reward(): number { if (this.agentRow === this.targetRow && this.agentCol === this.targetCol) return 10 return -0.1 } // Episode ends on success or timeout done(): boolean { if (this.agentRow === this.targetRow && this.agentCol === this.targetCol) return true return this.stepCount >= this.maxSteps } // Back to the top-left, fresh step counter reset(): void { this.agentRow = 0 this.agentCol = 0 this.stepCount = 0 } }

What to observe: This file is about 60 lines. That’s your entire game world. No training code, no neural network, no hyperparameters. Just a description of how the grid behaves.

Why this step exists: The TrainingEnv interface is the contract between your world and IgnitionAI’s training loop. The framework doesn’t care what’s in your step() method — it could be a grid, a physics simulation, a chess board, a custom renderer. As long as observe() returns numbers and done() eventually returns true, you’re good.

Design decisions in this file:

  • Normalized observations (dividing by max). Neural networks learn faster when inputs are in [0, 1] or [-1, 1]. This is the #1 rule from the Core Concepts page.
  • Dense reward (-0.1 per step instead of just +10 at the end). Pure “goal-only” rewards are painfully slow for DQN to learn from. Penalizing every step creates a gradient pointing toward “reach the goal quickly.”
  • Timeout via maxSteps. Without this, a wandering agent could run forever in the worst case. 100 steps is plenty for a 7×7 grid where the optimal path is 12 moves.

Step 3 — Wire up the training loop

Open src/main.ts (Vite created this for you) and replace its contents with:

src/main.ts
import { IgnitionEnvTFJS } from '@ignitionai/backend-tfjs' import { GridWorldEnv } from './gridworld-env' // Create the world and the trainer const world = new GridWorldEnv(7) const env = new IgnitionEnvTFJS(world) // Start training with DQN and sensible defaults env.train('dqn') // Turbo mode — 10× faster than real-time so we see results sooner env.setSpeed(10) // Log progress every 50 steps setInterval(() => { console.log( `Step ${env.stepCount}`, `Agent @ (${world.agentRow}, ${world.agentCol})`, ) }, 500)

What to observe: Eight lines of actual code (plus the logging). No neural network shape, no hyperparameters, no training loop — env.train('dqn') handles all of that.

Why this step exists: The framework’s “zero config” promise is visible here. IgnitionEnvTFJS inspected world.observe() to deduce the network’s input size (4 floats), read world.actions.length for the output size (4 actions), and built a small DQN agent with the defaults from the DQN page. You never touched any of that.

Step 4 — Run it

Start the Vite dev server:

npm run dev

Open the URL it prints (usually http://localhost:5173) and pop the devtools console. You’ll see log lines like:

Step 50 Agent @ (2, 3) Step 100 Agent @ (5, 1) Step 150 Agent @ (6, 6) Step 200 Agent @ (0, 0) Step 250 Agent @ (4, 6) ...

What to observe:

  • For the first few hundred steps, the agent is effectively random — epsilon-greedy starts at 100% random. The agent position jumps around unpredictably.
  • Somewhere around step 1000–2000 (a few seconds at setSpeed(10)), the agent starts consistently reaching the goal at (6, 6).
  • After that, the goal-reach events become rhythmic — the agent is finding near-optimal paths.

Why this step exists: Watching a trained agent is the payoff. You just built a custom RL environment and trained a neural network to solve it without writing any ML code. That’s the whole pitch.

Step 5 — Add a visual grid (optional but satisfying)

The console logs are fine, but a live grid is much more fun. Add a <canvas> to index.html:

index.html (body)
<canvas id="grid" width="350" height="350" style="border: 1px solid #334155"></canvas>

Then update src/main.ts to draw the grid every animation frame:

src/main.ts (additions)
const canvas = document.getElementById('grid') as HTMLCanvasElement const ctx = canvas.getContext('2d')! const cellSize = canvas.width / world.gridSize function draw() { ctx.fillStyle = '#0f172a' ctx.fillRect(0, 0, canvas.width, canvas.height) // grid lines ctx.strokeStyle = '#1e293b' for (let i = 0; i <= world.gridSize; i++) { ctx.beginPath() ctx.moveTo(i * cellSize, 0) ctx.lineTo(i * cellSize, canvas.height) ctx.stroke() ctx.beginPath() ctx.moveTo(0, i * cellSize) ctx.lineTo(canvas.width, i * cellSize) ctx.stroke() } // target ctx.fillStyle = '#A5B4FC' ctx.fillRect(world.targetCol * cellSize + 4, world.targetRow * cellSize + 4, cellSize - 8, cellSize - 8) // agent ctx.fillStyle = '#6366F1' ctx.beginPath() ctx.arc( world.agentCol * cellSize + cellSize / 2, world.agentRow * cellSize + cellSize / 2, cellSize / 3, 0, Math.PI * 2, ) ctx.fill() requestAnimationFrame(draw) } draw()

What to observe: A 7×7 grid with a pale blue square at the bottom-right (the target) and an indigo dot that jumps around randomly at first, then begins tracing diagonal paths to the target, then quickly locks into near-optimal L-shaped paths.

Why this step exists: This is the “training loop vs render loop” split from the R3F page in action. The draw() function runs on requestAnimationFrame and reads world.agentRow / world.agentCol at its own pace. The training loop runs on setTimeout and mutates those same fields at its own pace. They don’t interfere.

What you just built

A small but complete reinforcement learning setup:

  • A custom TrainingEnv with dense reward shaping and step-count timeout.
  • A DQN agent training with IgnitionAI’s defaults.
  • A live visualization of the agent’s behavior as it learns.
  • A concrete feel for what “decoupled training and render loops” means in practice.

Everything in this tutorial scales up to harder problems. If you swap GridWorldEnv for a MountainCarEnv or a custom physics sim, the rest of the code barely changes — env.train('dqn') still works.

Next steps

  • Try a different algorithm. Change env.train('dqn') to env.train('qtable'). On a 7×7 grid, tabular Q-learning converges almost instantly. Then try env.train('ppo') and watch it take longer — PPO is overkill here, and that’s the lesson. See Algorithms for which is which.

  • Break the reward. Change the reward to return 10 only when the goal is reached (remove the -0.1 per step). Retry. You’ll see DQN struggle — this is what “sparse reward” looks like, and it’s the single biggest reason agents fail to learn.

  • Make it harder. Bump the grid size to 15 or 20. You may need to bump maxSteps proportionally and give the network more capacity (env.train('dqn', { hiddenLayers: [64, 64] })).

  • Write your own env. Anything you can describe in those five methods, IgnitionAI can train an agent on. Read How it works → core for the full interface reference, then React Three Fiber if you want to put your env in a 3D scene.

  • Check the other tutorials. More are coming — see the Tutorials index for what’s on the roadmap.


Previous: ← Tutorials · Next: CartPole: custom observations →

Last updated on