GridWorld — Start here
This is the tutorial to read first. By the end of it, you’ll have:
- A fresh Vite + TypeScript project with IgnitionAI installed.
- A
GridWorldEnvclass that implements theTrainingEnvinterface from scratch. - A DQN agent training live in your browser and finding the shortest path from the top-left to the bottom-right of a 7×7 grid.
- A visual indicator so you can see the agent’s position updating in real time.
Estimated time: 25–35 minutes on a machine that already has Node installed.
Prerequisites
- Node.js 20 or later. Check with
node --version. - npm (comes with Node) or
pnpm. - A text editor. VS Code is what the screenshots use, but anything works.
- Zero prior RL knowledge. We’ll explain what matters as we go.
You do not need TensorFlow, Python, CUDA, a GPU, or any prior ML experience.
Step 1 — Create a fresh Vite project
Open a terminal, pick a directory, and run:
npm create vite@latest gridworld-rl -- --template vanilla-ts
cd gridworld-rl
npm installVite gives you a minimal TypeScript project with a dev server. Now install the two IgnitionAI packages:
npm install @ignitionai/core @ignitionai/backend-tfjsWhat to observe: package.json should now list both @ignitionai/core and @ignitionai/backend-tfjs under dependencies. If the install failed, you probably have Node < 20 — upgrade and retry.
Why this step exists: Vite is the fastest way to get a TypeScript project with hot reload in the browser. IgnitionAI’s TF.js backend needs a bundler that can serve ES modules, which Vite handles out of the box.
Step 2 — Write the GridWorldEnv class
Create a new file src/gridworld-env.ts and paste this in:
import type { TrainingEnv } from '@ignitionai/core'
export class GridWorldEnv implements TrainingEnv {
// Four actions: up, right, down, left
actions = ['up', 'right', 'down', 'left']
// Agent starts at the top-left, target at the bottom-right
agentRow = 0
agentCol = 0
readonly targetRow: number
readonly targetCol: number
readonly gridSize: number
// How many steps the agent has taken in the current episode
stepCount = 0
private readonly maxSteps = 100
constructor(gridSize = 7) {
this.gridSize = gridSize
this.targetRow = gridSize - 1
this.targetCol = gridSize - 1
}
// Return the state the agent sees — normalized to [0, 1]
observe(): number[] {
const max = this.gridSize - 1
return [
this.agentRow / max,
this.agentCol / max,
this.targetRow / max,
this.targetCol / max,
]
}
// Apply an action to move the agent
step(action: number | number[]): void {
const a = typeof action === 'number' ? action : action[0]
switch (a) {
case 0: this.agentRow = Math.max(0, this.agentRow - 1); break // up
case 1: this.agentCol = Math.min(this.gridSize - 1, this.agentCol + 1); break // right
case 2: this.agentRow = Math.min(this.gridSize - 1, this.agentRow + 1); break // down
case 3: this.agentCol = Math.max(0, this.agentCol - 1); break // left
}
this.stepCount++
}
// +10 for reaching the target, -0.1 per step otherwise (dense reward)
reward(): number {
if (this.agentRow === this.targetRow && this.agentCol === this.targetCol) return 10
return -0.1
}
// Episode ends on success or timeout
done(): boolean {
if (this.agentRow === this.targetRow && this.agentCol === this.targetCol) return true
return this.stepCount >= this.maxSteps
}
// Back to the top-left, fresh step counter
reset(): void {
this.agentRow = 0
this.agentCol = 0
this.stepCount = 0
}
}What to observe: This file is about 60 lines. That’s your entire game world. No training code, no neural network, no hyperparameters. Just a description of how the grid behaves.
Why this step exists: The TrainingEnv interface is the contract between your world and IgnitionAI’s training loop. The framework doesn’t care what’s in your step() method — it could be a grid, a physics simulation, a chess board, a custom renderer. As long as observe() returns numbers and done() eventually returns true, you’re good.
Design decisions in this file:
- Normalized observations (dividing by
max). Neural networks learn faster when inputs are in[0, 1]or[-1, 1]. This is the #1 rule from the Core Concepts page. - Dense reward (
-0.1per step instead of just+10at the end). Pure “goal-only” rewards are painfully slow for DQN to learn from. Penalizing every step creates a gradient pointing toward “reach the goal quickly.” - Timeout via
maxSteps. Without this, a wandering agent could run forever in the worst case. 100 steps is plenty for a 7×7 grid where the optimal path is 12 moves.
Step 3 — Wire up the training loop
Open src/main.ts (Vite created this for you) and replace its contents with:
import { IgnitionEnvTFJS } from '@ignitionai/backend-tfjs'
import { GridWorldEnv } from './gridworld-env'
// Create the world and the trainer
const world = new GridWorldEnv(7)
const env = new IgnitionEnvTFJS(world)
// Start training with DQN and sensible defaults
env.train('dqn')
// Turbo mode — 10× faster than real-time so we see results sooner
env.setSpeed(10)
// Log progress every 50 steps
setInterval(() => {
console.log(
`Step ${env.stepCount}`,
`Agent @ (${world.agentRow}, ${world.agentCol})`,
)
}, 500)What to observe: Eight lines of actual code (plus the logging). No neural network shape, no hyperparameters, no training loop — env.train('dqn') handles all of that.
Why this step exists: The framework’s “zero config” promise is visible here. IgnitionEnvTFJS inspected world.observe() to deduce the network’s input size (4 floats), read world.actions.length for the output size (4 actions), and built a small DQN agent with the defaults from the DQN page. You never touched any of that.
Step 4 — Run it
Start the Vite dev server:
npm run devOpen the URL it prints (usually http://localhost:5173) and pop the devtools console. You’ll see log lines like:
Step 50 Agent @ (2, 3)
Step 100 Agent @ (5, 1)
Step 150 Agent @ (6, 6)
Step 200 Agent @ (0, 0)
Step 250 Agent @ (4, 6)
...What to observe:
- For the first few hundred steps, the agent is effectively random — epsilon-greedy starts at 100% random. The agent position jumps around unpredictably.
- Somewhere around step 1000–2000 (a few seconds at
setSpeed(10)), the agent starts consistently reaching the goal at(6, 6). - After that, the goal-reach events become rhythmic — the agent is finding near-optimal paths.
Why this step exists: Watching a trained agent is the payoff. You just built a custom RL environment and trained a neural network to solve it without writing any ML code. That’s the whole pitch.
Step 5 — Add a visual grid (optional but satisfying)
The console logs are fine, but a live grid is much more fun. Add a <canvas> to index.html:
<canvas id="grid" width="350" height="350" style="border: 1px solid #334155"></canvas>Then update src/main.ts to draw the grid every animation frame:
const canvas = document.getElementById('grid') as HTMLCanvasElement
const ctx = canvas.getContext('2d')!
const cellSize = canvas.width / world.gridSize
function draw() {
ctx.fillStyle = '#0f172a'
ctx.fillRect(0, 0, canvas.width, canvas.height)
// grid lines
ctx.strokeStyle = '#1e293b'
for (let i = 0; i <= world.gridSize; i++) {
ctx.beginPath()
ctx.moveTo(i * cellSize, 0)
ctx.lineTo(i * cellSize, canvas.height)
ctx.stroke()
ctx.beginPath()
ctx.moveTo(0, i * cellSize)
ctx.lineTo(canvas.width, i * cellSize)
ctx.stroke()
}
// target
ctx.fillStyle = '#A5B4FC'
ctx.fillRect(world.targetCol * cellSize + 4, world.targetRow * cellSize + 4, cellSize - 8, cellSize - 8)
// agent
ctx.fillStyle = '#6366F1'
ctx.beginPath()
ctx.arc(
world.agentCol * cellSize + cellSize / 2,
world.agentRow * cellSize + cellSize / 2,
cellSize / 3,
0,
Math.PI * 2,
)
ctx.fill()
requestAnimationFrame(draw)
}
draw()What to observe: A 7×7 grid with a pale blue square at the bottom-right (the target) and an indigo dot that jumps around randomly at first, then begins tracing diagonal paths to the target, then quickly locks into near-optimal L-shaped paths.
Why this step exists: This is the “training loop vs render loop” split from the R3F page in action. The draw() function runs on requestAnimationFrame and reads world.agentRow / world.agentCol at its own pace. The training loop runs on setTimeout and mutates those same fields at its own pace. They don’t interfere.
What you just built
A small but complete reinforcement learning setup:
- A custom
TrainingEnvwith dense reward shaping and step-count timeout. - A DQN agent training with IgnitionAI’s defaults.
- A live visualization of the agent’s behavior as it learns.
- A concrete feel for what “decoupled training and render loops” means in practice.
Everything in this tutorial scales up to harder problems. If you swap GridWorldEnv for a MountainCarEnv or a custom physics sim, the rest of the code barely changes — env.train('dqn') still works.
Next steps
-
Try a different algorithm. Change
env.train('dqn')toenv.train('qtable'). On a 7×7 grid, tabular Q-learning converges almost instantly. Then tryenv.train('ppo')and watch it take longer — PPO is overkill here, and that’s the lesson. See Algorithms for which is which. -
Break the reward. Change the reward to
return 10only when the goal is reached (remove the-0.1per step). Retry. You’ll see DQN struggle — this is what “sparse reward” looks like, and it’s the single biggest reason agents fail to learn. -
Make it harder. Bump the grid size to 15 or 20. You may need to bump
maxStepsproportionally and give the network more capacity (env.train('dqn', { hiddenLayers: [64, 64] })). -
Write your own env. Anything you can describe in those five methods, IgnitionAI can train an agent on. Read How it works → core for the full interface reference, then React Three Fiber if you want to put your env in a 3D scene.
-
Check the other tutorials. More are coming — see the Tutorials index for what’s on the roadmap.
Previous: ← Tutorials · Next: CartPole: custom observations →