Quickstart
By the end of this page, you will have trained a reinforcement-learning agent to balance a pole in the browser. The entire setup is 7 lines of code. You will not configure a neural network, pick hyperparameters, or write a training loop.
Install
npm install @ignitionai/core @ignitionai/backend-tfjs @ignitionai/environmentsYou need:
- Node.js 20 or later
- A bundler that can serve ES modules (Vite, Next.js, Webpack 5, etc.)
Train a CartPole agent
Create src/main.ts and paste this in exactly:
import { IgnitionEnvTFJS } from '@ignitionai/backend-tfjs'
import { CartPoleEnv } from '@ignitionai/environments'
const env = new IgnitionEnvTFJS(new CartPoleEnv())
env.train('dqn') // Zero config. It just works.
// env.infer() // Switch to inference after training.
// env.setSpeed(50) // Turbo training — 50× faster.Seven lines. That’s the entire training setup. No neural network code, no hyperparameter tuning, no config files.
Save and load your agent
Training takes time. Don’t throw it away. Add two lines to persist the model:
import { IndexedDBProvider } from '@ignitionai/storage'
const env = new IgnitionEnvTFJS(new CartPoleEnv())
env.train('dqn', { storageProvider: new IndexedDBProvider() })
// ... training converges ...
await env.save('cartpole-best', { meanReward: 195.4 })
// Later — on page reload, or in another session:
await env.load('cartpole-best')
env.infer() // Watch the trained agent playsave() stores the model weights and the agent’s internal state (epsilon, step counter, best reward). load() restores everything. No manual bookkeeping.
Open the page in your browser, pop the devtools console, and you’ll see reward logs scrolling as the agent learns. Within a few dozen seconds on most machines, the pole will start staying up.
That’s it. If you want to understand how to write your own TrainingEnv from scratch (instead of using the built-in CartPoleEnv), jump to the GridWorld tutorial — it walks through the full 5-method interface.
What just happened
Let’s map each line of the example to a concept you’ll see explained in more depth elsewhere in the docs.
Line 1 — import { IgnitionEnvTFJS } from '@ignitionai/backend-tfjs'
IgnitionEnvTFJS is the training environment that runs on TensorFlow.js. It owns the training loop, the agent instance, and the TF.js backend selection. You’ll see its internals in How it works → backend-tfjs.
Line 2 — import { CartPoleEnv } from '@ignitionai/environments'
CartPoleEnv is a built-in TrainingEnv that describes the classic cart-pole world: a cart that can move left or right, a pole balanced on top, Euler-integrated physics, and a done condition when the pole tips past 12° or the cart leaves the track. The @ignitionai/environments package also ships GridWorldEnv and MountainCarEnv. The TrainingEnv interface is what you’ll implement for your own worlds — see How it works → core.
Line 4 — new IgnitionEnvTFJS(new CartPoleEnv())
You instantiate your env and wrap it in IgnitionEnvTFJS. At this point, the framework calls cartpole.observe() once, inspects the array length, and deduces the neural network’s input size. It also reads cartpole.actions.length to deduce the output size. You never touch inputSize or actionSize explicitly — this is what “zero config” means in practice.
Line 6 — env.train('dqn')
This starts training with the Deep Q-Network algorithm and IgnitionAI’s default hyperparameters. Under the hood, the framework:
- Builds a small MLP matching the deduced input/output sizes (hidden layers
[24, 24]by default). - Allocates a replay buffer (10 000 experiences).
- Initializes epsilon-greedy exploration (
epsilon = 1.0, decaying 0.5% per episode to a floor of0.01). - Kicks off a
setTimeout-yielding training loop at 50 ms per step so the browser stays responsive.
You can read the full set of defaults on the DQN page. If you want PPO instead, it’s one word: env.train('ppo').
Line 7 — env.infer() (commented out)
Once your agent has converged, switch from training mode to inference mode. The epsilon exploration drops to zero, gradients stop flowing, and the policy plays deterministically. Read more in How it works → core.
Line 8 — env.setSpeed(50) (commented out)
By default the training loop yields every 50 ms so the browser can render your scene at 60 fps. setSpeed(50) accelerates training by running 50 steps per tick and dropping the interval to 1 ms. Visuals get choppy but training integrity is preserved. You’ll want this during dev — drop back to setSpeed(1) when you want to watch the agent play in real time.
Where to go next
You have three reasonable paths from here:
- Understand what
env.train('dqn')actually did. → Algorithms. The DQN, PPO, and Q-Table pages are deliberately verbose and explain the math without demanding a PhD. - Write your own environment. → How it works → core walks through the
TrainingEnvinterface with a worked example. - Follow a complete tutorial from scratch. → Tutorials → GridWorld is a step-by-step build of a custom env.
Previous: ← Introduction · Next: Algorithms →