CartPole 3D — React Three Fiber

This is the bridge tutorial between RL and 3D rendering. You already know how to train a cart-pole agent (from the Quickstart). You already read the R3F page and know the training loop is decoupled from the render loop. This tutorial connects those two threads into a working 3D demo you can ship.

Estimated time: 35–45 minutes (longer than the 2D tutorials because you set up an R3F project).

Prerequisites

You’ve done the GridWorld and Quickstart tutorials.
You’ve read the React Three Fiber page so you understand the training-loop/render-loop split.
Comfort with React hooks (useEffect, useRef) and basic R3F (<Canvas>, useFrame).

Step 1 — Scaffold a Vite React project


npm create vite@latest cartpole-3d -- --template react-ts
cd cartpole-3d
npm install
npm install @ignitionai/core @ignitionai/backend-tfjs @react-three/fiber @react-three/drei three
npm install -D @types/three

What to observe: package.json should now list @ignitionai/core, @ignitionai/backend-tfjs, @react-three/fiber, @react-three/drei, and three under dependencies.

Why this step exists: @react-three/fiber is the React renderer for Three.js, @react-three/drei ships helpers we’ll use for camera controls and shadows, and three is the underlying engine. The Vite template gives us hot reload and TypeScript out of the box.

Step 2 — Add the CartPole env

Create src/cartpole-env.ts and paste the full CartPoleEnv from the Quickstart. This is unchanged from before.

What to observe: nothing yet — you’re preparing for Step 3.

Why this step exists: R3F is the rendering layer. The env is still pure TypeScript, framework-agnostic, and doesn’t import anything from React. This separation is the whole point of the “decoupled loops” pattern.

Step 3 — The scene components

Create src/CartPoleScene.tsx:

src/CartPoleScene.tsx


import { useRef } from 'react'
import { useFrame } from '@react-three/fiber'
import * as THREE from 'three'
import type { CartPoleEnv } from './cartpole-env'
 
// Render the cart as a box and update its x each frame
export function Cart({ env }: { env: CartPoleEnv }) {
  const ref = useRef<THREE.Mesh>(null!)
  useFrame(() => {
    // Read env state, update mesh position
    ref.current.position.x = (env as any).x
  })
  return (
    <mesh ref={ref} position={[0, 0.1, 0]} castShadow>
      <boxGeometry args={[0.5, 0.2, 0.3]} />
      <meshStandardMaterial color="#6366F1" metalness={0.7} roughness={0.2} />
    </mesh>
  )
}
 
// Render the pole as a thin cylinder attached to the cart
export function Pole({ env }: { env: CartPoleEnv }) {
  const ref = useRef<THREE.Group>(null!)
  useFrame(() => {
    const e = env as any
    // Position matches cart; rotation is theta (pole angle)
    ref.current.position.x = e.x
    ref.current.position.y = 0.2
    ref.current.rotation.z = -e.theta
  })
  return (
    <group ref={ref}>
      {/* pivot at the base — geometry extends up */}
      <mesh position={[0, 0.5, 0]} castShadow>
        <cylinderGeometry args={[0.03, 0.03, 1, 16]} />
        <meshStandardMaterial color="#A5B4FC" metalness={0.5} roughness={0.3} />
      </mesh>
    </group>
  )
}
 
export function Ground() {
  return (
    <mesh rotation={[-Math.PI / 2, 0, 0]} position={[0, 0, 0]} receiveShadow>
      <planeGeometry args={[10, 10]} />
      <meshStandardMaterial color="#0f172a" />
    </mesh>
  )
}

What to observe: three pure components that know nothing about training — they just read env.x and env.theta on every frame and position their meshes. The as any cast is there because our CartPoleEnv marks those fields private; in production you’d expose them via a state getter.

Why this step exists: this is the view layer. It reads the env’s state at render-time and updates the meshes. It never mutates the env. That’s the render loop.

Step 4 — Expose the env state

Your CartPoleEnv has private x and private theta. The scene needs read access. Adjust cartpole-env.ts to make the five state fields public (or expose them via a state getter):

src/cartpole-env.ts (change private → public)


export class CartPoleEnv implements TrainingEnv {
  actions = ['push_left', 'push_right']
  x = 0
  xDot = 0
  theta = 0
  thetaDot = 0
  stepCount = 0
  // ... everything else unchanged
}

You can now remove the as any casts from the scene components.

Why this step exists: trade-off time. Keeping fields private is cleaner inside the env class. Exposing them as public lets other parts of your app (like a 3D renderer) read the state cheaply. For tutorial code, public wins; for library code, you’d prefer a read-only getter.

Step 5 — Mount the scene and start training

Replace src/App.tsx:

src/App.tsx


import { useEffect, useRef } from 'react'
import { Canvas } from '@react-three/fiber'
import { OrbitControls, Environment, ContactShadows } from '@react-three/drei'
import { IgnitionEnvTFJS } from '@ignitionai/backend-tfjs'
import { CartPoleEnv } from './cartpole-env'
import { Cart, Pole, Ground } from './CartPoleScene'
 
export default function App() {
  const envRef = useRef<CartPoleEnv>(new CartPoleEnv())
  const trainerRef = useRef<IgnitionEnvTFJS | null>(null)
 
  useEffect(() => {
    const trainer = new IgnitionEnvTFJS(envRef.current)
    trainer.train('dqn')
    trainer.setSpeed(10)   // 10x — turbo but still watchable
    trainerRef.current = trainer
    return () => trainer.stop()
  }, [])
 
  return (
    <div style={{ width: '100vw', height: '100vh' }}>
      <Canvas
        shadows
        camera={{ position: [2, 1.5, 3], fov: 50 }}
      >
        <ambientLight intensity={0.3} />
        <directionalLight
          position={[3, 4, 2]}
          intensity={1}
          castShadow
          shadow-mapSize-width={1024}
          shadow-mapSize-height={1024}
        />
        <Cart env={envRef.current} />
        <Pole env={envRef.current} />
        <Ground />
        <ContactShadows position={[0, 0.01, 0]} opacity={0.4} scale={5} blur={2} />
        <Environment preset="sunset" />
        <OrbitControls />
      </Canvas>
    </div>
  )
}

What to observe: Run npm run dev. You should see a cart at (0, 0) with a pole standing up, a soft contact shadow below it, a sunset-tinted environment, and after a few seconds of DQN training, the cart starts wobbling back and forth keeping the pole upright. Drag to orbit the camera with your mouse.

Why this step exists: this is the full integration. The useEffect owns the trainer lifecycle — it starts training on mount and stops on unmount. The <Canvas> renders the scene at 60 fps. The two loops communicate exclusively through envRef.current, which is a stable object owned by the React ref system.

Step 6 — Understand what you just built

Where the training loop lives: in useEffect, inside trainer.train('dqn'). That call sets up a setTimeout loop that runs entirely outside React’s render cycle. React doesn’t re-render when the agent takes a step. Training happens in the “background” relative to rendering.

Where the render loop lives: in the useFrame callbacks inside Cart and Pole. Those fire on every animation frame (60 fps) and read envRef.current.x / envRef.current.theta. They never mutate the env. They just observe and render.

Why this doesn’t race: JavaScript is single-threaded. The setTimeout callback and the requestAnimationFrame callback can’t run simultaneously. One always finishes before the other starts. The env state is consistent at any instant a callback reads it.

Why setSpeed(10) is the right default: at 10×, the cart is visibly moving fast — you can see the training progress in real time instead of waiting. At 50×, it’s a blur. At 1×, it takes minutes to converge and the visual is barely changing.

Step 7 — Add a reset button (optional)

Once training converges, you’ll want to test the trained policy. Add a button that swaps to inference mode:

src/App.tsx (additions)


// Inside the App component's return, before the closing </div>:
<button
  onClick={() => trainerRef.current?.infer()}
  style={{
    position: 'absolute', top: 20, left: 20,
    padding: '10px 20px', background: '#6366F1', color: 'white',
    border: 'none', borderRadius: 8, cursor: 'pointer',
  }}
>
  Switch to inference
</button>

Click it after ~1 minute of training. The agent should switch from “flailing + exploring” to “smooth, deterministic balancing.”

What you just built

A full R3F + IgnitionAI integration from a blank Vite project.
A pattern you can lift directly into any R3F project: env in a useRef, trainer in a useEffect with cleanup, meshes reading state in useFrame.
Concrete intuition for why the two loops don’t interfere.

This same pattern scales up to arbitrarily complex scenes. The Car Circuit tutorial takes it further — physics-driven track, chase camera, minimap — but the core loop structure is identical to what you just wrote.

Next steps

Car Circuit tutorial — the next step up: physics, camera controls, and the “hero demo” experience.
Export to Unity via ONNX — once you have a trained policy in the browser, deploy it to a Unity Sentis project.
R3F page — revisit the training-loop/render-loop deep dive with your new hands-on context.

Previous: ← MountainCar: reward shaping · Next: Car Circuit →