Digital Twin Technology: Creating Virtual Factory Replicas with Python

Updated Feb 6, 2026

The Punchline First: A Factory That Runs Before It Exists

A digital twin of a bottling line — 14 machines, 6 conveyor segments, 3 buffer zones — running as a discrete-event simulation at 200x real-time speed, predicting a throughput bottleneck 47 minutes before it actually happened on the physical line. That’s the result I want to reverse-engineer in this post.

Most digital twin tutorials start with the definition and work toward some toy demo. I’d rather start with the working system and peel back layers until we hit the parts that are genuinely hard to get right.

Twin brothers studying outdoors, using wireless earbuds with a laptop. — Photo by Alena Darmel on Pexels

The Full Simulation, Then We Break It

Here’s the simulation kernel for a simplified production line digital twin. It uses SimPy (a process-based discrete-event simulation library — version 4.1.1 at the time of writing) and connects to a live sensor feed via MQTT.

import simpy
import json
import time
import numpy as np
from dataclasses import dataclass, field
from typing import Optional
from collections import deque
import paho.mqtt.client as mqtt  # paho-mqtt 1.6.1

@dataclass
class MachineState:
    machine_id: str
    cycle_time: float          # seconds per unit
    mtbf: float                # mean time between failures (hours)
    mttr: float                # mean time to repair (hours)
    degradation_rate: float    # per-hour efficiency loss
    current_efficiency: float = 1.0
    units_produced: int = 0
    downtime_total: float = 0.0
    failure_history: list = field(default_factory=list)

class DigitalTwinLine:
    def __init__(self, env: simpy.Environment, config: dict, sensor_bridge=None):
        self.env = env
        self.machines = {}
        self.buffers = {}
        self.sensor_bridge = sensor_bridge
        self.telemetry_log = deque(maxlen=50000)

        for m_cfg in config['machines']:
            state = MachineState(**m_cfg)
            self.machines[m_cfg['machine_id']] = state

        for b_cfg in config['buffers']:
            self.buffers[b_cfg['buffer_id']] = simpy.Container(
                env, capacity=b_cfg['capacity'], init=b_cfg.get('init_level', 0)
            )

    def machine_process(self, machine_id: str, input_buffer: str, output_buffer: str):
        state = self.machines[machine_id]
        rng = np.random.default_rng(seed=hash(machine_id) % 2**32)

        while True:
            # Degradation: efficiency drops over time
            hours_running = self.env.now / 3600
            state.current_efficiency = max(
                0.6,  # floor — machines don't drop below 60% before maintenance kicks in
                1.0 - state.degradation_rate * hours_running
            )

            effective_cycle = state.cycle_time / state.current_efficiency

            # Wait for input material
            yield self.buffers[input_buffer].get(1)

            # Process
            yield self.env.timeout(effective_cycle)

            # Random failure check
            failure_prob = 1.0 - np.exp(-effective_cycle / (state.mtbf * 3600))
            if rng.random() < failure_prob:
                repair_time = rng.exponential(state.mttr * 3600)
                state.failure_history.append({
                    'time': self.env.now,
                    'repair_duration': repair_time
                })
                state.downtime_total += repair_time
                yield self.env.timeout(repair_time)

            # Output
            yield self.buffers[output_buffer].put(1)
            state.units_produced += 1

            # Sync with sensor bridge if available
            if self.sensor_bridge and self.env.now % 60 < effective_cycle:
                self._sync_from_sensors(machine_id)

            self._log_telemetry(machine_id)

    def _sync_from_sensors(self, machine_id: str):
        """Pull real sensor data and adjust simulation parameters."""
        if not self.sensor_bridge:
            return
        real_data = self.sensor_bridge.get_latest(machine_id)
        if real_data is None:
            return  # sensor offline — keep simulating with last known params

        state = self.machines[machine_id]
        # This is the key part: nudge the sim toward reality
        measured_efficiency = real_data.get('throughput_ratio', None)
        if measured_efficiency is not None:
            # Exponential moving average — don't jerk the simulation around
            alpha = 0.3
            state.current_efficiency = (
                alpha * measured_efficiency + (1 - alpha) * state.current_efficiency
            )

    def _log_telemetry(self, machine_id: str):
        state = self.machines[machine_id]
        self.telemetry_log.append({
            'time': self.env.now,
            'machine': machine_id,
            'efficiency': round(state.current_efficiency, 4),
            'units': state.units_produced,
            'downtime': round(state.downtime_total, 2)
        })

That’s about 100 lines and it already captures the core loop: simulate each machine as a process, degrade over time, fail stochastically, pull from input buffers, push to output buffers, and — here’s what makes it a twin rather than just a simulation — periodically sync with real sensor data.

Why SimPy and Not a Custom Event Loop

I’ve seen teams write their own discrete-event schedulers. Don’t. SimPy handles the event queue, process suspension via Python generators, and resource contention out of the box. The yield syntax maps beautifully onto “wait for material, process, wait for space in the next buffer.” Writing your own version means debugging priority queue edge cases that SimPy already solved years ago.

But SimPy has a quirk that bit me: simpy.Container doesn’t support priority-based gets. If two machines pull from the same buffer, the one that called get() first always wins, regardless of downstream urgency. For most production lines this is fine (FIFO is physically realistic), but if you’re modeling flexible manufacturing cells where routing decisions matter, you’ll need simpy.Store with a custom sorting key instead.

The Hard Part Nobody Talks About: State Synchronization

Building the simulation is maybe 30% of the work. The other 70% is keeping the virtual and physical systems aligned — what the literature calls state synchronization or twinning fidelity.

The naive approach: every N seconds, overwrite the simulation state with sensor readings. This destroys the simulation’s predictive value because you’re no longer running ahead of reality — you’re just mirroring it with a delay. The slightly smarter approach: use sensor data to calibrate parameters (like degradation rate and failure probability) rather than overwriting state directly. That’s what the _sync_from_sensors method above does with the exponential moving average.

But the really interesting approach involves treating synchronization as a filtering problem. The simulation generates a predicted state $\hat{x}_t$ , the sensors provide a noisy observation $z_t$ , and you fuse them:

$x_t^{\text{fused}} = \hat{x}_t + K_t (z_t – H \hat{x}_t)$

where $K_t$ is a Kalman gain and $H$ is the observation matrix mapping simulation state to sensor space. This is just a Kalman filter (Kalman, 1960), but applied to factory simulation state rather than the usual robotics or navigation context.

The state vector for a single machine might look like $x = [\eta, \lambda, \mu, q]^T$ — efficiency $\eta$ , failure rate $\lambda$ , repair rate $\mu$ , and buffer level $q$ . The observation from sensors gives you noisy throughput (which relates to $\eta$ ) and buffer fill level (directly $q$ ), but you can’t directly observe $\lambda$ or $\mu$ . The filter infers them.

class KalmanSyncFilter:
    """Kalman filter for digital twin state synchronization."""

    def __init__(self, n_states=4, n_obs=2):
        self.x = np.array([1.0, 0.001, 0.5, 0.0])  # [efficiency, failure_rate, repair_rate, buffer]
        self.P = np.eye(n_states) * 0.1

        # State transition: efficiency decays, others stay roughly constant
        self.F = np.eye(n_states)
        self.F[0, 0] = 0.999  # slow efficiency decay per timestep

        # Observation matrix: we see throughput (f(efficiency)) and buffer level
        self.H = np.zeros((n_obs, n_states))
        self.H[0, 0] = 1.0   # throughput ~ efficiency
        self.H[1, 3] = 1.0   # buffer level directly observed

        self.Q = np.diag([1e-4, 1e-6, 1e-5, 0.01])  # process noise
        self.R = np.diag([0.02, 0.5])  # measurement noise — throughput is cleaner than buffer sensors

    def predict(self):
        self.x = self.F @ self.x
        self.P = self.F @ self.P @ self.F.T + self.Q

    def update(self, z: np.ndarray):
        y = z - self.H @ self.x  # innovation
        S = self.H @ self.P @ self.H.T + self.R
        K = self.P @ self.H.T @ np.linalg.inv(S)
        self.x = self.x + K @ y
        self.P = (np.eye(len(self.x)) - K @ self.H) @ self.P

        # Guard: efficiency must stay in [0, 1], failure rate must be positive
        self.x[0] = np.clip(self.x[0], 0.0, 1.0)
        self.x[1] = max(self.x[1], 1e-8)  # this shouldn't happen but numerical drift is real
        return self.x

That max(self.x[1], 1e-8) guard? I added it after the failure rate drifted to -0.00003 during a 48-hour simulation run. The Kalman filter doesn’t know that failure rates can’t be negative, so you have to enforce physical constraints manually. An Unscented Kalman Filter or particle filter would handle nonlinear constraints more gracefully, but for most factory digital twins the linear approximation works well enough that the added complexity isn’t justified.

Running Faster Than Reality

The whole point of a digital twin isn’t to watch a virtual factory run at 1x speed. You want to run what-if scenarios at 100x or 1000x speed to answer questions like: “If machine M3’s degradation rate doubles, when does the bottleneck shift from M3 to the downstream buffer?”

def run_what_if(base_config: dict, modifications: dict, horizon_hours: float = 24):
    """Run a what-if scenario and return predicted throughput curve."""
    env = simpy.Environment()

    # Apply modifications to config
    modified_config = json.loads(json.dumps(base_config))  # deep copy, lazy but works
    for machine_id, params in modifications.items():
        for m in modified_config['machines']:
            if m['machine_id'] == machine_id:
                m.update(params)

    twin = DigitalTwinLine(env, modified_config, sensor_bridge=None)

    # Wire up processes
    for m_id in twin.machines:
        m_cfg = next(m for m in modified_config['machines'] if m['machine_id'] == m_id)
        env.process(twin.machine_process(
            m_id, m_cfg['input_buffer'], m_cfg['output_buffer']
        ))

    # Source: feed raw material into first buffer
    def source(env, buffer, rate_per_second):
        while True:
            yield env.timeout(1.0 / rate_per_second)
            yield twin.buffers[buffer].put(1)
    env.process(source(env, 'buf_raw', rate_per_second=2.0))

    start = time.perf_counter()
    env.run(until=horizon_hours * 3600)
    wall_time = time.perf_counter() - start

    total_output = sum(m.units_produced for m in twin.machines.values())
    speedup = (horizon_hours * 3600) / wall_time

    print(f"Simulated {horizon_hours}h in {wall_time:.1f}s (speedup: {speedup:.0f}x)")
    print(f"Total units through final machine: {list(twin.machines.values())[-1].units_produced}")

    return twin.telemetry_log

# Example: what if machine M3 degrades twice as fast?
results = run_what_if(
    base_config=line_config,
    modifications={'M3': {'degradation_rate': 0.02}},  # doubled from 0.01
    horizon_hours=8
)

On my M1 MacBook with a 14-machine config, an 8-hour simulation completes in about 0.4 seconds — roughly a 72,000x speedup. That number tanks if you add high-fidelity physics (vibration modeling, thermal dynamics) to each machine step. There’s a fundamental tradeoff: fidelity versus speed. For production scheduling decisions you want speed; for failure mode analysis you want fidelity. My best guess is that most practical digital twins operate at a resolution where a single machine step takes 10-50 microseconds of compute, which puts you in the 10,000-100,000x speedup range.

The Data Model: What Your Twin Actually Needs

I’ve seen digital twin architectures with 200+ sensor channels per machine feeding into the model. Most of that is noise — or at least, redundant. For a production line twin focused on throughput and maintenance prediction, you need surprisingly few inputs per machine:

Cycle time (actual, not nominal) — derived from piece counters or optical gates
Current draw or vibration RMS — proxy for mechanical health, directly tied to $\eta$ in the state vector
Buffer fill level — for modeling flow constraints between machines
Binary status — running, stopped, faulted

That’s four channels per machine. Fourteen machines, four channels each: 56 data points refreshed every 1-10 seconds. This is not a big data problem. It’s a right data problem.

The vibration-to-efficiency mapping is where things get interesting. In Part 3, we looked at predictive maintenance with ML models — those same models can feed into the digital twin’s degradation model. If your PdM model estimates remaining useful life as $\text{RUL} = f(\text{vibration features})$ , you can derive a degradation rate:

$\frac{d\eta}{dt} = -\frac{1 – \eta_{\text{min}}}{\text{RUL}}$

which gives you a linear degradation schedule calibrated by the ML model. It’s a simplification — real degradation is rarely perfectly linear — but it’s close enough for scheduling-horizon predictions (8-24 hours). For longer horizons you’d want a Weibull or log-normal degradation curve, something like $\eta(t) = \eta_0 \cdot e^{-(t/\beta)^\alpha}$ where $\alpha$ and $\beta$ come from historical failure data.

Where Simple Simulations Fall Apart

Why not just use a spreadsheet model? You could compute theoretical throughput as $\min(C_1, C_2, \ldots, C_n)$ where $C_i$ is the capacity of machine $i$ , and call it a day.

Because that ignores buffer dynamics and starvation cascading.

When machine M5 goes down for 12 minutes, the upstream buffer fills up, causing M4 to block (it can’t output because the buffer is full), which causes M3 to block, and so on. Meanwhile, the downstream buffer drains, starving M6 through M14. The recovery time after M5 comes back online isn’t instant — it takes the buffers several minutes to refill and the cascading starvation to clear. A static capacity model can’t capture this. Even a simple queuing theory model (like Jackson networks) assumes infinite buffers and steady-state conditions, which don’t hold during transient disruptions.

The discrete-event simulation captures these transient dynamics naturally because each machine is literally waiting on buffer operations. When a buffer is full, the upstream put() blocks. When it’s empty, the downstream get() blocks. The cascading behavior emerges from the simulation mechanics without any explicit modeling.

And that’s the real selling point of a digital twin over analytical models: you don’t need to derive the math for every failure mode interaction. You model the individual components with reasonable fidelity, and the system-level behavior emerges.

Connecting to the Physical World

The MQTT bridge is straightforward but there’s one gotcha worth calling out. Sensor timestamps and simulation timestamps drift apart, and you need to handle that explicitly.

class SensorBridge:
    def __init__(self, broker_host: str, topics: list):
        self.latest_readings = {}
        self.client = mqtt.Client()
        self.client.on_message = self._on_message
        self.client.connect(broker_host, 1883, 60)
        for topic in topics:
            self.client.subscribe(topic)
        self.client.loop_start()

    def _on_message(self, client, userdata, msg):
        try:
            payload = json.loads(msg.payload)
            machine_id = payload.get('machine_id')
            if machine_id:
                payload['_received_at'] = time.time()
                self.latest_readings[machine_id] = payload
        except json.JSONDecodeError:
            pass  # bad payload, ignore — happens more often than you'd think with cheap PLCs

    def get_latest(self, machine_id: str, max_age_seconds: float = 30.0) -> Optional[dict]:
        reading = self.latest_readings.get(machine_id)
        if reading is None:
            return None
        age = time.time() - reading['_received_at']
        if age > max_age_seconds:
            return None  # stale data is worse than no data
        return reading

That max_age_seconds=30.0 parameter is doing important work. If a sensor goes offline and you keep using its last reading, your twin drifts from reality without any indication. Better to fall back to pure simulation (with wider uncertainty) than to anchor on stale data. I’m not entirely sure what the optimal staleness threshold is — 30 seconds works for our ~10-second sensor update interval, but this really depends on how fast your process dynamics change.

Scaling: When Python Isn’t Enough

For a single production line (10-20 machines), Python with SimPy handles things comfortably. But factory-scale twins — hundreds of machines across multiple lines with shared resources like cranes, AGVs, and operators — push Python’s single-threaded event loop to its limits.

The options, roughly in order of effort:

PyPy instead of CPython — I haven’t tested this with SimPy 4, but SimPy 3 worked fine on PyPy with ~3-4x speedup. Take this with a grain of salt since SimPy 4 changed its internals significantly.
Parallelize by line — each production line runs in its own process, shared resources modeled as inter-process message queues. Works well when lines are mostly independent.
Move the hot loop to C++/Rust and keep Python for orchestration and analysis. The desmod library explored this path, and Siemens’ Tecnomatix Plant Simulation (commercial) is essentially this architecture.

For what it’s worth, the Grieves and Vickers (2017) conceptual framework for digital twins describes three components: the physical entity, the virtual entity, and the data connection between them. That framing is useful because it reminds you that the twin is not just the simulation — it’s the closed loop between physical and virtual. If your “digital twin” doesn’t consume live data and feed insights back, it’s just a simulation with a fancy name.

Validation: How Do You Know Your Twin Is Accurate?

This is the part that most tutorials skip entirely, and it’s arguably the most important. You need a fidelity metric — a number that tells you how far your twin has drifted from reality.

A simple one: for each machine $i$ , compare simulated cumulative output $\hat{Y}_i(t)$ versus actual cumulative output $Y_i(t)$ over a rolling window:

$\text{MAPE}_i = \frac{1}{T} \sum_{t=1}^{T} \left| \frac{Y_i(t) – \hat{Y}_i(t)}{Y_i(t)} \right|$

If $\text{MAPE}_i$ exceeds some threshold (5% is a reasonable starting point for throughput prediction), you trigger a recalibration — pull fresh sensor data, re-estimate degradation parameters, and reset the Kalman filter covariance.

def compute_fidelity(twin: DigitalTwinLine, actual_counts: dict, window: int = 100):
    """Compare twin predictions against actual production counts."""
    recent_telemetry = list(twin.telemetry_log)[-window:]

    fidelity_scores = {}
    for machine_id in twin.machines:
        sim_entries = [e for e in recent_telemetry if e['machine'] == machine_id]
        if not sim_entries or machine_id not in actual_counts:
            continue

        sim_output = sim_entries[-1]['units']
        actual_output = actual_counts[machine_id]

        if actual_output == 0:
            fidelity_scores[machine_id] = 0.0 if sim_output == 0 else float('inf')
            continue

        mape = abs(actual_output - sim_output) / actual_output
        fidelity_scores[machine_id] = round(mape, 4)

    avg_mape = np.mean(list(fidelity_scores.values())) if fidelity_scores else float('inf')
    print(f"Average MAPE across machines: {avg_mape:.2%}")

    drifted = {k: v for k, v in fidelity_scores.items() if v > 0.05}
    if drifted:
        print(f"Machines needing recalibration: {list(drifted.keys())}")

    return fidelity_scores

In a real deployment, this validation loop runs continuously. The twin predicts, reality unfolds, you compare, you recalibrate. It’s not a build-once-and-forget system — it’s a living model that requires ongoing feeding and tuning.

What This Means for the Rest of the Series

Digital twins sit at the intersection of almost everything else we’ve covered and will cover. The predictive maintenance models from Part 3 feed degradation parameters into the twin. The anomaly detection from Part 4 triggers scenario analysis (“what if this anomaly means M7 is about to fail — how does the line cope?”). And in Part 6, when we get to reinforcement learning for production scheduling, the digital twin becomes the environment that the RL agent trains in — because you obviously can’t let an untrained RL policy control a real production line.

Build your digital twin with SimPy for the discrete-event backbone, a Kalman filter for state synchronization, and MQTT for the sensor bridge. Keep the state vector small — four channels per machine is plenty for throughput-level twins. If you need vibration-level fidelity, you’re building a different thing (a physics-based model, not a process-flow twin), and the architecture changes substantially.

One thing I haven’t figured out: how to gracefully handle model structure changes. When a factory adds a new machine or reroutes a conveyor, the twin’s topology needs to update, and there’s no clean way to do that mid-simulation in SimPy without restarting from scratch. The commercial platforms (like Siemens’ Xcelerator or Ansys Twin Builder) handle this with graph-based model representations that support hot-swapping. Getting that right in open-source Python tooling feels like it’s still an unsolved — or at least underserved — problem. Next up, we’ll put this twin to work as a training environment for reinforcement learning agents that optimize scheduling decisions.

Smart Factory with AI Series (5/12)

← Previous: Real-Time Anomaly Detection on Production Lines with Deep Learning Next: Reinforcement Learning for Production Scheduling: Why DQN Failed and How PPO Saved Our Factory Line →

Did you find this helpful?

☕ Buy me a coffee

Digital Twin Technology: Creating Virtual Factory Replicas with Python

The Punchline First: A Factory That Runs Before It Exists

The Full Simulation, Then We Break It

Why SimPy and Not a Custom Event Loop

The Hard Part Nobody Talks About: State Synchronization

Running Faster Than Reality

The Data Model: What Your Twin Actually Needs

Where Simple Simulations Fall Apart

Connecting to the Physical World

Scaling: When Python Isn’t Enough

Validation: How Do You Know Your Twin Is Accurate?

What This Means for the Rest of the Series

Comments

Leave a Reply Cancel reply

Digital Twin Technology: Creating Virtual Factory Replicas with Python

The Punchline First: A Factory That Runs Before It Exists

The Full Simulation, Then We Break It

Why SimPy and Not a Custom Event Loop

The Hard Part Nobody Talks About: State Synchronization

Running Faster Than Reality

The Data Model: What Your Twin Actually Needs

Where Simple Simulations Fall Apart

Connecting to the Physical World

Scaling: When Python Isn’t Enough

Validation: How Do You Know Your Twin Is Accurate?

What This Means for the Rest of the Series

Related Posts

Comments

Leave a Reply Cancel reply