What Data Do You Need for Traffic Simulation and How to Collect It?

David Bennett
Dec 24, 2025
9 min read

Mobility teams rarely fail because they lack software. They fail because they feed a model the wrong story about how a road actually behaves. Good traffic simulation starts with the right inputs, collected with intent, cleaned with discipline, and tied back to decisions that matter. That is where simulation becomes a planning instrument, not a pretty animation.

At Mimic Mobility, we approach simulation like a production pipeline. Real streets, real schedules, real driver behavior, real operational constraints. Our work in virtual environments and testing, including our 3D work on the 3D simulations side, repeatedly shows the same pattern. The data you choose determines whether your outputs are actionable.

This guide breaks down the core traffic model inputs you need for traffic simulation, why each one matters, and practical ways to collect them without drowning your team in sensors, spreadsheets, or mismatched timestamps.

Define the job your traffic simulation must do
The core data categories you need (and what “good” looks like)
How to collect traffic data in the real world?
Traffic data types vs collection methods for faster model build
Applications In Mobility
Benefits
Considerations For Mobility Teams
Future Outlook
Conclusion
FAQs

Before you start buying hardware or licensing feeds, align your pipeline across mapping, sensing, and validation. The fastest teams treat collection and QA as a product, using a consistent stack of sensing and capture tools like those described in our tech practice.

Define the job your traffic simulation must do

A model that is “accurate” in one context can be misleading in another. The first dataset you need is not a file. It is a clear definition of what the model must answer.

Decision frame: Are you testing signal changes, lane repurposing, new curb rules, or emergency detours?
Resolution target: Do you need lane-level detail at an intersection, or corridor-level performance across a district?
Time horizon: Are you modeling peak 15-minute surges, all-day patterns, or seasonal shifts?
Mode mix: Will you include buses, freight, pedestrians, cyclists, micromobility, and ride-hail?
Outputs that matter: Queue spillback, bus delay, curb conflicts, near-miss risk proxies, or reliability impacts?

In-vehicle and roadside interfaces are also trending toward multimodal interaction, where touch, voice, and gesture patterns all shape driver attention and behavior. That affects how you interpret behavior signals inside a traffic simulation.

The core data categories you need (and what “good” looks like)

Think of traffic model inputs as five layers. You can start lean, but each missing layer forces you to “invent” reality with assumptions.

1) Network and geometry

This is your physical stage. If it is wrong, everything downstream is distorted.

Alignment: roadway network data must match the real map, including ramps, slip lanes, connectors, and frontage roads.
Lane logic: Capture lane counts, merges, tapers, bus lanes, HOV rules, and where queues actually form.
Turn constraints: Encode prohibited turns, time-based turn bans, and truck restrictions that reshape routing.
Grade and curvature: Critical for heavy vehicles, speed choice, and sightline-limited behavior.
Curb operations: Pick-up, drop-off, delivery bays, taxi stands, and loading rules. These often create the real bottlenecks.

Collection options that work well here include as-built drawings, city GIS, and targeted corridor capture using lidar or photogrammetry for complex nodes.

2) Demand and route choice

This is the volume and intent of movement. Demand is not just “cars per hour”. It is who is going where, when, and why.

Entry volumes: Use traffic counts at screenlines and gateways to anchor totals.
Turning structure: Use turning movement counts at key intersections to preserve directionality.
Mode layer: Capture multimodal demand so buses and pedestrians are not treated as noise.
Trip purpose proxies: Freight peaks, school surges, event load-ins, airport banks, and shift changes.

If you only have counts but not directional structure, your model will often calibrate by “forcing” the wrong routes.

3) Control and operations

Controls are the choreography. Two networks with the same geometry behave differently with different timing and policies.

Signal plans: Signal timing includes phases, splits, offsets, cycle lengths, and time-of-day plans.
Priority logic: Transit signal priority rules and emergency preemption sequences.
Ramp metering: Rates, queues, and override logic.
Variable speed and lane control: Where posted speed is not the actual operating rule.
Work zone rules: Temporary lane closures, reduced speeds, and dynamic merges.

For intersections, signal timing is often the highest leverage input because it determines saturation flow and queue dynamics.

4) Behavior, compliance, and interactions

This is where models become believable. Behavior is also where many teams overfit.

Speed choice: Build speed profiles from observed data, not posted limits.
Gap acceptance: Particularly important for unprotected turns and roundabouts.
Lane changing: Merge aggressiveness, weaving, and late lane changes near exits.
Compliance: Red light running risk, bus lane encroachment, yielding behavior at crossings.

Probe trajectories paired with detector inputs can support estimation of turning and queue states around signals, which is useful when direct counts are incomplete. (MDPI)

5) Context signals

These datasets explain the “why” behind anomalies.

Disruptions: Incident data for crashes, breakdowns, stalled vehicles, and lane-blocking events.
Weather: Weather data to explain speed drops, increased headways, and mode shifts.
Events: Stadium egress, concerts, conventions, strikes, or rail disruptions.
Policy changes: New curb rules, enforcement periods, and tolling changes.

Context is the difference between a model that matches a day and a model that matches reality.

How to collect traffic data in the real world

Collection is a design problem. The goal is coverage with traceability, not “more sensors”.

Build a collection plan around confidence, not convenience

Anchor points: Pick a small set of locations where you will always collect high quality truth data.
Expansion points: Use cheaper sources to fill gaps between anchors.
Validation windows: Define the exact days and times your model should reproduce.
Change log: Track roadway changes so your roadway network data stays current.

Practical collection methods by dataset

Counts and flows:
- Loops and radar: Strong for continuous traffic counts at fixed points.
- Video analytics: Using computer vision for flexible coverage and richer classification.
- Manual counts: Best for targeted turning movement counts when you need auditability.
Trajectories and travel times:
- Fleet telemetry: probe vehicle data from delivery fleets, buses, taxis, or maintenance vehicles.
- Smartphone GPS: Useful when privacy and consent are handled carefully.
- Connected vehicle feeds: When available, V2X-style signals can add timing and event context.
Signals and control:
- Controller logs: Pull signal timing directly from controllers and central systems.
- Field verification: Short site visits catch drift between configured and actual behavior.
Incidents and disruptions:
- Operations logs: Dispatch records, TMC feeds, service patrol logs.
- Crowd reports: Only if you can validate. Incident data needs confidence scores.
Network geometry:
- GIS and as-builts: Fast baseline.
- Survey capture: lidar scans for complex interchanges, terminals, and curb layouts.
- Ground truth drives: Use GNSS-tagged video for quick verification.

Make “mergeable time” a first-class requirement

Most simulation projects lose weeks to mismatched clocks. Fix that early.

Clock discipline: Normalize everything to a single timezone and timestamp format.
Sampling clarity: Document whether the data is per second, per minute, or per 15 minutes.
Map alignment: Use consistent map versions and stable link IDs.
Quality tags: Every dataset needs a confidence rating and known limitations.

This is where data governance stops being paperwork and starts being velocity.

Privacy and ethics are not optional

Mobility data often touches individuals, even when anonymized. Treat it like a core design constraint.

Minimization: Collect only what your traffic simulation needs.
Aggregation: Use spatial and temporal aggregation where possible.
Retention: Set deletion windows and access policies.
Contract clarity: Ensure rights to use, model, and share derived outputs.

That is privacy by design in practice.

Traffic data types vs collection methods for faster model build

Data type	What it supports in traffic simulation	Typical collection methods	Common pitfalls	“Good enough” target
Roadway network data	Topology, lane logic, routing	GIS, as-builts, lidar, field verification	Outdated lane rules, missing connectors	Correct lanes and turn rules at hotspots
Traffic counts	Flow calibration at screenlines	Loops, radar, video, manual	Overcounting, sensor dropouts	Stable totals by 15-min bins
Turning movement counts	Directional realism at nodes	Manual, video with computer vision	Misclassified turns, short samples	Peak hour with confidence bounds
Signal timing	Queues, delay, progression	Controller logs, central systems	Plans differ from reality	Verified time-of-day plans
Probe vehicle data	Speeds, routes, travel time reliability	Fleets, smartphones, connected vehicles	Bias toward certain users	Enough coverage on key links
Incident data	Non-recurrent congestion modeling	TMC feeds, ops logs, patrols	Missing minor events	Major lane-blocking events captured
Weather data	Speed and headway shifts	Public feeds, roadway sensors	Microclimate mismatch	Condition tags for modeled periods

A strong pipeline also treats data fusion as a continuous activity, not a one-time ETL task.

Applications In Mobility

When collected well, these traffic model inputs do more than predict queues. They let teams rehearse changes before they hit the street.

Intersection redesign: Use turning movement counts and signal timing to test protected turns, pedestrian phases, and queue spillback.

Corridor reliability: Combine probe vehicle data with traffic counts to measure and then improve travel time reliability across peak periods.

Transit hubs and terminals: Model curb churn, taxi staging, and platform access, then pair results with passenger guidance using ai avatars for wayfinding and service messaging.

Fleet training scenarios: Turn real routes and incidents into driver coaching and simulation exercises, building on approaches like those discussed in virtual driving simulation for fleets.

Navigation and disruption comms: Feed incident data and weather data into routing and traveler information workflows, aligned with systems like those described in ai navigation assistant integration.

Benefits

The payoff is not just smoother charts. It is faster decision cycles with fewer surprises after rollout.

Clarity: Better roadway network data reduces arguments over what is actually being modeled.
Credibility: Grounded traffic counts make results defensible in stakeholder reviews.
Precision: turning movement counts protect you from “right total, wrong direction” errors.
Responsiveness: Fresh incident data improves scenario planning for operations teams.
Efficiency: A disciplined data fusion pipeline cuts rework and calibration churn.
Trust: Strong data governance keeps partners aligned on what can be used and shared.

Considerations For Mobility Teams

A model is only as useful as its deployment reality. These are the friction points worth designing for upfront.

Scope discipline: Define where lane-level detail is needed, and where aggregate demand is enough.
Integration readiness: Ensure signal timing exports and mapping formats are stable and repeatable.
Bias controls: Understand who your probe vehicle data represents, and where it under-represents.
Validation plan: Set acceptance tests tied to queues, speeds, and travel time reliability, not just “overall fit”.
Operational handoff: Package assumptions, QA logs, and changes as part of data governance, not separate documentation.
Consent and safeguards: Implement privacy by design so collection can scale without reputational risk.

Future Outlook

The next phase of traffic simulation will feel less like a static model and more like a living instrument. Cities and operators are pushing toward digital twin workflows where simulations are synchronized with real-time feeds and updated continuously, rather than rebuilt for each study.

As sensing gets richer, data fusion will increasingly blend infrastructure detectors, computer vision, and V2X signals into a coherent operational picture. That same picture can also power human-facing interfaces. In-vehicle assistants, station kiosks, and control rooms will share the same ground truth, with a calm, multilingual layer for disruption messaging and emergency guidance. For mobility teams training for high-pressure moments, immersive scenario rehearsal becomes part of readiness, similar to what we explore in VR emergency response training.

Conclusion

Collecting data for traffic simulation is not a shopping list. It is a narrative build. You are reconstructing how people actually move, hesitate, merge, wait, and reroute when the system is stressed. The best models respect the street’s messy reality while staying structured enough to test change safely.

When your traffic model inputs are traceable, calibrated, and governed, simulation becomes a reliable place to make decisions. It can support infrastructure planning, operational resilience, fleet training, and passenger communication without relying on guesswork. Mimic Mobility sits at that intersection of real-time behavior, immersive simulation, and human-centered mobility interfaces, so the outputs are not just technical. They are deployable.

FAQs

What is the minimum data needed for traffic simulation?

At a minimum, you need roadway network data, baseline traffic counts, and some representation of control, like signal timing, where intersections matter. Without those, calibration becomes guesswork.

How do I choose between traffic counts and probe vehicle data?

Use traffic counts to anchor total volumes at fixed points. Use probe vehicle data to understand speeds, routes, and travel time reliability. Most strong models use both, then reconcile differences through data fusion.

How often should I update signal timing datasets?

Update whenever time-of-day plans change, construction modifies phases, or performance issues emerge. Even small offset changes can materially alter queues in traffic simulation.

How do I collect turning movement counts without a big field team?

Target the highest leverage intersections and use short, high-quality collection windows. Video with computer vision can scale, but you still need spot audits to validate classification.

What role does incident data play in planning models?

Incident data helps you model non-recurrent congestion and operational response. It is especially useful for scenario testing around closures, breakdowns, and event days.

How do I keep a digital twin from drifting away from reality?

Treat updates as continuous operations. Maintain versioned roadway network data, timestamped signal timing, and quality-scored feeds. Strong data governance is what keeps drift visible and correctable.

Does privacy by design limit what I can model?

It limits what you store and how you share it, but it does not block useful modeling. Aggregation, anonymization, and clear retention rules usually preserve what traffic simulation needs while reducing risk.

What is the biggest hidden risk in data fusion?

Time alignment. When timestamps, map versions, or sampling intervals are inconsistent, you can create a convincing but wrong “truth”. Build merge rules early and audit them often.