Gameplay Networking in Halo Reach Who am I David Aldridge Lead Networking Engineer at Bungie Spent three years working on Halo Reach networking Ive been making games for a while What is Halo Reach ID: 142002
Download Presentation The PPT/PDF document "I Shot You First!" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
I Shot You First!
Gameplay Networking in
Halo: ReachSlide2
Who am I?David Aldridge, Lead Networking Engineer at
Bungie
Spent three years working on Halo: Reach networking
I’ve
been making games for a whileSlide3
What is Halo: Reach?[video]Slide4
Talk TakeawaysA proven architecture for scalable gameplay networking
How to design solid networking for your game mechanics
How to measure
and optimize your networkingSlide5
What is this talk NOT about?
Halo’s Campaign or Firefight networking
Sockets/low level networking
High level networking
MatchmakingRating & ranking systemsCreating and curating an online ecosystemSlide6
Bungie’s Gameplay Networking ArchitectureSlide7
What is gameplay networking?Communicating sufficient information to maintain a perceptually shared reality, while minimizing both bandwidth use and perceived violations of the integrity of the simulation (artifacts
)
OR: Technology to help multiple players sustain the belief that they are playing a fun game togetherSlide8
Common simplifying approaches1. Lockstep (a.k.a. deterministic, input-passing)
Common for games with a strict split between input and simulation (e.g. RTS), so input latency issues can be bypassed
Also common for ports of classic games (avoids game alterations)
2. Reliable transport protocols (TCP or homegrown)
Requires high bandwidth or simple networked stateTCP requires high latency tolerance3. Send all networked state as a single blob (atomically)E.g. Quake 3 modelWorks very well as long as the total networked state is not too largeSlide9
Halo has to solve the hard problemHighly competitive multiplayer action game
16 players, vehicles, hundreds of
replicated objects
No dedicated servers
Game is expected to work regardless of connection qualityFor N players, O(N2) data needs to be networkedSlide10
We can’t network everything!Slide11
TRIBES points the way“The TRIBES Engine Networking
Model”,
Frohnmayer
and Gift, GDC 1999
A host/client model, resilient to cheatingProtocols for semi-reliable data deliverySupports persistent state and transient eventsHighly scalable to match available bandwidthSlide12
Three Key TermsSlide13
Term: Replication
The communication of state or events to a remote peer
“Replicating an object” means causing it to be created and updated on a remote peer
A “replicated object” is one whose state is kept approximately in sync between peers
Our replication systems are the Application Layer of our network stackSlide14
Term: Authority
Permission to update the persistent state of an object
E.g. in Reach, the game host peer is
authoritative
over dealing damageSlide15
Term: Prediction
Extrapolating
the current properties of an entity based on historical authoritative data and local guesses about the future
A
predicted object is one which the local peer does not have full control over – this is the opposite of an authoritative objectSlide16Slide17
Bungie’s Networking Stack
Layer
Purpose
Game
Runs the gameGame InterfaceExtract and apply replicated dataPrioritizationRate the priority of all possible replication optionsReplication
Protocols with various reliability guaranteesChannel Manager
Flow
and congestion control
Transport
Send & receive on socketsSlide18
Let’s talk about gameplay
Layer
Purpose
Game
Runs the gameGame InterfaceExtract and apply replicated dataPrioritizationRate the priority of all possible replication options
ReplicationProtocols with various reliability guarantees
Channel Manager
Flow
and congestion control
Transport
Send & receive on socketsSlide19
Replication Protocol: State Data
Guaranteed eventual delivery of most current state,
host→client
only
Object positionObject healthTerritory capture timer~150 more propertiesSlide20
Replication Protocol: Events
Unreliable
notifications of transient
occurrences,
host→client and client→hostPlease fire my weapon This weapon was firedProjectile detonated~50 more eventsSlide21
Replication Protocol: Control data
High-frequency, best-effort
transmission of rapidly-updated
data extracted from player control inputs,
host→client and client→hostCurrent analog stick values for all players (host->client)Current position of client’s own biped (client->host)~15 more propertiesSlide22
Replication: The Big Picture
Host
Client
Control Data
“My biped is now at position x”
Events
“I just fired my primary weapon”
“I’d like to get into this warthog”Slide23
Replication: The Big Picture
Host
Client
Control Data
“This biped is now trying to strafe left”
State Data
“This object is now in position X”
“This warthog now has a broken windshield”
“All these broken warthog chunks now exist”
Events
“This weapon just fired”
“This warthog just took damage at this point”Slide24
Replication is never fully reliableUnreliability enables
aggressive
prioritization
,
which lets us handle the richness of our simulationFlow control layer decides when to send a packet, and what size it should beReplication writes data into the packet until fullThere is always more data than will fit, so we write high-priority data firstSlide25
PrioritizationPriority is based on client view and simulation state
Priority is calculated separately per-object per-client
Distance/direction is the core metric
Size & speed affect priority
Shooting & damage apply appropriate boostsLots of special cases (e.g. thrown grenades)Slide26
Prioritization exampleSlide27
Prioritization exampleSlide28
Prioritization example
0.50/1.00/0
Legend:
Final priority / relevance / desired update period (
ms
)
0.22/0.97/127Slide29
Prioritization example
Legend:
Final priority / relevance / desired update period (
ms
)0.19/0.73/339Slide30
Designing for Networking QualitySlide31
Throwing a grenade[video]Slide32
Single-box grenade throw
Controller
Single peer simulation
Player presses left trigger
Grenade throw animation begins
Release frame is reached, grenade object is detached from hand, aimed, and launched
Throw animation delaySlide33
Client grenade throw – attempt #1Send grenade throw request to host
Throw grenade locally when host confirmsSlide34
Client grenade throw – attempt #1
Client
Host
I’d like to throw a grenade
Grenade throw animation begins
Release frame is reached, throw grenade
Throw animation delay
Button press
Release frame is reached
Throw animation delay
Create grenade object
Start your throw animation
Throw animation starts
One-way latency, client to host
Here’s the lag!Slide35
Client grenade throw – attempt #2Throw a grenade locally.
Ask host to also throw a grenade.Slide36
Client grenade throw – attempt #2
Client
Host
I’ve begun a grenade throw
Grenade throw animation begins
Release frame is reached, throw grenade
Throw animation delay
Button press, grenade throw animation begins
Release frame is reached, throw grenade
Throw animation delay
Where is the lag?
There isn’t any!Slide37
Client grenade throw - actualPredict throw animation
But do not predict grenade release – wait for host
Grenades in flight are always real, and the host is authoritative over them
Where is the lag?Slide38
Client grenade throw - actual
Client
Host
I’ve begun a grenade throw
Grenade throw animation begins
Release frame is reached, delete grenade
Throw animation delay
Button press, grenade throw animation begins
Release frame is reached, delete grenade, aim throw
Throw animation delay
Please create a grenade aimed at X
Create grenade aimed at X, grenade appears
Grenade appears
Create grenade object,
pos
/
vel
Here’s the lag!Slide39
Results![video]Slide40
Trickier gameplay examplesSlide41
Armor Lock[video]Slide42
Armor Lock as a sequence diagram
Controller
Single peer simulation
Player presses equipment button
Intro animation begins
Intro completes, invulnerability begins
3 frames
Player releases equipment button
Invulnerability endsSlide43
Armor Lock networking, v1All animations & FX predicted by clients
T
his feels very responsive, no visible lag
But where is the lag?Slide44
V1 sequence diagram
Client
Host
I’ve activated my armor lock
Intro animation begins
Intro animation completes, player is invulnerable
3 frame delay
Button press, intro animation begins
Intro animation completes, player appears invulnerable
3 frame delay
Grenade explodes
Hey, this grenade just blew up, and you took damage
WTF I was armor locked!
Where is the lag?Slide45
Armor Lock, v2Animation
controlled by
client…
…but wait for host to tell you to show yourself as invincible
Where did we move the lag to?Slide46
V2 sequence diagram
Client
Host
I’ve activated my armor lock
Intro animation begins
Intro animation completes, player is invulnerable
3 frame delay
Button press, intro animation begins
Intro animation completes, no shield yet
3 frame delay
Grenade explodes
Grenade exploded, you’re damaged
WTF, why does my armor lock not work properly?
You’re invulnerable now, turn on the shield
fx
Here’s the lag!Slide47
Armor Lock, v3 – one last tweak
Client
Host
I’ve activated my armor lock
Intro animation begins
Invulnerability begins
(3-RTT) frame delay
Button press, intro animation begins
Intro animation completes, no shield yet
3 frame delay
Grenade explodes
Grenade exploded, but you’re fine
You’re invulnerable now, turn on the shield
fx
Intro animation ends
:-)Slide48
What just happened?Did we just cheat lag? Where did it go?Slide49
Armor Lock, v3
Client
Host
I’ve activated my armor lock
Intro animation begins
Invulnerability begins
(3-RTT) frame delay
Button press, intro animation begins
Intro animation completes, no shield yet
3 frame delay
Grenade explodes
Grenade exploded, but you’re fine
You’re invulnerable now, turn on the shield
fx
Intro animation ends
:-)Slide50
Results![video]Slide51
Example #3: Assassinations[video]Slide52
Assassinations2 bipeds are happily running along
Suddenly, we need to force them to perform a joint, synchronized animationSlide53
Assassinations, v1Local prediction
of participant positions & orientations
Worked great in in-house playtests & take-homes
Failed in the wilds of the public betaSlide54
Assassinations, v1 - issues[videos]Slide55
Assassinations, v1 - issuesAnimation didn’t always fit in the predicted positions on client machines
On completion, must resolve discrepancies for survivors Slide56
Assassinations, v2 - shippingAll peers (including participants) obey host strictly
No discrepancies on exit!
Visual-only object state is interpolated on the way in to the animationSlide57
Results! [video]Slide58
4 rules of gameplay networking
W
hich parts of your gameplay need to be adjudicated by a single authority?
Always ask: Where am I hiding the lag?
Don’t be afraid to change game mechanics to improve networkingReserve time to iterateSlide59
Measuring and OptimizingSlide60
Networking is a magnet for entropy
Invisible
system
with ever-growing complexity
Optimizations obscure original intent of systemsMay appear to work, but have lots of soft failures and inefficienciesHalo 3 games with 16 players were often laggyLet’s optimize!Slide61
Optimization is dangerousEasy to find an “obvious” architectural optimization, gain 1% efficiency, and introduce a week’s worth of bugs
Just
like CPU, don’t optimize without
good data!
“The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.” - Michael A. JacksonSlide62
Inspection tools are the key!
Deep inspection and analysis tools will help you identify the best optimizations
Think about the kind of tools you use for CPU performance optimizationSlide63
Tool: Profilers
We built profilers to track bandwidth use and priority calculation resultsSlide64
Profiler demo[video]Slide65
Tool: Films
Deterministic playback of gameplay sessions
Extraordinarily useful for debugging gameplay…
…but have never been very useful for network debugging
Network systems are idle during film playbackSlide66
Leveraging Films
Splice the
network profiler
data into the films
For the first time, we could analyze network performance after the fact+Slide67
Tool: Playtests
Network
perf
playtests, once a month during production
Simulate adverse network conditions with traffic shaping toolsSlide68
Tool: Playtests
How can we measure success in these playtests?
Allow players to report lag with a controller button!
Afterwards, investigate perceived lag events
Will also find confusing game mechanics!Slide69
Culmination![video]Slide70
Inspection of Halo 3 revealed…
50% positions/velocities/orientations
20% player control data
20% weapon firing, bullets, damage
10% otherWoohoo, let’s optimize the heavy hitters!Slide71
This was a false startHard to further optimize the encoding of positions, velocities, and orientations
Like seeing your math functions in your CPU profiles
Need to optimize at a higher levelSlide72
Good Optimizations IN ReachSlide73
Reducing always-on bandwidth useH
ost->client control replication accounted for 22% of all host upstream on Halo 3
Removed data that was duplicated in object state data
Removed data that clients didn’t need to know
Optimized some encoding (details in slide notes)Reduced bandwidth use by 60% (14% overall)Slide74
Fixing a prioritization bug
Problem: Idle grenades rolling around on the ground had incredibly high network priority
The cause was traced back… to a
bugfix
at the end of Halo 3! “Equipment” was given a huge priority boostFix: only apply priority boost to active equipmentSlide75
Changing game mechanics
Halo 3 used a constant artificial friction on items
Problem: Very slow descent on hills
Optimization: Fake friction!Slide76
Ragdoll networkingRagdolls are
difficult
and costly
to
network wellHey, why do we have to network ragdolls?Slide77
ShockSlide78
SkepticismSlide79
ConsiderationSlide80
Ragdoll networkingRagdolls are
difficult
and costly
to
network wellHey, why do we have to network ragdolls?2 challengesRagdolls block bulletsHumping2 fixesAllow bullets and grenades to penetrate ragdolls freelySync initial state of ragdollSlide81
Smoothing out bursts of bandwidth
Problems with high ROF weapons: bullets were networked optimally, but not the damage they caused!
Fix: Allow client prediction of some damage effects
Periodic update of game statistics data taking priority over gameplay traffic (on a protocol below replication)
Fix: Limit statistics data to <= 10% of each packetLow-priority objects getting updates in perfect syncFix: Limit objects that can take “panic” priority to N per packetSlide82
3 rules of network optimization
Measure twice, cut
once -
use
tools to guide your optimizationsDon’t focus on encoding & compression – look at the big pictureMake friends with your game mechanics designers and codersSlide83
Tidbits and The FutureSlide84
Numbers from Reach
250kbits/s
Minimum
t
otal upstream for the host of a solid 16 player game
675kbits/s
Maximum total upstream bandwidth use
from a single peer
45kbits/s
Maximum bandwidth sent to one client
from a host
1kbit/s
Host upstream
required to replicate one biped to one client at combat quality
10hz
Minimum packet rate for solid gameplay
100ms/200ms
Maximum latency for close-quarters gameplay for tournament/casual
133ms/300ms
Maximum latency for ranged
gameplay
for tournament/casualSlide85
Related best practices
Flow & congestion control
Connection
quality records
& smart host selectionHost migration - adding this late is hardA multiplayer beta or demoRegular internal playtests, with traffic shapingFull-time network testers, early and lateSlide86
More Resources
“Recreating The LAN Party Online”, Butcher
&
House, GDC 2005
“The TRIBES Engine Networking Model”, Frohnmayer & Gift, GDC 1999Play Reach!Slide87
AcknowledgementsMany people toiled to make Halo: Reach play as well as it does online, especially these guysSlide88
Kings Among Men
Nick Gerrone
Lead Network Tester
Paul Lewellen
Network EngineerSlide89
Additional KingsJon Cable
Sandbox Engineer
Luke Timmins
Lead of Networking and UISlide90
What’s next for Bungie?
Usability improvements to replication
Reducing boilerplate code
Extension of replication protocols to support one-off, low-bandwidth, complex use cases
I just want to network a state machine, I don’t want to get a PhD in replicationSlide91
What’s really next for Bungie?Slide92
Questions?
daldridge@bungie.com
www.bungie.net/careers
we’re hiring!Slide93
Bonus slidesThe talk proper was already too longSlide94
Basics of encodingFor rare things, and by default: write raw bits
For common things: limit range as much as possible, write only necessary bits (
bitstream
)
For floats: quantize to fixed pointFor positions and vectors: Do lots of work to compress these – limit domains, limit precision, think about temporal coherence, use googleSlide95
Packet rate vs. sizeMaximize packet rate
to minimize latency
Maximize packet
size
to maximize throughputGoals in direct tension…Ideally, maximize packet rate by default, but lower it as needed when simulation becomes too richSlide96
Problem: Networking new mechanics is hard with our replication systemsThis is somewhat intentional!
Ease of use is dangerous
Lots of safeguards ensure careful thought (but add implementation time)
We still get quick-and-dirty prototype networking that needs to be rewritten late, but we try to minimize the amount of itSlide97
Example of a bad optimization“Let’s classify all our networked object indices into contiguous buckets by object type so we can use fewer bits to refer to an object if the type is known on both ends, which is common”
Saved 1% of bandwidth - awesome
Cost over 30 hours of debugging/support over the course of the projectSlide98
What is “Lag”?Perceived delay or inconsistency
Caused by latency
Caused by bandwidth limitation
Caused by packet loss
Sometimes caused by game mechanicsSlide99
GlitchesGlitch: Colloquially,
a series of events that break or appear to break the rules or perceived rules of
the
game
There are 4 important classes of glitchesPerceived as wrong / real break of real rulePerceived as wrong / real rule, but not a real breakNot perceived as wrong / real break of a real rulePerceived breakage of a perceived ruleSlide100
Melee “Glitches”Conceptually melee is very simple
In practice it’s not; we had to make post-ship fixes to it in halo 2/3
Example: In Reach public beta, client melee strikes were sometimes (rarely) ignored by the hostSlide101
That’s all there isThere isn’t any more