/
Engineering a Safer World Engineering a Safer World

Engineering a Safer World - PowerPoint Presentation

danika-pritchard
danika-pritchard . @danika-pritchard
Follow
384 views
Uploaded On 2017-08-30

Engineering a Safer World - PPT Presentation

Nancy Leveson MIT Outline Accident Causation in Complex Systems STAMP New Analysis Methods Hazard Analysis Accident Analysis Security Analysis Does it Work Evaluations Why We Need a New Approach to Safety ID: 583498

control safety aircraft system safety control system aircraft airport itp flight process constraints runway model controller analysis unsafe systems actions crew accident

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Engineering a Safer World" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Engineering a Safer World

Nancy LevesonMITSlide2

Outline

Accident Causation in Complex Systems: STAMPNew Analysis Methods

Hazard AnalysisAccident AnalysisSecurity AnalysisDoes it Work? EvaluationsSlide3

Why We Need a New Approach to Safety

Traditional safety engineering approaches developed for relatively simple electro-mechanical systemsAccidents in complex, software-intensive systems are changing their nature

Role of humans in systems is changingWe need new ways to deal with safety in complex systems

Without changing our patterns of thought, we will

not be able to solve the problems we created

with our current patterns of thought.”

Albert EinsteinSlide4

Accident Causality Models

Underlie all our efforts to engineer for safetyExplain why accidents occurDetermine the way we prevent and investigate accidents

May not be aware you are using one, but you areImposes patterns on accidents “All models are wrong, some models are useful” George BoxSlide5

Traditional Ways to Cope with Complexity

Analytic ReductionStatisticsSlide6

Analytic Reduction

Divide system into distinct parts for analysis Physical aspects

 Separate physical components Behavior  Events over timeExamine parts separatelyAssumes such separation does not distort phenomenon

Each component or subsystem operates independentlyAnalysis results not distorted when consider components separatelyComponents act the same when examined singly as when playing their part in the wholeEvents not subject to feedback loops and non-linear interactionsSlide7

Chain-of-Events Accident Causality Model

Explains accidents in terms of multiple events, sequenced as a forward chain over time.Simple, direct relationship between events in chain

Events almost always involve component failure, human error, or energy-related eventForms the basis for most safety engineering and reliability engineering analysis: e,g, FTA, PRA, FMECA, Event Trees, etc. and design:

e.g., redundancy, overdesign, safety margins, ….Slide8

Domino “Chain of events” Model

Event-based

Cargo door fails

Causes

Floor collapses

Causes

Hydraulics fail

Causes

Airplane crashes

© Copyright John Thomas 2013

DC-10:Slide9

Chain-of-events exampleSlide10

Accident with No Component Failures

Mars Polar LanderHave to slow down spacecraft to land safelyUse Martian gravity, parachute, descent engines (controlled by software)

Software knows landed because of sensitive sensors on landing legs. Cut off engines when determine have landed.But “noise” (false signals) by sensors generated when parachute opensSoftware not supposed to be operating at that time but software engineers decided to start early to even out load on processorSoftware thought spacecraft had landed and shut down descent enginesSlide11

Types of Accidents

Component Failure AccidentsSingle or multiple component failuresUsually assume random failure

Component Interaction AccidentsArise in interactions among componentsRelated to interactive and dynamic complexityBehavior can no longer be Planned

UnderstoodAnticipatedGuarded againstExacerbated by introduction of computers and softwareSlide12

A

B

C

Unreliable but not unsafe

Unsafe but not unreliable

Unreliable and unsafe

Confusing Safety and Reliability

Preventing Component or Functional

Failures is NOT Enough

Scenarios

Involving failures

Unsafe

scenariosSlide13

Analytic Reduction does not Handle

Component interaction accidentsSystemic factors (affecting all components and barriers)SoftwareHuman behavior (in a non-superficial way)

System design errorsIndirect or non-linear interactions and complexityMigration of systems toward greater risk over timeSlide14

14

Summary

The world of engineering is changing.

If safety engineering does not change with it, it will become more and more irrelevant.

Trying to shoehorn new technology and new levels of complexity into old methods does not workSlide15

Systems Theory

Developed for systems that areToo complex for complete analysisSeparation into (interacting) subsystems distorts the results

The most important properties are emergentToo organized for statisticsToo much underlying structure that distorts the statisticsDeveloped for biology (von Bertalanffy) and engineering (Norbert Weiner)Basis of system engineering and “System Safety”

Slide16

Systems Theory (2)

Focuses on systems taken as a whole, not on parts taken separatelyEmergent properties

Some properties can only be treated adequately in their entirety, taking into account all social and technical aspects “The whole is greater than the sum of the parts”These properties derive from relationships among the parts of the system

How they interact and fit togetherTwo pairs of ideasHierarchy and emergenceCommunication and controlSlide17

Emergent properties

(arise from complex interactions)

Process

Process components interact in

direct and indirect ways

Safety is an emergent propertySlide18

Controller

Controlling emergent properties

(e.g., enforcing safety constraints)

Process

Control Actions

Feedback

Individual component behavior

Component interactions

Process components interact in

direct and indirect waysSlide19

Controller

Controlling emergent properties

(e.g., enforcing safety constraints)

Process

Control Actions

Feedback

Individual component behavior

Component interactions

Process components interact in

direct and indirect ways

Air Traffic Control:

Safety

ThroughputSlide20

Controls/Controllers Enforce Safety Constraints

Power must never be on when access door openTwo aircraft must not violate minimum separation

Aircraft must maintain sufficient lift to remain airbornePublic health system must prevent exposure of public to contaminated water and food productsPressure in a offshore well must be controlledRunway incursions and operations on wrong runways or taxiways must be preventedSlide21

A Broad View of “Control”

Component failures and unsafe interactions may be “controlled” through design

(e.g., redundancy, interlocks, fail-safe design) or through processManufacturing processes and proceduresMaintenance processesOperations

or through social controlsGovernmental or regulatoryCulture InsuranceLaw and the courts

Individual self-interest (incentive structure)Slide22

There may be multiple controllers, processes, and levels of control

(with various types of communication between them)

Each controller enforces

specific constraints, whichtogether enforce the system level constraints (emergent

properties)

Controller

Controller

Controller

Controller

Controller

Physical Process 1

Physical Process 2Slide23

Controlled Process

Process

Model

Control

Actions

Feedback

Role of Process Models in Control

Controllers use a

process model

to determine control actions

Accidents often occur when the process model is incorrect

How could this happen?

Four types of unsafe control actions:

Control commands required for safety are not given

Unsafe ones are given

Potentially safe commands given too early, too late

Control stops too soon or applied too long

Controller

23

(Leveson, 2003); (Leveson, 2011)

Control

AlgorithmSlide24

Example

Safety

Control

StructureSlide25

Potential Accidents and Hazards for Aircraft

Accidents (Losses):Aircraft mid-air collisionUncontrolled collision with terrain

Aircraft collision with something on the groundPassenger injury due to wake turbulence or unsafe movement of aircraftHazards: Controlled aircraft violate minimum separation standards.Aircraft enters an unsafe atmospheric region.

Loss of controlled flight or airframe integrityAircraft enters unsafe attitude (excessive turbulence or pitch/roll/yaw that causes passenger injury but not necessarily aircraft loss).Aircraft enters restricted airspace without permission …Slide26

In-Trail Procedure (ITP)

Enables aircraft to achieve FL changes on a more frequent basis.Designed for oceanic and remote airspaces not covered by radar.

Permits climb and descent using new reduced longitudinal separation standards.Potential BenefitsReduced fuel burn and CO2 emissions via more opportunities to reach the optimum FL or FL with more favorable winds.

Increased safety via more opportunities to leave turbulent FL.But standard separation requirements not met during maneuverSlide27

Example High-Level Control Structure for ITP

CONSTRAINTS:

Enforce minimum separation

Maximize throughput

Minimize fuel burnSlide28
Slide29

STAMP:

System-Theoretic Accident Model and Processes

A new accident causality model based on Systems Theory

(vs. Reliability Theory)Slide30

STAMP: Safety as a Control Problem

Safety is an emergent property that arises when system components interact with each other within a larger environmentA set of constraints related to behavior of system components (physical, human, social) enforces that property

Accidents occur when interactions violate those constraints (a lack of appropriate constraints on the interactions)Goal is to control the behavior of the components and systems as a whole to ensure safety constraints are enforced in the operating system. Slide31

Safety as a Dynamic Control Problem

Examples

O-ring did not control propellant gas release by sealing gap in field joint of Challenger Space Shuttle

Software did not adequately control descent speed of Mars Polar Lander

At Texas City, did not control the level of liquids in the ISOM tower;

In Deep Water Horizon, did not control the pressure in the well;

Financial system did not adequately control the use of financial instrumentsSlide32

Safety as a Dynamic Control Problem

Events are the result of the inadequate control

Result from lack of enforcement of safety constraints in system design and operationsSystems are dynamic processes that are continually changing and adapting to achieve their goalsA change in emphasis:“prevent failures”

“enforce safety constraints on system behavior” Slide33

Changes to Analysis Goals

Hazard analysis: Ways that safety constraints might not be enforced (vs. chains of failure events leading to accident)Accident Analysis (investigation)

Why control structure was not adequate to prevent loss (vs. what failures led to loss and who responsible)Security AnalysisPotential weaknesses in security controls (vs. threat analysis)Slide34

Systems ThinkingSlide35

STAMP: Theoretical Causality Model

Accident/Event Analysis

CAST

Hazard Analysis

STPA

System Engineering

(e.g.,

Specification

,

Safety-Guided Design, Design Principles)

Specification Tools

SpecTRM

Risk Management

Operations

Management Principles/

Organizational Design

Identifying Leading

Indicators

Organizational/Cultural

Risk Analysis

Tools

Processes

Regulation

Security Analysis

STPA-SecSlide36

Outline

Accident Causation in Complex Systems: STAMPNew Analysis Methods

Hazard AnalysisAccident AnalysisSecurity Analysis

Does it Work? EvaluationsSlide37

STPA: System-Theoretic Process Analysis

Integrated into system engineeringCan be used from beginning of projectSafety-guided design

Guidance for evaluation and testIncident/accident analysisWorks also on social and organizational aspects of systemsGenerates system and component safety requirements (constraints)Identifies flaws in system design and scenarios leading to violation of a safety requirement (i.e., a hazard)Slide38

ITP Procedure – Step by Step

Check that ITP criteria are met.If ITP is possible, request ATC clearance via CPDLC using

ITP phraseology.Check that there are no blocking aircraft other than Reference Aircraft in the ITP request.Check that ITP request is applicable (i.e. standard request not sufficient) and compliant with ITP phraseology.

Check that ITP criteria are met.If all checks are positive, issue ITP clearance via CPDLC.

Flight Crew

Air Traffic Controller

If ITP criteria are still met, accept ITP

clearance via CPDLC

When ITP clearance is received, check that ITP criteria are still met.

9. Execute ITP clearance without delay

10. Report when established at cleared FL

Involves multiple aircraft, crew, communications (ADS-B, GPS) ,

ATCSlide39

High-Level Control Structure for ITPSlide40

Pilot Responsibilities and Process Model

Responsibilities:Assess whether ITP appropriateCheck if ITP criteria are met

Request ITPReceive ITP approvalRecheck criteriaExecute flight level changeConfirm new flight level to ATCProcess Model

Own ship climb/descend capability ADS-B data for nearby aircraft (velocity, position, orientation)ITP criteria (speed, distance, relative attitude, similar track, data quality)State of ITP request/approvaletc.Slide41

Step 1 for Pilot

Control Action

Not providing causes hazard

Providing causes hazard

Too early/too late, wrong order

Stopped too soon/ applied too long

Execute

ITP

Maneuver

Pilot

Aircraft

Execute ITP

maneuver

A/C status, position, etc.Slide42

Potentially Hazardous Control Actions

by the Flight Crew

Control Action

Not Providing Causes Hazard

Providing Causes Hazard

Wrong Timing/Order

Causes Hazard

Stopped Too Soon/Applied Too Long

Execute ITP

ITP executed when not approved

ITP executed when ITP criteria are not satisfied

ITP executed with incorrect climb rate, final altitude, etc

ITP executed too soon before approval

ITP executed too late after reassessment

ITP aircraft levels off above requested FL

ITP aircraft levels off below requested FL

Abnormal Termination of ITP

FC continues with maneuver in dangerous situation

FC aborts unnecessarily

FC does not follow regional contingency procedures while aborting

 Slide43

High Level Constraints on Flight Crew

The flight crew must not execute the ITP when it has not been approved by ATC.The flight crew must not execute an ITP when the ITP criteria are not satisfied.

The flight crew must execute the ITP with correct climb rate, flight levels, Mach number, and other associated performance criteria.The flight crew must not continue the ITP maneuver when it would be dangerous to do so.The flight crew must not abort the ITP unnecessarily. (Rationale: An abort may violate separation minimums)

When performing an abort, the flight crew must follow regional contingency procedures.The flight crew must not execute the ITP before approval by ATC.The flight crew must execute the ITP immediately when approved unless it would be dangerous to do so.The crew shall be given positive notification of arrival at the requested FLSlide44

Potentially Hazardous Control

Actions for ATC

Control Action

Not Providing Causes Hazard

Providing Causes Hazard

Wrong Timing/Order Causes Hazard

Stopped Too Soon or Applied Too Long Causes Hazard

Approve ITP request

Approval given when criteria are not met

Approval given to incorrect aircraft

Approval given too early

Approval given too late

Deny ITP request

 

Abnormal Termination Instruction

Aircraft should abort but instruction

not given

Abort instruction given when abort is not necessary

Abort instruction given too late

 Slide45

High-Level Constraints on ATC

Approval of an ITP request must be given only when the ITP criteria are met.Approval must be given to the requesting aircraft only.Approval must not be given too early or too late [needs to be clarified as to the actual time limits]

An abnormal termination instruction must be given when continuing the ITP would be unsafe.An abnormal termination instruction must not be given when it is not required to maintain safety and would result in a loss of separation.An abnormal termination instruction must be given immediately if an abort is required.Slide46

Steps in STPA

Establish foundation for analysisDefine “accident” for your systemDefine hazardsRewrite hazards as constraints on system designDraw preliminary (high-level) safety control structure

Step 1: Identify potentially unsafe control actions (high-level safety requirements and constraints)Step 2: Determine how each potentially hazardous control action could occurSlide47

STPA Step 2

47

Inadequate Control Algorithm

(Flaws in creation, process changes, incorrect modification or adaptation)

Controller

Process Model

(inconsistent, incomplete, or incorrect)

Control input or external information wrong or missing

Actuator

Inadequate operation

Inappropriate, ineffective, or missing control action

Sensor

Inadequate operation

Inadequate or missing feedback

Feedback Delays

Component failures

Changes over time

Controlled Process

Unidentified or out-of-range disturbance

Controller

Process input missing or wrong

Process output contributes to system hazard

Incorrect or no information provided

Measurement inaccuracies

Feedback delays

Delayed operation

Conflicting control actions

Missing or wrong communication with another controller

ControllerSlide48

Example Causal Analysis

Unsafe control action: Pilot executes maneuver when criteria not metPossible Causes?Thinks criteria met (incorrect process model)Inadequate feedback provided by ITP boxFeedback delayed or corruptedReceives incorrect info from ATC or ADS-B

ATC thinks criteria are met and safe to perform maneuver but it is not…Slide49

Is it Practical?

STPA has been or is being used in a large variety of industriesSpacecraftAircraft Air Traffic Control

UAVs (RPAs)Defense Automobiles (GM, Ford, Nissan?)Medical Devices and Hospital SafetyChemical plantsOil and GasNuclear and Electrical PowerC02

Capture, Transport, and StorageEtc.Slide50

Analysis of the management structure of the space shuttle program (post-Columbia)

Risk management in the development of NASA’s new manned space program (Constellation) NASA Mission control ─ re-planning and changing mission control procedures safelyFood safetySafety in pharmaceutical drug development

Risk analysis of outpatient GI surgery at Beth Israel Deaconess Hospital Analysis and prevention of corporate fraud

Social and Managerial

Is it Practical? (2)Slide51

Does it Work?

Most of these systems are very complex (e.g., the U.S. Missile Defense System)In all cases where a comparison was made:

STPA found the same hazard causes as the old methodsPlus it found more causes than traditional methodsIn some evaluations, found accidents that had occurred that other methods missedCost was orders of magnitude less than the traditional hazard analysis methodsSlide52

Automating STPA (Step 1): John Thomas)

52

Requirements can be derived automatically (with some user guidance) using mathematical foundation

Allows automated completeness/consistency checking

Hazardous Control Actions

Discrete Mathematical Representation

Predicate calculus /state machine structure

Formal (model-based) requirements specification

HazardsSlide53

Others (that we are doing)

Automating Step 2Leading IndicatorsSophisticated Human Factors AnalysisFeature Interactions (Automobiles)

Safety Management SystemsSafety-Guided Development (Design)Slide54
Slide55

Conops

Identify:

--

Missing, inconsistent, conflicting information -- Vulnerabilities, risks, tradeoffs -- Potential design or architectural solutions to hazardsDemonstrating on TBO (Trajectory Based Operations)Cody FlemingSlide56
Slide57

Others (that we are doing)

Changes in Complex SystemsAir Traffic Control (NextGen)ITPInterval Management (IMS)

UAS (unmanned aircraft systems) in national airspaceWorkplace (Occupational) SafetySome Current ApplicationsHospital Patient SafetyFlight Test (Air Force)Security in aircraft networks

Defense systemsSlide58

Human Factors in Hazard AnalysisSlide59

Adding Human Factors to Hazard AnalysisSlide60

Outline

Accident Causation in Complex Systems: STAMPNew Analysis Methods

Hazard AnalysisAccident AnalysisSecurity Analysis

Does it Work? EvaluationsSlide61

Common Traps in Understanding Accident Causes

Root cause seduction and oversimplificationNarrow views of human errorHindsight bias

Focus finding someone or something to blameSlide62

Root Cause Seduction

Assuming there is a root cause gives us an illusion of control.Usually focus on operator error or technical failuresIgnore systemic and management factors

Leads to a sophisticated “whack a mole” gameFix symptoms but not process that led to those symptomsIn continual fire-fighting modeHaving the same accident over and overSlide63
Slide64

Oversimplification of Causes

Almost always there is:Operator “error”Flawed management decision makingFlaws in the physical design of equipment

Safety culture problemsRegulatory deficienciesEtc.Need to determine why safety control structure was ineffective in preventing the loss.Slide65

Blame is the Enemy of Safety

Two possible goals for an accident investigation:Find who to blameUnderstand why occurred so can prevent in future

Blame is a legal or moral concept, not an engineering oneFocus on blame can:Prevent openness during investigationLead to finger pointing and cover upsLead to people not reporting errors and problems before accidentsSlide66

Human Error: Traditional

ViewOperator error is cause of most incidents and accidents

So do something about human involved (fire them, retrain, admonish) Or do something about humans in generalMarginalize them by putting in more automationRigidify their work by creating more rules and proceduresSlide67

Fumbling for his recline button Ted

unwittingly instigates a disasterSlide68

Human Error: Systems View

(Sydney Dekker, Jens Rasmussen, Leveson)

Human error is a symptom, not a causeAll behavior affected by context (system) in which occursTo do something about error, must look at system in which people work:Design of equipmentUsefulness of proceduresExistence of goal conflicts and production pressures

Human error is a sign that a system needs to be redesignedSlide69

Sidney Dekker, 2009

Hindsight BiasSlide70

Overcoming Hindsight Bias

Assume nobody comes to work to do a bad job.Assume were doing reasonable things given the complexities, dilemmas, tradeoffs, and uncertainty surrounding them.

Simply finding and highlighting people’s mistakes explains nothing. Saying what did not do or what should have done does not explain why they did what they did.Investigation reports should explainWhy it made sense for people to do what they did

rather than judging them for what they allegedly did wrong and What changes will reduce likelihood of happening againSlide71

ComAir 5191 (Lexington) Sept. 2006

Analysis using CAST by Paul Nelson,

ComAir pilot and human factors expert

(for report: http://sunnyday.mit.edu/papers/nelson-thesis.pdfSlide72

Identify Hazard and Safety Constraint Violated

Accident: death or injury, hull lossSystem hazard: Operation on wrong runways or taxiways.

System safety constraint: The safety control structure must prevent operations on wrong runways or taxiways Goal: Figure out why the safety control structure did not do thisSlide73

Start with Physical System (Aircraft)

Failures: NoneUnsafe InteractionsTook off on wrong runwayRunway too short for that aircraft to become safely airborne Then add controller of aircraft to determine why on that runwaySlide74

Aircraft

Flight CrewSlide75

Component Analysis in CAST

Safety responsibilities/constraintsUnsafe control actionsWhy?Mental/process model flawsContextual/environmental influencesSlide76

5191 Flight Crew

Safety Requirements and Constraints:

Operate the aircraft in accordance with company procedures, ATC clearances and FAA regulations.Safely taxi the aircraft to the intended departure runway.

Take off safely from the planned runway.Unsafe Control Actions:Taxied to runway 26 instead of continuing to runway 22.

Did not use the airport signage to confirm their position short of the runway.Did not confirm runway heading and compass heading matched.40 second conversation violation of “sterile cockpit”Slide77
Slide78

Mental Model Flaws:

Believed they were on runway 22 when the takeoff was initiated.

Thought the taxi route to runway 22 was the same as previously experienced.Believed their airport chart accurately depicted the taxi route to runway 22.Believed high-threat taxi procedures were unnecessary

Believed “lights were out all over the place” so the lack of runway lights was expectedSlide79

Context in Which Decisions Made:

No communication that the taxi route to the departure runway was different than indicated on the airport diagram

No known reason for high-threat taxi proceduresDark outComair had no specified procedures to confirm compass heading with runwaySleep loss fatigue

Runways 22 and 26 looked very similar from that positionComair in bankruptcy, tried to maximize efficiencyDemanded large wage concessions from pilotsEconomic pressures a stressor and frequent topic of conversation for pilotsSlide80

The Airport Diagram

What The Crew had

What the Crew NeededSlide81

Federal Aviation Administration

Comair: Delta Connection

Airport Safety & Standards District Office

LEX ATC Facility

National Flight Data Center

Jeppesen

5191 Flight Crew

Certification, Regulation, Monitoring & Inspection

Procedures, Staffing, Budget

Aircraft Clearance and Monitoring

Charts, NOTAM Data

(except “L”) to Customer

Read backs, Requests

Local NOTAMs

Reports, Project Plans

NOTAM Data

Chart Discrepancies

ATIS & “L” NOTAMs

Operational Reports

ALPA

Safety ALR

Airport

Diagram

Airport Diagram Verification

Optional construction signage

= missing feedback lines

Certification, Inspection, Federal Grants

Composite Flight Data, except “L” NOTAM

Graphical Airport Data

ATO: Terminal Services

Pilot perspective information

Construction information

Blue Grass Airport Authority

Procedures & Standards

Flight release, Charts etc.

NOTAMs except “L”

IOR, ASAP

Reports

Certification & RegulationSlide82

Comair (Delta Connection) Airlines

Safety Requirements and ConstraintsResponsible for safe, timely tranport of passengers within their established route system

Ensure crews have available all necessary information for each flightFacilitate a flight deck environment that enables crew focus on flight safety actions during critical phases of flightDevelop procedures to ensure proper taxi route progression and runway confirmationUnsafe Control Actions

:Internal processes did not provide LEX local NOTAM on the flight release, even though it was faxed to Comair from LEXIn order to advance corporate strategies, tactics were used that fostered work environment stress precluding crew focus ability during critical phases of flight.

Did not develop or train procedures for take off runway confirmation.Slide83

Comair (2)

Process Model Flaws:Trusted the ATIS broadcast would provide local NOTAMs to crews

Believed tactics promoting corporate strategy had no connection to safetyBelieved formal procedures and training emphasis of runway confirmation methods were unnecessary

Context in Which Decisions Made:In bankruptcy.Slide84

Blue Grass Airport Authority (LEX)

Safety Requirements and Constraints: Establish and maintain a facility for the safe arrival and departure of aircraft to service the community. Operate the airport according to FAA certification standards, FAA regulations (FARs) and airport safety bulletin guidelines (ACs).

Ensure taxiway changes are marked in a manner to be clearly understood by aircraft operators. Slide85

Airport Authority

Unsafe Control Actions: Relied solely on FAA guidelines for determining adequate signage during construction. Did not seek FAA acceptable options other than NOTAMs to inform airport users of the known airport chart inaccuracies. Changed taxiway A5 to Alpha without communicating the change by other than minimum signage.

Did not establish feedback pathways to obtain operational safety information from airport users.Slide86

Airport Authority

Process Model Flaws: Believed compliance with FAA guidelines and inspections would equal adequate safety. Believed the NOTAM system would provide understandable information about inconsistencies of published documents.

Believed airport users would provide feedback if they were confused. Context in Which Decisions Made: The last three FAA inspections demonstrated complete compliance with FAA regulations and guidelines. Last minute change from Safety Plans Construction Document phase III implementation plan.Slide87

Airport Safety & Standards Office

Safety Requirements and Constraints: Establish airport design, construction, maintenance, operational and safety standards and issue operational certificates accordingly. Ensure airport improvement project grant compliance and release of grant money accordingly.

Perform airport inspections and surveillance. Enforce compliance if problems found. Review and approve Safety Plans Construction Documents in a timely manner, consistent with safety. Assure all stake holders participate in developing methods to maintain operational safety during construction periods. Slide88

Unsafe Control Actions: The FAA review/acceptance process was inconsistent, accepting the original phase IIIA (Paving and Lighting) Safety Plans Construction Documents and then rejecting them during the transition between phases II and IIIA.

Did not require all stake holders (i.e. a Pilot representative was not present) be part of the meetings where methods of maintaining operational safety during construction were decided. Focused on inaccurate runway length depiction without consideration of taxiway discrepancies.

Did not require methods in addition to NOTAMs to assure safety during periods of construction when difference between LEX Airport physical environment and LEX Airport charts.

Airport Safety & Standards OfficeSlide89

Airport Safety & Standards Office

Process Model Flaws Did not believe pilot input was necessary for development of safe surface movement operations. No recognition of negative effects of changes on safety. Belief that the accepted practice of using NOTAMs to advise crews of charting differences was sufficient for safety.

Context in Which Decisions Made: Priority to keep Airport Facility Directory accurate. Slide90

Standard and Enhanced Hold Short MarkingsSlide91

LEX Controller Operations

Safety Requirements and ConstraintsContinuously monitor all aircraft in the jurisdictional airspace and insure clearance compliance.

Continuously monitor all aircraft and vehicle movement on the airport surface and insure clearance compliance.Clearances will clearly direct aircraft for safe arrivals and departures.Clearances will clearly direct safe aircraft and vehicle surface movement.All Local NOTAMs will be included on the ATIS broadcast.Unsafe Control Actions

Issued non-specific taxi instructions; i.e. “Taxi to runway 22” instead of “Taxi to runway 22 via Alpha, cross runway 26”.Did not monitor and confirm 5191 had taxied to runway 22.Issued takeoff clearance while 5191 was holding short of the wrong runway.Did not include all local NOTAMs on the ATISSlide92

Mental Model Flaws

Hazard of pilot confusion during North end taxi operations was unrecognized.Believed flight 5191 had taxied to runway 22.Did not recognize personal state of fatigue.

Context in Which Decisions MadeSingle controller for the operation of Tower and Radar functions.The controller was functioning at a questionable performance level due to sleep loss fatigueFrom control tower, thresholds of runways 22 and 26 appear to overlapSlide93

LEX Air Traffic Control Facility

Safety Requirements and ConstraintsResponsible for the operation of Class C airspace at LEX airport.

Schedule sufficient controllers to monitor all aircraft with in jurisdictional responsibility; i.e. in the air and on the ground.Unsafe Control ActionsDid not staff Tower and Radar functions separately.Used the fatigue inducing 2-2-1 schedule rotation for controllers.Slide94

LEX Air Traffic Control Facility (2)

Mental Model FlawsBelieved “verbal” guidance requiring 2 controllers was merely a preferred condition.Controllers would manage fatigue resulting from use of the 2-2-1 rotating shift.

Context in Which Decisions MadeRequests for increased staffing were ignored.Overtime budget was insufficient to make up for the reduced staffing.Slide95

Air Traffic Organization: Terminal Services

Safety Requirements and ConstraintsEnsure appropriate ATC Facilities are established to safely and efficiently guide aircraft in and out of airports.

Establish budgets for operation and staffing levels which maintain safety guidelines.Ensure compliance with minimum facility staffing guidelines.Provide duty/rest period policies which ensure safe controller performance functioning ability.Unsafe Control ActionsIssued verbal guidance that Tower and Radar functions were to be separately manned, instead of specifying in official staffing policies.

Did not confirm the minimum 2 controller guidance was being followed.Did not monitor the safety effects of limiting overtime.Slide96

Process Model Flaws

Believed “verbal” guidance (minimum staffing of 2 controllers) was clear.Believed staffing with one controller was rare and if it was unavoidable due to sick calls etc., that the facility would coordinate the with Air Route Traffic Control Center (ARTCC) to control traffic.Believed limiting overtime budget was unrelated to safety.

Believed controller fatigue was rare and a personal matter, up to the individual to evaluate and mitigate.Context in Which Decisions MadeBudget constraints.Air Traffic controller contract negotiations.

FeedbackVerbal communication during quarterly meetings.No feedback pathways for monitoring controller fatigue.Slide97

Federal Aviation Administration

Comair: Delta Connection

Airport Safety & Standards District Office

LEX ATC Facility

National Flight Data Center

Jeppesen

5191 Flight Crew

Certification, Regulation, Monitoring & Inspection

Procedures, Staffing, Budget

Aircraft Clearance and Monitoring

Charts, NOTAM Data

(except “L”) to Customer

Read backs, Requests

Local NOTAMs

Reports, Project Plans

NOTAM Data

Chart Discrepancies

ATIS & “L” NOTAMs

Operational Reports

ALPA

Safety ALR

Airport

Diagram

Airport Diagram Verification

Optional construction signage

= missing feedback lines

Certification, Inspection, Federal Grants

Composite Flight Data, except “L” NOTAM

Graphical Airport Data

ATO: Terminal Services

Pilot perspective information

Construction information

Blue Grass Airport Authority

Procedures & Standards

Flight release, Charts etc.

NOTAMs except “L”

IOR, ASAP

Reports

Certification & RegulationSlide98

Jeppesen

Safety Requirements and Constraints Creation of accurate aviation navigation charts and information data for safe operation of aircraft in the NAS. Assure Airport Charts reflect the most recent NFDC data

Unsafe Control Actions Insufficient analysis of the software which processed incoming NFDC data to assure the original design assumptions matched those of the application. Not making available to the NAS Airport structure the type of information necessary to generate the 10-8 “Yellow Sheet” airport construction chart. Slide99

Jeppesen (2)

Process Model FlawsBelieved Document Control System software always generated notice of received NFDC data requiring analyst evaluation. Any extended airport construction included phase and time data as a normal part of FAA submitted paper work.

Context in Which Decisions MadeThe Document Control System software generated notices of received NFDC data. Preferred Chart provider to airlines. FeedbackCustomer feedback channels are inadequae for providing information about charting inaccuracies. Slide100

National Flight Data Center

Safety Requirements and ConstraintsCollect, collate, validate, store, and disseminateaeronautical information detailing the physical description and operational status of all components of the National Airspace System (NAS). Operate the US NOTAM system to create, validate, publish and disseminate NOTAMS.

Provide safety critical NAS information in a format which is understandable to pilots. NOTAM dissemination methods will ensure pilot operators receive all necessary information. Slide101

Unsafe Control ActionsDid not use the FAA Human Factors Design Guide principles to update the NOTAM text format.

Limited dissemination of local NOTAMs (NOTAM-L). Used multiple and various publications to disseminate NOTAMs, none of which individually contained all NOTAM information.Process Model Flaws:

Believed NOTAM system successfully communicated NAS changes. Context in Which Decisions MadeThe NOTAM systems over 70 year history of operation. Format based on teletypes Coordination: No coordination between FAA human factors branch and the NFDC for use of HF design principle for NOTAM format revision.Slide102

Federal Aviation Administration

Safety Requirements and ConstraintsEstablish and administer the National Aviation Transportation System. Coordinate the internal branches of the FAA, to monitor and enforce compliance with safety guidelines and regulations.

Provide budgets which assure the ability of each branch to operate according to safe policies and procedures. Provide regulations to ensure safety critical operators can function unimpaired. Provide and require components to prevent runway incursions. Slide103

Unsafe Control Actions:

Controller and Crew duty/rest regulations were not updated to be consistent with modern scientific knowledge about fatigue and its causes. Required enhanced taxiway markings at only 15% of air carrier airports: those with greater than 1.5 million passenger enplanements per year. Mental Model Flaws

Enhanced taxiway markings unnecessary except for the largest US airports. Crew/controller duty/rest regulations are safe. Context in Which Decisions Made FAA funding battles with the US congress. Industry pressure to leave duty/rest regulations alone. Slide104

NTSB “Findings”

Probable Cause:FC’s failure to use available cues and aids to identify the airplane’s location on the airport surface during taxiFC’s failure to cross-check and verify that the airplane was on the correct runway before takeoff.

Contributing to the accident were the flight crew’s non-pertinent conversation during taxi, which resulted in a loss of positional awareness, Federal Aviation Administration’s (FAA) failure to require that all runway crossings be authorized only by specific air traffic control (ATC) clearances.Slide105

Communication Links Theoretically in

Place in Uberlingen AccidentSlide106

Communication Links Actually in PlaceSlide107

CAST (Causal Analysis using System Theory)

Identify system hazard violated and the system safety design constraintsConstruct the safety control structure as it was designed to work

Component responsibilities (requirements)Control actions and feedback loopsFor each component, determine if it fulfilled its responsibilities or provided inadequate control.If inadequate control, why? (including changes over time)Context Process Model FlawsSlide108

CAST (2)

For humans, why did it make sense for them to do what they did (to reduce hindsight bias)Examine coordination and communicationConsider dynamics and migration to higher risk

Determine the changes that could eliminate the inadequate control (lack of enforcement of system safety constraints) in the future.Generate recommendationsSlide109

CAST (3)

Continuous ImprovementAssigning responsibility for implementing recommendationsFollow-up to ensure implementedFeedback channels to determine whether changes effective If not, why not?Slide110

Conclusions

The model used in accident or incident analysis determines what we what look for, how we go about looking for “facts”, and what facts we see as relevant.

A linear chain of events promotes looking for something that broke or went wrong in the proximal sequence of events prior to the accident.

In accidents where nothing physically broke, then currently we look for operator error.

Unless we look further, we limit our learning and almost guarantee future accidents related to the same factors.

Goal is to learn how to improve the safety control structure

\Slide111

Evaluating CAST on Real Accidents

Used on many types of accidentsAviationTrainsChemical plants and off-shore oil drillingRoad Tunnels

Medical devices Etc.All CAST analyses so far have identified important causal factors omitted from official accident reportsSlide112

Evaluations (2)

Jon Hickey, US Coast Guard applied to aviation training accidents US Coast Guard currently uses HFACS (based on Swiss Cheese Model)Spate of recent accidents but couldn’t find any common factors

Using CAST, found common systemic factors not identified by HFACSUSCG now in process of adopting CASTDutch Safety Agency using it on a large variety of accidents (aircraft, railroads, traffic accidents, child abuse, medicine, airport runway incursions, etc.)Slide113

Outline

Accident Causation in Complex Systems: STAMPNew Analysis Methods

Hazard AnalysisAccident AnalysisSecurity AnalysisDoes it Work? EvaluationsSlide114

Strategy vs. Tactics

Primarily focus on tacticsCyber security often framed as battle between adversaries and defenders (tactics)Requires correctly identifying attackers motives, capabilities, targetingCan reframe problem in terms of strategy

Identify and control system vulnerabilities (vs. reacting to potential threats)Top-down vs. bottom-up tactics approachTactics tackled laterSlide115

Integrated Approach to Safety and Security:

Safety: prevent losses due to unintentional actions by

benevolent actorsSecurity: prevent losses due to intentional actions by malevolent actorsKey difference is intent

Common goal: loss preventionEnsure that critical functions and services provided by networks and services are maintainedNew paradigm for safety will work for security tooMay have to add new causes, but rest of process is the sameA top-down, system engineering approach to designing safety and security into systemsSlide116

STPA-Sec Allows us to Address Security “Left of Design” [Bill Young]

Concept

Requirements

Design

Build

Operate

System Engineering Phases

Cost of Fix

Low

High

Attack

Response

System

Security

Requirements

Secure

Systems

Engineering

Cyber

Security

“Bolt-on”

Secure

Systems

Thinking

Abstract Systems

Physical Systems

Build security into system like safetySlide117

Real World Evaluation of STPA-Sec to Date

Demonstrated ability to identify previously unknown vulnerabilities in a global DoD missionCreated model based on actual planning documents

Demonstrated ability to identify high-level vulnerabilities in early system concept documentsRequired security constraints missing Demonstrated ability to improve ability of network defenders to assure a real-world space surveillance missionReal mission, Real mission owner, Real networkDefenders able to more precisely identify what to defend & why (e.g. set of servers

 integrity of a single file)Defenders able to provide traceability allowing non-cyber experts to better understand mission impact of cyber disruptions

117Slide118

It’s still hungry … and I’ve been stuffing worms into it all day.Slide119

Reason Swiss Cheese