/
Concepts for T&E of Autonomous Systems in Multi-Domain Operations Concepts for T&E of Autonomous Systems in Multi-Domain Operations

Concepts for T&E of Autonomous Systems in Multi-Domain Operations - PowerPoint Presentation

priscilla
priscilla . @priscilla
Follow
0 views
Uploaded On 2024-03-13

Concepts for T&E of Autonomous Systems in Multi-Domain Operations - PPT Presentation

20 July 2022 Charlie Middleton contractor Lenny Truett PhD contractor wwwAFITeduSTAT 9372553636 x 4486 Overview Motivation and Introduction to Autonomy TampE Potential Future Autonomy in MultiDomain Operations ID: 1046993

autonomy amp training test amp autonomy test training testing systems development system mission autonomous methods ads data flight challenges

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Concepts for T&E of Autonomous Syste..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Concepts for T&E of Autonomous Systems in Multi-Domain Operations20 July 2022Charlie Middleton, contractorLenny Truett, PhD, contractorwww.AFIT.edu/STAT937-255-3636 x 4486.

2. OverviewMotivation and Introduction to Autonomy T&EPotential Future Autonomy in Multi-Domain OperationsChallenges for T&E of Autonomous Defense SystemsMethods and Tools to Address ChallengesFrameworks for Applying Autonomy in MDO ScenariosDraft T&E Companion Guide for Autonomy and Follow-ons2

3. Advancing T&E of Autonomous SystemsMotivationWhy have Autonomous Systems required research and development of T&E methods? Autonomous systems have great interest and investment in the Department of Defense Specific capabilities at the core of Autonomy require innovative application of T&E methods include perception, reasoning, deciding, learning, teaming, and emergent behaviorAutonomy will require testing and evaluation of tasks that are traditionally performed by human operators; such as application of combat rules of engagement, threat avoidance and mitigation, risk management, mission integration in a large strike package, emergency and degraded systems management, reporting of critical intelligence events, and moreDue to their intelligence-based capabilities and coexistence with human actors, autonomous systems pose new challenges associated with proper T&E to ensure developmental and operational mission assurance that are more complex than typical weapons systemsMethods and processes to mitigate these challenges are limited and not readily available to the test community; some key efforts are stove-piped which causes duplication of effortPolicy and education is limited: DoDI 3000.09 “Autonomy in Weapon Systems” (7 pages) and DAU online course CLE 002 “Introduction to the T&E of Autonomous Systems” Autonomy has no clear, existing way of being tested and evaluated as traditional systems require an operator to perform them, whose performance is historically evaluated through training and proficiency, not through system T&EDTE&A leveraged expertise in T&E strategy and rigor at STAT Center of Excellence to expose and define autonomy T&E challenges, identify novel T&E methods, and evaluate and share T&E tools3

4. Autonomy Relation with AI and MLAutonomy refers to an agent or machine being delegated to perform a task – capable of independent operation without external controlAutonomy is NOT only about making systems “adaptive”, “intelligent”, “smart”, or “unmanned / uncrewed” -- autonomy is about making systems self-directed & self-sufficientAutonomous Defense Systems will need to be evaluated in a system and mission context, emphasizing measures of effectiveness to characterize the system’s integrated capabilities and limitations including software, hardware, and the synergies and/or disconnects between themArtificial Intelligence (AI) refers to the ability of machines to perform tasks that normally require human intelligence; AI may be an element in a system pursuing a goalAI contribution is evaluated primarily in terms of measures of performanceT&E of AI techniques or models under development will include broad characterization of the performance measures, without necessarily having an application in mind. This characterization of performance is an important stage in making AI components understood and available for inclusion in systems Machine learning refers to the ability of machines to learn from data without being explicitly programmed; subset of AI techniquesAI that does not utilize machine learning goes by different names, such as expert AI, rule-based AI, symbolic AI, domain ontology and reasoning, multi-agent planners, etc. AI & ML are integral to autonomy, so multiple portions of this guide discuss considerations for T&E of AI and ML as integrated into an autonomous defense systemAutonomous systems are sometimes referred to as “AI in motion” to differentiate from “AI at rest” 4

5. Potential Future Autonomy in Multi-Domain Operations5Autonomous UAV sensing airborneAutonomous USV sensing underseaAutonomous UUV with countersea munitionsRedland threatsAir, sea, underseaAutonomous C2 satellites providing commAutonomous UGV with counter CM munitionsBlueland defended locationsNotional Future Battlespace with Autonomous Defense Systems

6. Central Autonomy T&E ChallengesParadigm change from traditional, more segregated T&E of hardware processes to concept of continuous testing for autonomous system whose core is softwareHardware forms the basis for the physical features of the system -- more traditional T&E processes typically apply, such as the ‘waterfall’ alignment of T&E responsibilities Contractor T&E, DT&E, IOT&E, follow-on OT&E Software T&E community often (but not always) utilizes small, quick “sprints” of development, testing and fieldingAgile processes and development-for-security-and-operations (DevSecOps) construct, flow very rapidly from contractor T&E through DT&E, OT&E, and fielding, on a short development cycle, to enable rapid responsiveness to the user needs As autonomous defense systems will be comprised of both hardware and software development and capabilities, the practitioner must recognize that a hybrid approach may in some situations be needed necessary, where some test events occur on a pace with hardware development, but other test events occur at the higher frequency of software development. Autonomous system capabilities are new CONOPS for employment are new Human teaming coordination and control are new Adversary responses are new Strategic, operational, and tactical risks are newTest methods and tools are new Behavior and outcomes will be unexpected 6

7. Specific Autonomy T&E ChallengesData: Files use internal algorithms/states with vendor-specific formats; very large file sizes; simulation data is common; training data may not exhaustively cover real operations; simulations too big; metadata problems.Requirements and Measures: Autonomy requirements too broad or not defined; coverage impossible; measures of success (trust) not well defined; Concept of Operations (CONOPS) incomplete; sound behavior too subjective; decision systems highly abstract – not explainable; interoperability lacking.Infrastructure and Personnel: Autonomy experts are few; test experts do not have autonomy knowledge; lack of sophisticated software and testing tools; test ranges not prepared for autonomy needs; LVC support; instrumentation ad hoc.Exploitable Vulnerabilities: Large cyber attack surfaces; use of commercial/open-source code; too complex to exhaustively test; adversaries may implement Autonomy sooner.Safety: Decisions underlying routine safety taken out of the hands (and minds) of operators; Lack of T&E methods for complex software that allows the system to “operate” itself; . Very few active research programs today for applications with highly asymmetric hazard functions. Simulation: Modeling and simulation are critical components for mission assurance; In some instances, these techniques are the only ones available to test and evaluate a system; Sparse development of high-quality models and simulations to test system behavior.Test planning for complex adaptive systems requires all current T&E challenges to be met in addition to addressing new challenges associated with these AI-enabled systemsHuman Systems Teaming: Need to address the ability of any combination of humans and machines to perform as partners and to determine how to measure the effectiveness of that team; currently no standard for measuring trust in automated systems.Post-Acceptance Testing: Systems will continue to change their behavior over time; need to develop periodic regression testing and for predictive models of how post-fielding learning might affect system (or team) behavior.Design for Test: T&E of autonomous systems is made difficult by the system’s internal decision‐making process, which essentially operates like a “black box”; Need to develop T&E methods when the reasons for decision making are not transparent. Test Adequacy and Integration: Interaction not only with their environment but also with other systems, lead to emergent, unanticipated behaviors that are difficult for humans to track; Lack of T&E methods to analyze the potential for undesirable emergent behavior in order to avoid it. Testing Continuum: Requires testing throughout the system life cycle; Necessary to decide when, how, and what aspects of a system to test under initial assumptions; How to retest after learning has occurred. Testing for Ethical and Legal Requirements: Must be responsible, equitable, traceable, reliable, and governable and meet the same legal, ethical, and policy standards; Currently no T&E methodologies exist. 7

8. Specific features of MDO ADS that challenge T&EChallenging data needs from MDO scenariosAutonomous sensors using raw or processed data from other systemsUncertainty in post-processed information - represented and communicated ADS coordination - requesting more info or clarification of the ‘picture’ Independent ADS - each orient individually versus collective orientation Decision or action by one ADS affects cooperative ADS performance Key characteristics of effective and efficient T&ETest scenarios need to cover the MDO operational space to avoid extrapolationCoverage of all operational factors in MDO scenarios exceedingly enormousIsolating the root cause of a deficient outcome difficult with interdependent ADSAgile, continuous testing of S/W problematic with interdependent MDO ADSThe question of ‘certification’ of autonomous defense systemsUnlikely that exhaustive understanding of MDO scenarios with ADS achievableDependence on simulation creates risks of mirroring, fighting the last warMay require checks, monitoring and assurance by human operators/managersTask-based certifications may aid in calibration of appropriate MDO ADS trust ADS may need to evolve, grow, and learn to become useful in MDO8

9. Methods and Tools to Address Autonomy T&E ChallengesNow that we have discussed the background and challenges for autonomy T&E, how can we address these challenges within a realistic context? Helpful to have a framework for understanding how possible solutions can fit together to provide a comprehensive strategy for T&E of autonomy Since capabilities of interest replace former functions performed by human operators, useful to consider how a human operator develops and is evaluated over timeCan the system perform at least as well as a human could? To understand how a human is evaluated, we need to examine how a human operator is developed and trained as well, since evaluators also provide training feedback This may provide insight into how autonomous systems could be or should be developed (and ‘trained’) which will affect the strategies for test and evaluation and other verification and validation methods End result envisioned that a military commander has confidence in how they can deploy and employ the system effectively and can understand and manage risks involved 9Autonomous Defense SystemWhat is comparable?

10. How to develop and evaluate aFighter PilotScreening in small, light planeAcademics on aerodynamics, FAA rulesClosed-book examinationsTraining syllabusSimulator trainingTraining aircraftFlight training with instructor pilot sitting besideDaily flight debriefsOps limitsEmergency procedures and prohibited maneuversFlight training with instructor pilot as wingmanCheck-rides with evaluator pilotGraduation to mission airplaneMission qualification trainingWingman – flight lead – mission commanderDaily flight debriefs and tactics developmentRed air / adversarial trainingThreat study and regional combat spin-up10

11. How to develop and evaluate an Autonomous Defense System (?)Screening in small, light planeInitial development in small unmanned systemsAcademics on aerodynamics, weather, physiology, FAA rulesAccepted world and state modelsClosed-book examinationsFormal verification / Formal methodsTraining syllabusTraining data setsSimulator trainingSimulation for testing Training aircraftSurrogate platformsFlight training with instructor pilot sitting besideRuntime assurance and recoveryFlight training with instructor pilot in formationLayered runtime monitoringOps limits Adaptive, outlier search, and boundary testingPractice emergency procedures and prohibited maneuversFailure path testingDaily flight debriefsContinuous testing Check-rides with evaluator pilotOperational mission testingGraduation to mission airplaneEvolved certifications and accreditationOpen architectureMission qualification trainingAgile development and new training dataWingman – flight lead – mission commanderTesting continuumDaily flight debriefs and tactics developmentPost acceptance testingRed air / adversarial trainingAdversarial testingThreat study and regional combat spin-upRetraining for mission threats and tactics11Fighter Pilot Autonomous SystemIn this orderNot in this order

12. Early Development Methods12Initial Flight TrainingInitial Automation DevelopmentBeginning pilots undergo flight screening and first start training in a small, light, single-engine civilian airplaneLow cost, low risk, widely understood and usedExample: Cessna-152, Cessna-172, T-3 Firefly, DA-20Early automation development and testing may begin in small unmanned systems Low cost, low risk, less complex to understand and useExamples: Wasp Micro Air Vehicle, PUMA, Dragon Eye, Raven … Early ground training involves classroom academics on fundamentalsStudents must understand the flying environment, overarching principles, and basic fundamentalsExample: aerodynamic principles, FAA rules, physiology, weather, communicationAccepted world and state models Systems must have basic system states to match their operating modes and phases with their operational environments; internal modeling of reality supports accurate Perception, valid Reasoning, and effective DecisionsSystem state models must have T&E or V&V of logic, processes, and effectiveness Closed-book examinationsStudents must prove knowledge of basic flight rules, procedures, regulations, etc. Documented assurance of correct understanding of important facts and rules Formal verification / Formal methodsSystems may use formal methods to mathematically prove that certain inputs will always produce certain outputsTechniques: Model checking, state space enumeration, proof obligations, counterexamplesLimitations: Complex systems impossible to formally verifyTools: FRET (NASA)

13. Training and Test Development Methods13Syllabus Flight TrainingAutomation Development and Training Pilot training syllabusSyllabus covers knowledge and proficiency of takeoffs, landings, aerobatics, formations, emergenciesProvides baseline of minimum proficiencyTraining data for Autonomy development with machine learning Training data covers the operational space and conditions expected in employmentNeed for verification, validation, (and accreditation?) of the data used to train the AI“The intelligence is in the training data, not the algorithm” - UnknownFlight simulatorsProvide a safe, secure, repeatable way of training Can attempt things you never would do for real Varying levels of fidelity in many aspectsSimulation for testing Use of Live-Virtual-Constructive testing for safety, security, and efficiencyEx: Software integration labs (SIL), High-performance computing clusters (HPC), desktop simulators Risks and costs include V&V of the simulation, intended vs non-intended useTraining aircraftLearn first in a safe, cheaper, simpler platform Skills learned transfer to more complex aircraft Risk of negative training Surrogate autonomy platformsUse of test surrogate platforms (with or without human operators) for development and sequential integrationCheaper, safer, faster development and integrationRisk of missing key issues in transfer Example: TPS LearJet, Centaur OPA, UTAP-22

14. Monitoring and Safety Methods14Monitoring and Safety of FlightAutonomy Monitoring and SafetyInstructor pilot in aircraftInstructor constantly watching performance Can make inputs / corrections (comm dependent)Can take the controls if necessary Runtime assurance and recovery Monitoring and feedback tools for reporting and dataProvide a backup for safe recovery of test vehicle if unanticipated problems occur; allows re-starts Example: TACE (JHU APL / Emerging Tech CTF), R2U2 (NASA)T&E or V&V of the runtime assurance toolsInstructor pilot in formationInstructor monitoring to assist but not take overAble to provide inputs to aid in unexpected events or direct mission cancellation / return to base Layered runtime monitoring Multiple nested layers of monitoring provide runtime assurance at different levels based on information and authority and allow graceful degradationLayers can be removed as system development proceeds and confidence in the system increasesOperations limits & prohibited maneuversMin and max airspeed, altitude, G-loading, engine limits, flight control limits, employment limits Unusual attitude recovery, spins, stalls, abrupt maneuvers, out of control proceduresAdaptive/outlier search/boundary testingAutomate the search for outliers in the system response/outputs by using statistical toolsFind limits of safe/effective ops and quickest path to return to safe operations Example: MARGInS (NASA), RAPT (TRMC), Boundary Explorer (STAT COE)

15. Contingencies and Assurance15Flight Training is continuous with checkpointsAutonomy Testing throughout the LifecyclePractice emergency proceduresWalk-through of emergency indications, initial actions, checklist procedures, considerationsInstructor or evaluator led and verifiedFailure path testing Automated failure path test tools find unexpected failure conditions – ex: AdaStress (NASA)Daily flight debriefsPilots learn and improve every single day of training … which continues throughout the career … Learning occurs primarily in the debrief Continuous testing and cognitive instrumentationUse of agile and DevSecOps processes for development with continuous integration and testingUse of automated software integration and regression testing through 24/7 software test labs Cognitive instrumentation to diagnose the causes of incorrect behavior or inadequate performanceDistinguish coding errors from inadequate algorithms or bad training data or sensor/hardware problemsCheck-ride with evaluator pilotCapstone evaluation at the end of a phase of trainingEvaluation pilot is observer onlyOperational mission testingCONOPS and tactics (and corresponding training) are part of the system designTesting in realistic mission conditions, and integrated with manned systems in mission representative scenarios

16. Testing for Mission Operational Roles16Earning your Wings is only the beginningAutonomy Testing will continue after fielding and deploymentGraduation to mission airplaneTransferable skills and experience from training aircraft allow quicker, cheaper achievement of basic flying proficiency in complex mission aircraft Some certifications extend even if moving to a new aircraft (FAC-A, Targeting pod, Night-vision goggle) Open architecture Use of non-proprietary, open architecture standards for integrating subsystems and services into the mission package with government-owned interfacesAllows T&E insight into inner processes; portable Examples: OMA (USAF), UMAA (Navy), AFSIMEvolved certifications and accreditations Evolve certification standards as V&V methods for autonomy evolveMission qualification trainingEach flying squadron has specific missions, weapons, tactics, and standards that are learned before combat readiness is declaredAgile development and new training dataSystem performance optimized under iterative tests as system matures over time New operationally relevant data may become available for training the systemWingman – flight lead – mission commandPilots’ responsibilities and authority grow as additional certifications are achieved with experienceTesting continuumNo single predetermined test phase, but requires testing throughout the system life cycleSystem roles and trust will change over time Roles of DT&E vs IOT&E vs FOT&E vs TD&E

17. Testing in Preparation for Combat Missions17A fighter pilot never stops learning and improvingAutonomy Testing will adapt and tailor to match a thinking adversaryDaily flight debriefs & tactics developmentThoughtful analysis of execution errors and lessons learned provide the basis for airpower dominance Post-acceptance testing & operational instrumentationSystems that employ unsupervised learning or other adaptive control algorithms during operations will continue to change their behavior over timeUnexpected responses / performance may occurRed air / adversarial trainingAir superiority training requires regular adversary aircraft to train against with realistic TTPs Adversarial testingCyclical testing of the attack surface and attack vectors for disruption of ops as system understanding grows Threat study and regional combat spin-upPre-deployment training and fleet spin-up prepare crews for specific expected imminent combat missionsTailoring for mission tactics and threatsTheater-specific or mission-specific tactics and threats may require re-training and re-testing of the system for specific regional or mission scenariosCAVEAT EMPTOR: While I have tried to draw parallels and similarities between training a fighter pilot and testing an autonomous system, they are not the same thing. Many methods are analogous, but only loosely! This exercise is primarily intended to inform on the broad range of techniques involved and simplify them for those inexperienced in autonomy and AI test.

18. Autonomy as C2 Aiding the OODA Loop18Prioritization of Threats Safety of OperationsSafety of TestMDO Scenario CoverageLocal Optimal SolutionsStandardized C2 and InterfacesPrioritization of TasksShared CONOPSShared Operational ViewPriority Comms for CooperationGlobal Optimal SolutionsForecasting Future States Understanding Team CapabilitiesHuman InterventionSuccess CriteriaTrust and ExploitabilityTruth Data / Test Oracle Noise, Uncertainty, Deception

19. Methods for Addressing Autonomy T&E ChallengesDraft T&E Companion Guide for Autonomy provides a discussion of background, challenges, methods and tools in the context of the overarching T&E lifecycle of a program, as divided into T&E Phases T&E Phases start with earliest beginnings of a program at development of acquisition strategy and development strategy for ADS projects, proceed through test strategy development, test planning, test design, test execution, data analysis, and certification and accreditation processes Scientific Test and Analysis Techniques (STAT) Process is referenced as the framework for organizing the Test Phases for application of these ADS T&E solutionsEach phase also includes a general discussion of considerations and lessons learned19Methods Discussion by T&E PhasesAcquisition Strategy and Development StrategyTest StrategyTest PlanningTest DesignTest ExecutionData AnalysisCertification and Accreditation

20. T&E Companion Guide for Autonomy Draft Outline and ContentT&E Companion Guide - Outline1. Introduction 2. Autonomy Policies 3. Background and Context for ADS T&E 3.1 T&E Context 3.2 Autonomy Relation with AI & ML4. ADS T&E Challenges 4.1 Central T&E Challenges 4.2 Specific ADS T&E Challenges Methods Addressing Challenges 5.1 Where to begin 5.2 Methods and Lessons Learned by Test Phase ADS T&E Resources 6.1 Test labs & test range support for ADS T&E 6.2 Software tools for ADS T&E 6.3 Expertise in DoD for ADS T&ESummary and Way Ahead A-1 Lexicon A-2 References12 ADS T&E ChallengesDataReqmts/measuresInfrast/personnelExploitable vulnerSafetySimulation Example Methods Assurance casesLVC testingAutomate rqmts, codeContinuous testingFormal methodsSurrogate platformsExample Tools TACE, R2U2 Boundary Explorer MARGInS, AdaStress+ TRMC Toolkit Human sys teamingPost-accept testingDesign for testTest adequacy/integTesting continuumTesting ethics/legalRuntime assuranceAutomate s/w testDesign of experimentsAutonomy measures Automate outlier srchAdversarial testingHuman team measuresRAPT, RIOTFRET, IKOSACedit, AdvoCATE+ DARPA Assured Autonomy site20The guide is the primary means of capturing autonomy T&E lessons learnedKey contributors include Dr. David Tate and Dr. David Sparrow, Institute for Defense Analyses

21. Autonomy T&E Community ResourcesDraft Guide has a Contacts section for information about DoD offices with autonomy and AI expertise Access to subject matter experts from across the DoD, US government, industry and academia will enable continued sharing of lessons and methodsContinued coordination, collaboration and cooperation will facilitate the continued joint effort needed to update, maintain, and expand best practices and lessonsDRAFT Test and Evaluation Companion Guide for Autonomous Defense Systems is a community effort intended to serve the greater DoD community The Guide is incomplete Challenges can be better defined and characterized Methods can be added, expanded, clarified, or combinedTools must be identified, referenced, assessed, and disseminatedThis document ought to be available online, through proper channels, to all members Our vision for this document is as a living, expanding resourceBy allowing collaboration in this manner, this guide will leverage the collective networked expertise federated throughout the DoDWelcome Feedback, Recommendations, and Suggestions to improve the guideCapture emerging lessons learned and best practices from MDO ADS 2121

22. Questions? The goal is to advance Test & Evaluation methods for AI-enabled Autonomous SystemsMulti-Domain Operations produce key challenges to autonomy T&EPilot programs/projects, technical exchanges, and working groupsDraft T&E Companion Guide for Autonomous Defense Systems Application of human analogous processes when practical and effective 22OSD STAT Center of ExcellenceSteven N. Thorsen, PhD, Director Steven.Thorsen@afit.eduCharlie Middleton, contractor Charles.Middleton.ctr@afit.eduLenny Truett, PhD, contractor Leonard.Truett.ctr@afit.eduJoe Lazarus, contractor Joseph.Lazarus.ctr@afit.eduwww.AFIT.edu/STAT937-255-3636 x 4736Disclaimer: The views, opinions, and findings should not be construed as representing the official position of either the Department of Defense or the sponsoring organization.

23. 23The STAT COE provides independent STAT consultation to designated acquisition programs and special projects to improve Test and Evaluation (T&E) rigor, effectiveness, and efficiency. Visit, www.AFIT.edu/STATEmail, COE@afit.eduCall, 937-255-3636 x4736MODERNIZING the CULTURE of TEST & EVALUATION