/
Victoria R. Rodrigues,  Mudit Victoria R. Rodrigues,  Mudit

Victoria R. Rodrigues, Mudit - PowerPoint Presentation

emmy
emmy . @emmy
Follow
0 views
Uploaded On 2024-03-13

Victoria R. Rodrigues, Mudit - PPT Presentation

Paliwal Nicholas J Napoli Electrical and Computer Engineering University of Florida Gainesville FL 32611 USA Human Informatics and Predictive Performance Optimization HIPPO Lab Angela ID: 1046836

amp class methods multiclass class amp multiclass methods engagement features score workload meanauc meanf1 accuracy scenario fusion classification results

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Victoria R. Rodrigues, Mudit" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Victoria R. Rodrigues, Mudit Paliwal, Nicholas J. NapoliElectrical and Computer Engineering, University of Florida, Gainesville, FL, 32611, USA Human Informatics and Predictive Performance Optimization (HIPPO) LabAngela Harrivel, Kellie D. Kennedy, Chad L. StephensNASA Langley Research Center, Hampton, VA 23681Sensor Systems and Information Fusion I, Fusion I – 01/14/2021Published by the American Institute of Aeronautics and Astronautics, Inc., with permission.Predicting Cognitive States Using Machine Learning Fusion Paradigms to Reduce Model Uncertainty

2. Table of ContentsIntroductionMotivationPrior WorkChallenges InsightsMethodsResearch QuestionsResultsConclusion2

3. Motivation: Safety3CAST Airplane State Awareness Joint Safety Analysis Team Report, 2014Commercial Aviation Safety Team, “Airplane State Awareness (ASA) Joint Safety Analysis Team (JSAT) Report,” 6-17-2014, http://skybrary.aero/bookshelf/content/bookDetails.php?bookId=2999

4. Motivation: States of Distraction4Attention-related Human Performance Limiting States (AHPLS)Channelized AttentionStartle/SurpriseLow Workload High Workload

5. System Objective5

6. Prior Work: LimitationsThere is limited number of experiments that apply proper cross-validation methodsTypical cross-validation results lead to inflated performance measures [1,2]Prior work in the field has focused mostly on workload, stress level and attention level prediction [3,4].There is little research on multiclass AHPLS prediction, and limited experimental data in the aeronautics scenario.Most of the research to date use a single physiological modality [5,6] This is a gap for multi-modal physiological sensing and the possibility to explore sensor fusion.Most EEG feature extraction methods still fail to capture the time-frequency dynamics of the data [7].Multiclass classification is rarely investigatedWhen it is, it usually aims on classifying different levels of the same cognitive states (e.g. Low vs medium vs high workload) [8].6

7. Research QuestionsHow subject-independent vs subject-dependent cross validation affects accuracy?How accurate can we predict in a 2-class scenario vs 3-class scenario vs 4-class scenario? Which subsets of features are the most relevant for multiclass classification?Is the proposed engagement index measurement a better predictor than the usual engagement index calculation?How does the fusion of multiclass models perform vs the classical methods presented? (ongoing research)7

8. ChallengesHow to ensure each model is trained and tested on a fair dataset?The current literature does not fully investigate classification accuracy for AHPLS, especially in a multiclass and multimodal scenario.Increasing the number of classes and adding multiple modalities to the problem adds uncertainty to the decision boundaries.When using multiple physiological modalities, there is a probability that different subsets of the features will be useful to characterize different classes.How can we capture the time-frequency dynamics of EEG?8

9. Insights9Subject-independent cross-validation should provide realistic results.Joining two similar AHPLS datasets may help model generalization and lead to better accuracy in prediction.Compare the prediction power of groups of features.EEG Wavelet Decomposition is better for capturing time-frequency dynamics.Sensor Fusion Framework.

10. Methods #1 – Cross-ValidationRQ1 - How subject-independent vs subject-dependent cross validation affects accuracy?10

11. Methods #2 – Binary vs multiclass classificationRQ2 - How accurate can we predict in a 2-class scenario vs 3-class scenario vs 4-class scenario? We will evaluate all combinations of the four AHPLS using classical machine learning models.To reduce the effects of an increase in the number of classes, we will join two similar data sets so our models can have more data points to generalize.We believe that we will see a drop in accuracy when comparing 2-class problems vs 4-class problems, but this drop will be small due to having more data.11

12. Methods #2 – Binary vs multiclass classificationScenarios for Human Attention Restoration Using Psychophysiology (SHARP-1) & SHARP-212AHPLSTaskChannelized AttentionTetrisStartle / SurpriseMovie Scene ObservationLow Workload complex multi-taskMATB* Low Workload High Workload complex multi-taskMATB High Workload*Multi-Attribute Task Battery available at: http://matb.larc.nasa.gov/

13. Methods #2 – Binary vs multiclass classificationData Collection13ABM B-AlertMuse wrist-worn, for GSRHarrivel, H., Heinich, C., Milletich, R., Comstock, J., Stephens, C., Last, M., Napoli, N., Abraham, N., Toro, K., Kennedy, K., and Pope, A. (2018). Comparative EEG Sensor Analysis for Attentional State Prediction. Presented at AsMA 2018, Sensors and Symptoms: Research in Physiological Events, May 7, 2018, Dallas, Texasimage credit: NASA/CSM teamBenchmark Task performanceSpire

14. Methods #2 – Binary vs multiclass classificationDataset Composition 14wrist-worn, for GSR4 states3 states2 states1 stateTotalSHARP-110146131SHARP-242017Subjects38

15. Methods #3 – Important features subsetsRQ3 - Which subsets of features are the most relevant for multiclass classification? We will evaluate classification accuracy using 3 different groups of features:EEG & ECGEEG & RespirationRespiration & ECG15

16. Methods #3 – Important features subsets16ModalityFeature TypesNumber of FeaturesECGHRV Entropy, mean and standard deviation3EEGEngagementEngagement index, mean and stdActivation VarianceIntensityRank Order Entropy ABTRank Order Entropy Full1 x 4 channels3 x 4 channels12 x 4 channels12 x 4 channels1 x 4 channels1 x 4 channelsRespirationRespiration RateRespiration Rate PeakRespiration Sample Entropy3Total126

17. Methods #4: New Engagement IndexRQ4 - Is the proposed engagement index measurement a better predictor than the usual engagement index calculation?Classical Engagement Index 17Where T in the length of the signal

18. Methods #4: New Engagement IndexProposed Engagement Index 18

19. Methods #4: New Engagement IndexWavelet Decomposition vs Fourier Transform19

20. Methods #4: New Engagement Index20

21. Methods #5: A Sensor Fusion ApproachRQ5 - How does the fusion of multiclass models perform vs the classical machine learning methods?Naive Adaptive Probabilistic Sensor (NAPS) FusionProposed framework that utilizes numerous, random, small, bagged models with augmented response variables that randomly span the full feature space of our dataset.This approach aims to:Reduce model uncertainty caused by the data structureBetter Handle Subject-to-Subject VariabilityDetect AnomaliesMaintain modularity for future modalities and features21

22. Results22

23. Results – RQ1RQ1 - How subject-independent vs subject-dependent cross validation affects accuracy?23Subject-dependent average accuracySubject-independent average accuracy2-class problem98.7%65.2%3-class problem97.3%47.4%4-class problem96.9%39.0%Note: accuracy (ACC)

24. Results – RQ2RQ2 - How accurate can we predict in a 2-class scenario vs 3-class scenario vs 4-class scenario? 24Average ACCAverage AUCAverage F1-Score2-class problem65.2%0.6750.6933-class problem47.4%0.6350.5124-class problem39.0%0.6100.418Note: Area Under the (Receiver Operating Characteristic) Curve (AUC) F1-Score used to evaluate the predictive performance of multiclass ML models

25. Results – RQ3RQ3 - Which subsets of features are the most relevant for multiclass classification? 25EEG & RespirationEEG & ECGRespiration & ECGClassesACC MeanAUC MeanF1-Score MeanACC MeanAUC MeanF1-Score MeanACC MeanAUC MeanF1-Score Mean['LW', 'HW']66.0%0.7110.70067.9%0.7290.72054.2%0.5990.624['SS', 'LW']64.0%0.6600.67659.6%0.6180.63370.2%0.7390.747['CA', 'LW']53.1%0.5380.55552.9%0.5420.55250.9%0.5210.533['SS', 'HW']63.6%0.6770.72258.1%0.6130.66472.7%0.7620.801['CA', 'HW']66.9%0.7210.72267.2%0.7180.71857.7%0.6490.662['CA', 'SS']70.1%0.7420.72268.4%0.7300.70572.7%0.7720.752['SS', 'LW', 'HW']47.4%0.6420.51744.1%0.6080.48445.9%0.6530.529['CA', 'LW', 'HW']42.3%0.6030.45443.0%0.6070.45737.4%0.5730.426['CA', 'SS', 'LW']42.8%0.6050.45240.2%0.5810.42342.2%0.6240.442['CA', 'SS', 'HW']53.0%0.6950.56550.5%0.6730.53447.8%0.6730.540['CA', 'SS', 'LW', 'HW']36.1%0.6160.38434.0%0.5950.36031.9%0.6000.352

26. Results – RQ3RQ3 - Which subsets of features are the most relevant for multiclass classification? 26EEG & RespirationEEG & ECGRespiration & ECGClassesACC MeanAUC MeanF1-Score MeanACC MeanAUC MeanF1-Score MeanACC MeanAUC MeanF1-Score Mean['LW', 'HW']66.0%0.7110.70067.9%0.7290.72054.2%0.5990.624['SS', 'LW']64.0%0.6600.67659.6%0.6180.63370.2%0.7390.747['CA', 'LW']53.1%0.5380.55552.9%0.5420.55250.9%0.5210.533['SS', 'HW']63.6%0.6770.72258.1%0.6130.66472.7%0.7620.801['CA', 'HW']66.9%0.7210.72267.2%0.7180.71857.7%0.6490.662['CA', 'SS']70.1%0.7420.72268.4%0.7300.70572.7%0.7720.752['SS', 'LW', 'HW']47.4%0.6420.51744.1%0.6080.48445.9%0.6530.529['CA', 'LW', 'HW']42.3%0.6030.45443.0%0.6070.45737.4%0.5730.426['CA', 'SS', 'LW']42.8%0.6050.45240.2%0.5810.42342.2%0.6240.442['CA', 'SS', 'HW']53.0%0.6950.56550.5%0.6730.53447.8%0.6730.540['CA', 'SS', 'LW', 'HW']36.1%0.6160.38434.0%0.5950.36031.9%0.6000.352

27. Results – RQ3RQ3 - Which subsets of features are the most relevant for multiclass classification? 27EEG & RespirationEEG & ECGRespiration & ECGClassesACC MeanAUC MeanF1-Score MeanACC MeanAUC MeanF1-Score MeanACC MeanAUC MeanF1-Score Mean['LW', 'HW']66.0%0.7110.70067.9%0.7290.72054.2%0.5990.624['SS', 'LW']64.0%0.6600.67659.6%0.6180.63370.2%0.7390.747['CA', 'LW']53.1%0.5380.55552.9%0.5420.55250.9%0.5210.533['SS', 'HW']63.6%0.6770.72258.1%0.6130.66472.7%0.7620.801['CA', 'HW']66.9%0.7210.72267.2%0.7180.71857.7%0.6490.662['CA', 'SS']70.1%0.7420.72268.4%0.7300.70572.7%0.7720.752['SS', 'LW', 'HW']47.4%0.6420.51744.1%0.6080.48445.9%0.6530.529['CA', 'LW', 'HW']42.3%0.6030.45443.0%0.6070.45737.4%0.5730.426['CA', 'SS', 'LW']42.8%0.6050.45240.2%0.5810.42342.2%0.6240.442['CA', 'SS', 'HW']53.0%0.6950.56550.5%0.6730.53447.8%0.6730.540['CA', 'SS', 'LW', 'HW']36.1%0.6160.38434.0%0.5950.36031.9%0.6000.352

28. Results – RQ428RQ4 - Is the proposed engagement index measurement a better predictor than the usual engagement index calculation?Engagement IndexProposed Engagement IndexImprovementClassesACC MeanAUC MeanF1-Score MeanACC MeanAUC MeanF1-Score MeanACC Improv.AUC Improv.F1-Score Impro.['LW', 'HW']55.1%0.6040.63164.2%0.7040.69716%17%10%['SS', 'LW']70.8%0.7440.75273.1%0.7690.7713%3%3%['CA', 'LW']50.8%0.5180.53452.3%0.5370.5423%4%2%['SS', 'HW']71.6%0.7670.79479.7%0.8530.84811%11%7%['CA', 'HW']57.6%0.6380.66064.1%0.6980.70611%9%7%['CA', 'SS']71.7%0.7620.74173.4%0.7920.7572%4%2%['SS', 'LW', 'HW']46.9%0.6630.53956.6%0.7560.61821%14%15%['CA', 'LW', 'HW']37.0%0.5760.42139.7%0.6010.4377%4%4%['CA', 'SS', 'LW']42.8%0.6230.44943.1%0.6400.4511%3%0%['CA', 'SS', 'HW']47.3%0.6750.53251.2%0.7070.5618%5%5%['CA', 'SS', 'LW', 'HW']32.8%0.6090.36435.2%0.6510.3827%7%5%

29. Results – RQ5RQ5 - How does the fusion of multiclass models perform vs the classical machine learning methods?Ongoing ResearchWe expect our committee All-vs-One voting approach will lead to more generalized binary models.Better prediction accuracy overall 29

30. ConclusionWe demonstrated how incorrect cross-validation can lead to over-optimistic validation results Our work showed good accuracy (73% for a 2-class problem and 40% in a 4-class problem) in predicting AHPLS in the benchmark tasks. We looked at how each subset of features affects accuracy and determined that Respiration is essential for prediction in any subset combination.We proposed a different engagement index calculation, which showed to be a better predictor, especially when High Workload is present. Current research focus on a sensor fusion approach 30

31. Reference[1] Wang, Qiang, and Olga Sourina. "Real-time mental arithmetic task recognition from EEG signals." IEEE Transactions on Neural Systems and Rehabilitation Engineering 21.2 (2013): 225-232.[2] Cheema, Amandeep, and Mandeep Singh. "An application of phonocardiography signals for psychological stress detection using non-linear entropy based features in empirical mode decomposition domain." Applied Soft Computing 77 (2019): 24-33.[3] A. Hasanbasic, M. Spahic, D. Bosnjic, H. H. adzic, V. Mesic and O. Jahic, "Recognition of stress levels among students with wearable sensors," 2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH), East Sarajevo, Bosnia and Herzegovina, 2019, pp. 1-4, doi: 10.1109/INFOTEH.2019.8717754.[4] Kutlu, Y., Yayık, A., Yildirim, E., & Yildirim, S. (2019). LU triangularization extreme learning machine in EEG cognitive task classification. Neural Computing and Applications, 31(4), 1117-1126.[5] Wang, Q., & Sourina, O. (2013). Real-time mental arithmetic task recognition from EEG signals. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 21(2), 225-232.[6] Melillo, P., Bracale, M., & Pecchia, L. (2011). Nonlinear Heart Rate Variability features for real-life stress detection. Case study: students under stress due to university examination. Biomedical engineering online, 10(1), 96.[7] Aghajani, H., Garbey, M., & Omurtag, A. (2017). Measuring mental workload with EEG+ fNIRS. Frontiers in human neuroscience, 11, 359.[8] Ziheng Wang, Ryan M. Hope, Zuoguan Wang, Qiang Ji, Wayne D. Gray, Cross-subject workload classification with a hierarchical Bayes model, NeuroImage, Volume 59, Issue 1, 2012, Pages 64-69, ISSN 1053-8119, https://doi.org/10.1016/j.neuroimage.2011.07.094.31

32.