Zhuo Chen Lu Jiang Wenlu Hu Kiryong Ha Brandon Amos Padmanabhan Pillai Alex Hauptmann and Mahadev Satyanarayanan Wearable Cognitive Assistance Input your destination ID: 534136
Download Presentation The PPT/PDF document "Early Implementation Experience with Wea..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Early Implementation Experience with Wearable Cognitive Assistance Applications
Zhuo
Chen, Lu Jiang, Wenlu Hu, Kiryong Ha, Brandon Amos,Padmanabhan Pillai, Alex Hauptmann, and Mahadev SatyanarayananSlide2
Wearable Cognitive Assistance
Input:
your destinationGuidance:step by step directionsknow your location“Recalculating…”Generalize metaphor from GPSSlide3
Wearable Cognitive Assistance
Input:
some target taskGuidance:step by step instructionsknow your progresscorrective feedbackGeneralize metaphor from GPSCognitive assistance is so broad a concept Focus on narrow, well-defined
task assistance for nowSlide4
Real World Use Cases
Industrial Troubleshooting
Medical TrainingCookingFurniture AssemblySlide5
This Paper
Common platform - Gabriel
Current implementations - Lego Assistant - Drawing Assistant - Ping-pong Assistant
- Assistance with Crowed-sourced TutorialsLessons and Future WorkSlide6
Review of Gabriel
Cognitive Engine 1
e.g. Face recognition
Cognitive VMs
DeviceComm
UPnP
PubSub
Video/
Acc
/
sensor streams
Control VM
Cloudlet
User assistance
User Guidance VM
Cognitive Engine
2
Cognitive Engine
3
Cognitive Engine n
...
Glass Device
Wi-Fi
Sensor control
Context Inference
VM boundary
Sensor flows
Cognitive flowsSlide7
Key Features of GabrielOffload video stream to cloudletGuarantees
low latency with app-level flow controlEncapsulate each application into a VMUse Pub-Sub to distribute streams
Goal: provide common functionalities to simplify development of each applicationSlide8
Example 1: Lego Assistant
Assembly 2D Lego with Life of GeorgeSlide9
Two-phase Processing
[[0,
0, 0, 1, 1, 3], [0, 6, 1, 6, 1, 1], [0, 1, 1, 1, 1, 0], [4, 4, 6, 4, 4, 4], [4, 4, 6, 4, 4, 4],
[1, 4, 4, 4, 4, 1], [0, 5, 5, 5, 5, 0],
[0, 5, 0, 0, 5, 0],
[6, 6, 0, 6, 6, 0]]
Applies to all applications we have built
Visua
l + Verbal
Guidance
Raw
stream
Symbolic Representation
Match current state with all known states in DB to get guidance
“Digitize”
Tolerant of different lighting, background, occlusionSlide10
Lego: Symbolic Representation Extractor
Four months of effort to make it
robust
Spent a great amount of time on
tuning parameters
and
testingSlide11
Lego Assistant DemoSlide12
Example 2: Drawing Assistant
“Drawing by observation”Corrective feedback for construction linesOriginal version uses pen tablet and screen
Move it to Glass (and any media)!Slide13
Drawing Assistant Workflow
[[0, 0,
0, 1, 1, 3], [0, 6, 1, 6, 1, 1], [0, 1, 1, 1, 1, 0], [4, 4, 6, 4, 4, 4], [4, 4, 6, 4, 4, 4], [1, 4, 4, 4, 4, 1], [0, 5, 5, 5, 5, 0], [0, 5, 0, 0, 5, 0],
[6, 6, 0, 6, 6, 0]]
Raw
stream
Symbolic Representation
(binary image)
Feed to almost unmodified logic in original software
Find paper
Locate sketches
Remove noise
Visual FeedbackSlide14
Example 3: Ping-pong Assistant
A better chance to winDirect to hit to the left or right based on opponent & ball positionNot for professionalsNot for visual impairedSlide15
Ping-pong Assistant Workflow
Raw
stream (process on pairs)Symbolic Representation(3-tuple)Suggestion based on recent state historyTable detectionOpponent detectionBall detectionVerbal Feedback
<
is_playing
,
ball_pos
,
opponent_pos
>
“
Left”/
“Right”Slide16
Ping-pong – Opponent Detector
Rotated frame 1
Rotated frame 2A1: White wall
A2: Dense optical flow
A3: LK optical flow
70 millisecond, error prone
Latency
increases
by 50%, but more robustSlide17
Example 4: Assistance with Crowd-sourced Tutorial
Deliver context-relevant tutorial videos87+ million tutorial videos on YouTubeState-of-the-art context detectorE.g. Cooking omelet
Recognize egg, butter, etc.Recommend video for same style omelet, using similar toolsQuickly scale up tasksCoarse grained guidanceSlide18
Tutorial Delivery Workflow
Raw
stream (process on video segments)Symbolic Representation(concept list)Indexed 72,000 Youtube videosText search using standard language model
Dense trajectory featureState-of-art, slow
1 min processing for a 6 sec video
Video Feedback
Objects
People
Scene
ActionSlide19
Future Directions
Faster PrototypingImprove Runtime Performance
Extending Battery LifeSlide20
Quick Prototyping – State ExtractorMaybe different applications can share libraries?
Cognitive Engine 1
e.g. Face recognition
Cognitive VMs
DeviceComm
UPnP
PubSub
Video/
Acc
/
sensor streams
Control VM
Cloudlet
User assistance
User Guidance VM
Cognitive Engine
2
Cognitive Engine
3
Cognitive Engine n
...
Glass Device
Wi-Fi
Sensor control
Context Inference
VM boundary
Sensor flows
Cognitive flows
Speeding up developing CV algorithmsSlide21
Quick Prototyping – State Extractor
Speeding up developing CV algorithms
Library VMs
DeviceComm
PubSub
Video/
Acc
/
sensor streams
Control VM
Cloudlet
App feedback
User Guidance VM
Shared
Library 1
Shared
Library 2
Glass Device
Wi-Fi
Sensor control
msg
Sensor Controller
App 1
App 2
...
App n
…
PubSub
Apps
PubSub
VM boundary
Sensor flows
Cognitive flows
Maybe different applications can share libraries?Slide22
Quick Prototyping – GuidanceEasy when state space is smallSpecify guidance for each state beforehand and match in real-time
E.g. Lego, Ping-pongHard when too many statesE.g. Drawing, free style Lego“Guidance by example”: learn
from crowed-sourced experts doing the taskSlide23
Improving Runtime PerformanceLeverage multiple algorithms
Do exist. Maybe just different parametersTradeoff between accuracy and speedE.g. Ping pong opponent detector
Accuracy of an algorithm depends onLighting, backgroundUser, user’s stateWon’t change quickly within a taskRun all, use optimal!Slide24
Extending Battery LifeRegion of Interest (ROI) exists for some tasksLego (board)
Drawing (paper)ROI doesn’t move quickly among framesCheap computation on clientTransmit only potential ROISlide25
Early Implementation Experience with Wearable Cognitive Assistance Applications
Zhuo
Chen, Lu Jiang,
Wenlu
Hu,
Kiryong
Ha, Brandon Amos
,
Padmanabhan
Pillai, Alex Hauptmann, and
Mahadev
SatyanarayananSlide26
Backup SlidesSlide27
Glass-based Virtual Instructor
Understand user’s state
- Real instructor: use eyes, ears, and knowledge- Virtual: sensors, computer vision & speech analysis + task representationProvide guidance to user
- Real instructor: speak, or show demos- Virtual: text/speech/image/videoSlide28
An Example
Making Butterscotch Pudding Glass gives guidance (e.g. step-by-step instructions)
E.g. “Gradually whisk in 1 cup of cream until smooth”Glass checks if user is doing wellE.g. Cream amount ok? Smooth enough?Guidance adjusted based on user’s progressE.g. “Next step is …” OR “Add more cream!”Slide29
Task Representation
Matrix representation of Lego state
Task represented as a list of states[[0, 2, 2, 2], [0, 2, 1, 1], [0, 2, 1, 6], [2, 2, 2, 2]]
…Slide30
Guidance to the User
Speech guidance“Now find a 1x4 piece and add it to the top right of the current model”“ This is incorrect. Now move the… to the left”
Visual guidanceAnimations to show the three actionsDemoSlide31
State Extractor (1)Slide32
State Extractor (2)Slide33
Guidance
Call function in original softwareSlide34
State Extractor – Table DetectionSlide35
State Extractor
1 min to detect context from a 6 sec videoSlide36
Symbolic RepresentationConcept list + high level taskCan detect 3000+ conceptsSlide37
Quick Prototyping – GuidanceHard when too many statesLearn from examplesRecord the task performance from multiple (maybe crowd-sourced) users
Run state extractor to extract state chainFor a new state from current user, find the optimal match to provide guidancePerformance improved as more people useSlide38
Framework for Easy ParallelismInter-frame parallelismEasierImprove throughput, not latency
Intra-frame parallelismScanning window – detection based on recognitionExtract local features from a big pictureSproutSlide39
Identify State ChangeFull processing needed only for new state
Savings can be huge! For a 2 minute Lego task, there are only 10 statesOnly 10 images need to be transmitted! (not 1800!)The question is which 10…Use “cheap” sensor to detect state changeTurn “expensive” sensor on when there isSlide40
Identify State ChangeInstructor doesn’t watch you all the time!
Probably just after giving some guidance, she won’t watch you.After guidance, turn camera off for some time.Instructor has time expectation of each stepCan set expectation learned from other usersAdapt to the current user
Instructor will check regularlyTransmit image at very low sampling rateTurn accelerometer on after some time