/
Speech-to-Speech Translation with Clarifications Speech-to-Speech Translation with Clarifications

Speech-to-Speech Translation with Clarifications - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
436 views
Uploaded On 2016-03-02

Speech-to-Speech Translation with Clarifications - PPT Presentation

Julia Hirschberg Svetlana Stoyanchev Columbia University September 18 2013 Outline Main Problem Key Ideas Solution Details Impact Issues Gaps and Future work Speech Translation SpeechtoSpeech translation system ID: 239594

questions clarification system error clarification questions error system question targeted speech user inappropriate dialogue word reprise data translation human

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Speech-to-Speech Translation with Clarif..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Speech-to-Speech Translation with Clarifications

Julia Hirschberg, Svetlana Stoyanchev

Columbia University

September 18, 2013Slide2

Outline

Main Problem

Key Ideas

Solution Details

Impact

Issues,

Gaps

, and Future workSlide3

Speech Translation

Speech-to-Speech translation system

3

L1 Speaker

lation

Speech

Question

(L1)

Translated Question (L2)

Answer (L2)

Translated

Answer (

L1)

L2 Speaker

Translation

SystemSlide4

Speech Translation

Translation may be impaired by:

Speech recognition errors

Word Error rate in English side of

Transtac

is 9%

Word error rate in Let’s Go bus information is 50%A speaker may use ambiguous languageA speech recognition error may be caused by use of out-of-vocabulary words

4Slide5

Translation

System

Speech Translation

Speech-to-Speech translation system

Introduce a clarification component

5

L1 Speaker

Speech

Question

(L1)

Translated Question (L2)

Answer (L2)

Translated

Answer (L1))

Clarification

sub-dialogue

Clarification

sub-dialogue

L2 Speaker

Dialogue Manager

Dialogue ManagerSlide6

Key Ideas

Use

targeted

c

larifications

Address challenges with targeted clarificationsData collection for system evaluationSlide7

Most Common Clarification Strategies in Dialogue Systems

“Please repeat”

“Please rephrase”

System repeats the previous question

7Slide8

What Clarification Questions Do Human Speakers Ask?

Targeted reprise questions (M.

Purver

)

Ask a targeted question about the part of an utterance that was misheard or misunderstood, including understood portions of the utterance

Speaker: Do you have anything other than these XXX plans?

Non-Reprise: What did you say?/Please repeat.Reprise: What kind of plans?88% of human clarification questions are reprise12% non-repriseGoal: Introduce targeted (reprise) questions into a spoken system

8Slide9

Advantages of Targeted Clarifications

More natural

User does not have to repeat the whole utterance/command

Provides grounding and implicit confirmation

Speech-to-speech translation

Useful in systems that handle natural language user responses/commands/queries and a wide range of topics and vocabulary

Tutoring systemVirtual assistants (in car, in home): a user command may contain ASR error due to noise, background speech, etc.

9Slide10

Types of Clarification Questions in

the TBOLT

System

Rephrase part

Used when an error is OOV and NOT a name (works on difficult non-OOV words as well)

Asks to rephrase the error segment

“I did not understand when you said: fiscal. Please give me another word or phrase for it.”SpellingUsed for names“Please spell ‘Rockefeller’.”DisambiguationUsed to disambiguate between homophones“Did you mean plain as in extensive tract of level open land, or, plane as in an aircraft?”Slide11

Types of Questions (cont.)

Reprise (as found in human-human communication)

Repeats part of the utterance before the error segment

User: We will search some of the

XXX to

make sure everyone is safe.

System: We will search some of the what?Reprise/ Rephrase-partCombines a targeted question with a rephrase questionSystem: We will search some of the what? Please say another word or phrase for this: ‘vehicles’.ConfirmationA yes/no question to confirm an utterance“Did you say ‘the breach is located here’?”Slide12

Requirement

for

a Targeted Question

Error Detection

Error segment boundaries

Error typeDoes the error contain a proper name?Does the error contain an out-of-vocabulary (OOV) word?Slide13

Challenges with Using Targeted Clarifications

A system's clarification question may not be appropriate because

An error segment may not be detected correctly

An error type is not identified correctly

How

should a system construct a targeted/reprise clarification

questionHow long should the system persist in its clarification?Our goal is to address these challenges empirically

13Slide14

Examples of Inappropriate Clarification Questions

Rephrase-part (incorrect error type

detected)

User: Captain

Bell wants

to ask you some questions.

Error Detection: Captain <OOV> to ask you some questions.System: I did not understand when you said "Bell wants". Please give me another word or phrase for it. Reprise/ Targeted Rephrase (error segment is identified incorrectly)User: The insurgents are denouncing us. ASR/Error detection: The insurgents are doing now and <ERROR>System: The insurgents are doing now and what? Please say another word or phrase for "cing us". Slide15

Examples (cont.)Spelling (error type is incorrectly detected as a name)

User: It sounds like there is a

kernel

of truth to your story.

Error Detection: It sounds like there is a

<NAME> of truth to your story. System: I think I heard a name when you said this: "kernel". Please spell that name. Slide16

Examples (cont.)

Disambiguation

(disambiguated word is misunderstood)

User: How often do you have problems with

genera

tors?

ASR/Error Detection: How often do you have problems with general?System: Did you mean general as in broad or general as in a military officer?Slide17

Causes of Inappropriate Questions

Rephrase part

Partial word is detected as an error

Detected segment contains a name

Detected segment is a function word (to, from, the …)

DisambiguateNeither choice for disambiguation is correctDisambiguated word is misrecognizedSpell

Non-nameLong segmentRepriseContains undetected recognition errorSlide18

Goal

Develop a method to automatically identify when an inappropriate question is asked

Use user’s answers to detect if a question was inappropriateSlide19

Data Collection

Simulation clarification system

Users were asked to read a sentence and then were played a pre-recorded question

Led to believe they were interacting with the actual systemSlide20

Data Collection(

cont.)

Prepared 228 questions

84 appropriate

144 inappropriate

For each type of clarification

questions, create appropriate and inappropriate questions,Total 19 categories of clarification questionsEach subject was asked 144 questionsRecorded their initial utterances and their answers to the questionsSlide21

User Responses

Subjects tended to be cooperative

Answers varied from subject to subject

Example:

“I did not understand when you said: ‘

Betirma

’. Please give me another word or phrase for it.”“No" "Betirma" “Betirma bravo echo tango india romeo mike alpha" Slide22

User Responses (cont.)

Example 2:

User: “How often do you have problems with generators?”

System:

“Did you mean general as in broad or general as in a military officer?”

"generator as in a machine for making electricity"

"no" "generators" Slide23

Method

Extract lexical and prosodic features from responses

N

umber of pauses, speech energy, speech tempo

Lexical and prosodic difference between initial response and an answer to clarification

M

easure number of times subjects replay each question Measure latency: length of pause before answerDetermine whether questions are appropriate or inappropriate based on user responsesSlide24

Challenge 2: Constructing Targeted Clarification Questions

Previous work: collected clarification questions using mturk (Stoyanchev et al. 2012, 2013)

Using human-generated questions manually created a set of generation rules

Evaluated generated questions with human subjectsSlide25

Types of Questions

R_GEN Generic:

<context before error>

what

?

Applies if no other rules applySentence: The doctor will most likely prescribe XXX Question: the doctor will most likely prescribe WHAT? R_SYN Syntactic: about <context before error> what about <context after error> ?Applies when: there is VB after error; VB and error share a parent Sentence: When was the XXX contacted?Question: When was WHAT contacted?

R_NMOD: which <parent word>?Applies when: DEP TAG error = NMOD and parent POS = NN | NNS Sentence: Do you have anything other than these XXX plansQuestion: Which plans?R_START: what about <context after error>Slide26

Evaluation Questionnaire

Generated questions automatically using the rules for a set of 84 sentences

Asked humans (mturk) to create a clarification questions for the same sentences

Questionnaire applied to both human and computer-generated questionsSlide27

SubjectsMturk Recruited 6 subjects from the labInter-annotator AgreementSlide28

ResultsSlide29

ResultsSlide30

DiscussionR_GEN and R_SYN performance is comparable to human-generated questionsR_NMOD (which …?) outperforms all other question types including human-generated questions

R_START rule did not workSlide31

Key Ideas

Use Targeted Clarifications

Address challenges with targeted clarifications

Experiment on automatic detection of inappropriate questions

Experiment on automatic detection of when to terminate clarification

Data collection for system evaluationSlide32

Image Description and Questioning

Speaker1:

A car is burning behind the girl

The girl looks startled

There was a massive explosion

Speaker2:A woman is standing in front of a burning carEverything around her seems to have been destroyedWhat caused this destruction?

Show user an image and ask to describe it and construct questionsSlide33

Data Collection for System Evaluation

Advantages:

Do not prime users with words in a verbally described scenario

Elicits natural speech compared to reading

Can be extended to a 2-way dialogue where the

interviewee

is given a narrative or video information for answering interviewer's questions.Disadvantages:Uncontrolled vocabulary (can not force to mispronounce words)No control across subject pairsSlide34

Impact

Impact on Speech-to-Speech Translation

Detecting when a targeted clarification question was inappropriate is an important feature for determining next dialogue move in clarification

Impact

beyond Speech-to-Speech Translation

Targeted clarifications can be used in spoken dialogue systemsEspecially useful for non-slot-filling (tutoring, virtual assistants)Slide35

Future Work

Appropriate and inappropriate questions

Analyze the data collected in responses to appropriate and inappropriate clarification questions

Use machine learning to predict if an utterance is an answer to appropriate or inappropriate clarification question

Targeted

(reprise) clarification questionsWhich information from an initial sentence should a reprise clarification question contain?Using human-constructed questions, determine which information is essential to be repeated in a targeted questionClarification lengthHow long should the system focus on a targeted clarification before back off?Collect data and use machine learning to predict on each system’s turn whether a clarification should continue or stopsSlide36

Conclusions

Used an error-simulation system to collect data

D

ata collection experiment for automatic detection of answers to 'inappropriate' system clarifications

Evaluation

of automatically generated reprise clarification questions shows that they could be used in a

systemProposed an experiment for determining an optimal length of targeted clarificationCollected audio data for system evaluation using an image description method

36Slide37

Thank you

Questions?

37Slide38

Challenge 3:

Clarification Length

How long should the system focus on a targeted clarification before back off?

In a Speech-to-Speech translation: back-off= translate

In spoken dialogue systems : back-off = ask a generic question to 'please rephrase'.

The answer depends on how patient and cooperative are users.Slide39

Evaluation of

Clarification Length

BOLT 2012 system behaviour: System asks targeted clarification at most 3 times before translating.

Goal: Determine dynamically at each clarification turn whether the system should terminate clarification process.

Use data to learn the dialogue strategySlide40

Experiment Design

Simulate sequence of unsuccessful clarification questions.

Give user an option to hit “

translate

” button

Distractor cases:

Simulate successful clarification User: This computer is not operationalSystem: Please rephrase “not operational”User: not workingSystem: thank you ( translate and show next question)Experimental case:Loop asking 3 – 5 different targeted questionsClarification dialogue continues until the user hits “translate”Use a combination of distractor and experimental casesSlide41

Method

Use data to determine when system should give up on a targeted clarification

Apply machine learning

Features:

Dialogue length (more likely to give up as dialogue continues to fail)

Question type

Appropriateness of a clarification questionConfidences of error detection and classification components