/
Expertise Finding for Question Answering (QA) Services Expertise Finding for Question Answering (QA) Services

Expertise Finding for Question Answering (QA) Services - PowerPoint Presentation

jane-oiler
jane-oiler . @jane-oiler
Follow
369 views
Uploaded On 2018-03-06

Expertise Finding for Question Answering (QA) Services - PPT Presentation

October 16 2014 Department of Knowledge Service Engineering Prof JaeGil Lee Brief Bio Currently an associate professor at Department of Knowledge Service Engineering KAIST Homepage httpdmkaistackrjaegil ID: 641089

users user location search user users search location based social services questions experts availability top question answers day evaluation

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Expertise Finding for Question Answering..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Expertise Finding for Question Answering (QA) Services

October 16, 2014

Department of Knowledge Service Engineering

Prof. Jae-Gil LeeSlide2

Brief Bio

Currently, an associate professor at Department of Knowledge Service Engineering, KAIST

Homepage: http://dm.kaist.ac.kr/jaegil

Lab homepage: http://dm.kaist.ac.kr/

Previously, worked at IBM

Almaden

Research Center and University of Illinois at Urbana-Champaign

Areas of Interest:

Data Mining

and

Big DataSlide3

Table of Contents

Community-based Question Answering (CQA) Services

Background and Motivation

Methodology Overview

Evaluation Results

Social Search Engines for Location-Based Questions

Background and Motivation

System Architecture and User Interface

Evaluation Results Slide4

Question Answering (QA) Services

QA services are good at

Recently updated information

Personalized information

Advice & opinion

[

Budalakoti

et al., 2010]

Questions Answers

Knowledge

Base

Search

ExpertsSlide5

Community-based Question Answering (CQA) Services

Naver

Knowledge-In Yahoo! Answers

50,000

questions per day

160,000

questions per daySlide6

Motivation of Our Study

Most contributions (i.e., answers) in CQA services are made by a small number of heavy users

Recently-joined

users are prone to leave CQA

services very soon

Only 8.4

% of answerers

remained after a year

Making the long tail stay longer

before they leave

is of prime importance towards the success of the servicesSlide7

Problem Setting

To whom does the service provider need to pay special attention?

Recently-joined

(i.e., light) users who are likely to become

contributive

(i.e., heavy) users

Goal

: estimating the

likelihood of a light user becoming a heavy user (mainly by his/her expertise)Challenges

: lack of information about the light user

어장관리?Slide8

Intuition behind Our Methodology

A person’s

active vocabulary

reveals his/her knowledge

Vocabulary has sharable characteristics so that domain-specific words are repeatedly used by expert answerers

SSD

NAND

ECC

RAM

Device

Memory

Computer

NAND

ECC

RAM

SSD

Operation

Data

Drive

Q&A 1 by Answerer 1

Q&A 2 by Answerer 2

Domain-Specific

Vocabularies

Common

Vocabularies

Level

Difference

Sharable

CharacteristicsSlide9

Estimated Expertise

Heavy Users Words Light Users

The more expert a user is, the higher the level of words he/she used is.Slide10

Availability

Simply measuring the number of a user’s answers with their importance proportional to their

recencySlide11

Answer Affordance

Being defined as the likelihood of a light user becoming a heavy user if he/she is treated specially

Considering

both

expertise

and

availability

 Slide12

Data Set

Collected from

Naver

Knowledge-In

(

KiN

,

지식인

)Spanning ten years (from Sept. 2002 to Aug. 2012)

Including two categories: Computers and TravelComputers: factual information, Travel: subjective opinionsThe entropy was used for measuring the expertise of a user, working well especially for the categories where factual expertise is primarily sought after [

Adamic et al., 2008]Statistics

ComputersTravel# of answers3,926,794

585,316# of words191,502

232,076# of users

228,36944,866Slide13

Evaluation Setting (1/2)

Finding the top-k users by

Affordance

()

for light

users

our

methodologyRetrieving the top-k directoryexperts managed by KiN

 competitorMeasuring the two measuresfor the next one monthUser availability: the ratio of the number of the top-k users who appeared on the day to the total number of users who appeared on that day

Answer possession: the ratio of the number of the answers posted by the top-k users on the day to the total number of answers posted on that daySlide14

Evaluation Setting (2/2)

Ten year period

Sept. 2002 July 2011 July 2012 Aug. 2012

Used for deriving the word levels

Used for finding top-k experts by our methodology

Picked up the top-k directory experts managed by

KiN

Monitored the user availability and answer possessionSlide15

The result of the

answer possession

The result of the

user availability

(a) Computers (b) Travel

(a) Computers (b) Travel

t

op-400

top-200

t

op-400

top-200Slide16

See the paper for the technical details.

Sung, J.,

Lee, J.

, and Lee, U., "Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services," In

Proc. 7th Int'l AAAI Conf. on Weblogs and Social Media (ICWSM)

, Cambridge, Massachusetts, July

2013.

This paper received the

Best Paper Award

at AAAI ICWSM-13.Slide17

Table of Contents

Community-based Question Answering (CQA) Services

Background and Motivation

Methodology Overview

Evaluation Results

Social Search Engines for Location-Based Questions

Background and Motivation

System

Architecture and User InterfaceEvaluation Results Slide18

Social Search (1/2)

A new paradigm of knowledge acquisition that relies on the people of

a

questioner’s

social networkSlide19

Social Search (2/2)

If you want to get some opinions or advices from your online friends, what do you do?

Not

knowing whom to ask

Knowing whom to ask

Taking advantage of both approaches

Social SearchSlide20

KiN Here (지식인 위치질문

)

A query is routed by finding a match between a target location of a query and a relevant location of a user

동 단위로 추가Slide21

Location-Based Questions

Informally defined

as “search for a business

or place

of interest that is tied to a specific geographical

location”[Amin

et al

., 2009]

Very popular especially in mobile search and typically subjectiveMobile search is estimated to comprise 10%∼30%

of all searches About 9∼10% of the queries from Yahoo! mobile search, over 15% of 1 million Google queries from PDA devices

, and about 10% of 10 million Bing mobile queries were identified as location-based questions

In a set of location-based questions, 63% of them were non-factual, and the remaining 37% of them were factual Mobile social search is the best way to process location-based questionsSlide22

Glaucus: A Social Search Engine for Location-Based Questions

1.

Asking a question to

Glaucus

2.

Selecting proper experts

3.

Routing the question to the experts

4.

Returning an answer to the questioner5. (Optional) Rating the answer

Glaucus

Social Search Engine

User Database

1: Query

Users

2: Selected Experts

3: Query

Answer

4: Answer

5: Feedback

Crawling

QuestionerSlide23

User Interface

An Android app has been developed and is under (closed) beta testing

Questioner AnswererSlide24

Data Collection

Being able to collect

who

visited

where

and

when

on

geosocial networking services such as FoursquareUsers check-in to a venue and also may leave a tipOur crawler collects such information upon user approvalSlide25

Expert Finding

Venue

Location

Category

Time

Misc.

Venue

Location

Category

Time

Misc.

Location Aspect Model

Questioner

Question

Other Users

Online Friend?

Similarity

Calculation

Score

Score

Score

Score

Top-

kSlide26

Evaluation Setting

Collected

check-in’s and tips from

Foursquare (foursquare.com

)

Confined to the places in the Gangnam District

Ranging from April 2012 to December 2012

Statistics

Variable

Value

# of users9,163# of places (venues)

1,220# of check-in’s243,114# of tips40,248Slide27

Evaluation Results

SocialTelescope

Aardvark

Glaucus

DCG

Set 1

Set 2

Set 3

3.94

3.99

4.07

6.61

6.31

6.68

8.25

8.82

7.78

2.37

1.97

Qualification of the Experts:

Two human judges investigated the profiles of the experts selected by the three systems for

30 questions (distributed to 3 sets) and

gave a score in 3 scales.

Quality of the Answers:

Two human judges examined the quality of the answers

both from experts and non-experts

and gave a score in 3 scales.Slide28

Mobile User Availability

Motivation

Study Methodology

Context

Smart Phone Log

External Information

(Time, Date)

Availability

Classifier

Decision Tree,

SVM,

Random Forest …

26

Features

Class Label

Classification Model

Availability?

Training

Prediction

26

Features

AvailabilitySlide29

User Behavior Collection

 

분류

데이터 종류

 

 

수집 방법

스마트폰

Context Data

배터리

정보

(배터리 잔량

, 충전 여부, 충전 모드)

백그라운드 수집

 

전화 정보(통화 시작시간,

통화 소요시간, 수신/

발신 여부

)

메시지

정보

(

문자 시간

,

수신

/

발신 여부

)

GPS

정보

(

위도

,

경도

)

기기 정보

(

진동모드

,

무음모드

,

비행기모드

, CPU

사용량

,

헤드폰모드

,

스크린 점등

)

주위

정보

(

주변 조명 밝기

,

주변 소음 세기

)

WIFI

정보

(

WIFI On/Off, SSID,

신호 세기

)

Cellular

정보

(

Cellular On/Off,

신호 세기

)

애플리케이션

정보

(

애플리케이션 이름

,

애플리케이션

구동 시간

)

가용성

Data

특정 시각에서의 응답 가능 여부

 

직접 입력Slide30

Preliminary Evaluation Results

Accuracy

10-fold cross validation

10 users for 5 weeks

Important Features

1

st

: Time, Day of Week

2

nd: Running Apps3rd: WIFI SSID, # of Apps (30 mins), Time of Day

Model

AccuracyBaseline (Always Available)0.53Naïve Bayesian0.66SVM

0.64KNN

0.62Decision Tree0.64

Adaboost0.61

Random Forest0.7Slide31

See the paper for the technical details.

Choy, M.,

Lee, J.

,

Gweon, G., and Kim, D.,

"

Glaucus

: Exploiting the Wisdom of Crowds for Location-Based Queries in Mobile Environments," In

Proc. 8th Int'l AAAI Conf. on Weblogs and Social Media (ICWSM)

, Ann Arbor, Michigan, June 2014.Slide32

Thank you very much!Any Questions?

E-mail:

jaegil@kaist.ac.kr

Homepage:

http://dm.kaist.ac.kr/