/
Query Processing Query Processing

Query Processing - PowerPoint Presentation

jane-oiler
jane-oiler . @jane-oiler
Follow
463 views
Uploaded On 2017-10-18

Query Processing - PPT Presentation

of Massive Trajectory Data based on MapReduce Qiang Ma Bin Yang Fudan University Weining Qian Aoying Zhou ECNU Presented By Xin Cao Aalborg University Outline Introduction ID: 597267

query data processing trajectory data query trajectory processing based temporal methods object partition spatio load key large range storage

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Query Processing" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Query Processing of Massive Trajectory Data based on MapReduce

Qiang

Ma, Bin Yang (

Fudan

University)

Weining

Qian

,

Aoying

Zhou (ECNU)

Presented By:

Xin

Cao (Aalborg University)Slide2

Outline

Introduction

Preliminary

Trajectory Processing

Execution Overview

Storage

Indexing Methods

Query Processing

Experimental Study

Future WorksSlide3

IntroductionLocation-based services are playing important roles.

Large volumes of diverse formats of trajectory data have been accumulated.

Traditional centralized technologies may not deal with the large amount of trajectories.

Cloud computing, such as GFS and MapReduce, provides a promising paradigm to conquer the explosion of trajectory data.Slide4

Challenge

Huge

volume, updates frequently, rapidly increasing.

Trajectory data is

“continuous”, i.e.

o

rdered sequentially.

Highly

skewed.

MapReduce is good at offline data analysis,

but not

efficient for online query.Slide5

Our Contributions

Extend the MapReduce

framework to manage massive sequential

data, such as trajectories

of moving objects.

Study

what kind of query

processing methods

are appropriate for

large clusters

.

Provide two scalable indexing

methods to facilitate

query processing

efficiently.Slide6

Preliminary

Data

Model - line segments

model

A

polyline

in three-dimensional

space.

Query Types

Spatio

-temporal Range Query:

Q(E

s

, E

t

) → {

S

k

}

Trajectory-based Query:

Q(O, E

t

) → {

S

k

} Slide7

Trajectory ProcessingExecution OverviewsSlide8

Storage

Data

are grouped with key and organized in data chunks in GFS-style storage.

The whole data set is divided into several parts, and each part is called a partition and assigned to one data chunk to store.

Each trajectory data is assigned to at least one partition according to

spatio

-temporal informationSlide9

Storage

A

good

spatio

-temporal partitioning

makes

the size of data per chunk is fairly

uniform.

Static partitioning strategies are easy to control and suitable for distributed scheduling, but may lead to load imbalance.

Dynamic strategies can resolve load imbalance, but

re-split

data can cause distantly migration of large volume of data in clusters.

Appropriate strategies should be trainedSlide10

PMI (Partition based Multilevel Index)

Aim to speed

up

spatio

-temporal range queries.

Generate all

candidate partitions

by

invoking space partition strategy

.

Store together as key/value.

<

PartitionID

,

S

k

>

Each data chunk only contains trajectory segments that belong to the same partition.

Multilevel index for each node can be built local.

(using traditional

centralized methods)Slide11

OII (Object Inverted Index)Aim to speed

up trajectory based queries.

Collect each object's all historical trajectories.

Store together as key/value.

<OID, {

PartitionID

,

T}>

Access according to key(object identifier).Slide12

Data InsertionSlide13

Query Processing

Query

Processing

Trajectory based

Queries

Given any object ID, the system can locate

the object's trajectory according to OII.

Range

QueriesSlide14

Experimental Study

Settings

Hadoop

version 0.19.0

8 PC nodes

Ubuntu

Linux version 8.04

Pentium IV 1.7GHz CPU

512M memory

Java

SDK 1.42

Experiment

data: Network-based

GeneratorSlide15

Experiments – Load Balance

Standard Deviation of Partitioning

Load Balance of PRADASESlide16

Experiments – Data Importing and Index Creating

Data Importing with PMI

Data Importing with

OIISlide17

Experiments – Query Processing

Spatio

-temporal Range Query

Processing with

PMI

Trajectory Base Query Processing

with OIISlide18

Future WorksMore

heuristic partitioning methods.

Reducing

data

migration between nodes.

Efficient real-time query processing on Cloud infrastructure.Slide19

Thanks!