of Massive Trajectory Data based on MapReduce Qiang Ma Bin Yang Fudan University Weining Qian Aoying Zhou ECNU Presented By Xin Cao Aalborg University Outline Introduction ID: 597267
Download Presentation The PPT/PDF document "Query Processing" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Query Processing of Massive Trajectory Data based on MapReduce
Qiang
Ma, Bin Yang (
Fudan
University)
Weining
Qian
,
Aoying
Zhou (ECNU)
Presented By:
Xin
Cao (Aalborg University)Slide2
Outline
Introduction
Preliminary
Trajectory Processing
Execution Overview
Storage
Indexing Methods
Query Processing
Experimental Study
Future WorksSlide3
IntroductionLocation-based services are playing important roles.
Large volumes of diverse formats of trajectory data have been accumulated.
Traditional centralized technologies may not deal with the large amount of trajectories.
Cloud computing, such as GFS and MapReduce, provides a promising paradigm to conquer the explosion of trajectory data.Slide4
Challenge
Huge
volume, updates frequently, rapidly increasing.
Trajectory data is
“continuous”, i.e.
o
rdered sequentially.
Highly
skewed.
MapReduce is good at offline data analysis,
but not
efficient for online query.Slide5
Our Contributions
Extend the MapReduce
framework to manage massive sequential
data, such as trajectories
of moving objects.
Study
what kind of query
processing methods
are appropriate for
large clusters
.
Provide two scalable indexing
methods to facilitate
query processing
efficiently.Slide6
Preliminary
Data
Model - line segments
model
A
polyline
in three-dimensional
space.
Query Types
Spatio
-temporal Range Query:
Q(E
s
, E
t
) → {
S
k
}
Trajectory-based Query:
Q(O, E
t
) → {
S
k
} Slide7
Trajectory ProcessingExecution OverviewsSlide8
Storage
Data
are grouped with key and organized in data chunks in GFS-style storage.
The whole data set is divided into several parts, and each part is called a partition and assigned to one data chunk to store.
Each trajectory data is assigned to at least one partition according to
spatio
-temporal informationSlide9
Storage
A
good
spatio
-temporal partitioning
makes
the size of data per chunk is fairly
uniform.
Static partitioning strategies are easy to control and suitable for distributed scheduling, but may lead to load imbalance.
Dynamic strategies can resolve load imbalance, but
re-split
data can cause distantly migration of large volume of data in clusters.
Appropriate strategies should be trainedSlide10
PMI (Partition based Multilevel Index)
Aim to speed
up
spatio
-temporal range queries.
Generate all
candidate partitions
by
invoking space partition strategy
.
Store together as key/value.
<
PartitionID
,
S
k
>
Each data chunk only contains trajectory segments that belong to the same partition.
Multilevel index for each node can be built local.
(using traditional
centralized methods)Slide11
OII (Object Inverted Index)Aim to speed
up trajectory based queries.
Collect each object's all historical trajectories.
Store together as key/value.
<OID, {
PartitionID
,
T}>
Access according to key(object identifier).Slide12
Data InsertionSlide13
Query Processing
Query
Processing
Trajectory based
Queries
Given any object ID, the system can locate
the object's trajectory according to OII.
Range
QueriesSlide14
Experimental Study
Settings
Hadoop
version 0.19.0
8 PC nodes
Ubuntu
Linux version 8.04
Pentium IV 1.7GHz CPU
512M memory
Java
SDK 1.42
Experiment
data: Network-based
GeneratorSlide15
Experiments – Load Balance
Standard Deviation of Partitioning
Load Balance of PRADASESlide16
Experiments – Data Importing and Index Creating
Data Importing with PMI
Data Importing with
OIISlide17
Experiments – Query Processing
Spatio
-temporal Range Query
Processing with
PMI
Trajectory Base Query Processing
with OIISlide18
Future WorksMore
heuristic partitioning methods.
Reducing
data
migration between nodes.
Efficient real-time query processing on Cloud infrastructure.Slide19
Thanks!