6375 Final Project Ming Liu Shuotao Xu Motivation Todays Database Management Systems DBMS software running on a standard operating system on a general purpose CPU DBMS frequently used in analytics and scientific computing but bottlenecked by ID: 338557
Download Presentation The PPT/PDF document "A Relational Algebra Processor" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A Relational Algebra Processor
6.375 Final Project
Ming Liu,
Shuotao
XuSlide2
Motivation
Today’s Database Management Systems (DBMS): software running on a standard operating system on a general purpose CPU
DBMS frequently used in analytics and scientific computing, but bottlenecked by:
Processor speed, software overhead, latency & bandwidthProposal: FPGA Based Relational Algebra Processor
Host PC
(DBMS)
FPGARelational Algebra Processor
Physical Storage
2Slide3
Background|Relational
Algebra (RA)
Many database
queries are fundamentally decomposable to five basic RA operatorsAlthough SQL is capable of much more
Operator
Functions
SelectionFilter rows based on a Boolean condition
ProjectionEliminate selected attributes (columns) of a table; remove duplicated results
Cartesian Product
Combine several tables with unique attributesUnion
Combine several tables with the same attributes
Difference
Select rows of several tables where the rows do not match
Design dedicated processors on the FPGA for each operator
3Slide4
Project Goal
Design and implement an in-memory relational algebra processor on the FPGA
Explore the types of queries that can benefit from FPGA acceleration
Secondary: Outperform SQLite!Some assumptions:32-bit wide table entries
Tables fit in memoryMax number of columns is 32
Read only4Slide5
Microarchitecture | Host Software
5
FPGASlide6
Microarchitecture |
Top-Level
RAProcessor
Host PC
(C++ functions)
RA Processor
DRAM
PCIe
Host PC
(DBMS)
RA Processor
Physical Storage
6Slide7
Microarchitecture | Row
Marshaller
7
Exposes a simple interface for operators to access tables in DRAM
Address translation, burst aggregation, truncation & alignment
Multiplexes requestsTable values sent/received as 32-bit burstsSlide8
Microarchitecture | Selection
8
Filters rows based on predicates (e.g. age < 40)
16 predicate evaluators
Internally comparators
A tree of gates to qualify the predicatesMax: 4 ORs of 4 ANDsSlide9
Microarchitecture | Projection
9
Select columns of a table
Column mask one-hot encoded
Do not need to buffer row; operate directly on data burstsSlide10
Microarchitecture | Binary Operators
10
Cartesian
Product, Union, Difference and
Deduplication
Nested loop implementationSlide11
Microarchitecture|
Inter-operator
Bypassing11
Operators enabled concurrently; data passed between operators
No intermediate storage Conditions:
A singly link of unary operatorsEach operator has a single target outputNo structural
hazardSoftware reorders and schedules the RA commandsData source/destination encoded in commandSlide12
Microarchitecture|
Inter-operator
Bypassing12
Multiple 32-bit wide output FIFOs to other operatorsSlide13
Implementation Evaluation
13
Timing
Maximum Frequency: 55.786MHz
Critical Path: Row
Marshaller muxArea
Slice Registers: 50%LUTs: 85%BRAM/FIFOs: 47%
Modules
Slice Registers
LUTs
BRAM/FIFOs
TOTAL
34649
(50%)
59328 (85%)71 (47%)
Row
Mashaller
2804
6627
0
Controller
4570
6277
29
Selection
3137
19633
0
Projection
739
654
0
Cartesian
Product
1935
1478
0
Union
1939
1983
0
Difference
1875
1949
0
Deduplication
1822
1970
0Slide14
Performance Benchmark |
Setup
14
SQLite
Internal
SQLite timer to report execution time of the queryThinkpad T430, Core i7-3520M @ 2.90Ghz, 1x8GB DDR3-1600
RA ProcessorPerformance counters: cycles from start to
ack of an operator
Table
Relational Algebra Query
SQL Query
1 table
100k
x 30
SELECT,starLong,tableOut, mass,>,80000,AND,pos_x,>,10, OR,pos_x,<,pos_z, OR,col12,>,col14, AND,col20,<,col21SELECT * FROM starLongWHERE mass > 80000 AND pos_x > 10 OR pos_x < pos_z OR col12 > col14 AND col20 < col21;
1 table
100k x
30
PROJECT,starLong,tableOut
,
pos_x,col19,col25,col29
SELECT pos_x,col19
, col25,
col29 FROM
starLong
;
2 tables
1k
x 30
UNION,starMed1,starMed2,starUnion
SELECT * FROM starMed1
UNION
SELECT * FROM starMed2;
2 tables
1k
x 30
XPROD,starMed1,starMed2,starXprod
RENAME,starXprod,0,iOrder0,1,mass0,8,phi0
SELECT,starXprod,starFiltered
, iOrder0,=,
iOrder
, AND,phi0
,>,1
,
AND,mass0,>,mass
PROJECT,starFiltered,starOut,mass0
SELECT
s1.mass
FROM starMed1
s1
, starMed2
s2
WHERE s1.vx
> s2.vx
AND s1.phi
> 1
AND s1.mass
>
s2.mass;Slide15
Performance Benchmark |
Results
15
Limitation: Memory Bandwidth: 200MB/s
vs
12.8GB/sSlide16
Performance Benchmark |
Results
16
Select operator
most competitive with SQLite
What happens with
more predicates?Slide17
Improvements
17
Increasing data burst width
32-bit to 256-bit:
potential 8x speedup
Area/critical path increaseMaximizing memory bandwidth
Additional row buffers to buffer data from DDR2 Memory
Larger, faster DRAM; Higher clock speedSlide18
Conclusion & Future Work
18
Complex filtering operations performs well on the FPGA
Better than SQLite with sufficient memory bandwidthData intensive operators do not perform wellFuture opportunities:
An accelerator alongside SQLiteIntegration with HDD/SSD controller