HardwareSoftware Codesign of Wireless Transceivers on Heterogeneous Computing Architectures Benjamin Drozdenko Graduate Research Assistant amp PhD Candidate Northeastern University Boston MA ID: 669639
Download Presentation The PPT/PDF document "Enabling Protocol Coexistence:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Enabling Protocol Coexistence: Hardware-Software Codesign of Wireless Transceivers on Heterogeneous Computing Architectures
Benjamin Drozdenko
Graduate Research Assistant & Ph.D. Candidate
Northeastern University, Boston, MA
Dissertation
Prof. Miriam Leeser (RCL)
Committee
Prof. Kaushik Chowdhury (GENESYS)
Members
Prof. Stefano Basagni
Ph.D. Dissertation Proposal Review
August 5, 2016Slide2
Dissertation Proposal OutlineIntroductionRelated WorkPreliminary Research
802.11b on Host PC with USRP
IEEE 802.11a Background
802.11a on Zynq SoC with FMComms3Proposed ResearchCodesign Estimation, Automation, OptimizationLTE / Wi-Fi Coexistence StudyAutomated Transceive Chain GenerationConclusion
3
4
5
2
1Slide3
Introduction: Issues in Wireless NetworkingIncreasing CongestionMany more devices using different protocols, now often on the same bandwidthsSpectrum ScarcityAll bandwidths currently occupied by now FCC has opened up new ones for multiple protocolsBig Data, Fast Rates
Streaming applications need to send more in less time; protocols must have higher bit rates, lower error rates
Energy Efficiency
Wireless mobile devices must be recharged infrequently; devices must evolve to support lower power consumption 3
4
5
2
1
LTE
Wi-FiSlide4
Introduction: Issues in HW-SW Systems CodesignInflexible HardwareExisting wireless transceivers usually made with static HW, in which modern protocols cannot be modifiedSlow SoftwareSoftware-defined radios (SDR) usually made for low-end processors; too slow for all PHY layer processing blocks
FPGA Design Difficult
High-level designers may not have HDL experience; designs don’t consider FPGA-CPU-RFFE interfaces
Not Specific to WirelessExisting HW-SW codesign environments don’t take into account wireless comms and transceiver design issues3
4
5
2
1Slide5
Introduction: ContributionsTheoretical FrameworkFor approaching the problem of HW-SW codesign of modern wireless transceiversLive Prototype on Heterogeneous ArchitectureComprised of unlike computing elements, like FPGA & processorExample: Zynq SoC, and extension to apply to latest generation Ultrascale Multi-Processor SoC (MPSoC) devices
Protocol Coexistence at PHY Layer
A means for researching issues regarding protocol coexistence at PHY layer using real RF front end HWHW/SW Codesign for ABS DetectionModeling environment can study HW-SW codesign issues using multiple ABS detection as proof of concept
3
4
5
2
1Slide6
Related WorkHardware for RF Front EndAD9361/4, FMComms3, USRPSW for Interface with RF Front EndGNURadio, MathWorks CST/USRPHigh-Level SDR FrameworksATOMIX, CODIPHYSDR/CR Platforms on CPU w/ FPGA
WARP, SORA, AirBlue,
CoPR
SDR/CR Platforms on Zynq SoCIris, CRASH, NI LTE/Wi-Fi TestbedLTE / Wi-Fi Coexistence Studies3
4
5
2
1
[2Ettus]
[6ADI]
[4GNURadio]
[5MathWorks]Slide7
Related Work: Comparison of SDR Systems
Features
ATOMIX
CODIPHY
WARP
Sora
—ZiriaCoPRAirblue
IrisCRASHNI LTE / Wi-Fi Testbed
Our MethodSystem can be described at a high level
Combines SW
processor & reconfigurable HW
Developer
can program HW/FPGA
Can use to study HW-SW
codesign tradeoffs
Easily modifiable for Next Generation protocols
Can use
to study protocol coexistence
Full design open and available to public
3
4
5
2
1Slide8
The first SDR platform we tried connected a host PC with a USRP, which allows for flexibility and tunability in RF-related parametersTwo important lessons learned about SDRs: To enact timing synchronization, need to built the transceiver system functionality around the RF front end
Rely on USRP clock for fixed radio time
Build
transceive() function as basis for state logicRegardless of the RF front end used, certain algorithms always needed to recover a signal and find packet startRF Front End Algorithms: AGC, Frequency Offset CompensationPreamble Detection Algorithm
Preliminary Work: 802.11b on USRP
3
4
5
2
1Slide9
Preliminary Work: 802.11b on USRPTarget HW and SW, Experimental Setup
System design must be based on the HW used
We relied on standard-fare PC's running Ubuntu OS
PCs are incapable of doing anything in consistent time intervals
We instead relied upon the USRP N210 to provide our timing information
3
4
5
2
1Slide10
Preliminary Work: 802.11b on USRPTransceive Function for Synchronized Opsfunction
dr
= transceive(ft, d2s)
% Initialization
persistent
hrx
htx; dr
=complex(zeros(nspf,1)); ns=0;
if
isempty
(
hrx
),
hrx
= ...;
htx
= ...;
end
if
~
ft
% Transmit one frame step(htx,d2s); % Receive one frame using polling
while
(ns == 0), [dr,ns
] = step(
hrx);
end
% Termination
else
, release(hrx
); release(htx);
end
3
4
5
2
1
Reliance upon an external clock requires us to design transceiver system around its interactions with USRP
We create a
transceive
() function designed to execute every
t
radio
secs
We build higher levels using finite state machine (FSM) that executes transceive each time it enters or remains in a stateSlide11
Preliminary Work: 802.11b on USRPRF Front End and Preamble DetectionRF Front End Algorithms: Detection of packets is not possible unless sufficient work is done first to recover the received signal, incl.:ensuring that the received signal's envelope magnitude is amplified to approximately 1, frequency-domain artifacts resulting from differences between the
Tx's
and Rx's local oscillators are removed to correct the received signal's spectrum
Operations involving FFTs must complete in <tradio sec Preamble Detection Algorithms: Most essential component to reducing error rateThe more accurate the algorithm, the longer it takesTo save time, FFT-based algorithm and 2 stages to confirm the presence of the SFD bits.
3
4
5
2
1Slide12
Preliminary Work: 802.11b on USRPCalibration Results
3
4
5
2
1
Compiled MEX shows less
overshoot
and
undershoot
; conforms more strictly to expected
t
radio
=7.04ms
Compiled MEX shows better average time for RFFE processing block, and lower time for higher frequency resolutionSlide13
Recognizing timing limitations in the PC-based 802.11b, I realized SDR needs a heterogeneous system The next SDR platform contains a FPGA, which allows for flexibility and tunability in HWWe Develop a HW-SW Modeling Environment for SDR Needs
Model Wireless Processing Blocks (PBs)
Given a HW-SW Prototyping Platform
Reconfigurable Hardware (HW) ComponentsSoftware (SW) Tools for High-Level DesignModel a HW-SW Divide PointEnact HW-SW Interfacing
Can Reuse & Adapt to Modern Standards
Preliminary Work: 802.11a on ZynqTestbed Requirements
3
4
5
2
1Slide14
Preliminary Work: 802.11a on ZynqHardware Components: Xilinx Zynq
Zynq-Based Heterogeneous Computing System
Zynq-7000 series System-on-Chip (SoC)
Processing System: ARM Cortex-A9 CPUProgrammable Logic: FPGA with DSPs & BRAMWe prototype on 2 varieties: ZC706 & Zedboard
FPGA
Zynq SoC
CPU
FPGA
Zynq SoC
CPU
FPGA
Zynq SoC
CPU
3
4
5
2
1Slide15
JTAG(to FPGA)Preliminary Work: 802.11a on ZynqHardware Components
Host PC: Runs SW Tools
RF Front End:
ADI FMComms3
FPGA
Zynq SoC
CPU
3
4
5
6
4
3
2
1
Receive Path
Transmit
Path
6
5
1
2
Ethernet
(to CPU)
2Tx 2Rx
AD9361
FMC
Slot
Zynq-Based Heterogeneous Computing System
Radio Frequency (RF) Front End
Host Personal Computer (PC)
Zynq-Based Heterogeneous Computing System
3
4
5
2
1Slide16
Scrambler
: performs an XOR operation with a constant sequence to make the data more pseudo-random, unbiased, and independent
Convolutional Encoder
: adds redundancy by producing parity bits using a Boolean polynomial function, producing a coded output of twice the original length
Block Interleaver
: rearranges the bit indices to make random errors and burst errors seem more pseudorandom
BPSK Modulator
: each coded bit is represented as a complex symbol; true becomes 1+0i, false becomes
-1+0i
Symbol to Subcarrier Mapping
: a set of 48 symbols is mapped to subcarriers, divisions of the transmission bandwidth, and
Pilot
signals inserted between them
OFDM Modulator
: an Inverse Fast Fourier Transform (IFFT) converts 64 frequency-domain subcarriers (incl. 12 guard channels) to a 64-sample time-domain sequence
Prepend Cyclic Prefix
: the final 16 time-domain samples are prepended to the start of the OFDM symbol to mitigate inter-symbol interference and to make channel estimation easier
Preamble Switch
: Decide between sending pre-calculated
Preamble
data samples at the start of the sequence or the modulated DATA samples
IEEE 802.11a Background:
Transmitter PHY Layer Processing Blocks
Long Preamble
←2 Frames →
Short Preamble
←2 Frames→
[1IEEE]
3
4
5
2
1Slide17
IEEE 802.11a Background: Receiver PHY Layer Processing Blocks
Frame Recovery
Matched Filter: Short Preamble
Update Rx Buffer
Matched Filter: Long Preamble
Preamble Found Decision:
Return Flag & Index
3
4
5
2
1Slide18
Preliminary Work: 802.11a on ZynqModeling for Wireless Processing Blocks
PB
Transmitter (Tx)
Receiver (Rx)1
ScramblingPreamble
Detection2
Convolutional Coding
OFDM Demodulation3
Block InterleavingBPSK Demodulation
4
BPSK Modulation
Block De-interleaving
5
OFDM Modulation
Viterbi Decoding
6
Preamble Insertion
Descrambling
Tx: Data Bits
Tx: Samples
Rx: Samples
Rx: Data Bits
Simulink Model for Tx or Rx path
Simulink: Design Synchronous Dataflow (SDF) Models
Integrated Profiling: Look at Entire 802.11a PHY Layer Processing Chain
3
4
5
2
1Slide19
Preliminary Work: 802.11a on Zynq
Software Tools
FPGA
Zynq SoC
CPU
3
4
5
6
5
4
3
2
Receive Path
Transmit
Path
7
6
1
2
JTAG
(to FPGA)
Ethernet
(to CPU)
MathWorks Simulink™ Model
HDL Code
Xilinx
Vivado
®
C Code
ARM
Executable
FPGA Bitstream
Embedded Coder™
HDL Coder™
Zynq-Based Heterogeneous Computing System
Host PC: Runs SW Tools
HDL Coder: Create HW Description Language (HDL) code
Vivado: Synthesize, Implement, and Generate FPGA Bitstream
Embedded Coder: Generate C code for ARM Processor
3
4
5
2
1Slide20
Preliminary Work: 802.11a on ZynqModeling 7 HW-SW Divide Points
FPGA
Zynq SoC
CPU
3
4
5
6
4
3
2
1
Receive Path
Transmit
Path
6
5
1
2
V1
SW HW
V2
SW HW
V3
SW HW
V4
SW HW
V5
SW HW
V6
SW HW
V7
SW HW
V1: SW-only model
V2: Adds Tx6 & Rx1 to HW
V3: Adds Tx5 & Rx2 to HW
V4: Adds Tx4 & Rx3 to HW
V5: Adds Tx3 & Rx4 to HW
V6: Adds Tx2 & Rx5 to HW
V7: HW-only model
Zynq-Based Heterogeneous Computing System
Tx Path
1: Additive Scrambling
2: Convolutional Encoding
3: Block Interleaving
4: Digital (BPSK) Modulation
5: OFDM Modulation
6: Preamble Insertion
Rx Path
1: Preamble Detection
2: OFDM Demodulation
3: Digital Demodulation
4: Block Deinterleaving
5: Viterbi Decoding
6: Descrambling
3
4
5
2
1Slide21
Preliminary Work: 802.11a on ZynqResults: CPU Execution Time: Tx on Zynq
Moving one processing block from SW to HW does not necessarily cause speedup
Increase in Tx frame time on ZC706 from V1 to V2 is proof
V1 is SW-only, requires no AXI communication
Keeps all operations in SW
V2 adds small component to HW
Time saved < time spent on CPU-FPGA data transferOur modeling environment can identify location at which
HW-SW interface is best placed
3
4
5
2
1Slide22
Preliminary Work: 802.11a on ZynqResults: CPU Execution Time: Rx on ZC706
Rx maximum CPU frame time decreases as more blocks are moved onto the FPGA
Preamble detection is revealed to be the biggest bottleneck in the Rx model
Moving it in V2 results in the largest drop in frame timeAlso drops with FFT in V3 & Viterbi Decoder in V6 Moving Descrambler in V7 does not show decrease, suggesting we can put it in SW3
4
5
2
1Slide23
Preliminary Work: 802.11a on Zynq ResultsFPGA Resource Utilization & Power Usage
PB
Tx
Rx
1
1.53
1.572
1.82
2.343
1.84
2.35
4
1.84
2.11
5
1.84
2.11
6
1.85
2.11
7
1.84
2.12
Transmitter Res
UtilReceiver Res Util
Power
3
4
5
2
1Slide24
Preliminary Work: 802.11a on Zynq ResultsBlock Variants: Preamble Detection
MF Variant
Default
HDL LongHDL TrainingData Path Delay
(ns)
500314132
% LUTs8.938.215.8
% Registers4.32.0
1.3% DSPs
99.2
35.3
14.7
Total Power (W)
2.65
2.34
2.09
Block uses a matched filter to correlate 2 frames with a fixed set of coefficients
1
st
MF manually assembled from adders & multipliers
Not ideal: uses 99% of DSPs
2
nd MF correlates with full long preamble
But long preamble composed of repetitions of training seq3rd MF correlates with only the training sequence2.38X reduction in path delay1.12X reduction in power
34
5
2
1Slide25
Proposed Research OverviewCo-design Estimation, Automation, OptimizationTheoretical Metric Derivation
Optimization Exploration
Alternate Heterogeneous Platforms
Wi-Fi / LTE CoexistenceCommonalities in Processing BlocksJoint Preamble DetectionMultiple ABS DetectionAutomated Transceive Chain Generation
3
4
5
2
1Slide26
Proposed Research: 1A: Theoretical Metric Derivation
Var
Description
Est TimeEst FreqtC
FPGA Logic Fabric Clock5-10 ns
100-200 MHz
cSTConstant # Logic Stages8
t
DFPGA Data Path Delay40 ns25 MHz
t
S
Sample Time
50 ns
20 MHz
t
F
Frame Time
4
μ
s
250 kHz
c
U
Upsampling Factor1,000
tPProcessor Update Time4 ms250 Hz
Var
Description
Zedboard
ZC706
uS# Logic Slices13,300
54,650
uL# LUTs
53,200218,600
uR
# 1-bit Registers106,400437,200
u
D# DSPs220
900
u
B# BRAM blocks140
545
pHPower of HW
piece
p
S
Power of SW piece
1.530 W
1.566 W
3
4
5
2
1
A major design problem: don’t know how long a Tx or Rx chain takes, and how much space or energy it requires
I plan to derive metrics for timing, utilization, and energy from the PBs in a chainSlide27
Proposed Research: 1B: Optimization ExplorationCustom implementations for the processing blocks that show the longest data path delaysMathWorks tools allow developers to create their own Simulink blocksS-function for incorporating custom C codeBlack Box Interface for incorporating custom HDL code
Incorporate Xilinx IP cores to implement LTE downlink
Algorithms ideal for one protocol & target arch
Example: Schmidl-Cox for 802.11a expects repeated training sequencesPreamble detection via matched filter for multiple protocols3
4
5
2
1Slide28
I plan to derive theoretical metrics on alternate SoC devices
Xilinx Zynq Ultrascale+ MPSoC has Cortex-A application & Cortex-R real-time processors
3 types of HW-SW divide points
Proposed Research:
1C: Alternate Heterogeneous Platforms
3
4
5
2
1
SWA
SWR
HW
HW-SWA
HW-SWR
SWA-SWR
[7Xilinx]Slide29
Proposed Research: 2: Wi-Fi / LTE Coexistence MotivationDue to scarcity, unlicensed spectrum valuableMobile phones could use 2.4/5.8 GHz if unoccupiedLTE variants developed to operate alongside Wi-Fi (e.g. Qualcomm LTE-U, MulteFire; Ericsson LAA,)Modern smartphones already speak both protocols, only using different bandwidths
In case of one connection outage, switch to the other
Issues with how to design a single SDR to accommodate both protocols while optimizing metrics
Challenges of coexistenceSynchronization, OFDM vs. OFDMADCF vs. flexible resource allocation in FDD/TDDCSMA/CA vs. eNodeB subchannel allocation
3
4
5
2
1Slide30
Proposed Research: 2A: Commonalities in Processing Blocks
3
4
5
2
1
I plan to identify how easily processing blocks can adapt to support multiple standards
802.11a
802.11b
802.11g/n
LTE DL-SCH
Coding
Convolutional
(opt.)
Convolutional
Turbo
Rates
1/2,
2/3, 3/4
½ only
1/2,
2/3, 3/4
1/3, ½, 2/3, 5/6
Digital Modulation PSKBPSK, QPSKDBPSK,
DQPSK(D)BPSK,(D)QPSKQPSK only
QAM16QAM, 64QAM
n/a
16QAM, 64QAM
16QAM, 64QAMInterleaving
Blockn/aBlockn/a
Spectrum Spreadingn/a
DSSSDSSSn/a
Mapping/PrecodingTo subcarriers n/aTo subcarriers
To layers, resource grid
OFDMXn/a
XOFDMA FFT
size64n/a64
128-2,048
Cyclic Prefix (μs)
0.8n/a0.85.21-33.33Slide31
Proposed Research: 2A: Block Variant: OFDM IFFT
IFFT Size
64
128256512
1024
Data Path Delay (ns)15.2
16.816.815.618.0
% LUTs19.922.4
27.837.254.9
% Registers
12.3
14.5
19.1
27.7
44.1
% DSPs
6.4
7.7
9.1
10.5
11.8
Total Power (W)
1.84
1.841.851.851.87In LTE, OFDM modulation uses different IFFT sizes to spread symbols onto a larger number of subcarriers
We vary the IFFT sizes to identify its impact on FPGA metricsDelay, resources, and power rises for higher IFFT sizesLimiting factors: #LUTs for multiple IFFTs on FPGA3
4
5
2
1Slide32
Proposed Research: 2B: Wi-Fi/LTE Joint Preamble Detection
3
4
5
2
1
[8MathWorks]
I plan to prototype a preamble detection mechanism that suits both the LTE and Wi-Fi protocols
LTE Primary Synchronization Signal (PSS) is
Zadoff
-Chu seq
At first glance, no similarities between these preambles
Plan to implement matched filters for each, then explore optimization methodsSlide33
Proposed Research: 2C: Multiple ABS Detection
3
4
5
2
1
[9Do]
In LTE, the Almost-Blank Subframe (ABS) helps the Enhanced Inter-Cell Interference Control (eICIC) process
By using a variation of energy detection on PL fabric, I can detect the presence of an ABS
If HW sends a flag from PL to PS to detect a single ABS appearance, SW routine can recognize consistent ABS frames
Thus can tell LTE from 802.11 better than preamble detectionSlide34
Proposed Research: 3: Automated Transceive Chain Generation
3
4
5
2
1
I plan to develop a user interface that automates the generation of a transmit and receive chain
SW HW
HW-SW divide point(s) can be automatically chosen, or set by userSlide35
To do this, need to auto-determine how and when to transfer dataAdvanced eXtensible Interface (AXI): Bus to Connect CPU & FPGA
Direct Memory Access (DMA): To Hold Data Sent b/w CPU & FPGA
First-In First-Out (FIFO): Queue to Buffer Bits in Transit
Proposed Research: 3: ATCG HW-SW Interfacing
Note: t
he data to transfer between CPU & FPGA has a different size and class for each model variant
FPGA
Zynq SoC
CPU
3
4
5
6
7
5
4
3
2
Receive Path
Transmit
Path
1
7
6
1
2
2Tx 2Rx
DAC:
I
1,2
,
Q
1,2
AD9361
ADC:
I
1,2
,
Q
1,2
AXI
DMA
Controller
FIFO
unpack
FIFO
slice
FIFO
concat
FIFO
pack
RF Front End:
ADI FMComms3
Zynq-Based Heterogeneous Computing System
3
4
5
2
1Slide36
Proposed Research: 3: ATCG Data Packing
Data to Send
Data Type
Size of 1#Elements
V1
SamplesSigned Fixed Point16 bits80
V2SamplesSigned Fixed Point
16 bits64V3
SymbolsSigned Integer1-8 bits64
V4
Coded Bits
Boolean
1 bit
48
V5
Coded
Bits
Boolean
1 bit
48
V6
Data
Bits
Boolean1 bit24V7Data Bits
Boolean1 bit24Before sending data between CPU & FPGA, we translate to a 32-bit unsigned integer format for transfer on AXI interconnectWe build a library of Packing blocks to facilitate this transfer
3
4
5
2
1Slide37
Proposed Research: TimelineFall 2016: Derive theoretical framework for estimating time, utilization, & power metrics. Prototype Wi-Fi and LTE DL-SCH protocols on Zynq SoC.
Research issues regarding protocol coexistence at the PHY layer.
Study HW-SW codesign issues with one HW-SW divide point.
Spring 2017: Expand framework to more precisely model data path delay & AXI-S transfers. Employ optimizations to preamble detection for Wi-Fi and LTE DL-SCH chains. Research issues regarding protocol coexistence at MAC layer. Study HW-SW codesign issues with multiple HW-SW divide points. Summer 2017:
Expand framework to model Zynq Ultrascale+ MPSoC.
Collect accuracy and error rate metrics from Wi-Fi and LTE DL-SCH chains. Research issues regarding protocol coexistence with cross-layer interaction. Study HW-SW codesign issues with both real-time and application processors.
3
4
5
2
1Slide38
Conclusion (1)I propose a method for modeling next generation protocol coexistence, and the implementation of a joint Wi-Fi/LTE processing chain as a proof of conceptThe modeling environment will allow researchers to analyze any processing chain at a high level
Derive theoretical estimates for time delay, utilization, power
Assist in selection of one or multiple HW-SW divide points
Processing blocks will be optimized based on their relative size compared to the overall processing chain Save time, improve reliability over alternate SDR solutions
3
4
5
2
1Slide39
Conclusion (2)Wi-Fi / LTE coexistence study of interest to wireless communityAnalyze the commonalities and differences between Wi-Fi and LTE processing blocksPrototype joint preamble detection to handle the most costly processing block in either chain
Further distinguish an LTE from a Wi-Fi transmission by detecting multiple ABS’s in a HW-SW collaborative manner
Automation of transceive chain generation, as time permits
A new path and procedure for studying protocol coexistenceCan be utilized and expanded upon by wireless networking and HW-SW systems design research communities
3
4
5
2
1Slide40
PublicationsPublished: B. Drozdenko, R. Subramanian, K. Chowdhury, and M. Leeser, “Implementing a MATLAB-Based Self-configurable Software Defined Radio Transceiver,” in Cognitive Radio Oriented Wireless Networks - 10th International Conference, CROWNCOM 2015, Doha, Qatar, April 21-23, Revised Selected Papers, 2015, pp. 164–175.R. Subramanian, B. Drozdenko, E. Doyle, R. Ahmed, M. Leeser, and K. R. Chowdhury, “High-Level System Design of IEEE 802.11b Standard-Compliant Link Layer for MATLAB-Based SDR,” IEEE
Access, vol. 4, pp. 1494–1509, 2016.
Submitted, Pending:
B. Drozdenko, M. Zimmermann, T. Dao, K. Chowdhury, and M. Leeser, “Modeling Considerations for the Hardware-Software Co-design of Flexible Modern Wireless Transceivers,” in 2016 26th International Conference on Field Programmable Logic & Applications (FPL), Lausanne, Switzerland, August 29-September 2, 2016. [Accepted, to appear August 2016]B. Drozdenko, M. Zimmermann, T. Dao, K. Chowdhury, and M. Leeser, “Hardware-Software Codesign of Wireless Transceivers on Zynq Heterogeneous Systems,” in IEEE Transactions on Emerging Topics in Computing, Special Issue on Next Generation Wireless Computing Systems, vol. 5, no. 1, March 2017. [Accepted with Major Revision in May 2016, expected publication March 2017]
3
4
5
2
1Slide41
Posters & AcknowledgmentsExtended Abstracts & Posters: B. Drozdenko, R. Subramanian, K. Chowdhury, and M. Leeser, "Bi-Directional Transceiver Implementation with Optimal Parameter Selection", 5th Annual New England Workshop on Software Defined Radio (NEWSDR 2015), Worcester, MA, May 22, 2015. B. Drozdenko, M. Zimmermann, T. Dao, K. Chowdhury, and M. Leeser, "High-Level Hardware-Software Codesign of an 802.11a Transceiver System using Zynq SoC", 2016 Boston Area Architecture Workshop (BARC 2016), Boston, MA, January 29, 2016.
B. Drozdenko, M. Zimmermann, T. Dao, M. Leeser, and K. Chowdhury, "High-Level Hardware-Software Co-design of an 802.11a Transceiver System using Zynq SoC", in Proceedings of the IEEE International Conference on Computer Communications, INFOCOM 2016, San Francisco, CA, April 10-15, 2016, pp. 469-470.
B. Drozdenko, M. Zimmermann, T. Dao, K. Chowdhury, and M. Leeser, “Profiling 802.11a Standard on the Xilinx Zynq System-on-Chip”, in 6th Annual New England Workshop on Software Defined Radio (NEWSDR 2016), Boston, MA, June 3, 2016.
Acknowledgments:
3
4
5
2
1Slide42
References[1] Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications: High-speed Physical Layer in the 5 GHz Band, IEEE Std. 802.11a-1999, 1999.[2] United States Department of Commerce, National Telecommunications & Information Administration, Office of Spectrum Management. “United States Frequency Allocations: The Radio Spectrum. URL: http://www.ntia.doc.gov/files/ntia/publications/2003-allochrt.pdf
.
[3] I. O. Ettus Research, “USRP N200/N210 networked series,” 2015. [Online]. URL:
http://www.ettus.com. [4] GNU Radio Project. [Online], “GnuRadio: The free and open source radio ecosystem,” 2015. [Online]. Available: http://www.gnuradio.org[5] MathWorks, Inc. (2016) Zynq SDR support from communications system toolbox. [Online]. Available: http://www.mathworks.com/hardwaresupport/zynq-sdr.html[6] Analog Devices, Inc. (2015) Integrated transceivers, transmitters, and receivers. [Online]. http://www.analog.com/en/products/rfmicrowave/integrated-transceivers-transmitters-receivers.html
. [7] Xilinx, Inc. (2016) Zynq UltraScale+ All Programmable Heterogeneous MPSoC. [Online].
Available: http://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html. [8] MathWorks, Inc. (2016) Synchronization Signals (PSS and SSS). [Online]. Available:
http://www.mathworks.com/help/lte/ug/synchronization-signals-pss-and-sss.html. [9] M. M. Do and H. J. Son. (2014) Interference Coordination in LTE/LTE-A (2): eICIC (enhanced ICIC). [Online]. Available: http://www.netmanias.com/en/post/blog/6551/lte-lte-a-eicic/interference-coordination-in-lte-lte-a-2-eicic-enhanced-icic.
3
4
5
2
1Slide43
Supplementary Material: Introduction: Issue: Spectrum Scarcity
54-698 MHz:
802.11af TV Whitespace Reuse
3.55-3.65 GHz:
Military RADAR Reuse
2.4, 5.8 GHz:
802.11a/b Designated ISM Bands
3
4
5
2
1
[2USNTIA]Slide44
Modeling Environment
Supplementary Material: Barriers:
Why Making Such Wireless Transceivers Is Hard
Comms protocols evolve; transceiver HW/SW must evolve too!
SW
f
c
=
5.8 GHz
f
s
=
20 Msps
HW
B2: HW & SW must be reconfigurable
B4: Map behaviors to HW or SW
B3: Each processing block (PB) must be same on HW&SW
B1: HW-SW modeling environment
y=FFT(x)
module FFT(
x,y
)
function FFT(
x,y
)
==
==
ProcBlk3
ProcBlk1
ProcBlk2
Effective Bus
FPGA
3
4
5
2
1Slide45
Supplementary Material: Introduction: Approach to Modeling Transceivers for Coexistence
3
4
5
2
1
Signal Processing for
Wireless Communications
at the PHY Layer
Implementation Portability
Equivalent functionality
on HW & SW
Time Synchronization
Low Energy
Low Error Rate
Spectrum Reuse:
Put multiple protocols
on specific bandwidths
Simulink/Vivado-based
HW-SW Modeling Environ
ADI FMComms board for
Tunable Radio Parameters
Enabling Technologies
System goals and challenges
Fundamental Research
Adapt wireless processing
chain to new specifications
and to handle contention
Zynq SoC: RISC processor,
Reconfigurable HW (FPGA)
Time & Energy
Optimization Techniques
Map processing blocks
to HW & SW, and est.
timing & utilization
R1
R2
R3
R4
T1
T2
T3
Wireless Networking
Study Effects of Multiple
Protocols on Same BW
R5Slide46
Supplementary Material: Introduction to SDR: from Static HW to SW
Radio Hardware:
Superheterodyne
Transmitter/Receiver
[10Wikipedia]
SDR: Universal Software Radio Peripheral (USRP)
[3Ettus]
3
4
5
2
1Slide47
Supplementary Material: SDR ESL Design: SWAP Tradeoffs
3
4
5
2
1Slide48
Supplementary Material: IEEE 802.11 Std Characteristics & Processing BlocksPLCP PreambleScramblingConvolutional EncodingBlock InterleavingPSK ModulationSymbol-to-Subcarrier Mapping and Pilot InsertionOFDM Modulation with Cyclic Prefix Attachment
3
4
5
2
1Slide49
Background: IEEE 802.11b Transmitter Processing Blocks
[Ref 9.3]
Scrambling
[Ref 9.4]
Modulation
[Ref 3]
Spectrum Spreading
[Ref 7]
Framing (802.11b)
[Ref 13.1]
Raised Cosine Filter
3
4
5
2
1Slide50
Background: IEEE 802.11b Receiver Processing Blocks
3
4
5
2
1
Receiver Front End:
Receiver Controller:
[Ref 9.1]
AGC
[Ref 10]
Frequency Offset Compensation
[Ref 9.2]
[Ref 10]
Synchronization
[Ref 2]Slide51
Supplementary Material: 802.11b on USRPDesignated Transmitter & Receiver
RFFE
R
adio
F
requency
F
ront
E
nd: AGC, Frequency Offset Estimation & Compensation, and Raised Cosine Receive Filter (RCRF)
PD
P
reamble
D
etection
DDD
D
espreading,
D
emodulation, and
D
escrambling
SMS
RCTF
S
crambling,
M
odulation, and
S
preading
R
aised
C
osine
T
ransmit
F
ilter
3
4
5
2
1Slide52
Supplementary Material: 802.11b on USRPPreamble Detection Algorithm
1.) Compare Received Signal (complex samples) to Expected Spread Preamble (real samples)
Despread
and demodulate to get real bits
2.) Compare Demodulated Signal to Expected Scrambled Preamble (real symbols)
−Window1
+Window1
Descrambled Next USRP Frame
Expected SFD Sequence
−Window2
+Window2
3.) Compare Descrambled 2
nd
USRP Frame to Expected SFD Sequence (real bits)
3
4
5
2
1Slide53
Preliminary Work: 802.11a on Zynq ResultsBlock Variants: Viterbi Decoder
VD Variant
Delay-Based
BRAM-BasedData Path Delay (ns)308314
% LUTs
41.040.3
% Registers4.23.2
BRAM Tiles02
Total Power (W)2.362.36
VD Power (W)
0.011
0.005
Block reverses effects of Convolutional Encoder
Requires memory to hold intermediate state values
1
st
VD uses delay blocks to hold state memory
Exhibits lower path delay
2
nd
VD uses BRAM tiles to hold state memory
Uses fewer LUTs and registersSlightly lower power
Illustrates tradeoff between time and powerCan dynamically tune design to target either objective
34
52
1Slide54
Supplementary Material: 802.11a Transmit and Receive Chains
Preamble
Detect
OFDM Demodulate
Block
Deinterleave
Digital
Demodulate
Viterbi
Decode
Descramble
From RF Board
Data
Bits
Scramble
Convolutional Encode
Digital Modulate
Block Interleave
OFDM Modulate
Preamble
Switch
Data Bits
To RF Board
BPSK, QPSK
,
16QAM, 64QAM
FFT Size:
64
Cyclic Prefix: 0.8
μ
s
For live, online tests, insert:
Automatic Gain Control (AGC)
Frequency Offset Compensation
For changing datasets, insert:
Zero Bit Padding
PHY Layer Framing
CRC Calculation & Appending
802.11a: Frequency Band: 5.8 GHz; Rates: 6, 12, 18, 24, 36, 42, 48, 54 Mbps
½, 2/3, ¾ rates
For changing datasets, insert:
CRC Checking
PHY Layer Deframing
Zero Bit Removal
3
4
5
2
1Slide55
Supplementary Material: 802.11b Transmit and Receive Chains
Raised Cosine Rx Filter
Preamble
Detect
Spectrum
Despread
Digital
Demodulate
Descramble
From RF Board
Data
Bits
802.11b: Frequency Band: 2.4 GHz; Rates: 1, 2, 5.5, 11 Mbps
Scramble
Digital Modulate
Spectrum Spread
Preamble
Switch
Raised Cosine Tx Filter
Data Bits
To RF Board
For changing datasets, insert:
Zero Bit Padding
PHY Layer Framing
CRC Calculation & Appending
For live, online tests, insert:
Automatic Gain Control (AGC)
Frequency Offset Compensation
D
BPSK,
D
QPSK
DSSS:
Direct Sequence Spread Spectrum
For changing datasets, insert:
CRC Checking
PHY Layer Deframing
Zero Bit Removal
Optional:
Convolutional Encode
Optional:
Convolutional Decode
PBCC:
½ rate only
128 One Bits, scrambled & spreaded
+ fixed SFD sequence
3
4
5
2
1Slide56
Supplementary Material: 802.11g/n802.11g: Frequency Band: 2.4 GHz; Rates: 1, 2, 5.5, 6, 11, 12, 18, 24, 36, 42, 48, 54 MbpsEffectively Combines 802.11a and 802.11b standards802.11g preamble is combined 802.11b + 802.11a preambles
802.11n: Frequency Band: 2.4 GHz or 5.8 GHz; Additional Rates up to
600
Mbps
Uses up to 4 Antennas to improve throughput using MIMO SDM
3
4
5
2
1Slide57
Supplementary Material: LTE Downlink Shared Channel
PSS/SSS
Detect
OFDMA Demodulate
De-
precode
Layer De-map
Digital
Demodulate
Turbo
Decode
Descramble
From RF Board
Data
Bits
Turbo Encode
Rate Match
Scramble
Digital Modulate
Layer Map
Precode
OFDMA
Modulate
PSS/SSS
Switch
Data Bits
To RF Board
QPSK
,
16QAM, 64QAM
FFT Sizes:
128-2,048
Cyclic Prefix: 5.21-33.33
μ
s
For live, online tests, insert:
Automatic Gain Control (AGC)
Frequency Offset Compensation
DL Channel Estimation
For changing datasets, insert:
Zero Bit Padding
CRC Calculation & Appending
Code Block Segmentation
LTE Downlink Shared Channel (DL-SCH): Frequency Bands: 700 MHz, 1.7 GHz, 1.9 GHz; Bit rates of 1.7-403.2 Mbps
LTE standard is up to 8x8 MIMO for DL, but we will prototype for 1 antenna
1/3, ½, 2/3, 5/6 rates
For changing datasets, insert:
Code Block Desegmentation
CRC Checking
Zero Bit Removal
For LTE, synchronization signals determine the frame timing
3
4
5
2
1Slide58
Supplementary Material: LTE Fragmented Bandwidths
Band
LTE UL
LTE DL11.92-1.98 GHz2.11-2.17 GHz2
1.85-1.91 GHz
1.93-1.99 GHz31.71-1.785 GHz
1.805-1.88 GHz41.71-1.755 GHz
2.11-2.155 GHz5824-849 MHz
869-894 MHz6830-840 MHz875-885 MHz
7
2.5-2.57
GHz
2.62-2.69 GHz
8
880-915 MHz
925-960 MHz
9
1.75-1.785 GHz
1.845-1.88 GHz
10
1.71-1.77 GHz
2.11-2.17 GHz
11
1.428-1.448 GHz1.476-1.496 GHz12699-716 MHz729-746 MHz13
777-787 MHz746-756 MHz14788-798 MHz758-768 MHz
Band
LTE UL
LTE DL
17704-716 MHz734-746 MHz
18815-830 MHz860-875 MHz
19830-845 MHz
875-890 MHz20
832-862 MHz791-821 MHz21
1.448-1.463 GHz1.496-1.511 GHz
223.41-3.49 GHz3.51-3.59 GHz
232-2.02 GHz2.18-2.2 GHz
241.627-1.661 GHz
1.525-1.559 GHz251.85-1.915 GHz
1.93-1.995 GHz
26814-849 MHz859-894 MHz
331.9-1.92 GHz
1.9-1.92 GHz342.01-2.025 GHz
2.01-2.025 GHz
…
…
…
43
3.6-3.8 GHz
3.6-3.8 GHz
80 MHz
TDD
200 MHz, TDD
Rel11
3
4
5
2
1