SEQ UENCE ESTIMA TION AND CHANNEL EQ ALIZA TION USING FOR ARD DECODING KERNEL MA CHINES Shantanu Chakr abartty and Gert Cauwenber ghs Center for Language and Speech Processing Dept

SEQ UENCE ESTIMA TION AND CHANNEL EQ ALIZA TION USING FOR ARD DECODING KERNEL MA CHINES Shantanu Chakr abartty and Gert Cauwenber ghs Center for Language and Speech Processing Dept - Description

of Electrical and Computer Engineering Johns Hopkins Uni ersity Baltimore MD 21218 shantanug ert jhuedu ABSTRA CT forw ard decoding approach to ernel machine learning is pre sented The method combines concepts from Mark vian dynam ics lar ge mar gin ID: 27226 Download Pdf

211K - views

SEQ UENCE ESTIMA TION AND CHANNEL EQ ALIZA TION USING FOR ARD DECODING KERNEL MA CHINES Shantanu Chakr abartty and Gert Cauwenber ghs Center for Language and Speech Processing Dept

of Electrical and Computer Engineering Johns Hopkins Uni ersity Baltimore MD 21218 shantanug ert jhuedu ABSTRA CT forw ard decoding approach to ernel machine learning is pre sented The method combines concepts from Mark vian dynam ics lar ge mar gin

Similar presentations


Download Pdf

SEQ UENCE ESTIMA TION AND CHANNEL EQ ALIZA TION USING FOR ARD DECODING KERNEL MA CHINES Shantanu Chakr abartty and Gert Cauwenber ghs Center for Language and Speech Processing Dept




Download Pdf - The PPT/PDF document "SEQ UENCE ESTIMA TION AND CHANNEL EQ ALI..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentation on theme: "SEQ UENCE ESTIMA TION AND CHANNEL EQ ALIZA TION USING FOR ARD DECODING KERNEL MA CHINES Shantanu Chakr abartty and Gert Cauwenber ghs Center for Language and Speech Processing Dept"— Presentation transcript:


Page 1
SEQ UENCE ESTIMA TION AND CHANNEL EQ ALIZA TION USING FOR ARD DECODING KERNEL MA CHINES Shantanu Chakr abartty and Gert Cauwenber ghs Center for Language and Speech Processing, Dept. of Electrical and Computer Engineering Johns Hopkins Uni ersity Baltimore, MD 21218 shantanu,g ert @jhu.edu ABSTRA CT forw ard decoding approach to ernel machine learning is pre- sented. The method combines concepts from Mark vian dynam- ics, lar ge mar gin classifiers and reproducing ernels for rob ust se- quence detection by learning inter -data dependencies. MAP (maximum posteriori)

sequence estimator is obtained by re gress- ing transition probabilities between symbols as function of re- cei ed data. The training procedure in olv es maximizing lo wer bound of re gularized cross-entrop on the posterior probabilities, which simplifies into direct estimation of transition probabilities using ernel logistic re gression. Applied to channel equalization, forw ard decoding ernel machines outperform support ector ma- chines and other techniques by about 5dB in SNR for gi en BER, within 1dB of theoretical limits. 1. INTR ODUCTION Man digital communication recei ers require

equalizers to com- bat channel inter symbol interference (ISI) and co-channel interfer ence to obtain reliable data transmission. raditionally decision- feedback equalizers (DFE) implemented by FIR filters and max- imum lik elihood estimation (MLE) ha been emplo yed for this purpose [7, 5]. The inherent comple xity of true MLE decod- ing procedure renders it impractical in man implementations, and MLE performance is kno wn to de grade under time-v arying chan- nel conditions. Symbol decision equalizers ha relati ely sim- ple architecture and training procedure ut do not perform as well

as nonlinear equalizers based on neural netw orks, such as multi- layer perceptrons or radial basis functions [6]. Lar ge mar gin classifiers, lik support ector machines, ha been the subject of intensi research in the neural netw ork and artificial intelligence communities [2, 13 ]. The are attracti be- cause the generalize well en with relati ely fe data points in the training set. Bounds on the generalization error can be directly estimated from the training data. Recently support ector ma- chines ha been used for nonlinear equalization and ha sho wn to pro vide ery encouraging

results compared to other nonlinear equalizers [11]. Figure sho ws system block diagram for chan- nel equalization emplo ying an SVM equalizer The use of stan- dard SVM classifier inherently assumes that the data points are identically independently distrib uted. This is an unlik ely scenario in ISI channels where there xists sequential structure amongst the recei ed symbols. This paper describes ne architecture, which we term for nonlinear channel e[n] SVM equalizer u[n] x[n] x[n] q[n] Fig System ar hitectur incorpor ating an SVM-based MAP equalizer to compensate for hannel distortion.

ard decoding ernel machine (FDKM), that augments the ability of lar ge mar gin classifiers to perform sequence decoding and to infer the sequential properties of the data. FDKM performs lar ge mar gin discrimination based on the trajectory of the data rather than solely on indi vidual data points and hence relax es the con- straint of i.i.d. data. incorporate Mark vian dynamics into the frame ork of lar ge mar gin classifiers using ernels, and pro vide sequential al- gorithm to train the ernel machine. The time-comple xity of the algorithm can be suitably impro ed by tuning the

parameters of the learning model and also by pruning. The paper is or ganized as follo ws. Section introduces and formulates FDKM along with its training procedure. Section applies FDKM to the problem to channel equalization. Section presents xperimental results. Finally Section pro vides conclud- ing remarks. 2. PR OBLEM FORMULA TION The problem of sequence decoding and channel equalization is for mulated in the frame ork of MAP (maximum posteriori) estima- tion, combining Mark vian dynamics with ernel machines. Con- sider Mark vian model with symbols belonging to classes, as illustrated in

Figure for  ransitions between the classes are modulated in probability by observ ation (data) ectors er time. 2.1. Decoding ormulation The MAP forw ard decoder recei es the sequence         and produces an estimate of the probability of the state ariable !" # er all classes %'& # (*),+-!" . $/ 0 2143 where denotes the set of parameters for the learn- ing machine. Unlik hidden Mark models, the states directly
Page 2
P(0|1, P(0|0, P(1|1 P(1|0, Fig wo state mark model, wher tr ansition pr obabilities between states ar modulated by the

observation vector encode the symbols, and the observ ations modulate transition probabilities between states [3 ]. Estimates of the posterior prob- ability for soft decoding are obtained from estimates of local transition probabilities using the forwar d-decoding proce- dure [1 3] %5&6 # 798;: >#? );& @% AB (1) where )5& 5),+-!C ;D$E/!" FG 5BHI2E 2143 denotes the probability of making transition from class at time  to class at time gi en the current observ ation ector E The forw ard decoding (1) embeds sequential dependence of the data wherein the probability estimate at

time instant depends on all the pre vious data. An on-line estimate of the symbol !C is thus obtained: !IJKML NPOQ2RESTOU # (2) The BCJR forw ard-backw ard algorithm [1] produces in principle better estimate that accounts for future conte xt, ut requires backw ard pass through the data, which is impractical in man ap- plications. Accurate estimation of transition probabilities );& in (1) is crucial in decoding (2) to pro vide good performance. use ernel logistic re gression [9], with re gularized maximum cross- entrop to model conditional probabilities. 2.2. raining ormulation or training

the MAP forw ard decoder we assume access to training sequence with labels (class memberships). Continuous (soft) labels can be assigned rather than binary indicator labels, to signify uncertainty in the training data er the classes. Lik probabilities, label assignments are normalized: 798;: >#?TW # X Y & # [ZB\" The objecti of training is to maximize the cross-entrop of the estimated probabilities %'& gi en by (1) with respect to the labels & # er all classes and training data _^ 8[: >N? 7a8[: >#? bdcIRE% (3) pro vide capacity control we introduce re gularizer ef+-143 in the objecti

function [8]. The parameter space can be parti- tioned into disjoint parameter ectors 1g& for each pair of classes $H0h\"6P such that depends only on The re gularizer can then be chosen as the iEj norm of each disjoint parameter ector and the objecti function becomes _k 8;: >N? 798;: >N? & # bdcYRl%'&  798;: >#? 798[: >#? 1g& (4) where the re gularization parameter controls comple xity ersus generalization as bias-v ariance trade-of [8]. The objecti func- tion (4) is similar to the primal formulation of lar ge mar gin clas- sifier [13]. Unlik the con (quadratic)

cost function of SVMs, the formulation (4) does not ha unique solution and direct opti- mization could lead to poor local optima. Ho we er lower bound of the objecti function can be formulated so that maximizing this lo wer bound reduces to set of con optimization sub-problems with an ele gant dual formulation in terms of support ectors and ernels. Applying the con property of the mbdcIR +n function to the con sum in the forw ard estimation (1), we obtain directly 798;: >#? (5) where 8;: >#? 798;: >#? &6 bdcIRl)5& a 798;: >N? 1,& (6) with ef fecti re gularization sequence #_ko% AB 

(7) Disre garding the intricate dependence of (7) on the results of (6) which we defer to the follo wing section, the formulation (6) is equi alent to re gression of conditional probabilities );& # from labeled data  and for gi en outgoing state Estima- tion of conditional probabilities p'Q+-$6/ ;3 from training data E and labels & can be obtained using re gularized form of ernel logistic re gression [9]. or each outgoing state we construct one such probabilistic model for the incoming state conditional on  )5& #rqU"s;+nt1g&  3u 798;: >N? qUCsN+nt1 E M3E (8) As with SVMs, the

dot products in (8) con ert into ernel xpan- sions er the training data  wm by transforming the data to fea- ture space [12 1g&   xzy =|{ +- w4 53 (9) where +n}d} denotes an symmetric positi e-definite ernel that satisfies the Mercer condition, such as Gaussian radial basis func- tion or polynomial spline [8, 14 ]. arameters are determined by maximizing dual formulation of the objecti function (6) through the Le gendre transformation, which for logistic re gres- sion tak es the form of an entrop y-based potential function in the parameters [9]. use Ne wton-Ralphson

iterati optimization scheme to arri at dual parameter estimates 2.3. Recursi FDKM raining The weights (7) in (6) are recursi ely estimated using an iterati procedure reminiscent of (b ut dif ferent from) xpectation maxi- mization. The procedure in olv es computing ne estimates of the sequence ~, to train (6) based on estimates of )5& using pre- vious alues of the parameters The training proceeds in se- ries of epochs, each refining the estimate of the sequence t4
Page 3
n-1 n-1 n-2 n-1 n-2 n-K time -1 -1 -1 training training -1 re-estimation Fig Iter ations in volved in

tr aining FDKM on tr ellis based on the Mark model of igur 2. During the initial epoc h, pa- ameter of the pr obabilistic model, conditioned on the observed label for the outgoing state at time D of the state at time ar tr ained fr om observed labels at time During subsequent epoc hs, pr obability estimates of the outgoing state at time 0P ver incr easing forwar decoding depth ‚ƒY determine weights assigned to data for tr aining eac of the pr obabilistic models conditioned on the outgoing state by increasing the size of the time windo (decoding depth, er which it is

obtained by the forw ard algorithm (1). The training steps are illustrated in Figure and summarized as follo ws: 1. bootstrap the iteration for the first training epoch  ), obtain initial alues for g„ from the labels of the outgoing state, …† # , This corresponds to taking the labels & A‡ as true state probabilities. 2. rain logistic ernel machines, one for each outgoing class to estimate the parameters in $HmˆYd‰ from the training data  and labels &2 weighted by the se- quence m‡ 3. Re-estimate X  using the forw

ard algorithm (1) er increasing decoding depth by initializing rŠ to 4‚Š 4. Re-train, increment decoding depth and re-estimate 0D until the final decoding depth is reached ‚ ). 3. CHANNEL EQ ALIZA TION USING FDKM FDKM can be directly applied to channel equalization, as depicted in Figure 1. Denote ‹5 source of equiprobable binary symbols sent er channel. The FDKM model can be made to fit an discrete time transmission channel by grouping outputs of the channel into feature ectors  # Œ 5 ‰5 A dd ; F†i ŽG d 

(10) or training, the tar get label is tak en as the input to the channel delayed by samples, i.e ., E’‹5 0„‘… The output of the channel ; E“„” is modeled as the sum of deterministic func- tion of ‹5 and additi white noise •Š The goal of the equal- izer is to reproduce the desired output ‹5+- –.‘T3 The deterministic portion of the channel model could consist, for instance, of lin- ear finite impulse response (FIR) filter follo wed by polynomial nonlinearity Note that FDKM does not actually mak use of chan-

nel model. Instead it adapti ely parameterizes the state transi- tion probabilities in the forw ard decoding from the training data. −3 −2 −1 −2 −1.5 −1 −0.5 0.5 1.5 2.5 x(n) x(n−1) Fig ained pr obabilistic model ),+ \@;3 obtained by lo gistic gr ession befor e-estimation g— ). −3 −2 −1 −2.5 −2 −1.5 −1 −0.5 0.5 1.5 2.5 x(n) x(n−1) Fig ained pr obabilistic model ),+ \@;3 obtained by lo gistic gr ession after e-estimation g _˜ ). Therefore one can xtend the channel model to

include, for in- stance, the ef fect of source coding. 4. EXPERIMENTS AND RESUL TS or comparati figure of merit, we replicated linear channel models of [11, 5, 10]: 5 š ‹; "Ž\" ˜‹5 F‡ (11) 5 š \@ ›œ\œ‹5 aŽ„\@ ž@˜‹5 ‡ @Ž‡\@ ›I\œ‹5 † (12) from which we generated datasets and 2, respecti ely 200 sam- ples each were generated for training FDKM, and 10,000 samples each for testing. The uf fer length as chosen to be and the equalizer delay as tak en to be

1. Therefore, the xtent of ISI for the second data set xceeds the time horizon of the feature ector Ne ertheless, the decoding depth for FDKM as chosen to be ˜ or all the xperiments reported here, polynomial ernel of de gree as used, +-E wm ;3Ÿ +n~Ž„ w4  ;3n Figures and illustrate the impro ements in mar gin distri- ution of the probabilistic model obtained after epochs of the
Page 4
able erformance of equalization sc hemes at 6dB SNR Machine rain1 est1 rain2 est2 SVM 7.3% 8% 14.2% 23% Logistic re gression 14.4% 16% 26% 34% FDKM — 0.1% 0.9%

0.4% 1.6% FDKM _˜ 0.1% 0.1% 0.6% 0.8% 10 12 14 16 18 20 10 −5 10 −4 10 −3 10 −2 10 −1 10 SNR(dB) BER antipodal FDKM SVM PDFSVM Fig erformance valuation for the hannel model (12). The FDKM equalizer deliver performance near that of the ISI-fr ee theor etical limit. iterati FDKM procedure. able gi es the comparati BER figures at 6dB SNR. The re gularization parameter for SVM as k_D˜ and for FDKM kD— and D˜ Interestingly able indicates that generati probability model obtained by logistic re gression gi es orse decoding per formance than

discriminant model by SVM classification. Ho w- er the probabilistic model coupled with forw ard decoding gi es drastic impro ement in decoding performance. Figure com- pares the performance of FDKM with SVM and Perfect Decision Feedback SVM (PDFSVM) [11 ], and the theoretical optimum for non-ISI binary signaling. One can see that FDKM equalization deli ers about 4-5dB impro ement in SNR for gi en BER, er PDFSVM and other techniques. In another xperiment we tested the performance of FDKM equalization with nonlinear channels in the presence of colored noise [11 ]. FDKM equalizer as found

to perform nearly as well as the theoretical Bayes-optimal equalizer 5. CONCLUSIONS presented forw ard decoding architecture for lar ge mar gin classifiers and aluated its performance for combating ISI and nonlinearity due to the communication channel. Simulations ha sho wn that the equalizer outperforms se eral other adapti esti- mation techniques. The decoding architecture is feedforw ard in nature and hence ery amenable to hardw are implementation [4]. The FDKM approach is model-free, and xtends directly to ac- count for arious forms of source coding. 6. REFERENCES [1] Bahl, L.R.,

Cock J., Jelinek and Ra vi J. Optimal de- coding of linear codes for minimizing symbol error rate, IEEE ansactions on Inform. Theory ol. IT -20 pp. 284- 287, 1974. [2] Boser B., Guyon, I. and apnik, ., training algorithm for optimal mar gin classifier in Pr oceedings of the ifth Annual CM orkshop on Computational Learning Theory pp 144- 52, 1992. [3] Bourlard, H. and Mor gan, N., Connectionist Speec Reco g- nition: Hybrid Appr oac h, Kluwer Academic, 1994. [4] Chakrabartty S., Singh, G. and Cauwenber ghs, G. Hybrid Support ector Machine/Hidden Mark Model Approach for Continuous Speech

recognition, Pr oc. IEEE Midwest Symp. Cir cuits and Systems (MWSCAS2000) Lansing, MI, Aug. 2000. [5] Chen, S., Mulgre B. and McLaughlin S. Adapti Bayesian Equalizer with Decision Feedback, IEEE ans- actions on Signal Pr ocessing ol. 41 September 1993. [6] Chen, S., Mulgre B. and Grant .M. clustering tech- nique for digital communications channel equalization using radial basis function netw orks, IEEE ansactions on Neu- al Networks ol.4 (4), 1993. [7] orne G. Maximum-lik elihood sequence estimation of digital sequences in the presence of intersymbol interfer ence, IEEE ansactions on Inform.

Theory ol. IT -18 pp. 363-378, 1972. [8] Girosi, ., Jones, M. and Poggio, Re gularization Theory and Neural Netw orks Architectures, Neur al Computation ol. pp 219-269, 1995. [9] Jaakk ola, and Haussler D. Probablistic ernel re gression models, Pr oceedings of Se venth International orkshop on Artificial Intellig ence and Statistics 1999. [10] Proakis, G.J, Digital Communications 3rd ed. Ne ork: McGra w-Hill, 1995. [11] Sebald, D. and Buckle A. James. Support ector Machine echniques for Nonlinear Equalization, IEEE ansactions on Signal Pr ocessing ol. 48 pp 3217-3226, No ember 2000.

[12] Scholk opf, B., Bur ges, C. and Smola, A., eds., Advances in ernel Methods-Support ector Learning MIT Press, Cam- bridge, 1998. [13] apnik, The Natur of Statistical Learning Theory Ne ork: Springer -V erlag, 1995. [14] ahba, G. Spline Models for Observational Data CBMF- NSF Re gional Conference Series in Applied Mathematics, ol. 59, Philadelphia A: SIAM, 1990.