JOURNA OF COMPU TIONA BIOLOG olum  Numbe   Mar An Liebert Inc Pp

JOURNA OF COMPU TIONA BIOLOG olum Numbe Mar An Liebert Inc Pp - Description

37 52 On Differentia ariabilit of Expressio Ratios Improvin Statistica Inferenc abou Gen Expressio Change fro Microarra Dat MA NEWTON CM KENDZIORSKI CS RICHMOND R BL TTNER an K TSU ABSTRACT We conside th proble of inferrin fol change in gen expressi ID: 29378 Download Pdf

104K - views

JOURNA OF COMPU TIONA BIOLOG olum Numbe Mar An Liebert Inc Pp

37 52 On Differentia ariabilit of Expressio Ratios Improvin Statistica Inferenc abou Gen Expressio Change fro Microarra Dat MA NEWTON CM KENDZIORSKI CS RICHMOND R BL TTNER an K TSU ABSTRACT We conside th proble of inferrin fol change in gen expressi

Similar presentations


Tags : Differentia
Download Pdf

JOURNA OF COMPU TIONA BIOLOG olum Numbe Mar An Liebert Inc Pp




Download Pdf - The PPT/PDF document "JOURNA OF COMPU TIONA BIOLOG olum Numbe..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentation on theme: "JOURNA OF COMPU TIONA BIOLOG olum Numbe Mar An Liebert Inc Pp"— Presentation transcript:


Page 1
JOURNA OF COMPU TIONA BIOLOG olum 8, Numbe 1, 200 Mar An Liebert Inc Pp. 37 52 On Differentia ariabilit of Expressio Ratios Improvin Statistica Inferenc abou Gen Expressio Change fro Microarra Dat M.A NEWTON C.M KENDZIORSKI C.S RICHMOND .R BL TTNER an K. TSU ABSTRACT We conside th proble of inferrin fol change in gen expressio fro cDN microarra data Standar procedure focu on th rati of measure uorescen intensitie at eac spo on th microarra bu to do so is to ignor th fac tha th variatio of suc ratio is no constant Estimate of gen expressio change ar derive withi simpl

hierarchica mode tha account fo measuremen erro an uctuation in absolut gen expressio levels Sign can gen expressio change ar ident ed by derivin th posterio odd of chang withi simila model Th method ar teste vi simulatio an ar applie to panel of scheric ia col microarrays. Key words: empirica Bayesia analysis globa gen expression hierarchica modeling 1. INTRODUCTIO echnolog is no becomin widesprea fo measurin th simultaneou expressio level of thousand to ten of thousand of gene in give cel type Ther is mountin evidenc tha suc dat ca yiel sign can insight int th underlyin biolog of th cel

(e.g. Brow an Botstein 1999 Lander 1999) Coordinate expressio pattern provid clue abou gen functio an she ligh on com ple biomolecula pathways transcriptiona pr le ca characteriz differen cel types thu potentiall enablin improve cance diagnosi an therap fo example Al high-throughpu method interrogat th populatio of mRNA transcribe durin gen expressio in sample cells an the basicall attemp to measur th abundanc of eac uniqu transcript Th method rel on th highl spec proces of hybridizatio to separat th comple poo of mRN molecules On complementar DN (cDNA microarra uniqu cDN molecule ar localize

on glas slid to ac as probe agains tw differen transcrip samples Th tw mRN sample ar prepared separatel labele wit distinc uorescen dyes an the cohybridize to th microarra Fluorescenc signa intensit in bot channel is capture wit confoca microscope an afte som imag analysi to localiz eac probe expressio level ar derive fo eac spo on th microarra withi eac origina Departmen of Statistics Universit of Wisconsin Madison WI 53792 Departmen of Biostatistic an Medica Informatics Universit of Wisconsin Madison WI 53792 Laborator of Genetics Universit of Wisconsin Madison WI 53792 37
Page 2

38 NEWTO ET AL sample Followin convention we us re to indicat th sampl tagge wit th Cy dy an gree to indicat th Cy dye Dugga et al (1999 or Cheun et al (1999) fo example provid furthe detail on ho to obtai cDN microarra data Simila expressio dat ar obtaine on oligonucleotid array (Lipshutz et al ., 1999) thoug we focu on th cDN microarray in th presen development Ther is reaso to expec tha th statistica methodolog describe her wil appl in bot domains accoun fo intrinsi difference betwee th hybridizin samples intensit measurement ar nor malize in som fashion On wa is to compar signal at se of

house-keepin genes i.e. gene though to no presen sign can change in expressio betwee samples Richmon et al (1999 use simpl metho on th E. coli microarray reconsidere here the normalize by th tota signa intensit fro al spots Othe possibilitie includ spikin th prepare sampl wit know concentration of spec genes or combinin measurement take in bot orientation of th dyes componen of eac intensit measuremen at give spo is backgroun uorescence An estimat of thi componen ca be obtaine fro pixel nea th spot an th reporte intensit is the th origina measuremen minu th backgroun measurement In wha follow

we conside th measurement to be normalize an to be adjuste fo backgroun intensit It is typica tha inferenc abou differentia gen expressio betwee tw cel type is base on th rati of measure expressio levels basi statistica proble is to kno whe th measure differentia expressio is likel to re ec rea biologica shif in gen expression Thi depend on th amoun of variatio in th system so it is di cul to justif xe rule suc as to focu on gene exhibitin mor tha 3-fol shift sa Certainl replicatio wil be critica in applications as mor an mor microarray ar measured som co denc in th expressio pr le wil

undoubtedl emerge Ou immediat concer is wit dat fro singl microarray we se som roo fo improvemen in th initia signa processin whic ma hav bearin on downstrea task suc as clusterin or othe form of dat analysi (e.g. Eisen et al ., 1998 Basset et al ., 1999). give fol chang in measure expressio ma hav differen interpretatio fo gen whos absolut expressio is lo as compare to gen tha is brigh in bot uorescen channel (note in Bassett et al [1999 fo example) argu tha an procedur whic use th ra intensit ratio alon to infe differentia expressio ma be ine cien an thu ma lea to excessiv errors Indeed

source of variatio ar expecte to be suc tha th absolut expressio level provid informatio on th variatio of intensit ratios Thi informatio is ignore in th standar treatment On solutio is to ignor an gene whos transcript ar presen at lo tota abundance ma have con denc abou th differentia expressio of remainin genes bu at th pric of throwin awa potentiall valuabl data In an case th choic of cutof ma be arbitrar gen ma be deeme belo th detectio leve in on channe bu no in th other Furthermore th transcrip abundanc of man interestin gene ma be ver low an so th strateg seem fa fro optimal Th solutio

we describ is base on hierarchica model of measure expressio level in whic we accoun fo tw obviou source of variation Th rs we cal measuremen erro In hypothetica repetition of th experiment th measure uorescenc signa wil uctuat aroun som mea valu whic is itsel propert of th cel type th particula gene an othe factors Thes uctuation ar du to multipl source of variatio tha aris in producin th measurement suc as variatio in th preparatio of th mRN sample an in th incorporatio of uorescen tags optica noise an cros hybridization Importantl thi variatio ma include bu is abov an beyond th backgroun

nois mentione above Th secon mai sourc of variatio we conside is du to th differen gene spotte ont th microarra Th mea uorescenc valu aroun whic measure expressio uctuate change fro gen to gen an wil serv as rando effec in ou propose model Th populatio of mRNA fro give sampl is compose of man distinc molecules bu th partitio is no uniform som mRNA ar abundan an other ar rare As we se in Section an 4, by formall combinin th tw source of variatio we ca readil obtai probabilisti statement abou actua differentia expression nd tha th observe ratio ar no optima estimators we nd tha focusin on fol

change alon is insu cien an tha co dence statement abou differentia expressio depen on transcrip abundance Perhaps the rs statistica treatmen of microarra dat analysi is containe in Che et al (1997) Thes author mak th interestin argumen that althoug expecte expressio level do uctuat fro gen to
Page 3
ON DIFFERENTIA ARIABILIT OF EXPRESSIO RA TIO 39 gen acros th microarra th measurement ar linke by havin constan coe cien of variatio sa Then th observe differentia expression sa =G (rati of re to gree intensit at gen ), ha samplin distributio dependen onl on unde th nul hypothesi tha and

have the sam expectation i.e. tha ther is no rea differentia expression an computatio is unde Gaussia mode fo bot measurements Usin se of house-keepin genes whic ar though to no presen rea differentia expression th maximu likelihoo estimat of is derive an co denc intervals fo actua differentia expressio ar compute fro percentile of th estimate nul distributio of The interval ar eas to comput an ar responsiv to th intrinsi variatio of dat on th microarra becaus the us data-dependen valu of (B contrast th procedur to cal sign can an gen presenting sa 3-fol or greate expressio differentia is no

responsiv to suc variation. Th metho ha ignore ancillar information contain informatio abou th variatio of In othe words is not independen of Ther is th mino technica point too tha and ar modele as Gaussia whe in fac the mus be positive conside samplin mode fo measure intensitie in Sectio 2. On ca vie th Che et al (1997 metho as producin se of hypothesi tests on fo eac gen on th microarra in whic th nul hypothesi is tha th expectatio of bot intensit signal is equa an th alternativ is tha the ar unequal Whe an observe fall in th tail of th nul samplin distribution we rejec th nul an declar sign

can differentia expression As in othe domain of application we kno tha som ben ca be attaine if th thousand of parameter ar considere simultaneousl rathe tha in isolatio (Efro an Morris 1973 1977 Carli an Louis 1996) Th calculation presente her attemp to demonstrat th utilit of treatin th gene-spec parameter themselve as member of an array-speci population In Sectio we conside th proble of estimatin an possibl formin co denc interva fo th actua differentia expressio of give gene Thi involve calculation in two-laye hierarchica mode to produc posterio probabilit distributio fo th actua

differentia expressio an an empirica Baye estimat of th same illustrat wit pane of fou E. coli microarrays highlightin distinction betwee th empirica Baye estimate an th naiv estimates simulatio stud show tha tota estimatio erro ca be reduce by usin thi procedure Th proble of testin fo sign can differentia expressio is th focu of Sectio 4, an her thir laye is adde to th hierarchica model deriv functio whic give th odd of actua differentia expressio as functio of measure intensities Thi provide an effectiv summar sor of qualit number fo eac gene simplif ou development we focu on dat fro singl

microarra an we suppos tha ther is on spo fo eac gene addres th issu of mode validatio in Sectio 5, wher prediction fro ou hierarchica model ar compare wit feature in th availabl data Possibl mode extension ar als discussed 2. SAMPLIN MODE FO MEASURE EXPRESSIO Measure intensit level (red) and (green approximat targe mea value and at given spo on th microarra (W avoi usin subscript fo distinguishin spot unles it is absolutel necessar .) Th goa is to estimat = Th standar (naive estimato is R= Measuremen erro depend on signa strengt (Che et al ., 1997) accoun fo thi explicitl and ar modele as

independen sample fro distinc distribution wit commo coe cien of variation nd it convenien to wor wit Gamm distribution havin constan shap paramete The exhibi constan coe cien of variation the ar Gaussian-like the ar supporte on th positiv line an the ar eas to manipulate Th Gamm mode ha lon histor in statistica ecolog (e.g. Fishe et al ., 1943) no onl fo analytica convenienc bu als becaus it ma posses deepe biologica interpretation Denni an Pati (1984 showe tha th Gamm is th approximat stationar distributio fo th abundanc of populatio uctuatin aroun stabl equilibrium In Sectio we commen on th

sensitivit of ou result to th Gamm mode assumption Th probabilit densit of Gamm variabl with scale an shap paramete is p.r a/ exp .a/ for (1)
Page 4
40 NEWTO ET AL where 0.a/ denot thi densit by Gamma a; ). Similarl we mode th measure intensity as Gamma( a; ), an we assum tha and ar independent Th expectation of and are a= and a= respectivel an thu th targe differentia expressio = = Both and hav th sam coe cien of variatio eve thoug the ma hav differen scales By integratin th join distributio of and we ca deriv th samplin distributio of th measure differentia expressio R=G p.t a/ 0.

a/ .a/ .t = = (2) for wher agai = is th paramete of interest Th tai of thi distributio is asymptoti to so we restric to ensure rs moment Th for of thi samplin distributio is wel known it is scal multipl of Bet distributio of th secon kin (Kendal an Stuart 1969 p. 151) At thi poin we coul follo th developmen in Che et al (1997 usin thi Gamm mode instea of th norma mode use in tha work Fo instance we se tha whe 1, i.e. no rea differentia expression th distributio of depend on th coe cien of variatio onl proble wit thi approac is tha we los informatio whe usin th rati alon to asses differentia

expression se wh conside th conditiona samplin distributio of R=G given RG withi th Gamm model Th cas of no actua differentia expression 1, is mos simple Denotin by the common value of and we have p.t s; a/ exp (3) On th multiplicativ scal of intensit measurements act lik tota abundanc fro bot channels Inspectio show tha th scal of thi conditiona distributio depend on i.e. th variatio in is smalle fo large Th variabilit of differentia expressio is no constant an so ignorin thes change ca lea to ine cien statistica procedures On wa to modif th procedur fro Che et al (1997), fo example woul be

to us somethin lik (3 as referenc distributio instea of th analo to (1) instea tak hierarchica modelin approac whic enable direc paramete estimatio an hypothesi testing. 3. ESTIM TIN DIFFERENTIA EXPRESSIO Excep fo tes microarray or house-keepin genes we certainl expec rea difference in gen expres sio betwee cel types an clearl differen gene ca exhibi difference in actua expressio withi give cel type ke distinctio of th presen approac fro earlie effort is th formulatio of spec probabilit mode to characteriz thes uctuations Amon th rang of possibl spec cations, we rst conside simpl Gamm mode fo

th scal parameter and Thi for is conjugat to th Gamm samplin mode an thu permit detaile analysis It entail independenc amon al th scal parameter on th microarra an assume tha the follo th commo Gamm distributio Gamma ). Model ca be improve slightl if we allo differen scal parameters sa and fo th tw dyes bu we tak commo paramete in th presen development Thi mode is reasonabl exible skewe right an present increasin variatio wit increasin mean It represent prio uncertaint in actua expressio levels An extensio whic allow correlatio is describe in Sectio 5. Ou mai reason fo choosin Gamm distributio

to gover th laten scal parameter and are analytica tractabilit an mode exibilit not in passing however tha som theoretica just cation ma als exist Th targe expressio level = and = eac represen som kin of tru abundanc of th give transcrip in th tw mRN pools As such thei distributio concern relativ frequencie of frequencie an th size frequenc relatio characteristi of th Zipf-Paret la ma obtai (Johnson et al ., 1994) If so th relativ frequenc of gene wit transcrip abundanc is proportiona to = fo som powe Th reciproca Gamm distributio ha essentiall th sam densit fo moderat to larg value
Page

5
ON DIFFERENTIA ARIABILIT OF EXPRESSIO RA TIO 41 Wit th tw mode component in place we ca deriv som interestin consequences Notably we ca comput th posterio distributio of th tru differentia expressio at give spo p. R; G; .a .G .R .a (4) where .a; denote th additiona parameter ye to be spec ed Thi is th distributio of th rati of tw independen Gamm variables an it ca be derive in th sam wa as th samplin distributio (2) Uncertaint abou th tru differentia expressio at give spo is characterize by thi distribution an so dependin on ou los function th Baye estimat of is som measur of it center

not th mod an mea of thi right-skewe distributio ar mode and mean As it is in betwee thes tw values an is somewha simpler we tak as th Baye estimat of differentia expression (5) Thi ha th classi for of shrinkag estimato Fo stron signals wil be quit clos to th naiv estimator R= bu ther is attenuatio of th Baye estimate especiall whe th overal signa intensit is lo Thus th Baye estimat naturall account fo th decrease variatio in differentia expressio wit increasin signal Th amoun of attenuatio is governe by th paramete whic is ye to be spec ed Bein th scal paramete of th expressio mode component

we ca represen it in mor familia terms Conside th margina expectatio of signa intensit in th re channel sa obtaine by integratin uncertaint in E.R/ E.R E.a= a =.a and so .a /E.R/=a Thi simpl formulatio contain thre ke quantities Th coe cient of variatio in th measuremen erro is controlle by the coef cien of variatio describin uctuation of actua expressio amon gene is controlle by and E.R/ is th overal averag intensit measuremen in th re channel (W ar assumin th dat ar normalized an ou mode assert tha and have identica margina distributions so we coul us abov instea of .) pragmati approac to

dealin wit th unknow parameter .a; is to estimat the by margina maximu likelihood Tha is by integratin uncertaint in bot and we obtai th predictiv probabilit densit of eac measuremen pai as .r; .a .a/0.a . .rg .r /.g (6) Du to th independenc assumption thi join distributio is th produc of th margina distributio of an th margina distributio of Eac margi is scal mixtur of Gamm distribution an henc is compoun Gamm distributio belongin to yp VI of Pearson syste (Johnso et al ., 1994 p. 381) Th margina loglikelihoo l.a; is th su of contribution fro al spot on th microarra .a; log .r where .r ar

observe at th th spot cal l.a; margina loglikelihoo rathe tha an ordinar loglikelihoo becaus th gene-spec parameter hav bee integrate awa optimiz
Page 6
42 NEWTO ET AL able aramete Estimate in Gamma-Gamm Mode Microarra NA Control 37 2.0 2.1 12.8 Heat shock 82 2.12 1.61 7.13 IPTG-a 20 1.4 1.8 15.2 IPTG-b 14 1.1 1.5 14.7 NA is th numbe of spots ou of 4,290 in whic back groun is highe tha spo intensit an th remainin column giv maximu likelihoo paramete values Th scal is for normalizatio to fraction the time 10 .a; numericall usin th Splus function nlminb (Statistica Sciences 1993) In th

example worke so far we nd that is estimate to be large tha 1, so th estimat of is smalle tha th mea intensit Th inferenc tha result by estimatin parameter as abov is calle empirica Baye (EB (Efro an Morris 1973) Effort at whol genom expressio analysi wer pioneere in E. coli K-1 (Chuan et al ., 1993) Se quencin of th E. coli K-1 genom ha enable th fabricatio of hig resolutio microarray containin th entir complemen of 4,29 ope readin frame in thi genom (Blattne et al ., 1997 Richmon et al ., 1999) demonstrat ou statistica methodolog we reanalyze pane of fou microarray fro thi E. coli project On

of thes is contro microarra tw ar replicate involvin treatmen wit isopropyl -D-thiogalactopyranosid (IPTG in whic it is know tha onl fe transcript shoul be induced an on is fro hea shoc treatmen (HS in whic mor globa change ar expected Fo th contro microarra tota RN wa isolate fro E. coli grow in ric medium Thi singl poo wa spli in two separatel labeled an the mixe an cohybridize to th microarra In th IPT replicates contro RN (labele Cy3 wa cohybridize wit RN (labele Cy5 fro E. coli treate wit IPTG Similarl obtaine wa th hea shoc microarra Richmon et al (1999 provid details Followin thes

authors we invok simpl normalizatio method on eac microarra we rs subtrac backgroun intensit fro eac spot The we divid eac adjuste intensit by th tota intensit obtaine by combinin al positiv adjuste measurements omi fro th estimatio proces an spot wher th backgroun is highe tha th signa (colum 1, abl 1) Maximu likelihoo paramete estimate fo th hierarchica Gamma Gamm mode ar als give in abl 1. Th EB estimate of differentia expressio attenuat th naiv estimates as summarize graphicall in Fig 1. Point indicat th normalized background-adjuste intensitie in th tw dyes sa .G; R/ Lin segment ru fro

eac .G; R/ to .G Owin to th logarithmi scale of course thi shrinkag is mos pronounce fo lo intensit spots It is interestin tha on th contro microarra th shrinkag constan is fairl larg in compariso to th heat-shoc microarra suggestin tha th metho ca distinguis nois fro sign can signal Th attenuatio inheren in th EB estimate affect th rankin of th mos highl differentiall ex presse genes Figur illustrate th rankin change fo tw of th E. coli microarrays conside th to 10 mos differentiall expresse genes as measure by R= For each in th rang to 100 we as ho man of thes to gene ar ranke in th to by th

EB procedure If th method agre (i.e. if is ver small) the th answe is Abou on quarte of th 10 mos highl differentiall expresse gene measure vi ar no in th to 10 as measure by th EB procedure Dat analysi method ofte focu on th mos differentiall expresse genes an so it is quit possibl tha th us of the more ef cien Baye estimatio procedur wil hav an impac on downstream computation usin measure differentia expression Th EB estimat .R /=.G is attenuate compare to th naiv estimato R= bu is th estimat an better repor th result of smal simulatio to addres thi question Gree an re intensitie wer

simulate fo syntheti microarra havin 4,00 spots Intensitie aros fro Gamm distribution wit shap paramete an Gamm distribute scale in whic and 8. incorporate
Page 7
ON DIFFERENTIA ARIABILIT OF EXPRESSIO RA TIO 43 FIG. 1. Shrinkag estimation E. coli Point ar plotte at measure intensitie G; ), an lin segment exten to .G On line paralle to th diagonal fol chang is constant som positiv correlatio betwee gree an re scal parameter usin th mode describe in Sectio with 20 Thu th simulatio mode differe fro th mode use fo tting By keepin trac of th simulate scale and of cours we ca measur th

erro in estimatin fo bot th naiv procedur an th EB procedur (Fig 3) Ther is fairl sign can erro reductio by th EB procedur in thi case Othe case no reporte showe simila erro reductions Beyon poin estimate of differentia expression we ca us (4 to obtai Bayesia co denc interval (credibl intervals) By chang of variables =. is distribute posterior as symmetri Bet distributio wit shap paramete an so endpoint of credibl interva ma be compute by back transformin quantile fro thi symmetri Beta Th credibl interva provide measur of uncertaint in but we nd tha inferenc beyon poin estimatio ma be mor

accurat in model suc as th on describe next in whic it is recognize tha ther ar no rea change in expressio fo som subse of genes.
Page 8
44 NEWTO ET AL FIG. 2. Effec on rankin of genes 4. IDENTIFYIN SIGNIFICAN DIFFERENTIA EXPRESSIO tur attentio to decidin whethe or no th observe difference at give gen ar su ciently larg to asser sign cance Havin thir laye in th hierarchica mode facilitate th calculations Th tru mea intensitie of som proportio of spot chang betwee condition (i.e. 6D ), while th other remai xed . Fo spot whic change we us th previou mode fro Sectio 3. In othe words

scal parameter and ar independen Gamm variate wit commo shap and scale Fo unchange spots th commo scal paramete is deeme to aris fro th sam Gamm distribution. proble presente by th Gamma-Gamma-Bernoull spec catio is tha th identit of th change spot is unknown Th likelihoo calculation (whic we woul us to estimat and ap- pea impossibl complex sinc the involv summatio ove al con guration of missin indica tors, where is th numbe of spots Fortunatel th mode ts wel int th EM algorith framewor (Dempster et al ., 1977) an so we hav simpl recursio to infe thes parameter fro th margina likelihood. The

rs ingredien in th calculatio is th margina probabilit of dat at spo if ther is no rea differentia expression Thi is obtaine by integratin th Gamm mode fo th commo scal In
Page 9
ON DIFFERENTIA ARIABILIT OF EXPRESSIO RA TIO 45 FIG. 3. Erro reduction averag log = 88 wherea averag log = 42 Th empirica Baye estimat give 50 reductio in error contras to (6 in whic ther ar differen scales th margina probabilit in th nul cas is .r; g/ .a/0.a .rg .r (7) Letting and denot th measure intensitie at spo an introducin th binar indicato variabl to be unles ther is tru differentia expression th

complete dat loglikelihoo is .a; p/ log .r log .r log .p/ log p/ Th E-ste is to obtai th conditiona expectatio of .a; p/ whic simpl involve replacin by th posterio probabilit of chang .z p/ pp .r pp .r p/p .r (8) an wit parameter and xe at tentativ values Th M-ste is to maximiz th resultan for in th fou parameters Havin broke th mixtur structure thi maximizatio is simpl ed Immediatel we nd th update estimat of is th arithmeti mea of An off-the-shel numerica procedure such as nlminb in Splus readil optimize th remainin parameter in eac iteration Fort iteration of EM wer use to obtai th estimate

reporte in abl 2, an result wer checke fro variou startin con gurations. Placin prio distributio ove stabilize th computation an enable nic interpretatio of th output us Beta(2,2 prio in wha we repor in abl 2, whic amount to prio assumptio of exchangeabilit of th and that .z upo integratin uncertaint in It is convenien to
Page 10
46 NEWTO ET AL able aramete Estimate in Gamma Gamma Bernoull Mode vi EM Algorith Microarray Control 22.9 0.9 0.2 0.00 Heat shock 2.7 1.3 4.1 0.05 IPTG-a 12.5 0.8 0.3 0.00 IPTG-b 9.6 0.6 0.2 0.00 othe parameters and at thei estimate value rathe tha integratin

agains prio Owin to th larg sampl size ther shoul no be sign can erro in th presen examples Ou goa is to comput posterio odd of chang at eac spot Th odd summariz ou inferenc abou actua differentia expressio at eac spo usin al th dat on th microarra Wit denoting expressio measurement on th whol microarra th posterio odd of chang at spo are: odds .z D/ .z D/ where .z D/ .z p; /P .p D/ dp (9) by conditiona independenc of th dat at differen spot give th paramete Th Baye rul determine .z p; in terms of .r and .r se (8) Also th EM-algorith nds the posterio mod of .p D/ say rs approximation th

integra (9 equal th integran .z p/ evaluate at it moda valu Therefore odds .r; g/ .r; g/ (10) Thes posterio odd ma als be calle Baye factor becaus th prio odd fo chang equa unit An inspectio of th pr le loglikelihoo curv fo indicate tha .p D/ is highl concentrate fo th E. coli examples an so (10 provide goo approximation Figur show contour of th posterio odd of tru differentia expressio compute in th Gamma Gamma-Bernoull mode usin (10) Th contou ma provide an interestin summar of sign cant change Th gre are in eac pane indicate tha th odd favo no change An importan featur of th ma is tha th

contou line ar no straigh on thi log lo scale indicatin tha to conside fol change alon is no enough As we expected we ar les co den abou naiv fol chang at lo signa intensit Dotte line on eac pane correspon to th 99 rul fro Che et al (1997) (W use al th spot to estimat th coe cien of variation an thi wa so larg in th heat-shoc dat tha th Che et al approximatio to th uppe ban brok down. Thes line ar designe so tha the wil be exceede onl fo 1% of spot whic in fac hav no changed Withi th presen model mos of thi 1% ca be expecte to occu at lo tota abundance sig of ine ciency. Th Baye facto

computatio allow us to ran orde gene by thei probabilit of rea differentia expression hav replicat microarray fo th IPT treatment so it is instructiv to chec th repro ducibilit of thes assessments As technica note we ha remove in th estimatio phas an spot fo whic th measure intensit wa belo th estimate background Fo assessin changes we includ thes spot an dee the to presen zer signa in tha channel (Interestingly th lo Baye facto is continuou at zer intensit so ther is no computationa proble in doin so. On IPTG-a 20 gene hav odd bette tha 10: favorin change expression Th replicat IPTG- show

seve change genes Fiv vali gene ar in commo betwee th replicate an ar induce by IPTG Thes ve ar exactl th gene ident ed by Richmon et al (1999 as bein induce on th basi of th sam microarra dat an othe radioactivit data
Page 11
ON DIFFERENTIA ARIABILIT OF EXPRESSIO RA TIO 47 FIG. 4. Odd of rea differentia expressio E. coli data In shade regio odd ar fo no change Contour ar at odd of chang of 1:1 10:1 an 100:1 respectivel Th heat-shoc dat provid an interestin demonstratio of th methodolog Usin Baye factors we nd 38 gene exhibi differentia expressio (2 induce an 13 repressed) Thi

represent abou 1% of th gene on th microarra an prio wor suggest tha mor gene hav change (Richmon et al ., 1999) If we loo at th paramete estimate fo thi cas (T abl 2) we se tha th estimate proportio of change spots is abou 5% (abou 20 genes) an thi numbe is in lin wit th earlie report Bein th optima paramete value satis es th equatio p/ where .p/ is th posterio probabilit of chang at spo as in (8) At th sam time th Baye facto (odd fo change is th rati of p/ to p/ Somethin interestin is goin on On averag ove spots th posterio probabilit of chang is abou 5% an thi lead us to infe tha abou 5% of

spot hav changed Whe it come to decidin whic spot hav changed however onl 38 spots abou 1% hav p/ an thu hav odd favorin change Ou inferenc abou th proportio of change spot doe no nee to be th sam as th proportio of spot whic we ca co dentl sa hav changed In fact by Markov inequalit th proportio of spot in whic p/ is no greate tha and some re nement to thi boun ma be possible
Page 12
48 NEWTO ET AL able Erro Ra tes Simulate Gamma Gamma Bernoull Mode Odds-Bigger-Than- Rul Les Stringen Rul 236 236 True change 577 423 694 306 No change 73 2927 311 2689 otal 650 3350 1005 3995 Tw decisio

rule ar examine belo fo 4,00 spo microarra in whic 1,00 gene hav change in tru gen expression Sign can chang is inferre if th posterio probabilit of change exceed cutoff stud ou methodolog further we performe smal simulation considere singl microarra with 400 spot an in whic dat fo 1,00 spot aros fro th Gamma-Gamm mode wit 12, 1, and 1. Th remainin 3,00 spot ha variabl expressio levels bu ther wa no chang in tru expressio fro gree to red Th sam Gamm mode wa use to generat thes commo expressio level an the als to generat th measure intensities So basicall we simulate th Gamma Gamma Bernoull

model bu we force exactl 1,00 spot to change Paramete estimates obtaine by th EM algorithm wer 12 5, 0, 95, and 26 an thu we recovere th mode parameter extremel well Tabl record erro rate by tw decisio rules akin ou standar rule to cal spo as change if th odd excee unit we inferre onl 65 change spots muc les tha th 1,00 or so whic we conclud hav change fro Thi underestimatio mirror wha happene wit th heat-shoc data In total we mak 49 incorrec call usin th odds-bigger-than-on rule second muc les stringen rul is to cal tota of spot as changed rank-ordere by thei individua posterio probabilities

Fo thes simulate data we thu lowere th ba fro chang probabilit of 0. to chang probabilit of 0.236 an by so doin we produce lis of abou 1,00 genes Thi rul ha th advantag tha ou conclusio abou th numbe of change gene is in lin wit ou reportin of particula genes But at leas in thi simulation we ampl ed th overal erro rate wit thi rule 61 spot wer calle incorrectl Th simulatio highlight di cultie wit inferrin differentia gen expression bu we poin ou tha spec erro rate ma depen on th application 5. MODE ALID TIO AN DISCUSSIO Bot th Gamma Gamm mode an th Gamma Gamma Bernoull mode attemp to captur

som structura feature expecte in microarra data bu of cours the ar highl parameterize an it is importan to chec whethe prediction implie by the ar in lin wit availabl data hav considere severa simpl checks Fo instance we ca compar histogra of measure intensitie to th tted margina mode (Fig 5) Plotte on eac histogra (o th lo scale is th tte margina densit fro th Gamma Gamm mode use fo shrinkag estimatio (dotte line an th tte densit fro th Gamma Gamma-Bernoull mode (soli line) Clearl ther is roo fo improvemen in th t, but the primar feature of th dat ar captured secon interestin chec is base on

well-know propert of th Gamm distribution If and ar measurement on on spot an ther is no rea differentia expression the bot measurement aris fro commo Gamm distributio wit shap and scale Th renormalize differenc .R G/=.R G/ ha symmetri Bet distributio wit shap parameter and i.e. it densit is proportiona to b/ for Notabl thi densit doe no depen on th scal parameter so it is th commo distributio fo all unchange spot on th microarra Figur compare th histogra of value to th tte Bet density Fo th treatmen microarrays we focuse onl on values from spot deeme to be probabl unchange by th Baye facto

computation Thi is tru mode check th ttin procedur doe no attemp to captur variation on th show scale Indeed th is poo in som respects bu agai primar feature of variatio ar captured
Page 13
ON DIFFERENTIA ARIABILIT OF EXPRESSIO RA TIO 49 FIG. 5. Diagnosti check Histogram ar of intensitie (bot color pooled on th natura lo scale Dashe curv is fro Gamma Gamm model soli curv is fro Gamma Gamma Bernoull model ar usin jus fou parameter to describ margina variatio an dependenc betwee th re an gree channels an improvement ma com by increasin th numbe of parameters di le th scal parameter be

colo spec c, an thi improve th ts somewhat especiall on th IPT microarrays as th estimate valu of goe up als le th shap paramete be colo spec c, bu thi di no signi cantl improv ts Th us of additiona scal parameter lea to problem wit th Baye facto contour in Fig 4, an so currentl we ar investigatin mode elaboration an ident abilit questions conjectur tha improvement ma aris if som positiv correlatio is adde to th expressio model ca do thi by retainin th Gamm samplin mode bu addin correlatio betwee and in th expressio model Fo example we migh sa Gamma( an tw multiplier iid Gamma( fo positiv

dependenc paramete Then write and Th multiplier ar centere on unit an wil be clos to unit if is large Thi mode is an intermediat betwee th nul an alternativ model so fa studied an it require mor sophisticate machiner to t, bu it ma be effectiv at identifyin subtl expressio changes Ou method us Gamm distribution bu othe parametri form ca be considered ariou parametri model entai constan coe cien of variatio on th positiv line Th log-norma mode ha bee use fo thi purpose an compariso of th differen formulation wil be usefu (Wiens 1999) On th
Page 14
50 NEWTO ET AL FIG. 6. Diagnosti

check Histogram ar of renormalize difference .R G/=.R G/ for spots deeme to hav no changed Curve ar predicte Bet densities basi of preliminar computations we ca sa tha th sam qualitativ analysi feature carr ove to th log-normal in particula th shap of contour in th Baye facto plot is simila hav use on particula metho of normalizatio an backgroun nois adjustment Probabl som advantag ca be gaine by combinin thes task wit th presen modelin metho to bette accoun fo thes source of variation Fo instance on normalization we coul sa tha th scal parameter in on sampl ar globa constan multipl of thos in

th othe sample an the trea thi constan as anothe mode paramete to be estimate fro unnormalize intensities Thi is simila to th calibratio procedur describe in Che et al ., (1997) bu in th contex of hierarchica model Ou methodolog deal wit singl microarra at time an doe no attemp to combin data thoug th modelin framewor certainl allow thi elaboration On approac woul be to decompos sa lo th expressio scal parameter int contribution fro differen genes differen RN preparations an differen growt conditions Combinin informatio fro multipl microarray ma be an effectiv wa to obtai accurat estimate of

th contributio of differen source of variation Ker et al (2000 provid detail of relate metho whic expresse th expecte valu of log-transforme intensit measurement in term of contribution fro suc factors
Page 15
ON DIFFERENTIA ARIABILIT OF EXPRESSIO RA TIO 51 Hierarchica statistica modelin allow fo ef cien dat processin in large-scal expressio studies Thi provide mor precis estimate of differentia gen expressio an mor accurat assessment of signi can change tha standar method by accountin fo differentia variabilit in data Calculation accoun fo th measuremen erro proces an fo natura

uctuation in absolut expressio levels Preprocessin imag dat vi thes method ma reduc error in downstrea tasks suc as cluste analysi or class cation. ACKNOWLEDGMENTS Th author than Bo Ma fo hi critica revie of an earlie draf an refere fo helpfu comments The als acknowledg interestin discussion wit Gar Churchil who independentl ha considere th us of shrinkag estimatio fo differentia expression an the than Ale Loguino wh ident ed bu in on of th plottin programs Cod an dat use in thi articl ar freel availabl at th rst author we sit ww .stat.wisc.edu newton/. Thi researc wa funde in par by th

Nationa Cance Institute gran R29CA64364-0 to M.A.N an trainin gran A-C 0956 fo C.M.K NI gran R0 GM3568 supporte C.S.R an .R.B REFERENCES Basset Jr. D.E. Eisen M.B. an Boguski M.S 1999 Gen expressio informatics it al in you mine Nature Genet Supp 21, 51 55. Blattner .R. Plunkett G.III. Bloch C.A. Perna N. ., Buland ., Rile M. Collado-Vides J. Glasner J.D. Rode C.K. Mayhe G. ., Gregor J. Davis N. ., Kirkpatrick H.A. Goeden M.A. Rose D.J. Mau R. an Shao 1997 Th complet genom sequenc of Escherichi col K-12. Science 277 1453 1474 Brown .O. an Botstein D. 1999 Explorin th ne worl of th genom wit DN

microarrays Natur Genet Suppl 21, 33 37. Carlin B. ., an Louis .A 1996 Baye an Empirica Baye Method fo Dat Analysi Chapma an Hall Ne ork. Chen ., Doughert E.R. an Bittner M.L 1997 Ratio-base decision an th quantitativ analysi of cDN microarra images J. Biomed Optic 2(4) 364 374 Cheung .G. Morley M. Aguilar ., Massimi A. Kucherlapti R. an Childs G. 1999 Makin an readin microarrays. Natur Genet Suppl 21, 15 19. Chuang S.-E. Daniels D.L. Blattner .R 1993 Globa regulatio of gen expressio in Escherichi col J. Bacterio 175(7) 2026 2036 Dempster A. ., Laird N.M. an Rubin D.B 1977 Maximu likelihoo fro

incomplet dat vi th EM algorith (wit discussion) J. Roya Statistica Society Serie 39, 1 38. Dennis B. an Patil G. 1984 Th gamm distributio an weighte multimoda gamm distribution as model of populatio abundance Mathematica Bioscience 68 187 212 Duggan D.J. Bittner M. Chen ., Meltzer ., an Trent J.M 1999 Expressio pr lin usin cDN microarrays Natur Genetic Supplemen 21, 10 14. Efron B. an Morris C. 1973 Combinin possibl relate estimatio problem (wit discussion) Journa of th Roya Statistica Society Serie 35 379 421 Efron B. an Morris C. 1977 Stein parado in statistics Sci. Am. 236 119 127 Eisen

M.B. Spellman .T ., Brown .O. an Botstein D. 1998 Cluste analysi an displa of genome-wid expres sio patterns Proc Natl Acad Sci US 95 14863 14868 Fisher R.A. Corbet A.S. an Williams C.B 1943 Th relatio betwee th numbe of specie an th numbe of individual in rando sampl of an anima population J. Anima Ecolog 12, 42 58. Johnson N.L. Kotz S. an Balakrishnan N. 1994 Continuou Univariat Distribution vol 1, 2n ed Wile Ne ork. Kendall M.G. an Stuart A. 1969 Th Advance Theor of Statistic vol 1, 3r ed Hafner Ne ork Kerr M.K. Martin M. an Churchill G.A 2000 Analysi of varianc fo gen expressio microarra

data Manuscript http: ww .jax.org research churchill Lander E.S 1999 Arra of hope Natur Genet Supp 21, 3 4. Lipshutz R.J. Fodor S. .A. Gingeras .R. an Lockhart D.J 1999 Hig densit syntheti oligonucleotid arrays Natur Genet Suppl 21, 20 24.
Page 16
52 NEWTO ET AL Richmond C.S. Glasner J.D. Mau R. Jin H. an Blattner .R 1999 Genome-wid expressio pr ling in Es- cherichi col K-12. Nucl Acid Res 27(19) 3821 3835 Statistica Sciences 1993 S-PLU Guid to Statistica an Mathematica Analysis ersio 3. StatSci divisio of MathSoft Inc. Seattle Wiens B.L 1999 Whe log-norma an gamm model giv differen

results cas stud Am Statisticia 53, 89 93. Addres correspondenc to Michae A. Newto Departmen of Statistic 121 Dayto Stree Universit of Wisconsi Madison WI 53706-168 E-mail: newton@stat.wisc.edu