EqualcontributionyCorrespondingauthor Figure1OurproposedframeworkconsistsoftwobranchesTheupperbranchextractsconvolutionalfeaturesfromtheinputimageThelowerbranchcomputesthecontrastinducedattentio ID: 939967
Download Pdf The PPT/PDF document "AlignAttendandLocateChestXrayDiagnosisvi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Align,AttendandLocate:ChestX-rayDiagnosisviaContrastInducedAttentionNetworkwithLimitedSupervisionJingyuLiu1Ì,GangmingZhao2Ì,YuFei1,MingZhang1,YizhouWang1;2;3,YizhouYu2y1DepartmentofComputerScience,PekingUniversity2DeepwiseAILab3PengChengLaboratoryObstaclesfacingaccurate楤æ®ç©É£æ ´æ¥¯æ¸andlocalizationofdiseasesinchestX-rayimageslieinthelackofhigh-qualityimagesandannotations.Inthispaper,weproposeaContrastInducedAttentionNetwork(CIA-Net),whichexploitsthehighlystructuredpropertyofchestX-rayim-agesandlocalizesdiseasesviacontrastivelearningonthealignedpositiveandnegativesamples.Toforcetheatten-tionmoduletofocusonabnormalities,wealsointroducealearnablealignmentmoduletoadjustalltheinputim-ages,whicheliminatesvariationsofscales,angles,anddis-placementsofX-rayimagesgeneratedunderbadscancon-ditions.Weshowthattheuseofcontrastiveattentionandalignmentmoduleallowsthemodeltolearnrich楤æ®ç©É£ætionandlocalizationinformationusingonlyasmallamountoflocationannotations,resultinginstate-of-the-artperfor-manceinNIHchestX-raydataset.1.IntroductionChestX-rayimageanalysisservesasacrucialroleinclinicaldiagnosisofchestdiseases.Traditionally,itre-quiresyearsofaccumulatedexpertiseandconsistentcon-centrationtoɮ楳æ thetask,addingheavyworkloadstora-diologists.Fortunately,wecanformulatechestX-rayim-ageanalysisasaæ¬æ ³ç©É£æ ´æ¥¯æ¸task,whichassignsapartic-ulartypeofdiseasetoanimage,togetherwithadetectiontask,whichprovideslocationannotationoftheabnormal-ity.Therefore,automaticimageanalysissystemscouldbeimplementedwiththehelpofdeepConvolutionalNeuralNetwork(CNN)methods[28,5,11,12,27].Toachievegoodperformanceinnaturalimages,classicCNNapproachesrequiretonsofsampleswithimage-levellabelsforimageæ¬æ ³ç©É£æ ´æ¥¯æ¸¬andsampleswithbothcat-egoryandlocationlabelsforobjectdetection.However, ÌEqualcontributionyCorrespondingauthor Figure1.Ourproposedframeworkconsistsoftwobranches.Theupperbranchextractsconvolutionalfeaturesfromtheinputim-age.Thelowerbranchcomputesthecontrastinducedattentionontheextractedfeaturemap.Theinformationfromeachbranchismergedtoproducedisease楤æ®ç©É£æ ´æ¥¯æ¸andlocalizationresultsfortheinputchestX-rayimages.theserequirementsraisetwochallengeswhenitcomestochestX-rayimagediagnosis.First,accuratelocationlabelsareexpensivetoacquireforchestX-rayimages,makingithardtotrainanaccuratedetector.Second,thediversityoflocation,shapeandtexturemakecertaincategoriesofab-normalityvagueandmutuallyconfused.Inthispaper,weproposeanovelContrastInducedAttentionNetwork(CIA-Net)(Figure1)toaddresstheseproblems.ThemotivationofCIA-Netoriginatesfromtheconsistencyofthoracicstructureamonghumans.Throughcontrastivestudy,whichexploitsvisualcontrastbetweenapairofpositiveimage(withdiseases)andnegativeimage(withoutdiseases),CIA-Netcapturesadditional楤æ®ç©É£ætionandlocalizationinformationinthelackofannotation.å°æ£æ¤æ¡æ±¬ç¤,weextracthigh-levelimagefeaturerepresenta-tionsoftheimagepairfromCNN.Then,toutilizethehigh-lystructuredpropertyofinputs,wecomputeL1-distancebetweencorrespondingpixelsinthenegativeimageandthepositiveone,theresultofwhichservesasanindicationofpossiblelesionsitesonthelatter.However,someimages, 10632 Pathology DiagnosisInfiltrationPneumothorax Identification Localization Contrast Induced Attention Positive SampleNegative Sample especiallythepositiveones,sufferfromgeometricdefor-mationcausedbypoorscanconditions.Therefore,tora-tionalizetheprocessofcontrastivelearning,weproposealearnablea
lignmentmoduletoadjustinputimagestobege-ometricallycanonical.Finally,tofurtherutilizethelimit-edlocationannotation,weapplyMultipleInstanceLearn-ing(MIL)toperformend-to-endtrainingonCIA-Net.WeshowthatwiththehelpofthealignmentmoduleandCIA-Net,evenforvague,tinyandrandomlyappearedlesion-s,CIA-NetmakesmoreaccuratepredictionsthanpreviousAboveall,ourcontributionliesinthreefolds:WeproposeCIA-Net,whichistheɲç´tocapturein-formationbycontrastingpositiveandnegativeimages.Moregenerally,itprovidesinspirationtoaddressvi-siontaskswithsamplessharinghighsimilarityintheirvisualstructure.Weproposealearnablealignmentmodule,whichiseffectiveintransformingandaligningimagesindif-ferentscanconditions.Thistechniquecanalsobeuti-lizedinothermedicalimagesanalysistasksrequiringWeachievestate-of-the-artresultsonbothæ¬æ ³ç©É£ætionandlocalizationonChestX-ray14.2.RelatedWorkAutomaticChestX-rayAnalysisThereleaseoflargescalechestX-raydatasetsallowswideapplicationsofdeeplearningmethodsonautomaticchestX-rayanalysis.Wangetal.[30]introducetheChestX-ray14dataset,whichisbyfarthelargestwith112,120front-viewimagesin14type-softhoracicdiseases.Beforethat,alargepublicdatasetOpen-i[15]containing3,955radiologyreportsand7,470associatedchestx-raysenablesusageofearlydeepmodels.However,chestX-raydatasetsusuallysufferfromlimitedannotationanddata.Recentsurveys[15,25]haveindicatedthepotentialofdeeplearningmethodsinchestX-rayimageæ¬æ ³ç©É£ætion[30,19,4,34,32]anddetection[19,18,31].Tech-nically,Rajpurkaretal.[19]andWangetal.[30]applyC-NNmodelsdevelopedformorecomprehensivedatasetstoaddresstheæ¬æ ³ç©É£æ ´æ¥¯æ¸task,anduseclassactivationmap(CAM)[34]toobtainlocationsofdiseases.Yanetal.[31]addSqueeze-and-ExcitationBlock[6]toDenseNet[7]andutilizemulti-mapandmax-minpoolingtechniques.Later,Lietal.proposetousefullyconvolutionalneuralnetwork(FCN)[14]toaddresstheproblem.Theyunifythetrainingofimage-levelandbox-leveldatainoneframework,withcustomizedMILloss.Differentfrompreviousapproaches,whichmainlyadaptmodelsorlossesdevelopedforothertasks,ourproposedcontrastiveattentionexploitstheprop-ertyofchestX-raystoaddresstheproblem.ManyworksapplyattentionmechanismtochestX-rayanalysis.Ypsilantisetal.[32]proposearecurrentatten-tionmodeltoidentifycardiomegalyregions.Later,Pesceetal.[18]introduceasoftattentionmechanism,whichlocateslesionswithhighlightingpartofthesaliencymapgeneratedbyCNN.Guanetal.[4]useattentiontogeneratemasks,whichhelptoamplifylesionregions.Mostoftheseatten-tionmechanismsareimplicitlybuiltandhighlyreliedontheresultsofæ¬æ ³ç©É£æ ´æ¥¯æ¸®Theymaysufferfromnoisylabelsconsideringthattheimagelabelsarenotdirectlyfromman-ualannotationbyexperts,butareminedfromassociatedra-diologicalreportsusingnaturallanguageprocessing[30].WhileourCIA-Netfocusesonrelationsbetweenimagesandexplicitlybuildsattentionutilizingthehighlystructurepropertyofdata.ObjectdetectionObjectdetectionhaslongbeenthefundamentalandstudiedalotincomputervision.Aftertheadventofdeeplearning,twomainlinesofapproacheshavebecomematurelydevelopedinobjectdetection.Theɲç´arethetwo-stagedetectors,mainlybasedontheRegion-CNNseries.Thesecondaretheone-stagedetectors,main-lyrepresentedbyYOLO[20]andSSD[16].InFasterR-CNN[21],theregionproposalnetwork(RPN)intheɲç´stagepre-computesanobjectnessscoreforeachcandidateregionandpreservethetopKcandidates.ThentheFastR-CNN[3]networkinthesecondstageæ¬æ ³ç©É¥çeachcan-didateregionandadjuststheirlocationsviaregression.InYOLO,objectnessscore
,æ¬æ ³ç©É£æ ´æ¥¯æ¸andlocationregres-sionarecomputedinthesamestage.Ourapproachissimi-larinspiritwithone-stagedetectors.Wesplittheimageintocellsanddecideswhetherthecellispositiveornotbasedonitsoverlapwiththegroundtruth,whichmimicstherolesofanchorboxesindetectors.Basedontheproblemsetting,weaklyobjectdetec-tion[26,1,33,24,13,10,22,29]isalsocloselyrelatedtoourapproach.Givenonlytheimage-levellabels,mostap-proachesformulateobjectdetectionasaMILproblem.Thelossisoftendesignedbasedontheparadigmthatapositiveimagecontainsatleastonepositiveobject,whileanega-tiveimagecontainsnone.AnothereffectivemethodistoÉ®æthepeakinthefeaturemaporheatmap,amongwhichCAMisthemostcommonlyused.Onedrawbackofthislineofapproachesisthatthelocalizationisalwayspartial,andheavyengineeringworkisneededtotunetheresult-s.Ourapproachperformsend-to-endtraininganddoesnotrelyonanypost-processingtechniques.3.ApproachOurproposedframeworkisillustratedinFigure2andcomprisesoftwoparts:1.Alignmentmodulesthatau-tomaticallyadjusttheinputimagestowardscanonicalby 10633 Figure2.Ourproposedframeworkconsistsoftwoparts:(a)ThealignmentmodulethatautomaticallyafÉ®æç²æ ®ç¦æ½²æµ³theinputimagestowardscanonical.(b)TheCIA-Netthatconsistsoftwobranches.Theupperbranchextractsconvolutionalfeaturesfromtheinputimage.Thelowerbranchcomputesthecontrastinducedattentionontheextractedfeaturemap.TheBCElossandMILlosstakechargeofbox-levelannotateddataandclass-leveldatarespectively.afÉ®ætransformation.2.CIA-Netthatconsistsoftwobranches.Thecontrastinducedattentionbranchgeneratesattentionforeveryclassofdiseasesfromapairofpositiveandnegativeimages.Theattentionthatcontainslocaliza-tioninformationassiststhe楤æ®ç©É£æ ´æ¥¯æ¸andlocalizationbranchtomakepredictions.Next,weintroducethedetailsofeverykeycomponentofourframework.Astandardhigh-qualityfront-viewchestX-rayimageshouldbeuprightandsymmetrical.However,sometimesscannedX-rayimagesarefarfromstandardduetoimprop-erdistance,angleordisplacementbetweenthecameraandpatients.Thegeometricdeformationofimagesmightbeap-proximateasafÉ®ætransform,asshowninFigure3.Toen-ablechestX-raystosharethesamestructure,weintroduceanalignmentnetworktoalignalltheimages.Ouralign-mentnetworkisinspiritsimilartothespatialtransformernetwork(STN)[8],butweframeitwithmoreexplicitsu-pervision.Wealignalltheimagestoasingletargetimage,whichwetermCanonicalChest.Toobtainthecanonicalchestimage,wesimplyrandomlycollect500negativeim-agesfromthedataset.Andaveragethemtoobtainanaver-agedimage.Afterthat,wecropoutthecentralviewtightlyboundingthetwolungs.TheÉ®æ ¬canonicalchestisshown Figure3.Fromlefttorightarethecanonicalchest,theoriginalimageandthealignedimage,respectively.inFigure3(a).Statistically,webelievethattheaveragedchestx-rayimageshouldapproachtoastandardimage.3.1.AlignmentModuleAfterobtainingthecanonicalchestasthetargetimage,weframethetransformationlearningasminimizingthestructuredivergencebetweenthetransformedimageandthetargetimage.Formally,letITdenotetheinputimagetobetransformedandthetargetimagerespectively.GivenI,thealignmentnetworkAIA(I)ToletA(I)haveastandardstructure,weminimizethestructureloss:Ls=f(A(I);T).å°æ£æ¤æ¡æ±¬ç¤,weusealight-weightedResNet-18asthebackboneofA.The 10634 (a) canonical chest(b) original image(c) aligned image The Identification and Localization Branch The Contrast Induced Attention Branch positive sampleAlignment Module \nÛ à Canonical chest AlignmetModule AlignmetModule posit
ive samplenegative sample Consistency Loss MIL LossBCE Loss -Net Alignment Net -+ Feature MapPredictive ResultAttentionGround Truth - + Element Wise:Diff Sum Product PerceptualLoss T(IoU) ModelAtelectasisCardiomegalyEffusion䥮ɬç²æ ´æ¥¯æ¸MassNodulePneumoniaPneumothorax Mean 0.3 X,Wang[30]0.240.460.300.280.150.040.170.13 0.22 Z,Li[14]0.360.560.660.45 0.49 0.570.730.48 0.53 0.5 X,Wang[30]0.050.180.110.070.010.010.030.03 0.06 Z,Li[14]0.140.220.300.220.170.19 0.27 0.400.610.330.370.23 0.39 0.7 X,Wang[30]0.010.030.020.000.000.000.010.02 0.01 Z,Li[14]0.040.520.070.090.110.010.050.05 0.12 0.180.700.280.410.270.040.250.18 0.29 Table1.Comparisonofresultstrainedusing80%annotatedand50%unannotatedimages.LocalizationaccuracyareevaluatedatvariousT(IoU)inf0.1,0.2,0.3,0.4,0.5,0.6,0.7g.Theboldvaluesdenotethebestresultsandresultsareroundedtotwodecimaldigitsforreadability.Ourmodelconsistentlyoutperformspreviousmethodsinmostcases.TheadvantageisevidentespeciallyathighT(IoU). T(IOU) Model Atelectasis Cardiomegaly Effusion Mass Nodule Pneumonia Pneumothorax Mean 0.1 Z,Li[14] 0.59 0.81 0.72 0.84 0.68 0.28 0.22 0.37 0.57 Base 0.61 0.88 0.73 0.78 0.67 0.23 0.09 0.36 0.54 Ours 0.39 0.90 0.65 0.85 0.69 0.38 0.30 0.39 0.60 0.3 Base 0.33 0.71 0.34 0.68 0.36 0.06 0.05 0.20 0.34 Ours 0.34 0.71 0.39 0.65 0.48 0.09 0.16 0.20 0.38 0.5 Base 0.19 0.57 0.14 0.49 0.21 0.01 0.03 0.08 0.21 Ours 0.19 0.53 0.19 0.47 0.33 0.03 0.08 0.11 0.24 0.7 Base 0.11 0.40 0.06 0.29 0.11 0.00 0.01 0.06 0.13 Ours 0.08 0.30 0.09 0.25 0.19 0.01 0.04 0.07 0.13 Table2.Comparisonofresultstrainedusing100%unannotatedimagesandnoanyannotatedimages.DiseaselocalizationaccuracyareevaluatedatvariousT(IoU)inf0.1,0.3,0.5,0.7g.Ourmodeloutperforms[14]andourownimplementedbaselinemodelatdifferentIoUthresholdsinmostcases.outputofthealignmentnetworkisagroupofparameters(tx;ty;sx;sy;á)ofafÉ®ætransformation.txtyforhorizontalandverticaldisplacements.sxsyforhorizontalandverticalscaling.ástandsfortherotation-alangle.Tothisend,IistransformedtoA(I)following:A(I)=BáásxcosásysinátxsxsinásycosátyáG(I);IáwhereBstandsforabilinearinterpolatingfunction,andGrepresentsaregulargridfunction.ToencourageA(I)tohavesimilarstructureswithTanidealsolutionistoextractthecheststructurefromX-rayimages.However,thestructureannotationisnotavailable,sothatweneedtoÉ®æanalternativetoaddresstheproblem.Inspiredbyperceptuallosses[9]thatiscapableofpreserv-ingcontentandstructureinstyletransfer,weadoptithereinourtask.å°æ£æ¤æ¡æ±¬ç¤,weadoptthefeaturereconstructionlossusedin[9].Lfeat(A(I);T)= CHWNfeat(A(I))Nfeat(T)2C;H;Warethefeaturemapsize,andNfeatisthenetworktoextractfeatures.Inpractice,wealsousethecon-sistencylosswhichcomputesEuclideandistancesofcorre-spondingpixelsoftheimagepair.AnexemplarpairofIA(I)areshowninFigure3(b)and(c).3.2.äää¹¥çDifferentfromnaturalimagesthathaveÍ¥xiblestruc-tures,chestX-rayimageshaverelativelyɸedstructures.Basically,apositivesample(imagewithdiseases)mighthavethreetypesofvisualabnormalities:1.Opacityandcomplextexturescausedbyaccumulatedliquidorabnor-maltissues,e.g.effusion,consolidation,andmass.2.Overtransparencycausedbyair,e.g.emphysemaandp-neumothorax.3.Visualabnormalshapeoforgans,e.g.car-diomegaly.Mostdiseasesinourevaluateddatasetlieintheabovethreetypes.Theseabnormalitiesrenderapparentvi-sualcontrastcomparedwithnegativesamples.Tothisend,weproposetousethevisualcontrastasa
nattentionsignalindicatingthepossiblelocationofthedisease.AsshowninFigure2(a),theCIA-Netiscomposedoftwobranches.TheupperbranchextractstheconvolutionalfeaturemapFiofsizecÈhÈwfromapositiveimageIiThelowerbranchtakesthepositiveimageIiandanegativeIiasapairofinputs.ThesharedencodernetworkIiIiintoattentionmapsMiMihÈwrespectively.Afterthat,wecomputetheabsolutedifferenceÄM=à°à°MiMià°à°MiMiFinally,thespatial-wiseattentionmapÄMismultipliedFielementbyelementtoobtaintheweightedfeatureFiasfollowing:Fi=wÈhXkÄkfk 10635 Figure4.Somelocalizationresultsofeightclasseswithboxannotations.Theoriginalimages,baselineresults,andourresultsareshownintheleftcolumns,themiddlecolumns,andtherightcolumnsrespectively.Wecanseethatourapproachcanoutputmoreaccuratelocalizationresults.ÄkkthweightinÄM,andfkkthgridinFi.WenormalizeÄMtomakePkÄk=wÈhtokeepactivationsofFiproperlyscaled.Moreç°æ£æ¤æ¡æ±¬ç¤,theinputimagesofbothbranchesareresizedtoÈ.ResNet-50pre-trainedfromImageNetdatasetisadoptedasthebackboneforbothbranches.Fortheupperbranch,weusethefeaturemapFiafterC5(lastconvolutionaloutputof5th-stage),whichis32timesdown-sampledandofsizeÈÈ.Fortheattentionbranch,weuseC4(lastconvolutionaloutputof4th-stage)astheencodermoduleandobtaintheattentionblobofsizeÈÈafter16timesdown-sampled.TheattentionblobisthenpassedthroughaÈmaxpoolinglayerandÈconvolutionallayertoobtaintheattentionmapofÈLossfunction.AfterobtainingtheweightedfeaturemapFi,wepassitthroughaÈconvolutionallayerandasigmoidlayertoobtaintheclass-awarefeaturemapofsizeCÈH0ÈW0,whereCisthenumberofclasses.Eachgridinthefeaturemapdenotestheexistentprobabili-tyofadisease.Thenwefollowtheparadigmusedin[14],computinglossesandmakingpredictionsineachchannelforthecorrespondingclass.Forimageswithbox-levelan-notations,ifthegridinthefeaturemaphasoverlapwiththeprojectedgroundtruthbox,thenweassignlabel1tothegrid,otherwiseweassign0toit.Therefore,weusethebinarycross-entropylossforeachgrid:Lki()=Xjykijlog(pkij)Xj(1ykij)log(1pkij)ki,andjaretheindexofclasses,samples,andgridsrespectively.ykijdenotesthetargetlabelofthegridandpkijdenotesthepredictedprobabilityofthegrid.Forimageswithonlyimage-levelannotations,weusetheMILlossusedin[14].Lki()=ykilog(1Yj(1pkij))(1yki)log(Yj(1pkij))ykidenotesthetargetlabeloftheimage. 10636 Pneumothorax Nodule Effusion Pneumonia Atelectasi Infiltration Cardiomegaly T(IOU) annoratio Model Atelectasis Cardiomegaly Effusion Mass Nodule Pneumonia Pneumothorax Mean 0.3 80% Base 0.46 0.86 0.59 0.77 0.40 0.07 0.63 0.51 0.54 Ours 0.54 0.82 0.55 0.81 0.49 0.29 0.51 0.40 0.55 40% Base 0.41 0.74 0.53 0.79 0.31 0.08 0.49 0.29 0.46 Ours 0.55 0.73 0.55 0.76 0.48 0.22 0.39 0.30 0.50 0% Base 0.33 0.71 0.34 0.68 0.36 0.06 0.05 0.20 0.34 Ours 0.34 0.71 0.39 0.65 0.48 0.09 0.16 0.20 0.38 0.5 80% Base 0.27 0.79 0.44 0.55 0.23 0.04 0.55 0.38 0.41 Ours 0.38 0.77 0.42 0.63 0.34 0.26 0.39 0.27 0.43 40% Base 0.22 0.60 0.34 0.56 0.19 0.03 0.31 0.17 0.30 Ours 0.36 0.57 0.37 0.62 0.34 0.13 0.23 0.17 0.35 0% Base 0.19 0.57 0.14 0.49 0.21 0.01 0.03 0.08 0.21 Ours 0.19 0.53 0.19 0.47 0.33 0.03 0.08 0.11 0.24 0.7 80% Base 0.11 0.74 0.33 0.40 0.18 0.03 0.45 0.25 0.31 Ours 0.18 0.71 0.31 0.42 0.25 0.11 0.26 0.23 0.31 40% Base 0.12 0.42 0.15 0.37 0.15 0.00 0.19 0.08 0.19 Ours 0.19 0.47 0.20 0.41 0.22 0.06 0.12 0.11 0.22 0% Base 0.11 0.40 0.
06 0.29 0.11 0.00 0.01 0.06 0.13 Ours 0.08 0.30 0.09 0.25 0.19 0.01 0.04 0.07 0.13 Table3.Localizationresultsofmodelstrainedusingdifferentnumberofannotatedimageswith100%unannotatedimages. Figure5.Somealignedresultsoutputbythealignmentmodule.Eachpairiscomposedofanoriginalandanalignedsample.Wecanseethatthealignedsampleshavemorecanonicalviewsthantheoriginalones.Thetotallossacrossallclassesofallsamplesis:XiXkákià°BLki()+(1áki)Lki()áki2;denotesifthekthclassintheithhasboxannotation,andà°Bisthebalanceweightofthetwo3.3.TrainingandTestingTraining.WeusetheSGDalgorithmwiththeNesterovmomentumtotrainallthemodelsfor15epochsonchestX-raydataset.ForCIA-Net,weuseatotalmini-batchsizeof6onasingleGPU.Thelearningratestartswith0.001andisreducedbyafactorof10afterevery5epochs.Inaddition,theweightdecayandthemomentumaresetto0.0001and0.9,respectively.Alltheweightsareinitializedbypre-trainedResNet[5]modelsonImageNet[23].OurimplementationisbasedonPyTorch[17].Testing.Weusethethresholdof0.5todistinguishposi-tivegridsfromnegativegridsintheclass-wisefeaturemap.Inpractice,thefeaturemapisup-sampledfromthesizeofÈÈtoachievemoreaccurateprediction-s.Theup-samplingoperationisinsertedbeforethelasttwoconvolutions.4.Experiments4.1.DatasetandEvaluationMetricsThereare112,120frontal-viewX-rayimagesof14classesofdiseasesinNIHchestX-raydataset[30].Notethateachimagecanhavedifferentdiseases.Further-more,thedatasetcontains880imageswith984labeledboundingboxes.Wefollowthetermsin[14]tocall880imagesasannotatedandtheremaining111,240imagesasunannotated.Weresizetheoriginal3-channelimagesfromresolutionofÈÈwithoutanydataaugmentationtechniques.EvaluationMetrics.Wefollowthemetricsusedin[14].Forlocalization,theaccuracyiscalculatedbytheIoU(IntersectionoverUnion)betweenpredictionsandgroundtruths.Notethatpredictionscanbediscretesmallrectan-gles.Weonlyreportlocalizationresultsoftheeightdis-easeswithgroundtruthboxes.Thelocalizationresultisre- 10637 Figure6.AttentionmapsgeneratedbyCIA-Net.Theleftshowsthepredictedimages,wheregreenandblueboxesstandforgroundtruthsandpredictionsrespectively.Therightshowsthegeneratedattentionmaps,whichprovidehelpfulcuesforlocationsofabnormalities. Model Atelectasis Cardiomegaly Consolidation Edema Effusion Emphysema Fibrosis Z,Li[14] 0.80 0.87 0.80 0.88 0.87 0.91 0.78 Ours 0.79 0.87 0.79 0.91 0.88 0.93 0.80 Model Hernia Mass Nodule PleuralThickening Pneumonia Pneumothorax Z,Li[14] 0.77 0.70 0.83 0.75 0.79 0.66 0.80 0.80Ours 0.92 0.69 0.81 0.73 0.80 0.75 0.89 0.83 Table4.TheAUCscoresofourmethodandthebaseline.Here,70%and20%imagesareusedfortrainingandtestingrespectively.gardedascorrectwhenIoU>T(IoU),whereT(*)isthethreshold.Foræ¬æ ³ç©É£æ ´æ¥¯æ¸¬wealsoutilizeAUCscores(theareaundertheROCcurve)[2]tomeasuretheperfor-manceofourmodel.4.2.Comparisonwiththeå´æ ´æ潦굴桥굡ç´DiseaseLocalization.Following[14],weconducta5-foldcross-validation.Wedesigntwoexperimentstoverifytheeffectivenessofourmethod.Intheɲç´experiment,wetrainourmodelwith80%annotatedimagesand50%unan-notatedimagesandcomparethecorrespondinglocalizationaccuracywith[14]and[30](Table1).Themodeliseval-uatedontheremaining20%annotatedimages.Inthesec-ondexperiment,wetrainthemodelwith100%unannotatedimageswithoutanyannotatedimageandcomparethelocal-izationaccuracywith[14](Table2).Themodelisevaluatedonallannotatedimages.Table1showstheresultsoftheɲç´experiment,weshowthatourmodelperformsbetterinmostcases.Espe-cial
ly,whenT(IoU)increases,ourmodelgraduallyachievesgreaterimprovementinall8classesusedforevaluationoverthereferencemodels.Forexample,whenevaluatedT(IoU)=0:,theaccuracyofeasyclassese.g.Car-diomegalyis0.70,whilethereferencemodelsachieve0.52[14]and0.03[30].Forrelativelysmall-objectclass-ese.g.NoduleandMass,ouraccuracyachieves0.27and0.04whilethereferencemodelsachieveonly0.00forbothclassesin[30]and0.11,0.01for[14].Wealsocal-culatethemeanaccuracyofallclassestocomparethegen-eralperformanceofdifferentmethods.AtT(IoU)=0:ourapproachachievesaccuracyof0.53,witha0.03leadover[14].AtT(IoU)=0:T(IoU)=0:,ourap-proachachievesaccuracyof0.39and0.29,withaleadof0.12and0.17over[14],respectively.Overall,theexper-imentalresultsshowninTable1demonstratethatourap-proachismorecapableofaccuratelocalization,whichpro-videsgreatersupportforclinicalpractices.Table2showstheresultsofthesecondexperiment.S-ince[14]onlyprovidestheresultsatT(IoU)=0:,weutilizethebaselinemodelfollowing[14]implementedbyourselvesandevaluateitatT(IoU)=0:;:;:forbet-tercomparison.TheresultsatT(IoU)=0:showthatourimplementedbaselinehassimilarresultswith[14],val-idatingthelattercomparison.Theoverallresultsshowthatevenwithoutannotateddatausedfortraining,ourapproachcanachievedecentlocalizationresults.Comparedwiththebaselinemodel,ourproposedapproachperformsbetterinmostclassesatT(IoU)=0:;:;:demonstratingtheadvantagesofourmodeloverbaselinemethods.AnotherinterestingobservationisthatforhardclasseslikeNod-uleandMass,ourmodelachievescomparableresult-soverthoseintheɲç´experimentwithoutanyannotateddata.Theresultsshowthatourmodelisabletoutilizein-formationfromunannotateddatatomakeupforthelackoflocalizationannotationandachievegoodperformanceinsomehardtypesofabnormalityinchestX-rays.InFigure4,weillustratesomequalitativeresultsineightclassesusedforevaluationfromthesecondexperiment.Fromlefttorightareoriginalimages,baselineandourre-sults.Thegreenboxesandblueboxesstandforgroundtruthandprediction,respectively.Itshowsthatourapproachcanproducemoreaccuratelocalizationsinmostcases.Disease䥤æ®ç©É£æ ´æ¥¯æ¸®Table4showstheAUCscoresforall14classes.Wecompareourresultswithpreviousstate-of-the-artones[14].Wefollow[14]touse70%im-agesfortrainingand20%imagesfortesting.WecanseethatourmodelachievesbetterAUCscoresformostdis-eases.ThemeanAUCscoreisimprovedfrom0.80to0.83showingtheeffectivenessofCIA-Netfor楤æ®ç©É£æ ´æ¥¯æ¸®4.3.AblationStudiesInthissection,weconductablationstudiesfromthreeaspects.First,weexplorethe楮͵æ®æ¥ofdifferentnumbersofannotatedsamplesonourmethod.Second,westudythecontributionofdifferentmodules.Third,weexplorediffer-entnegativesamplingstrategiesusedintrainingandtesting. 10638 T(IOU) Model Atelectasis Cardiomegaly Effusion Mass Nodule Pneumonia Pneumothorax Mean 0.7 Ours+Canon 0.05 0.62 0.18 0.16 0.12 0.07 0.26 0.20 0.21 Ours+Rand 0.17 0.62 0.30 0.46 0.21 0.08 0.20 0.15 0.27 Ours+Sim 0.18 0.71 0.31 0.42 0.25 0.11 0.26 0.23 0.31 Table5.䥮͵æ®æ¥ofdifferentnegativesamplingstrategies.Allmodelsaretrainedusing100%unannotatedand80%annotatedimages.Rand:randomlysamplingnegativesamples.Canon:alwaysusingthecanonicalchest.Sim:Samplingbasedonstructuralsimilarity. T(IOU) Model Atelectasis Cardiomegaly Effusion Mass Nodule Pneumonia Pneumothorax Mean 0.7 Base 0.11 0.60 0.21 0.42 0.23 0.01 0.21 0.11 0.23 Base+Align 0.22 0.62 0.24 0.44 0.23 0.02 0.18 0.11 0.25 CIA 0.06 0.64 0.24 0.46 0.24 0.04 0.26
0.14 0.26 CIA+Align 0.09 0.68 0.28 0.46 0.26 0.06 0.29 0.15 0.28 Table6.䥮͵æ®æ¥ofalignmentmoduleonlocalizationresults.Allmodelsaretrainedusing100%unannotatedand80%annotatedimages.4.3.1CIA-NetGainsLocalizationInformationAsshowninTable3,withtheincreasingnumberofanno-tatedimages,thelocalizationaccuracywillbefurtherim-proved.å°æ£æ¤æ¡æ±¬ç¤,atT(IoU)=0:,themeanaccuracyisimprovedfrom0.22to0.31whenannotatedimagesra-tiointrainingincreasesfrom40%to80%.Furthermore,byusing40%annotatedimages,ourmodelgainshighermeanaccuracythanusing0%annotatedimages(0.22vs.0.13)atT(IoU)=0:.Inaddition,asshowninTable3,CIA-Nethasthelargerimprovementwhenlessannotatedimagesused.å°æ£æ¤æ¡æ±¬ç¤,inmostcasesourmodelshowshighermeanperformanceatannoratio=0%and40%Theexperimentalresultsdemonstratethatwiththehelpoflocalizationinformation,providedbyCIA-Net,ourmodelcanworkeffectivelywithlimitedannotatedimages.4.3.2NegativeSamplingInthetrainingandtestingphase,weuseperceptualhashal-gorithmtochooseasimilarlystructuredpairimageforev-erytrainingsample.å°æ£æ¤æ¡æ±¬ç¤,wegenerateahashcodedictionarybyresizingall63,000negativeimagestoÈandÍ¡ç´æ®æ¥®æthem.Duringtrainingandtesting,weresizeeverysampletoÈandchoosethenearesthashcodebasedoncosinedistance.Thecorrespondingnegativeim-ageisthenpairedwiththepositiveoneandsenttothelatermodules.Tojustifythisapproach,wecompareditwithoth-ertwosamplingmethods:1.Randomlysamplingfromneg-ativeimages.2.Utilizingthecanonicalchestasthenegativeimage.FromtheresultsinTable5,weÉ®æthatstructuralsimilaritybasedsamplingisgenerallybetterthantheother2methodsinmostclasses.Randomlysamplingintroducestomuchrandomnesstothemodelmakingithardtocapturemeaningfulinformationwithcontrastivelearning.Thesec-ondmethodsuffersfromthedomaingapbetweentherealimagesandaveragedone.4.3.3ContributionofDifferentModulesFigure5showssomeexamplesoforiginalimagesandalignedones.Wecanseethatthealignedsamplesaremoreapproachingthecanonicalchest,whichismoresymmet-rical,verticalandfocusedonthethoraciccavity.Table6showsthequantitativecontributionofthealignmentmod-ule.Forthebaselinemethod,ouralignmentmodulecanimprovethemeanlocalizationaccuracyfrom0.23to0.25.ForCIA-Net,thealignmentmodulecanalsoimprovethemeanaccuracyfrom0.26to0.28.Theresultsprovetheef-fectivenessofthealignmentmodule.Inaddition,bycomparingCIA-Netwiththebaselinemodel,wedemonstratestheeffectivenessofCIA-Net.CIA-Netcanimprovethemeanlocalizationaccuracyfrom0.23to0.26withoutthealignmentmodule,andfrom0.26to0.28withthealignmentmodule.Figure6showsvisualizedat-tentionmapsofsomeexamples.wecanseethatfromsmalllesionslikeNodule,toclassesoflargeregionslikePneu-mothoraxandCardiomegaly,CIA-Netcangenerateatten-tionmapsprovidinghelpfulcuesofdiseases'location.5.ConclusionInthispaper,weproposeCIA-Nettotacklethechalleng-ingproblemofautomaticdiseasediagnosisinchestX-rays,wheretheimagessharesimilarthoracicstructures.Ourpro-posedCIA-Netenablescapturingcontrastiveinformationfrompairsofpositiveandnegativeimages.Thecontrastiveinducedattentioncanprovidelocalizationcuesofthepossi-blesitesofabnormalities.TorationalizeCIA-Net,wealsoproposealearnablealignmentmoduletoadjustalltheinputimagestobecanonical.Qualitativeandquantitativeexperi-mentalresultsonNIHChestX-raydatasetdemonstratetheeffectivenessofourapproach.AcknowledgmentsThispaperispartiallysupportedbyBeijingMu-nicipalCommissionofScienceandTechnologyun-derGrantNo.Z181100008918005,NationalKeyRe-searchandDevelopmentProgramofChinawi
thGrantNo.SQ2018AAA010010,andNSFC-61772039,NSFC-91646202,NSFC-61625201,NSFC-61527804. 10639 References[1]ChunshuiCao,XianmingLiu,YiYang,YinanYu,JiangWang,ZileiWang,YongzhenHuang,LiangWang,ChangHuang,WeiXu,andDevaRamananandThomasS.Huang.Lookandthinktwice:Capturingtop-downvisualattentionwithfeedbackconvolutionalneuralnetwork.InInternation-alConferenceonComputerVision(ICCV),2015.[2]TomFawcett.Anintroductiontorocanalysis.Patternrecog-nitionletters,27(8):861874,2006.[3]RossGirshick.Fastr-cnn.InProceedingsoftheIEEEinter-nationalconferenceoncomputervision,pages14401448,[4]QingjiGuan,YapingHuang,ZhunZhong,ZhedongZheng,LiangZheng,andYiYang.Diagnoselikearadiologist:At-tentionguidedconvolutionalneuralnetworkforthoraxdis-easeæ¬æ ³ç©É£æ ´æ¥¯æ¸®arXivpreprintarXiv:1801.09927,2018.[5]KaimingHe,XiangyuZhang,ShaoqingRen,andJianSun.Deepresiduallearningforimagerecognition.InProceed-ingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pages770778,2016.[6]JieHu,LiShen,andGangSun.Squeeze-and-excitationnet-works.InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pages71327141,2018.[7]ForrestIandola,MattMoskewicz,SergeyKarayev,RossGir-shick,TrevorDarrell,andKurtKeutzer.Densenet:Im-plementingefɣ楥湴convnetdescriptorpyramids.preprintarXiv:1404.1869,2014.[8]MaxJaderberg,KarenSimonyan,AndrewZisserman,etal.Spatialtransformernetworks.InAdvancesinneuralinfor-mationprocessingsystems,pages20172025,2015.[9]JustinJohnson,AlexandreAlahi,andLiFei-Fei.Perceptuallossesforreal-timestyletransferandsuper-resolution.InEuropeanconferenceoncomputervision,pages694711.Springer,2016.[10]VadimKantorov,MaximeOquab,MinsuCho,andIvanLaptev.Contextlocnet:Context-awaredeepnetworkmodelsforweaklysupervisedlocalization.InEuropeanConferenceonComputerVision(ECCV),2016.[11]AlexKrizhevsky,IlyaSutskever,andGeoffreyE.Hinton.Imagenetæ¬æ ³ç©É£æ ´æ¥¯æ¸withdeepconvolutionalneuralnet-works.InAdvancesinNeuralInformationProcessingSys-tems(NIPS),2012.[12]YannLeCun,BernhardE.Boser,JohnS.Denker,DonnieHenderson,RichardE.Howard,WayneE.Hubbard,andLawrenceD.Jackel.Handwrittendigitrecognitionwithaback-propagationnetwork.InAdvancesinNeuralInforma-tionProcessingSystems(NIPS),1989.[13]DongLi,Jia-BinHuang,YaliLi,ShengjinWang,,andMing-HsuanYang.Weaklysupervisedobjectlocalizationwithprogressivedomainadaptation.InComputerVisionandPatternRecognition(CVPR),2016.[14]ZheLi,ChongWang,MeiHan,YuanXue,WeiWei,Li-JiaLi,andLiFei-Fei.Thoracicdisease楤æ®ç©É£æ ´æ¥¯æ¸andlo-calizationwithlimitedsupervision.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecogni-,pages82908299,2018.[15]GeertLitjens,ThijsKooi,BabakEhteshamiBejnordi,Ar-naudArindraAdiyosoSetio,FrancescoCiompi,MohsenGhafoorian,JeroenAwmVanDerLaak,BramVanGin-neken,andClaraISanchez.Asurveyondeeplearninginmedicalimageanalysis.Medicalimageanalysis,42:6088,[16]WeiLiu,DragomirAnguelov,DumitruErhan,ChristianSzegedy,ScottReed,Cheng-YangFu,andAlexanderCBerg.Ssd:Singleshotmultiboxdetector.InEuropeancon-ferenceoncomputervision,pages2137.Springer,2016.[17]AdamPaszke,SamGross,SoumithChintala,GregoryChanan,EdwardYang,ZacharyDeVito,ZemingLin,Al-banDesmaison,LucaAntiga,andAdamLerer.Automaticdifferentiationinpytorch.2017.[18]EmanuelePesce,Petros-PavlosYpsilantis,SamuelWithey,RobertBakewell,VickyGoh,andGiovanniMontana.Learn-ingtodetectchestradiographscontaininglungnodulesusingvisualattentionnetworks.arXivpreprintarXiv:1
712.00996[19]PranavRajpurkar,JeremyIrvin,KaylieZhu,BrandonYang,HershelMehta,TonyDuan,DaisyDing,AartiBagul,CurtisLanglotz,KatieShpanskaya,etal.Chexnet:Radiologist-levelpneumoniadetectiononchestx-rayswithdeeplearn-arXivpreprintarXiv:1711.05225,2017.[20]JosephRedmon,SantoshDivvala,RossGirshick,andAliFarhadi.Youonlylookonce:å®æ¤æ¤â°real-timeobjectde-tection.InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pages779788,2016.[21]ShaoqingRen,KaimingHe,RossGirshick,andJianSun.Fasterr-cnn:Towardsreal-timeobjectdetectionwithregionproposalnetworks.InC.Cortes,N.D.Lawrence,D.D.Lee,M.Sugiyama,andR.Garnett,editors,AdvancesinNeu-ralInformationProcessingSystems28,pages9199.CurranAssociates,Inc.,2015.[22]MrigankRochanandYangWang.Weaklysupervisedlocal-izationofnovelobjectsusingappearancetransfer.InCom-puterVisionandPatternRecognition(CVPR),2015.[23]OlgaRussakovsky,JiaDeng,HaoSu,JonathanKrause,San-jeevSatheesh,SeanMa,ZhihengHuang,AndrejKarpathy,AdityaKhosla,MichaelBernstein,etal.Imagenetlarges-calevisualrecognitionchallenge.Internationaljournalofcomputervision,115(3):211252,2015.[24]RamprasaathR.Selvaraju,MichaelCogswell,AbhishekDas,RamakrishnaVedantam,DeviParikh,andDhruvBa-tra.Grad-cam:Whydidyousaythat?visualexplanationsfromdeepnetworksviagradient-basedlocalization.,2016.[25]Hoo-ChangShin,HolgerRRoth,MingchenGao,LeLu,ZiyueXu,IsabellaNogues,JianhuaYao,DanielMollu-ra,andRonaldMSummers.Deepconvolutionalneuralnetworksforcomputer-aideddetection:Cnnarchitectures,datasetcharacteristicsandtransferlearning.IEEEtransac-tionsonmedicalimaging,35(5):12851298,2016.[26]KarenSimonyan,AndreaVedaldi,andAndrewZisserman.deepinsideconvolutionalnetworks:Visualisingimageclas-ç©É£æ ´æ¥¯æ¸modelsandsaliencymaps.InInternationalCon-ferenceonLearningRepresentationsWorkshop,2014. 10640 [27]KarenSimonyanandAndrewZisserman.Verydeepconvo-lutionalnetworksforlarge-scaleimagerecognition.abs/1409.1556,2014.[28]ChristianSzegedy,WeiLiu,YangqingJia,PierreSermanet,ScottEReed,DragomirAnguelov,DumitruErhan,Vincen-tVanhoucke,andAndrewRabinovich.Goingdeeperwithconvolutions.[29]YuxingTang,JosiahWang,XiaofangWang,BoyangGao,EmmanuelDellandrea,RobertGaizauskas,andLimingChen.Largescalesemi-supervisedobjectdetectionusingvisualandsemanticknowledgetransfer.InComputerVisionandPatternRecognition(CVPR),2016.[30]XiaosongWang,YifanPeng,LeLu,ZhiyongLu,Mo-hammadhadiBagheri,andRonaldMSummers.Chestx-ray8:Hospital-scalechestx-raydatabaseandbenchmarksonweakly-supervisedæ¬æ ³ç©É£æ ´æ¥¯æ¸andlocalizationofcom-monthoraxdiseases.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition,pages20972106,2017.[31]ChaochaoYan,JiawenYao,RuoyuLi,ZhengXu,andJun-zhouHuang.Weaklysuperviseddeeplearningforthoraci-cdiseaseæ¬æ ³ç©É£æ ´æ¥¯æ¸andlocalizationonchestx-rays.InProceedingsofthe2018ACMInternationalConferenceonBioinformatics,ComputationalBiology,andHealthInfor-,pages103110.ACM,2018.[32]Ypsilantis,Petros-Pavlos,Montana,andGiovanni.Learningwhattolookinchestx-rayswitharecurrentvisualattentionarXivpreprintarXiv:1701.06452,2017.[33]JianmingZhang,ZheLin,ShenXiaohuiBrandt,Jonathan,andStanSclaroff.Top-downneuralattentionbyexcitationInEuropeanConferenceonComputerVision(EC-,2016.[34]BoleiZhou,AdityaKhosla,AgataLapedriza,AudeOliva,andAntonioTorralba.Learningdeepfeaturesfordiscrimina-tivelocalization.InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pages29212929, 1