/
AlignAttendandLocateChestXrayDiagnosisviaContrastInducedAttentionNe AlignAttendandLocateChestXrayDiagnosisviaContrastInducedAttentionNe

AlignAttendandLocateChestXrayDiagnosisviaContrastInducedAttentionNe - PDF document

osullivan
osullivan . @osullivan
Follow
343 views
Uploaded On 2022-08-22

AlignAttendandLocateChestXrayDiagnosisviaContrastInducedAttentionNe - PPT Presentation

EqualcontributionyCorrespondingauthor Figure1OurproposedframeworkconsistsoftwobranchesTheupperbranchextractsconvolutionalfeaturesfromtheinputimageThelowerbranchcomputesthecontrastinducedattentio ID: 939967

x0000 iou base 150 iou x0000 150 base net cia 2016 147 nodule mass effusion 148 pneumothorax pneumonia cardiomegaly

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "AlignAttendandLocateChestXrayDiagnosisvi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Align,AttendandLocate:ChestX-rayDiagnosisviaContrastInducedAttentionNetworkwithLimitedSupervisionJingyuLiu1̀,GangmingZhao2̀,YuFei1,MingZhang1,YizhouWang1;2;3,YizhouYu2y1DepartmentofComputerScience,PekingUniversity2DeepwiseAILab3PengChengLaboratoryObstaclesfacingaccurate楤敮瑩ɣ慴楯渀andlocalizationofdiseasesinchestX-rayimageslieinthelackofhigh-qualityimagesandannotations.Inthispaper,weproposeaContrastInducedAttentionNetwork(CIA-Net),whichexploitsthehighlystructuredpropertyofchestX-rayim-agesandlocalizesdiseasesviacontrastivelearningonthealignedpositiveandnegativesamples.Toforcetheatten-tionmoduletofocusonabnormalities,wealsointroducealearnablealignmentmoduletoadjustalltheinputim-ages,whicheliminatesvariationsofscales,angles,anddis-placementsofX-rayimagesgeneratedunderbadscancon-ditions.Weshowthattheuseofcontrastiveattentionandalignmentmoduleallowsthemodeltolearnrich楤敮瑩ɣ愭tionandlocalizationinformationusingonlyasmallamountoflocationannotations,resultinginstate-of-the-artperfor-manceinNIHchestX-raydataset.1.IntroductionChestX-rayimageanalysisservesasacrucialroleinclinicaldiagnosisofchestdiseases.Traditionally,itre-quiresyearsofaccumulatedexpertiseandconsistentcon-centrationtoɮ楳栀thetask,addingheavyworkloadstora-diologists.Fortunately,wecanformulatechestX-rayim-ageanalysisasa捬慳獩ɣ慴楯渀task,whichassignsapartic-ulartypeofdiseasetoanimage,togetherwithadetectiontask,whichprovideslocationannotationoftheabnormal-ity.Therefore,automaticimageanalysissystemscouldbeimplementedwiththehelpofdeepConvolutionalNeuralNetwork(CNN)methods[28,5,11,12,27].Toachievegoodperformanceinnaturalimages,classicCNNapproachesrequiretonsofsampleswithimage-levellabelsforimage捬慳獩ɣ慴楯測andsampleswithbothcat-egoryandlocationlabelsforobjectdetection.However, ̀EqualcontributionyCorrespondingauthor Figure1.Ourproposedframeworkconsistsoftwobranches.Theupperbranchextractsconvolutionalfeaturesfromtheinputim-age.Thelowerbranchcomputesthecontrastinducedattentionontheextractedfeaturemap.Theinformationfromeachbranchismergedtoproducedisease楤敮瑩ɣ慴楯渀andlocalizationresultsfortheinputchestX-rayimages.theserequirementsraisetwochallengeswhenitcomestochestX-rayimagediagnosis.First,accuratelocationlabelsareexpensivetoacquireforchestX-rayimages,makingithardtotrainanaccuratedetector.Second,thediversityoflocation,shapeandtexturemakecertaincategoriesofab-normalityvagueandmutuallyconfused.Inthispaper,weproposeanovelContrastInducedAttentionNetwork(CIA-Net)(Figure1)toaddresstheseproblems.ThemotivationofCIA-Netoriginatesfromtheconsistencyofthoracicstructureamonghumans.Throughcontrastivestudy,whichexploitsvisualcontrastbetweenapairofpositiveimage(withdiseases)andnegativeimage(withoutdiseases),CIA-Netcapturesadditional楤敮瑩ɣ愭tionandlocalizationinformationinthelackofannotation.印散椂捡汬礀,weextracthigh-levelimagefeaturerepresenta-tionsoftheimagepairfromCNN.Then,toutilizethehigh-lystructuredpropertyofinputs,wecomputeL1-distancebetweencorrespondingpixelsinthenegativeimageandthepositiveone,theresultofwhichservesasanindicationofpossiblelesionsitesonthelatter.However,someimages, 10632 Pathology DiagnosisInfiltrationPneumothorax Identification Localization Contrast Induced Attention Positive SampleNegative Sample especiallythepositiveones,sufferfromgeometricdefor-mationcausedbypoorscanconditions.Therefore,tora-tionalizetheprocessofcontrastivelearning,weproposealearnablea

lignmentmoduletoadjustinputimagestobege-ometricallycanonical.Finally,tofurtherutilizethelimit-edlocationannotation,weapplyMultipleInstanceLearn-ing(MIL)toperformend-to-endtrainingonCIA-Net.WeshowthatwiththehelpofthealignmentmoduleandCIA-Net,evenforvague,tinyandrandomlyappearedlesion-s,CIA-NetmakesmoreaccuratepredictionsthanpreviousAboveall,ourcontributionliesinthreefolds:WeproposeCIA-Net,whichistheɲ獴tocapturein-formationbycontrastingpositiveandnegativeimages.Moregenerally,itprovidesinspirationtoaddressvi-siontaskswithsamplessharinghighsimilarityintheirvisualstructure.Weproposealearnablealignmentmodule,whichiseffectiveintransformingandaligningimagesindif-ferentscanconditions.Thistechniquecanalsobeuti-lizedinothermedicalimagesanalysistasksrequiringWeachievestate-of-the-artresultsonboth捬慳獩ɣ愭tionandlocalizationonChestX-ray14.2.RelatedWorkAutomaticChestX-rayAnalysisThereleaseoflargescalechestX-raydatasetsallowswideapplicationsofdeeplearningmethodsonautomaticchestX-rayanalysis.Wangetal.[30]introducetheChestX-ray14dataset,whichisbyfarthelargestwith112,120front-viewimagesin14type-softhoracicdiseases.Beforethat,alargepublicdatasetOpen-i[15]containing3,955radiologyreportsand7,470associatedchestx-raysenablesusageofearlydeepmodels.However,chestX-raydatasetsusuallysufferfromlimitedannotationanddata.Recentsurveys[15,25]haveindicatedthepotentialofdeeplearningmethodsinchestX-rayimage捬慳獩ɣ愭tion[30,19,4,34,32]anddetection[19,18,31].Tech-nically,Rajpurkaretal.[19]andWangetal.[30]applyC-NNmodelsdevelopedformorecomprehensivedatasetstoaddressthe捬慳獩ɣ慴楯渀task,anduseclassactivationmap(CAM)[34]toobtainlocationsofdiseases.Yanetal.[31]addSqueeze-and-ExcitationBlock[6]toDenseNet[7]andutilizemulti-mapandmax-minpoolingtechniques.Later,Lietal.proposetousefullyconvolutionalneuralnetwork(FCN)[14]toaddresstheproblem.Theyunifythetrainingofimage-levelandbox-leveldatainoneframework,withcustomizedMILloss.Differentfrompreviousapproaches,whichmainlyadaptmodelsorlossesdevelopedforothertasks,ourproposedcontrastiveattentionexploitstheprop-ertyofchestX-raystoaddresstheproblem.ManyworksapplyattentionmechanismtochestX-rayanalysis.Ypsilantisetal.[32]proposearecurrentatten-tionmodeltoidentifycardiomegalyregions.Later,Pesceetal.[18]introduceasoftattentionmechanism,whichlocateslesionswithhighlightingpartofthesaliencymapgeneratedbyCNN.Guanetal.[4]useattentiontogeneratemasks,whichhelptoamplifylesionregions.Mostoftheseatten-tionmechanismsareimplicitlybuiltandhighlyreliedontheresultsof捬慳獩ɣ慴楯渮Theymaysufferfromnoisylabelsconsideringthattheimagelabelsarenotdirectlyfromman-ualannotationbyexperts,butareminedfromassociatedra-diologicalreportsusingnaturallanguageprocessing[30].WhileourCIA-Netfocusesonrelationsbetweenimagesandexplicitlybuildsattentionutilizingthehighlystructurepropertyofdata.ObjectdetectionObjectdetectionhaslongbeenthefundamentalandstudiedalotincomputervision.Aftertheadventofdeeplearning,twomainlinesofapproacheshavebecomematurelydevelopedinobjectdetection.Theɲ獴arethetwo-stagedetectors,mainlybasedontheRegion-CNNseries.Thesecondaretheone-stagedetectors,main-lyrepresentedbyYOLO[20]andSSD[16].InFasterR-CNN[21],theregionproposalnetwork(RPN)intheɲ獴stagepre-computesanobjectnessscoreforeachcandidateregionandpreservethetopKcandidates.ThentheFastR-CNN[3]networkinthesecondstage捬慳獩ɥ猀eachcan-didateregionandadjuststheirlocationsviaregression.InYOLO,objectnessscore

,捬慳獩ɣ慴楯渀andlocationregres-sionarecomputedinthesamestage.Ourapproachissimi-larinspiritwithone-stagedetectors.Wesplittheimageintocellsanddecideswhetherthecellispositiveornotbasedonitsoverlapwiththegroundtruth,whichmimicstherolesofanchorboxesindetectors.Basedontheproblemsetting,weaklyobjectdetec-tion[26,1,33,24,13,10,22,29]isalsocloselyrelatedtoourapproach.Givenonlytheimage-levellabels,mostap-proachesformulateobjectdetectionasaMILproblem.Thelossisoftendesignedbasedontheparadigmthatapositiveimagecontainsatleastonepositiveobject,whileanega-tiveimagecontainsnone.Anothereffectivemethodistoɮ搀thepeakinthefeaturemaporheatmap,amongwhichCAMisthemostcommonlyused.Onedrawbackofthislineofapproachesisthatthelocalizationisalwayspartial,andheavyengineeringworkisneededtotunetheresult-s.Ourapproachperformsend-to-endtraininganddoesnotrelyonanypost-processingtechniques.3.ApproachOurproposedframeworkisillustratedinFigure2andcomprisesoftwoparts:1.Alignmentmodulesthatau-tomaticallyadjusttheinputimagestowardscanonicalby 10633 Figure2.Ourproposedframeworkconsistsoftwoparts:(a)Thealignmentmodulethatautomaticallyafɮ攭瑲慮獦潲浳theinputimagestowardscanonical.(b)TheCIA-Netthatconsistsoftwobranches.Theupperbranchextractsconvolutionalfeaturesfromtheinputimage.Thelowerbranchcomputesthecontrastinducedattentionontheextractedfeaturemap.TheBCElossandMILlosstakechargeofbox-levelannotateddataandclass-leveldatarespectively.afɮ攀transformation.2.CIA-Netthatconsistsoftwobranches.Thecontrastinducedattentionbranchgeneratesattentionforeveryclassofdiseasesfromapairofpositiveandnegativeimages.Theattentionthatcontainslocaliza-tioninformationassiststhe楤敮瑩ɣ慴楯渀andlocalizationbranchtomakepredictions.Next,weintroducethedetailsofeverykeycomponentofourframework.Astandardhigh-qualityfront-viewchestX-rayimageshouldbeuprightandsymmetrical.However,sometimesscannedX-rayimagesarefarfromstandardduetoimprop-erdistance,angleordisplacementbetweenthecameraandpatients.Thegeometricdeformationofimagesmightbeap-proximateasafɮ攀transform,asshowninFigure3.Toen-ablechestX-raystosharethesamestructure,weintroduceanalignmentnetworktoalignalltheimages.Ouralign-mentnetworkisinspiritsimilartothespatialtransformernetwork(STN)[8],butweframeitwithmoreexplicitsu-pervision.Wealignalltheimagestoasingletargetimage,whichwetermCanonicalChest.Toobtainthecanonicalchestimage,wesimplyrandomlycollect500negativeim-agesfromthedataset.Andaveragethemtoobtainanaver-agedimage.Afterthat,wecropoutthecentralviewtightlyboundingthetwolungs.Theɮ慬canonicalchestisshown Figure3.Fromlefttorightarethecanonicalchest,theoriginalimageandthealignedimage,respectively.inFigure3(a).Statistically,webelievethattheaveragedchestx-rayimageshouldapproachtoastandardimage.3.1.AlignmentModuleAfterobtainingthecanonicalchestasthetargetimage,weframethetransformationlearningasminimizingthestructuredivergencebetweenthetransformedimageandthetargetimage.Formally,letITdenotetheinputimagetobetransformedandthetargetimagerespectively.GivenI,thealignmentnetworkAIA(I)ToletA(I)haveastandardstructure,weminimizethestructureloss:Ls=f(A(I);T).印散椂捡汬礀,weusealight-weightedResNet-18asthebackboneofA.The 10634 (a) canonical chest(b) original image(c) aligned image The Identification and Localization Branch The Contrast Induced Attention Branch positive sampleAlignment Module “\nÛ à Canonical chest AlignmetModule AlignmetModule posit

ive samplenegative sample Consistency Loss MIL LossBCE Loss -Net Alignment Net -+ Feature MapPredictive ResultAttentionGround Truth - + Element Wise:Diff Sum Product PerceptualLoss T(IoU) ModelAtelectasisCardiomegalyEffusion䥮ɬ瑲慴楯渀MassNodulePneumoniaPneumothorax Mean 0.3 X,Wang[30]0.240.460.300.280.150.040.170.13 0.22 Z,Li[14]0.360.560.660.45 0.49 0.570.730.48 0.53 0.5 X,Wang[30]0.050.180.110.070.010.010.030.03 0.06 Z,Li[14]0.140.220.300.220.170.19 0.27 0.400.610.330.370.23 0.39 0.7 X,Wang[30]0.010.030.020.000.000.000.010.02 0.01 Z,Li[14]0.040.520.070.090.110.010.050.05 0.12 0.180.700.280.410.270.040.250.18 0.29 Table1.Comparisonofresultstrainedusing80%annotatedand50%unannotatedimages.LocalizationaccuracyareevaluatedatvariousT(IoU)inf0.1,0.2,0.3,0.4,0.5,0.6,0.7g.Theboldvaluesdenotethebestresultsandresultsareroundedtotwodecimaldigitsforreadability.Ourmodelconsistentlyoutperformspreviousmethodsinmostcases.TheadvantageisevidentespeciallyathighT(IoU). T(IOU) Model Atelectasis Cardiomegaly Effusion Mass Nodule Pneumonia Pneumothorax Mean 0.1 Z,Li[14] 0.59 0.81 0.72 0.84 0.68 0.28 0.22 0.37 0.57 Base 0.61 0.88 0.73 0.78 0.67 0.23 0.09 0.36 0.54 Ours 0.39 0.90 0.65 0.85 0.69 0.38 0.30 0.39 0.60 0.3 Base 0.33 0.71 0.34 0.68 0.36 0.06 0.05 0.20 0.34 Ours 0.34 0.71 0.39 0.65 0.48 0.09 0.16 0.20 0.38 0.5 Base 0.19 0.57 0.14 0.49 0.21 0.01 0.03 0.08 0.21 Ours 0.19 0.53 0.19 0.47 0.33 0.03 0.08 0.11 0.24 0.7 Base 0.11 0.40 0.06 0.29 0.11 0.00 0.01 0.06 0.13 Ours 0.08 0.30 0.09 0.25 0.19 0.01 0.04 0.07 0.13 Table2.Comparisonofresultstrainedusing100%unannotatedimagesandnoanyannotatedimages.DiseaselocalizationaccuracyareevaluatedatvariousT(IoU)inf0.1,0.3,0.5,0.7g.Ourmodeloutperforms[14]andourownimplementedbaselinemodelatdifferentIoUthresholdsinmostcases.outputofthealignmentnetworkisagroupofparameters(tx;ty;sx;sy;ሀ)ofafɮ攀transformation.txtyforhorizontalandverticaldisplacements.sxsyforhorizontalandverticalscaling.ሀstandsfortherotation-alangle.Tothisend,IistransformedtoA(I)following:A(I)=Bሀሀsxcosሀ�sysinሀtxsxsinሀsycosሀtyጀG(I);IጀwhereBstandsforabilinearinterpolatingfunction,andGrepresentsaregulargridfunction.ToencourageA(I)tohavesimilarstructureswithTanidealsolutionistoextractthecheststructurefromX-rayimages.However,thestructureannotationisnotavailable,sothatweneedtoɮ搀analternativetoaddresstheproblem.Inspiredbyperceptuallosses[9]thatiscapableofpreserv-ingcontentandstructureinstyletransfer,weadoptithereinourtask.印散椂捡汬礀,weadoptthefeaturereconstructionlossusedin[9].Lfeat(A(I);T)= CHWNfeat(A(I))�Nfeat(T)2C;H;Warethefeaturemapsize,andNfeatisthenetworktoextractfeatures.Inpractice,wealsousethecon-sistencylosswhichcomputesEuclideandistancesofcorre-spondingpixelsoftheimagepair.AnexemplarpairofIA(I)areshowninFigure3(b)and(c).3.2.䍉䆭乥琀DifferentfromnaturalimagesthathaveÍ¥xiblestruc-tures,chestX-rayimageshaverelativelyɸedstructures.Basically,apositivesample(imagewithdiseases)mighthavethreetypesofvisualabnormalities:1.Opacityandcomplextexturescausedbyaccumulatedliquidorabnor-maltissues,e.g.effusion,consolidation,andmass.2.Overtransparencycausedbyair,e.g.emphysemaandp-neumothorax.3.Visualabnormalshapeoforgans,e.g.car-diomegaly.Mostdiseasesinourevaluateddatasetlieintheabovethreetypes.Theseabnormalitiesrenderapparentvi-sualcontrastcomparedwithnegativesamples.Tothisend,weproposetousethevisualcontrastasa

nattentionsignalindicatingthepossiblelocationofthedisease.AsshowninFigure2(a),theCIA-Netiscomposedoftwobranches.TheupperbranchextractstheconvolutionalfeaturemapFiofsizecȀhȀwfromapositiveimageIiThelowerbranchtakesthepositiveimageIiandanegativeI�iasapairofinputs.ThesharedencodernetworkIiI�iintoattentionmapsMiM�ihȀwrespectively.Afterthat,wecomputetheabsolutedifferenceĀM=ఀఀMi�M�iఀఀMiM�iFinally,thespatial-wiseattentionmapĀMismultipliedFielementbyelementtoobtaintheweightedfeatureFiasfollowing:Fi=wȀhXkĀkfk 10635 Figure4.Somelocalizationresultsofeightclasseswithboxannotations.Theoriginalimages,baselineresults,andourresultsareshownintheleftcolumns,themiddlecolumns,andtherightcolumnsrespectively.Wecanseethatourapproachcanoutputmoreaccuratelocalizationresults.ĀkkthweightinĀM,andfkkthgridinFi.WenormalizeĀMtomakePkĀk=wȀhtokeepactivationsofFiproperlyscaled.More獰散椂捡汬礀,theinputimagesofbothbranchesareresizedtoȀ.ResNet-50pre-trainedfromImageNetdatasetisadoptedasthebackboneforbothbranches.Fortheupperbranch,weusethefeaturemapFiafterC5(lastconvolutionaloutputof5th-stage),whichis32timesdown-sampledandofsizeȀȀ.Fortheattentionbranch,weuseC4(lastconvolutionaloutputof4th-stage)astheencodermoduleandobtaintheattentionblobofsizeȀȀafter16timesdown-sampled.TheattentionblobisthenpassedthroughaȀmaxpoolinglayerandȀconvolutionallayertoobtaintheattentionmapofȀLossfunction.AfterobtainingtheweightedfeaturemapFi,wepassitthroughaȀconvolutionallayerandasigmoidlayertoobtaintheclass-awarefeaturemapofsizeCȀH0ȀW0,whereCisthenumberofclasses.Eachgridinthefeaturemapdenotestheexistentprobabili-tyofadisease.Thenwefollowtheparadigmusedin[14],computinglossesandmakingpredictionsineachchannelforthecorrespondingclass.Forimageswithbox-levelan-notations,ifthegridinthefeaturemaphasoverlapwiththeprojectedgroundtruthbox,thenweassignlabel1tothegrid,otherwiseweassign0toit.Therefore,weusethebinarycross-entropylossforeachgrid:Lki()=Xj�ykijlog(pkij)�Xj(1�ykij)log(1�pkij)ki,andjaretheindexofclasses,samples,andgridsrespectively.ykijdenotesthetargetlabelofthegridandpkijdenotesthepredictedprobabilityofthegrid.Forimageswithonlyimage-levelannotations,weusetheMILlossusedin[14].Lki()=�ykilog(1�Yj(1�pkij))�(1�yki)log(Yj(1�pkij))ykidenotesthetargetlabeloftheimage. 10636 Pneumothorax Nodule Effusion Pneumonia Atelectasi Infiltration Cardiomegaly T(IOU) annoratio Model Atelectasis Cardiomegaly Effusion Mass Nodule Pneumonia Pneumothorax Mean 0.3 80% Base 0.46 0.86 0.59 0.77 0.40 0.07 0.63 0.51 0.54 Ours 0.54 0.82 0.55 0.81 0.49 0.29 0.51 0.40 0.55 40% Base 0.41 0.74 0.53 0.79 0.31 0.08 0.49 0.29 0.46 Ours 0.55 0.73 0.55 0.76 0.48 0.22 0.39 0.30 0.50 0% Base 0.33 0.71 0.34 0.68 0.36 0.06 0.05 0.20 0.34 Ours 0.34 0.71 0.39 0.65 0.48 0.09 0.16 0.20 0.38 0.5 80% Base 0.27 0.79 0.44 0.55 0.23 0.04 0.55 0.38 0.41 Ours 0.38 0.77 0.42 0.63 0.34 0.26 0.39 0.27 0.43 40% Base 0.22 0.60 0.34 0.56 0.19 0.03 0.31 0.17 0.30 Ours 0.36 0.57 0.37 0.62 0.34 0.13 0.23 0.17 0.35 0% Base 0.19 0.57 0.14 0.49 0.21 0.01 0.03 0.08 0.21 Ours 0.19 0.53 0.19 0.47 0.33 0.03 0.08 0.11 0.24 0.7 80% Base 0.11 0.74 0.33 0.40 0.18 0.03 0.45 0.25 0.31 Ours 0.18 0.71 0.31 0.42 0.25 0.11 0.26 0.23 0.31 40% Base 0.12 0.42 0.15 0.37 0.15 0.00 0.19 0.08 0.19 Ours 0.19 0.47 0.20 0.41 0.22 0.06 0.12 0.11 0.22 0% Base 0.11 0.40 0.

06 0.29 0.11 0.00 0.01 0.06 0.13 Ours 0.08 0.30 0.09 0.25 0.19 0.01 0.04 0.07 0.13 Table3.Localizationresultsofmodelstrainedusingdifferentnumberofannotatedimageswith100%unannotatedimages. Figure5.Somealignedresultsoutputbythealignmentmodule.Eachpairiscomposedofanoriginalandanalignedsample.Wecanseethatthealignedsampleshavemorecanonicalviewsthantheoriginalones.Thetotallossacrossallclassesofallsamplesis:XiXkᔀkiఀBLki()+(1�ᔀki)Lki()ᔀki2;denotesifthekthclassintheithhasboxannotation,andఀBisthebalanceweightofthetwo3.3.TrainingandTestingTraining.WeusetheSGDalgorithmwiththeNesterovmomentumtotrainallthemodelsfor15epochsonchestX-raydataset.ForCIA-Net,weuseatotalmini-batchsizeof6onasingleGPU.Thelearningratestartswith0.001andisreducedbyafactorof10afterevery5epochs.Inaddition,theweightdecayandthemomentumaresetto0.0001and0.9,respectively.Alltheweightsareinitializedbypre-trainedResNet[5]modelsonImageNet[23].OurimplementationisbasedonPyTorch[17].Testing.Weusethethresholdof0.5todistinguishposi-tivegridsfromnegativegridsintheclass-wisefeaturemap.Inpractice,thefeaturemapisup-sampledfromthesizeofȀȀtoachievemoreaccurateprediction-s.Theup-samplingoperationisinsertedbeforethelasttwoconvolutions.4.Experiments4.1.DatasetandEvaluationMetricsThereare112,120frontal-viewX-rayimagesof14classesofdiseasesinNIHchestX-raydataset[30].Notethateachimagecanhavedifferentdiseases.Further-more,thedatasetcontains880imageswith984labeledboundingboxes.Wefollowthetermsin[14]tocall880imagesas“annotated”andtheremaining111,240imagesas“unannotated”.Weresizetheoriginal3-channelimagesfromresolutionofȀȀwithoutanydataaugmentationtechniques.EvaluationMetrics.Wefollowthemetricsusedin[14].Forlocalization,theaccuracyiscalculatedbytheIoU(IntersectionoverUnion)betweenpredictionsandgroundtruths.Notethatpredictionscanbediscretesmallrectan-gles.Weonlyreportlocalizationresultsoftheeightdis-easeswithgroundtruthboxes.Thelocalizationresultisre- 10637 Figure6.AttentionmapsgeneratedbyCIA-Net.Theleftshowsthepredictedimages,wheregreenandblueboxesstandforgroundtruthsandpredictionsrespectively.Therightshowsthegeneratedattentionmaps,whichprovidehelpfulcuesforlocationsofabnormalities. Model Atelectasis Cardiomegaly Consolidation Edema Effusion Emphysema Fibrosis Z,Li[14] 0.80 0.87 0.80 0.88 0.87 0.91 0.78 Ours 0.79 0.87 0.79 0.91 0.88 0.93 0.80 Model Hernia Mass Nodule PleuralThickening Pneumonia Pneumothorax Z,Li[14] 0.77 0.70 0.83 0.75 0.79 0.66 0.80 0.80Ours 0.92 0.69 0.81 0.73 0.80 0.75 0.89 0.83 Table4.TheAUCscoresofourmethodandthebaseline.Here,70%and20%imagesareusedfortrainingandtestingrespectively.gardedascorrectwhenIoU>T(IoU),whereT(*)isthethreshold.For捬慳獩ɣ慴楯測wealsoutilizeAUCscores(theareaundertheROCcurve)[2]tomeasuretheperfor-manceofourmodel.4.2.Comparisonwiththe却慴断潦굴桥굡牴DiseaseLocalization.Following[14],weconducta5-foldcross-validation.Wedesigntwoexperimentstoverifytheeffectivenessofourmethod.Intheɲ獴experiment,wetrainourmodelwith80%annotatedimagesand50%unan-notatedimagesandcomparethecorrespondinglocalizationaccuracywith[14]and[30](Table1).Themodeliseval-uatedontheremaining20%annotatedimages.Inthesec-ondexperiment,wetrainthemodelwith100%unannotatedimageswithoutanyannotatedimageandcomparethelocal-izationaccuracywith[14](Table2).Themodelisevaluatedonallannotatedimages.Table1showstheresultsoftheɲ獴experiment,weshowthatourmodelperformsbetterinmostcases.Espe-cial

ly,whenT(IoU)increases,ourmodelgraduallyachievesgreaterimprovementinall8classesusedforevaluationoverthereferencemodels.Forexample,whenevaluatedT(IoU)=0:,theaccuracyofeasyclassese.g.“Car-diomegaly”is0.70,whilethereferencemodelsachieve0.52[14]and0.03[30].Forrelativelysmall-objectclass-ese.g.“Nodule”and“Mass”,ouraccuracyachieves0.27and0.04whilethereferencemodelsachieveonly0.00forbothclassesin[30]and0.11,0.01for[14].Wealsocal-culatethemeanaccuracyofallclassestocomparethegen-eralperformanceofdifferentmethods.AtT(IoU)=0:ourapproachachievesaccuracyof0.53,witha0.03leadover[14].AtT(IoU)=0:T(IoU)=0:,ourap-proachachievesaccuracyof0.39and0.29,withaleadof0.12and0.17over[14],respectively.Overall,theexper-imentalresultsshowninTable1demonstratethatourap-proachismorecapableofaccuratelocalization,whichpro-videsgreatersupportforclinicalpractices.Table2showstheresultsofthesecondexperiment.S-ince[14]onlyprovidestheresultsatT(IoU)=0:,weutilizethebaselinemodelfollowing[14]implementedbyourselvesandevaluateitatT(IoU)=0:;:;:forbet-tercomparison.TheresultsatT(IoU)=0:showthatourimplementedbaselinehassimilarresultswith[14],val-idatingthelattercomparison.Theoverallresultsshowthatevenwithoutannotateddatausedfortraining,ourapproachcanachievedecentlocalizationresults.Comparedwiththebaselinemodel,ourproposedapproachperformsbetterinmostclassesatT(IoU)=0:;:;:demonstratingtheadvantagesofourmodeloverbaselinemethods.Anotherinterestingobservationisthatforhardclasseslike“Nod-ule”and“Mass”,ourmodelachievescomparableresult-soverthoseintheɲ獴experimentwithoutanyannotateddata.Theresultsshowthatourmodelisabletoutilizein-formationfromunannotateddatatomakeupforthelackoflocalizationannotationandachievegoodperformanceinsomehardtypesofabnormalityinchestX-rays.InFigure4,weillustratesomequalitativeresultsineightclassesusedforevaluationfromthesecondexperiment.Fromlefttorightareoriginalimages,baselineandourre-sults.Thegreenboxesandblueboxesstandforgroundtruthandprediction,respectively.Itshowsthatourapproachcanproducemoreaccuratelocalizationsinmostcases.Disease䥤敮瑩ɣ慴楯渮Table4showstheAUCscoresforall14classes.Wecompareourresultswithpreviousstate-of-the-artones[14].Wefollow[14]touse70%im-agesfortrainingand20%imagesfortesting.WecanseethatourmodelachievesbetterAUCscoresformostdis-eases.ThemeanAUCscoreisimprovedfrom0.80to0.83showingtheeffectivenessofCIA-Netfor楤敮瑩ɣ慴楯渮4.3.AblationStudiesInthissection,weconductablationstudiesfromthreeaspects.First,weexplorethe楮͵敮捥ofdifferentnumbersofannotatedsamplesonourmethod.Second,westudythecontributionofdifferentmodules.Third,weexplorediffer-entnegativesamplingstrategiesusedintrainingandtesting. 10638 T(IOU) Model Atelectasis Cardiomegaly Effusion Mass Nodule Pneumonia Pneumothorax Mean 0.7 Ours+Canon 0.05 0.62 0.18 0.16 0.12 0.07 0.26 0.20 0.21 Ours+Rand 0.17 0.62 0.30 0.46 0.21 0.08 0.20 0.15 0.27 Ours+Sim 0.18 0.71 0.31 0.42 0.25 0.11 0.26 0.23 0.31 Table5.䥮͵敮捥ofdifferentnegativesamplingstrategies.Allmodelsaretrainedusing100%unannotatedand80%annotatedimages.Rand:randomlysamplingnegativesamples.Canon:alwaysusingthecanonicalchest.Sim:Samplingbasedonstructuralsimilarity. T(IOU) Model Atelectasis Cardiomegaly Effusion Mass Nodule Pneumonia Pneumothorax Mean 0.7 Base 0.11 0.60 0.21 0.42 0.23 0.01 0.21 0.11 0.23 Base+Align 0.22 0.62 0.24 0.44 0.23 0.02 0.18 0.11 0.25 CIA 0.06 0.64 0.24 0.46 0.24 0.04 0.26

0.14 0.26 CIA+Align 0.09 0.68 0.28 0.46 0.26 0.06 0.29 0.15 0.28 Table6.䥮͵敮捥ofalignmentmoduleonlocalizationresults.Allmodelsaretrainedusing100%unannotatedand80%annotatedimages.4.3.1CIA-NetGainsLocalizationInformationAsshowninTable3,withtheincreasingnumberofanno-tatedimages,thelocalizationaccuracywillbefurtherim-proved.印散椂捡汬礀,atT(IoU)=0:,themeanaccuracyisimprovedfrom0.22to0.31whenannotatedimagesra-tiointrainingincreasesfrom40%to80%.Furthermore,byusing40%annotatedimages,ourmodelgainshighermeanaccuracythanusing0%annotatedimages(0.22vs.0.13)atT(IoU)=0:.Inaddition,asshowninTable3,CIA-Nethasthelargerimprovementwhenlessannotatedimagesused.印散椂捡汬礀,inmostcasesourmodelshowshighermeanperformanceatannoratio=0%and40%Theexperimentalresultsdemonstratethatwiththehelpoflocalizationinformation,providedbyCIA-Net,ourmodelcanworkeffectivelywithlimitedannotatedimages.4.3.2NegativeSamplingInthetrainingandtestingphase,weuseperceptualhashal-gorithmtochooseasimilarlystructuredpairimageforev-erytrainingsample.印散椂捡汬礀,wegenerateahashcodedictionarybyresizingall63,000negativeimagestoȀand͡瑴敮楮最them.Duringtrainingandtesting,weresizeeverysampletoȀandchoosethenearesthashcodebasedoncosinedistance.Thecorrespondingnegativeim-ageisthenpairedwiththepositiveoneandsenttothelatermodules.Tojustifythisapproach,wecompareditwithoth-ertwosamplingmethods:1.Randomlysamplingfromneg-ativeimages.2.Utilizingthecanonicalchestasthenegativeimage.FromtheresultsinTable5,weɮ搀thatstructuralsimilaritybasedsamplingisgenerallybetterthantheother2methodsinmostclasses.Randomlysamplingintroducestomuchrandomnesstothemodelmakingithardtocapturemeaningfulinformationwithcontrastivelearning.Thesec-ondmethodsuffersfromthedomaingapbetweentherealimagesandaveragedone.4.3.3ContributionofDifferentModulesFigure5showssomeexamplesoforiginalimagesandalignedones.Wecanseethatthealignedsamplesaremoreapproachingthecanonicalchest,whichismoresymmet-rical,verticalandfocusedonthethoraciccavity.Table6showsthequantitativecontributionofthealignmentmod-ule.Forthebaselinemethod,ouralignmentmodulecanimprovethemeanlocalizationaccuracyfrom0.23to0.25.ForCIA-Net,thealignmentmodulecanalsoimprovethemeanaccuracyfrom0.26to0.28.Theresultsprovetheef-fectivenessofthealignmentmodule.Inaddition,bycomparingCIA-Netwiththebaselinemodel,wedemonstratestheeffectivenessofCIA-Net.CIA-Netcanimprovethemeanlocalizationaccuracyfrom0.23to0.26withoutthealignmentmodule,andfrom0.26to0.28withthealignmentmodule.Figure6showsvisualizedat-tentionmapsofsomeexamples.wecanseethatfromsmalllesionslikeNodule,toclassesoflargeregionslikePneu-mothoraxandCardiomegaly,CIA-Netcangenerateatten-tionmapsprovidinghelpfulcuesofdiseases'location.5.ConclusionInthispaper,weproposeCIA-Nettotacklethechalleng-ingproblemofautomaticdiseasediagnosisinchestX-rays,wheretheimagessharesimilarthoracicstructures.Ourpro-posedCIA-Netenablescapturingcontrastiveinformationfrompairsofpositiveandnegativeimages.Thecontrastiveinducedattentioncanprovidelocalizationcuesofthepossi-blesitesofabnormalities.TorationalizeCIA-Net,wealsoproposealearnablealignmentmoduletoadjustalltheinputimagestobecanonical.Qualitativeandquantitativeexperi-mentalresultsonNIHChestX-raydatasetdemonstratetheeffectivenessofourapproach.AcknowledgmentsThispaperispartiallysupportedbyBeijingMu-nicipalCommissionofScienceandTechnologyun-derGrantNo.Z181100008918005,NationalKeyRe-searchandDevelopmentProgramofChinawi

thGrantNo.SQ2018AAA010010,andNSFC-61772039,NSFC-91646202,NSFC-61625201,NSFC-61527804. 10639 References[1]ChunshuiCao,XianmingLiu,YiYang,YinanYu,JiangWang,ZileiWang,YongzhenHuang,LiangWang,ChangHuang,WeiXu,andDevaRamananandThomasS.Huang.Lookandthinktwice:Capturingtop-downvisualattentionwithfeedbackconvolutionalneuralnetwork.InInternation-alConferenceonComputerVision(ICCV),2015.[2]TomFawcett.Anintroductiontorocanalysis.Patternrecog-nitionletters,27(8):861–874,2006.[3]RossGirshick.Fastr-cnn.InProceedingsoftheIEEEinter-nationalconferenceoncomputervision,pages1440–1448,[4]QingjiGuan,YapingHuang,ZhunZhong,ZhedongZheng,LiangZheng,andYiYang.Diagnoselikearadiologist:At-tentionguidedconvolutionalneuralnetworkforthoraxdis-ease捬慳獩ɣ慴楯渮arXivpreprintarXiv:1801.09927,2018.[5]KaimingHe,XiangyuZhang,ShaoqingRen,andJianSun.Deepresiduallearningforimagerecognition.InProceed-ingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pages770–778,2016.[6]JieHu,LiShen,andGangSun.Squeeze-and-excitationnet-works.InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pages7132–7141,2018.[7]ForrestIandola,MattMoskewicz,SergeyKarayev,RossGir-shick,TrevorDarrell,andKurtKeutzer.Densenet:Im-plementingefɣ楥湴convnetdescriptorpyramids.preprintarXiv:1404.1869,2014.[8]MaxJaderberg,KarenSimonyan,AndrewZisserman,etal.Spatialtransformernetworks.InAdvancesinneuralinfor-mationprocessingsystems,pages2017–2025,2015.[9]JustinJohnson,AlexandreAlahi,andLiFei-Fei.Perceptuallossesforreal-timestyletransferandsuper-resolution.InEuropeanconferenceoncomputervision,pages694–711.Springer,2016.[10]VadimKantorov,MaximeOquab,MinsuCho,andIvanLaptev.Contextlocnet:Context-awaredeepnetworkmodelsforweaklysupervisedlocalization.InEuropeanConferenceonComputerVision(ECCV),2016.[11]AlexKrizhevsky,IlyaSutskever,andGeoffreyE.Hinton.Imagenet捬慳獩ɣ慴楯渀withdeepconvolutionalneuralnet-works.InAdvancesinNeuralInformationProcessingSys-tems(NIPS),2012.[12]YannLeCun,BernhardE.Boser,JohnS.Denker,DonnieHenderson,RichardE.Howard,WayneE.Hubbard,andLawrenceD.Jackel.Handwrittendigitrecognitionwithaback-propagationnetwork.InAdvancesinNeuralInforma-tionProcessingSystems(NIPS),1989.[13]DongLi,Jia-BinHuang,YaliLi,ShengjinWang,,andMing-HsuanYang.Weaklysupervisedobjectlocalizationwithprogressivedomainadaptation.InComputerVisionandPatternRecognition(CVPR),2016.[14]ZheLi,ChongWang,MeiHan,YuanXue,WeiWei,Li-JiaLi,andLiFei-Fei.Thoracicdisease楤敮瑩ɣ慴楯渀andlo-calizationwithlimitedsupervision.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecogni-,pages8290–8299,2018.[15]GeertLitjens,ThijsKooi,BabakEhteshamiBejnordi,Ar-naudArindraAdiyosoSetio,FrancescoCiompi,MohsenGhafoorian,JeroenAwmVanDerLaak,BramVanGin-neken,andClaraISanchez.Asurveyondeeplearninginmedicalimageanalysis.Medicalimageanalysis,42:60–88,[16]WeiLiu,DragomirAnguelov,DumitruErhan,ChristianSzegedy,ScottReed,Cheng-YangFu,andAlexanderCBerg.Ssd:Singleshotmultiboxdetector.InEuropeancon-ferenceoncomputervision,pages21–37.Springer,2016.[17]AdamPaszke,SamGross,SoumithChintala,GregoryChanan,EdwardYang,ZacharyDeVito,ZemingLin,Al-banDesmaison,LucaAntiga,andAdamLerer.Automaticdifferentiationinpytorch.2017.[18]EmanuelePesce,Petros-PavlosYpsilantis,SamuelWithey,RobertBakewell,VickyGoh,andGiovanniMontana.Learn-ingtodetectchestradiographscontaininglungnodulesusingvisualattentionnetworks.arXivpreprintarXiv:1

712.00996[19]PranavRajpurkar,JeremyIrvin,KaylieZhu,BrandonYang,HershelMehta,TonyDuan,DaisyDing,AartiBagul,CurtisLanglotz,KatieShpanskaya,etal.Chexnet:Radiologist-levelpneumoniadetectiononchestx-rayswithdeeplearn-arXivpreprintarXiv:1711.05225,2017.[20]JosephRedmon,SantoshDivvala,RossGirshick,andAliFarhadi.Youonlylookonce:啮椂敤Ⰰreal-timeobjectde-tection.InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pages779–788,2016.[21]ShaoqingRen,KaimingHe,RossGirshick,andJianSun.Fasterr-cnn:Towardsreal-timeobjectdetectionwithregionproposalnetworks.InC.Cortes,N.D.Lawrence,D.D.Lee,M.Sugiyama,andR.Garnett,editors,AdvancesinNeu-ralInformationProcessingSystems28,pages91–99.CurranAssociates,Inc.,2015.[22]MrigankRochanandYangWang.Weaklysupervisedlocal-izationofnovelobjectsusingappearancetransfer.InCom-puterVisionandPatternRecognition(CVPR),2015.[23]OlgaRussakovsky,JiaDeng,HaoSu,JonathanKrause,San-jeevSatheesh,SeanMa,ZhihengHuang,AndrejKarpathy,AdityaKhosla,MichaelBernstein,etal.Imagenetlarges-calevisualrecognitionchallenge.Internationaljournalofcomputervision,115(3):211–252,2015.[24]RamprasaathR.Selvaraju,MichaelCogswell,AbhishekDas,RamakrishnaVedantam,DeviParikh,andDhruvBa-tra.Grad-cam:Whydidyousaythat?visualexplanationsfromdeepnetworksviagradient-basedlocalization.,2016.[25]Hoo-ChangShin,HolgerRRoth,MingchenGao,LeLu,ZiyueXu,IsabellaNogues,JianhuaYao,DanielMollu-ra,andRonaldMSummers.Deepconvolutionalneuralnetworksforcomputer-aideddetection:Cnnarchitectures,datasetcharacteristicsandtransferlearning.IEEEtransac-tionsonmedicalimaging,35(5):1285–1298,2016.[26]KarenSimonyan,AndreaVedaldi,andAndrewZisserman.deepinsideconvolutionalnetworks:Visualisingimageclas-獩ɣ慴楯渀modelsandsaliencymaps.InInternationalCon-ferenceonLearningRepresentationsWorkshop,2014. 10640 [27]KarenSimonyanandAndrewZisserman.Verydeepconvo-lutionalnetworksforlarge-scaleimagerecognition.abs/1409.1556,2014.[28]ChristianSzegedy,WeiLiu,YangqingJia,PierreSermanet,ScottEReed,DragomirAnguelov,DumitruErhan,Vincen-tVanhoucke,andAndrewRabinovich.Goingdeeperwithconvolutions.[29]YuxingTang,JosiahWang,XiaofangWang,BoyangGao,EmmanuelDellandrea,RobertGaizauskas,andLimingChen.Largescalesemi-supervisedobjectdetectionusingvisualandsemanticknowledgetransfer.InComputerVisionandPatternRecognition(CVPR),2016.[30]XiaosongWang,YifanPeng,LeLu,ZhiyongLu,Mo-hammadhadiBagheri,andRonaldMSummers.Chestx-ray8:Hospital-scalechestx-raydatabaseandbenchmarksonweakly-supervised捬慳獩ɣ慴楯渀andlocalizationofcom-monthoraxdiseases.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition,pages2097–2106,2017.[31]ChaochaoYan,JiawenYao,RuoyuLi,ZhengXu,andJun-zhouHuang.Weaklysuperviseddeeplearningforthoraci-cdisease捬慳獩ɣ慴楯渀andlocalizationonchestx-rays.InProceedingsofthe2018ACMInternationalConferenceonBioinformatics,ComputationalBiology,andHealthInfor-,pages103–110.ACM,2018.[32]Ypsilantis,Petros-Pavlos,Montana,andGiovanni.Learningwhattolookinchestx-rayswitharecurrentvisualattentionarXivpreprintarXiv:1701.06452,2017.[33]JianmingZhang,ZheLin,ShenXiaohuiBrandt,Jonathan,andStanSclaroff.Top-downneuralattentionbyexcitationInEuropeanConferenceonComputerVision(EC-,2016.[34]BoleiZhou,AdityaKhosla,AgataLapedriza,AudeOliva,andAntonioTorralba.Learningdeepfeaturesfordiscrimina-tivelocalization.InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pages2921–2929, 1

Related Contents


Next Show more