/
ClassesMatterAFinegrainedAdversarialApproachtoCrossdomainSemanticSeg ClassesMatterAFinegrainedAdversarialApproachtoCrossdomainSemanticSeg

ClassesMatterAFinegrainedAdversarialApproachtoCrossdomainSemanticSeg - PDF document

bitsy
bitsy . @bitsy
Follow
342 views
Uploaded On 2021-09-15

ClassesMatterAFinegrainedAdversarialApproachtoCrossdomainSemanticSeg - PPT Presentation

TheseauthorscontributedequallyThisworkwasperformedwhenHaoranWangwasvisitingJDAIresearchasaresearchintern2HWangetalFig1Illustrationoftraditionalandour12negrainedadversariallearningTraditionaladversar ID: 881752

city cityscapes fada cross cityscapes city cross fada adaptsegnet classesmatter wangetal afine 524 levelalignment clan 026 120 baseline 713

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "ClassesMatterAFinegrainedAdversarialAppr..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1 ClassesMatter:AFine-grainedAdversarialAp
ClassesMatter:AFine-grainedAdversarialApproachtoCross-domainSemanticSegmentationHaoranWang1*,TongShen2*,WeiZhang2,Ling-YuDuan3,andTaoMei21ETHZurich2JDAIResearch3PekingUniversityAbstract.Despitegreatprogressinsupervisedsemanticsegmentation,alargeperformancedropisusuallyobservedwhendeployingthemodelinthewild.Domainadaptationmethodstackletheissuebyaligningthesourcedomainandthetargetdomain.However,mostexistingmethodsattempttoperformthealignmentfromaholisticview,ignoringtheun-derlyingclass-leveldatastructureinthetargetdomain.Tofullyexploitthesupervisioninthesourcedomain,weproposea ne-grainedadversar-iallearningstrategyforclass-levelfeaturealignmentwhilepreservingtheinternalstructureofsemanticsacrossdomains.Weadopta ne-graineddomaindiscriminatorthatnotonlyplaysasadomaindistinguisher,butalsodi erentiatesdomainsatclasslevel.Thetraditionalbinarydomainlabelsarealsogeneralizedtodomainencodingsasthesupervisionsignaltoguidethe ne-grainedfeaturealignment.AnanalysiswithClassCen-terDistance(CCD)validatesthatour ne-grainedadversarialstrategyachievesbetterclass-levelalignmentcomparedtootherstate-of-the-artmethods.Ourmethodiseasytoimplementanditse ectivenessisevalu-atedonthreeclassicaldomainadaptationtasks,i.e.,GTA5!Cityscapes,SYNTHIA!CityscapesandCityscapes!Cross-City.Largeperformancegainsshowthatourmethodoutperformsotherglobalfeaturealignmentbasedandclass-wisealignmentbasedcounterparts.Thecodeispubliclyavailableathttps://github.com/JDAI-CV/FADA.1IntroductionThesuccessofsemanticsegmentation[26]inrecentyearsismostlydrivenbyalargeamountofaccessiblelabeleddata.However,collectingmassivedenselyannotateddatafortrainingisusuallyalabor-intensivetask[9].Recentadvancesincomputergraphicsprovideanalternativeforreplacingexpensivehumanlabor.Throughphysicallybasedrendering,wecanobtainphoto-realistici

2 mageswiththepixel-levelground-truthreadi
mageswiththepixel-levelground-truthreadilyavailableinane ortlessway[23,24].However,performancedropisobservedwhenthemodeltrainedwithsyn-theticdata(asourcedomain)isappliedinrealisticscenarios(atargetdomain),becausethedatafromdi erentdomainsusuallysharedi erentdistributions. *Theseauthorscontributedequally.ThisworkwasperformedwhenHaoranWangwasvisitingJDAIresearchasaresearchintern. 2H.Wangetal. Fig.1:Illustrationoftraditionalandour ne-grainedadversariallearning.Tra-ditionaladversariallearningpursuesthemarginaldistributionalignmentwhileignoringthesemanticstructureinconsistencybetweendomains.Weproposetousea ne-graineddiscriminatortoenableclass-levelalignment.Thisphenomenonisknownasdomainshiftproblem[27],whichposesachal-lengetocross-domaintasks[16].Domainadaptationaimstoalleviatethedomainshiftproblembyaligningthefeaturedistributionsofthesourceandthetargetdomain.Agroupofworksfocusonadoptinganadversarialframework,whereadomaindiscriminatoristrainedtodistinguishthetargetsamplesfromthesourceones,whilethefeaturenetworktriestofoolthediscriminatorbygeneratingdomain-invariantfeatures[8,15,16,20,25,30,34,35,38].Althoughimpressiveprogresshasbeenachievedindomainadaptiveseman-ticsegmentation,mostofpriorworksstrivetoalignglobalfeaturedistributionswithoutpayingmuchattentiontotheunderlyingstructuresamongclasses.How-ever,asdiscussedinrecentworks[3,17],matchingthemarginalfeaturedistribu-tionsdoesnotguaranteesmallexpectederroronthetargetdomain.Theclassconditionaldistributionsshouldalsobealigned,meaningthatclass-levelalign-mentalsoplaysanimportantrole.AsillustratedinFigure1,theupperpartshowstheresultofglobalfeaturealignmentwherethetwodomainsarewell-alignedbutsomesamplesarefalselymixedup.Thismotivatesustoincorporateclassinformationintotheadversarialframeworktoenable ne-grainedfeaturealignment.Asi

3 llustratedinthebottomofFigure1,featuresa
llustratedinthebottomofFigure1,featuresareexpectedtobealignedaccordingtospeci cclasses.Therehavebeensomepioneeringworks[7,20]tryingtoaddressthisprob-lem.Chenetal.[7]proposetouseseveralindependentdiscriminatorstoperformclass-wisealignment,butindependentdiscriminatorsmightfailtocapturethe ClassesMatter:AFine-grainedAdversarialApproach3relationshipsbetweenclasses.Luoetal.[20]introduceanself-adaptiveadver-sariallosstoapplydi erentweightstoeachregion.However,infact,theydonotexplicitlyincorporateclassinformationintheirmethods,whichmightfailtopromoteclass-levelalignment.Ourmotivationistodirectlyincorporateclassinformationintothediscrim-inatorandencourageittoalignfeaturesata ne-grainedlevel.Traditionaladversarialtraininghasbeenprovene ectiveforaligningfeaturesbyusingabi-narydomaindiscriminatortomodeldistributionP(djf)(dreferstodomainandfisthefeatureextractedfrominputdata).Byconfusingsuchadiscriminator,expectingP(d=0jf)P(d=1jf)where0standsforthesourcedomainand1forthetargetdomain,thefeaturesbecomedomaininvariantandwellaligned.Tofurthertakeclassesintoaccount,wesplittheoutputintomultiplechannelsaccordingtoP(djf)=PKc=1P(d;cjf)(wherecreferstoclassesf1;:::;Kg).WedirectlymodelthediscriminatorasP(d;cjf)toformulatea ne-graineddomainalignmenttask.Althoughinthesettingofdomainadaptationthecategory-levellabelsfortargetdomainareinaccessible,we ndthatthemodelpredictionsontargetdomainalsocontainclassinformationandprovethatitispossibletosupervisethediscriminatorwiththepredictionsonbothdomains.Inthead-versariallearningprocess,classinformationisincorporatedandthefeaturesareexpectedtobealignedaccordingtospeci cclasses.Inthispaper,weproposesucha ne-grainedadversariallearningframeworkfordomainadaptivesemanticsegmentation(FADA).AsillustratedinFigure1,werepresentthesupervisionoftraditiona

4 ldiscriminatorata ne-grainedse-manti
ldiscriminatorata ne-grainedse-manticlevel,whichenablesour ne-graineddiscriminatortocapturerichclass-levelinformation.Theadversariallearningprocessisperformedat ne-grainedlevel,sothefeaturesareexpectedtobeadaptivelyalignedaccordingtotheircorrespondingsemanticcategories.Theclassmismatchproblem,whichbroadlyexistsintheglobalfeaturealignment,isexpectedtobefurthersuppressed.Cor-respondingly,byincorporatingclassinformation,thebinarydomainlabelsarealsogeneralizedtoamorecomplexform,called\domainencodings"toserveasthenewsupervisionsignal.Domainencodingscouldbeextractedfromthenet-work'spredictionsonbothdomains.Di erentstrategiesofconstructingdomainencodingswillbediscussed.WeconductananalysiswithClassCenterDistancetodemonstratethee ectivenessofourmethodregardingclass-levelalignment.Ourmethodisalsoevaluatedonthreepopularcross-domainbenchmarksandpresentsnewstate-of-the-artresults.Themaincontributionsofthispaperaresummarizedbelow.{Weproposea ne-grainedadversariallearningframeworkforcross-domainsemanticsegmentationthatexplicitlyincorporatesclass-levelinformation.{The ne-grainedlearningframeworkenablesclass-levelfeaturealignment,whichisfurtherveri edbyanalysisusingClassCenterDistance.{Weevaluateourmethodswithcomprehensiveexperiments.Signi cantim-provementscomparedtootherstate-of-the-artmethodsareachievedonpop-ulardomainadaptivesegmentationtasksincludingGTA5!Cityscapes,SYNTHIA!CityscapesandCityscapes!Cross-City. 4H.Wangetal.2RelatedWork2.1SemanticSegmentationSemanticsegmentationisataskofpredictinguniquesemanticlabelforeachpixeloftheinputimage.Withtheadventofdeepconvolutionalneuralnetworks,theacademiaofcomputervisionwitnessesahugeprogressinthis eld.FCN[26]triggeredtheinterestsinintroducingdeeplearningforthistask.Manyfollow-upmethodsareproposedtoenlargethereceptive el

5 dstocovermorecontextinformation[4{6,36].
dstocovermorecontextinformation[4{6,36].Amongalltheseworks,thefamilyofDeeplab[4{6]attractsalotofattentionandhasbeenwidelyappliedinmanyworksfortheirsimplicityande ectiveness.2.2DomainAdaptationDomainadaptationstrivestoaddresstheperformancedropcausedbythedif-ferentdistributionsoftrainingdataandtestingdata.Intherecentyears,sev-eralworksareproposedtoapproachthisprobleminimageclassi cation[3,25].Inspiredbythetheoreticalupperboundofriskintargetdomain[2],somepio-neeringworkssuggesttooptimizesomedistancemeasurementsbetweenthetwodomainstoalignthefeatures[18,29].Recently,motivatedbyGAN[13],adver-sarialtrainingbecomespopularforitspowertoalignfeaturesglobally[7,25,30].2.3DomainAdaptiveSemanticSegmentationUnlikedomainadaptationforimageclassi cationtask,domainadaptiveseman-ticsegmentationreceiveslessattentionforitsdicultyeventhoughitsupportsmanyimportantapplicationsincludingautonomousdrivinginthewild[8,16].Basedonthetheoreticalinsight[2]ondomainadaptiveclassi cation,mostworksfollowthepathofshorteningthedomaindiscrepancybetweenthetwodomains.Largeprogressisachievedthroughoptimizationbyadversarialtrainingorex-plicitdomaindiscrepancymeasures[15,16,30].Inthecontextofdomainadaptivesemanticsegmentationtask,AdaptSegnet[30]attemptstoalignthedistributionintheoutputspace.InspiredbyCycleGAN[37],CyCADA[15]suggeststoadapttherepresentationinpixel-levelandfeature-level.Therearealsomanyworksfo-cusingonaligningdi erentpropertiesbetweentwodomainssuchasentropy[32]andinformation[19].Althoughhugeprogresshasbeenmadeinthis eld,mostofexistingmethodsshareacommonlimitation:Enforcingglobalfeaturealignmentwouldinevitablymixsampleswithdi erentsemanticlabelstogetherwhendrawingtwodomainscloser,whichusuallyresultsinamismatchofclassesfromdi erentdomains.CLAN[20]isapioneerworktoaddresscategory-levelalign

6 ment.Itsuggestsapplyingdi erentadver
ment.Itsuggestsapplyingdi erentadversarialweighttodi erentregions,butitdoesnotdirectlyandexplicitlyincorporatetheclassesintothemodel. ClassesMatter:AFine-grainedAdversarialApproach5 Fig.2:Overviewoftheproposed ne-grainedadversarialframework.Imagesfromthesourcedomainandtargetdomainarerandomlypickedandfedtothefeatureextractorandtheclassi er.Asegmentationlossiscomputedwiththesourcepre-dictionsandthesourceannotationstohelpthesegmentationnetworktogeneratediscriminativefeaturesandlearntaskspeci cknowledge.Thesemanticfeaturesfrombothdomainsarefedtotheconvolutional ne-graineddomaindiscrimina-tor.Thediscriminatorstrivestodistinguishthefeature'sdomaininformationata ne-grainedclasslevelusingthedomainencodingsprocessedfromthesamplepredictions.3Method3.1RevisitTraditionalFeatureAlignmentSemanticsegmentationaimstopredictper-pixeluniquelabelfortheinputim-age[26].Inanunsuperviseddomainadaptationsettingforsemanticsegmen-tation,wehaveaccesstoacollectionoflabeleddataXS=f(x(s)i;y(s)i)gnsi=1inasourcedomainS,andunlabeleddataXT=fx(t)jgntj=1inatargetdomainTwherensandntarethenumbersofsamplesfromdi erentdomains.DomainSanddomainTsharethesameKsemanticclasslabelsf1;:::;Kg.ThegoalistolearnasegmentationmodelGwhichcouldachievealowexpectedriskonthetargetdomain.Generally,segmentationnetworkGcouldbedividedintoafeatureextractorFandamulti-classclassi erC,whereG=CF.Traditionalfeature-leveladversarialtrainingreliesonabinarydomaindis-criminatorDtoalignthefeaturesextractedbyFonbothdomains.DomainadaptationistackledbyalternativelyoptimizingGandDwithtwosteps:(1)Distrainedtodistinguishfeaturesfromdi erentdomains.Thisprocessisusuallyachievedby xingFandCandsolving:minDLD=�nsXi=1(1�d)logP(d=0jfi)�ntXj=1dlogP(d=1jfj)(1) 6H.Wangetal.wherefiandfjarethefeaturesextractedbyFonsourcesample

7 x(s)iandtargetsamplex(t)j;dreferstothedo
x(s)iandtargetsamplex(t)j;dreferstothedomainvariablewhere0referstothesourcedomainand1referstothetargetdomain.P(djf)istheprobabilityoutputfromthediscriminator.(2)GistrainedwiththetasklossLsegonthesourcedomainandthead-versariallossLadvonthetargetdomain.Thisprocessrequires xingDandupdatingFandC:minF;CLseg+advLadv(2)Thecross-entropylossLsegonsourcedomainminimizesthedi erencebe-tweenthepredictionandthegroundtruth,whichhelpsGtolearnthetaskspeci cknowledge.Lseg=�nsXi=1KXk=1y(s)iklogp(s)ik;(3)wherep(s)ikistheprobabilitycon denceofsourcesamplex(s)ibelongingtose-manticclasskpredictedbyC,y(s)ikistheentryfortheone-hotlabel.TheadversariallossLadvisusedtoconfusethediscriminatortoencourageFtogeneratedomaininvariantfeatures.Ladv=�ntXj=1logP(d=0jfj)(4)3.2Fine-grainedAdversarialLearningToincorporatetheclassinformationintotheadversariallearningframework,weproposeanoveldiscriminatorandenablea ne-grainedadversariallearningprocess.ThewholepipelineisillustratedinFigure2.Thetraditionaladversarialtrainingstrivestoalignthemarginaldistributionbyconfusingabinarydiscriminator.Tomakethediscriminatornotmerelyfocusondistinguishingdomains,wespliteachofthetwooutputchannelsofthebi-narydiscriminatorintoKchannelsandencouragea ne-grainedleveladversariallearning.Withthisdesign,thepredictedcon dencefordomainsisrepresentedasacon dencedistributionoverdi erentclasses,whichenablesthenew ne-graineddiscriminatortomodelmorecomplexunderlyingstructuresbetweenclasses,thusencouragingclass-levelalignment.Correspondingly,thebinarydomainlabelsarealsoconvertedtoageneralform,namelydomainencodings,toincorporateclassinformation.Traditionally,thedomainlabelsusedfortrainingthebinarydiscriminatorare[1;0]and[0;1]forthesourceandtargetdomainsrespectively.Thedomainencodingsarerep-resentedasavector[a;0]and[

8 0;a]forthetwodomainsrespectively,whereai
0;a]forthetwodomainsrespectively,whereaistheknowledgeextractedfromtheclassi erCrepresentedbyaK-dimensional ClassesMatter:AFine-grainedAdversarialApproach7 Fig.3:Illustrationofdi erentstrategiestogeneratedomainencodings.Herewecomparethreedi erentstrategiestoextractknowledgefromsegmentationnetworkforconstructingdomainencodings:binarydomainlabels,one-hothardlabelsandmultichannelsoftlabels.vector;0isanall-zeroK-dimensionalvector.ThechoicesofhowtogeneratedomainknowledgeawillbediscussedinSection3.3.Duringthetrainingprocess,thediscriminatornotonlytriestodistinguishdomains,butalsolearnstomodelclassstructures.TheLDinEquation1be-comes:LD=�nsXi=1KXk=1a(s)iklogP(d=0;c=kjfi)�ntXj=1KXk=1a(t)jklogP(d=1;c=kjfj)(5)wherea(s)ikanda(t)jkarethekthentriesoftheclassknowledgeforthesourcesampleiandtargetsamplej.TheadversariallossLadvusedtoconfusethediscriminatorandguidethegenerationofdomain-invariantfeaturesinEquation4becomes:Ladv=�ntXj=1KXk=1a(t)jklogP(d=0;c=kjfj);(6)Ladvisdesignedtomaximizetheprobabilityoffeaturesfromtargetdomainbeingconsideredasthesourcefeatureswithouthurtingtherelationshipbetweenfeaturesandclasses.TheoverallnetworkinFigure2isusedinthetrainingstage.Duringinference,thedomainadaptationcomponentisremovedandoneonlyneedstousetheoriginalsegmentationnetworkwiththeadaptedweights. 8H.Wangetal.3.3ExtractingclassknowledgefordomainencodingsNowthatwehavea ne-graineddomaindiscriminator,whichcouldadaptivelyalignfeaturesaccordingtotheclass-levelinformationcontainedindomainen-codings,anotherchallengeraises:howtogettheclassknowledgea(s)ikanda(t)ikinEquations5and6toconstructdomainencodingforeachsample?Consider-ingthatintheunsuperviseddomainadaptivesemanticsegmentationtasknoneofannotationsintargetdomainisaccessible,itseemscontradictorytousetheclassknowledgeonthetargetdomainforguidingclass-lev

9 elalignment.How-ever,duringtraining,with
elalignment.How-ever,duringtraining,withground-truthannotationsfromthesourcedomain,theclassi erClearnstomapfeaturesintothesemanticclasses.Consideringthatthesourcedomainandthetargetdomainsharethesamesemanticclasses,itwouldbeanaturalchoicetousethepredictionsofCasknowledgetosupervisethediscriminator.Asillustratedinequations5and6,theclassknowledgeforoptimizingthe ne-graineddiscriminatorworksasthesupervisionsignal.Thechoicesofa(s)ikanda(t)jkareopentomanypossibilities.Forspeci ctasks,peoplecoulddesigndi erentformstoproduceclassknowledgewithpriorknowledge.Herewediscusstwogeneralsolutionstoextractclassknowledgefromnetworkpredictionsforconstructingdomainencodings.Becausetheclass-levelknowledgefordi erentdomainscouldbeextractedinthesameway,inthefollowingdiscussionwewoulduseaktorepresentkthentryforasinglesamplewithoutdi erentiatingthedomain.Theone-hothardlabelscouldbeastraightforwardsolutionforgeneratingknowledge,whichcouldbedenotedas:ak=(1ifk=argmaxkpk0otherwise(7)wherepkisthesoftmaxprobabilityoutputofCforclassk.Inthisway,onlythemostcon dentclassisselected.Inpractice,inordertoremovetheimpactofnoisysamples,wecanselectsampleswhosecon denceishigherthanacertainthresholdandignorethosewithlowcon dence.Anotheralternativeismulti-channelsoftlabels,whichhasthefollowingdef-inition:ak=exp(zk T) PKj=1exp(zj T)(8)wherezkiskthentryoflogitsandTisatemperaturetoencouragesoftproba-bilitydistributionoverclasses.Notethatduringtraining,anadditionalregular-izationcouldalsobeapplied.Forexample,wepractically ndthatclippingthevaluesofthesoftlabelsbyagiventhresholdachievesmorestableperformancebecauseitpreventsfromover ttingtocertainclasses.AnillustrativecomparisonofthesetwostrategieswiththetraditionalbinarydomainlabelsispresentedinFigure3.Wealsoconductexperimentsinsection4.6todemonstra

10 tetheperformanceofdi erentstrategies
tetheperformanceofdi erentstrategies. ClassesMatter:AFine-grainedAdversarialApproach9Table1:ExperimentalresultsforCityscapes!Cross-City. Cityscapes!Cross-City City Method roadsidewalkbuildinglightsignvegskypersonridercarbusmbikebike mIoU Rome SourceDilation-Frontend 77.721.983.50.110.778.988.121.610.067.230.46.10.6 38.2 Cross-City[7] 79.529.384.50.022.280.682.829.513.071.737.525.91.0 42.9 SourceDeepLab-v2 83.934.387.713.041.984.692.537.722.480.838.139.15.3 50.9 AdaptSegNet[30] 83.934.288.318.840.286.293.147.821.780.947.848.38.6 53.8 FADA 84.935.888.320.540.185.992.856.223.283.631.853.214.6 54.7 Rio SourceDilation-Frontend 69.031.877.04.73.771.880.838.28.061.238.911.53.4 38.5 Cross-City[7] 74.243.979.02.47.577.869.539.310.367.941.227.910.9 42.5 SourceDeepLab-v2 76.647.382.512.622.577.986.543.019.874.536.829.416.7 48.2 AdaptSegNet[30] 76.244.784.69.325.581.887.355.332.774.328.943.027.6 51.6 FADA 80.653.484.25.823.078.487.760.226.477.137.653.742.3 54.7 Tokyo SourceDilation-Frontend 81.226.771.78.75.673.275.739.314.957.619.01.633.8 39.2 Cross-City[7] 83.435.472.812.312.777.464.342.721.564.120.88.940.3 42.8 SourceDeepLab-v2 83.435.472.812.312.777.464.342.721.564.120.88.940.3 42.8 AdaptSegNet[30] 81.526.077.817.826.882.790.955.838.072.14.224.550.8 49.9 FADA 85.839.576.014.724.984.691.762.227.771.43.029.356.3 51.3 Taipei SourceDilation-Frontend 77.220.976.05.94.360.381.410.911.054.932.615.35.2 35.1 Cross-City[7] 78.628.680.013.17.668.282.116.89.460.434.026.59.9 39.6 SourceDeepLab-V2 78.628.680.013.17.668.282.116.89.460.434.026.59.9 39.6 AdaptSegNet[30] 81.729.585.226.415.676.791.731.012.571.541.147.327.7 49.1 FADA 86.042.386.16.220.578.392.747.217.772.237.254.344.0 52.7 4Experiments4.1DatasetsWepresentacomprehensiveevaluationofourproposedmethodonthreepop-ularunsuperviseddomainadaptivesemanticseg

11 mentationbenchmarks,e.g.,Cityscapes!Cros
mentationbenchmarks,e.g.,Cityscapes!Cross-City,SYNTHIA!Cityscapes,andGTA5!Cityscapes.CityscapesCityscapes[9]isareal-worldurbanscenedatasetconsistingofatrainingsetwith2,975images,avalidationsetwith500imagesandatestingsetwith1,525images.Followingthestandardprotocols[15,16,30],weusethe2,975imagesfromCityscapestrainingsetastheunlabeledtargetdomaintrainingsetandevaluateouradaptedmodelonthe500imagesfromthevalidationset.Cross-CityCross-City[7]isanurbanscenedatasetcollectedwithGoogleStreetView.Itcontains3,200unlabeledimagesand100annotatedimagesoffourdi erentcitiesrespectively.TheannotationsofCross-Cityshare13classeswithCityscapes.SYNTHIASYNTHIA[24]isasyntheticurbanscenedataset.WepickitssubsetSYNTHIA-RAND-CITYSCAPES,whichshares16semanticclasseswithCityscapes,asthesourcedomain.Intotal,9,400imagesfromSYNTHIAdatasetareusedassourcedomaintrainingdataforthetask.GTA5GTA5dataset[23]isanothersyntheticdatasetsharing19seman-ticclasseswithCityscapes.24,966urbansceneimagesarecollectedfromaphysically-basedrenderedvideogameGrandTheftAutoV(GTAV)andareusedassourcetrainingdata. 10H.Wangetal.Table2:ExperimentalresultsforSYNTHIA!Cityscapes. SYNTHIA!Cityscapes Backbone Method RoadSWBuildWallFencePoleTLTSVeg.SkyPRRiderCarBusMotorBike mIoU mIoU* VGG-16 FCNsinthewild[16] 11.519.630.84.40.020.30.111.742.368.751.23.854.03.20.20.6 20.2 22.9 CDA[34] 65.226.174.90.10.510.73.53.076.170.647.18.243.220.70.713.1 29.0 34.8 ST[38] 0.214.553.81.60.018.90.97.872.280.348.16.367.74.70.24.5 23.9 27.8 CBST[38] 69.628.769.512.10.125.411.913.682.081.949.114.566.06.63.732.4 35.4 36.1 AdaptSegNet[30] 78.929.275.5---0.14.872.676.743.48.871.116.03.68.4 - 37.6 SIBAN[19] 70.125.780.9---3.87.272.380.543.35.073.316.01.73.6 - 37.2 CLAN[20] 80.430.774.7---1.48.077.179.046.58.973.818.22.29.9 - 39.3 AdaptPatch[31] 72.629.577.23.50.421.01.47.973.379.04

12 5.714.569.419.67.416.5 33.7 39.6 ADVENT[
5.714.569.419.67.416.5 33.7 39.6 ADVENT[32] 67.929.471.96.30.319.90.62.674.974.935.49.667.821.44.115.5 31.4 36.6 Sourceonly 10.014.752.44.20.120.93.56.574.377.544.94.964.021.64.26.4 25.6 29.6 Baseline(feat.only)[30] 63.626.867.33.80.321.51.07.476.176.540.511.262.119.45.313.2 31.0 36.2 FADA 80.435.980.92.50.330.47.922.381.883.648.916.877.731.113.517.9 39.5 46.0 ResNet-101 SIBAN[19] 82.524.079.4---16.512.779.282.858.318.079.325.317.625.9 - 46.3 AdaptSegNet[30] 84.342.777.5---4.77.077.982.554.321.072.332.218.932.3 - 46.7 CLAN[20] 81.337.080.1---16.113.778.281.553.421.273.032.922.630.7 - 47.8 AdaptPatch[31] 82.438.078.68.70.626.03.911.175.584.653.521.671.432.619.331.7 40.0 46.5 ADVENT[32] 85.642.279.78.70.425.95.48.180.484.157.923.873.336.414.233.0 41.2 48.0 Sourceonly 55.623.874.69.20.224.46.112.174.879.055.319.139.623.313.725.0 33.5 38.6 Baseline(feat.only)[30] 62.421.976.311.50.124.911.711.475.380.953.718.559.713.720.624.0 35.4 40.8 FADA 84.540.183.14.80.034.320.127.284.884.053.522.685.443.726.827.8 45.2 52.5 4.2EvaluationMetricsThemetricsforevaluatingouralgorithmisconsistentwiththecommonsemanticsegmentationtask.Speci cally,wecomputePSACALVOCintersection-over-union(IoU)[11]ofourpredictionandthegroundtruthlabel.WehaveIoU=TP TP+FP+FN,whereTP,FPandFNarethenumbersoftruepositive,falsepositiveandfalsenegativepixelsrespectively.InadditiontotheIoUforeachclass,amIoUisalsoreportedasthemeanofIoUsoverallclasses.4.3ImplementationDetailsOurpipelineisimplementedbyPyTorch[22].Forfaircomparison,weemployDeeplabV2[4]withVGG-16[28]andResNet-101[14]asthesegmentationbasenetworks.Allmodelsarepre-trainedonImageNet[10].Forthe ne-graineddiscriminator,weadoptasimplestructureconsistingof3convolutionlayerswithchannelnumbersf256;128;2Kg,33kernels,andstrideof1.EachconvolutionlayerisfollowedbyaLeaky-ReLU[21]param

13 eterizedby0.2exceptforthelastlayer.Totra
eterizedby0.2exceptforthelastlayer.Totrainthesegmentationnetwork,weusetheStochasticGradientDescent(SGD)optimizerwherethemomentumis0.9andtheweightdecayis10�4.Thelearningrateisinitiallysetto2:510�4andisdecreasedfollowinga`poly'learningratepolicywithpowerof0.9.Fortrainingthediscriminator,weadopttheAdamoptimizerwith 1=0:9, 2=0:99andtheinitiallearningrateas10�4.Thesame'poly'learningratepolicyisused.advisconstantlysetto0.001.TemperatureTissetas1.8forallexperiments. ClassesMatter:AFine-grainedAdversarialApproach11Table3:ExperimentalresultsforGTA5!Cityscapes. GTA5!Cityscapes Backbone Method RoadSWBuildWallFencePoleTLTSVeg.TerrainSkyPRRiderCarTruckBusTrainMotorBike mIoU VGG-16 FCNsinthewild[16] 70.432.462.114.95.410.914.22.779.221.364.644.14.270.48.07.30.03.50.0 27.1 CDA[34] 74.922.071.76.011.98.416.311.175.713.366.538.09.355.218.818.90.016.814.6 28.9 ST[38] 83.817.472.114.62.916.516.06.881.424.247.240.77.671.710.27.60.511.10.9 28.1 CBST[38] 90.450.872.018.39.527.228.614.182.425.170.842.614.576.95.912.51.214.028.6 36.1 CyCADA[15] 85.237.276.521.815.023.822.921.580.531.360.750.59.076.917.128.24.59.80.0 35.4 AdaptSegNet[30] 87.329.878.621.118.222.521.511.079.729.671.346.86.580.123.026.90.010.60.3 35.0 SIBAN[19] 83.413.077.820.417.524.622.89.681.329.677.342.710.976.022.817.95.714.22.0 34.2 CLAN[20] 88.030.679.223.420.526.123.014.881.634.572.045.87.980.526.629.90.010.70.0 36.6 AdaptPatch[31] 87.335.779.532.014.521.524.813.780.432.070.550.516.981.020.828.14.115.54.1 37.5 ADVENT[32] 86.928.778.728.525.217.120.310.980.026.470.247.18.481.526.017.218.911.71.6 36.1 Sourceonly 35.413.272.116.711.620.722.513.176.07.666.141.119.069.815.216.30.016.24.7 28.3 Baseline(feat.only)[30] 85.722.877.624.810.622.219.710.879.727.864.841.518.479.719.921.80.516.24.2 34.1 FADA 92.351.183.733.129.128.52

14 8.021.082.632.685.355.228.883.524.437.40
8.021.082.632.685.355.228.883.524.437.40.021.115.2 43.8 ResNet-101 AdaptSegNet[30] 86.536.079.923.423.323.935.214.883.433.375.658.527.673.732.535.43.930.128.1 42.4 SIBAN[19] 88.535.479.526.324.328.532.518.381.240.076.558.125.882.630.334.43.421.621.5 42.6 CLAN[20] 87.027.179.627.323.328.335.524.283.627.474.258.628.076.233.136.76.731.931.4 43.2 AdaptPatch[31] 92.351.982.129.225.124.533.833.082.432.882.258.627.284.333.446.32.229.532.3 46.5 ADVENT[32] 89.433.181.026.626.827.233.524.783.936.778.858.730.584.838.544.51.731.632.4 45.5 Sourceonly 65.016.168.718.616.821.331.411.283.022.078.054.433.873.912.730.713.728.119.7 36.8 Baseline(feat.only)[30] 83.727.675.520.319.927.428.327.479.028.470.155.120.272.922.535.78.320.623.0 39.3 FADA 92.547.585.137.632.833.433.818.485.337.783.563.239.787.532.947.81.634.939.5 49.2 FADA-MST 91.050.686.043.429.836.843.425.086.838.387.464.038.085.231.646.16.525.437.1 50.1 Regardingthetrainingprocedure,thenetworkis rsttrainedonsourcedatafor20kiterationsandthen ne-tunedusingourframeworkfor40kiterations.Thebatchsizeiseight.Fouraresourceimagesandtheotherfouraretargetimages.Somedataaugmentationsareusedincludingrandom ipandcolorjitteringtopreventover tting.Althoughourmodelisalreadyabletoachievenewstate-of-the-artresults,wefurtherboosttheperformancebyusingselfdistillation[1,12,33]andmulti-scaletesting.AdetailedablationstudyisconductedinSection4.5torevealthee ectofeachcomponent,which,wehope,couldprovidemoreinsightsintothetopic.4.4ComparisonwithState-of-the-artMethodsSmallshift:Crosscityadaptation.Adaptationbetweenrealimagesfromdi erentcitiesisascenariowithgreatpotentialforpracticalapplications.Table1showstheresultsofdomainadaptationonCityscapes!Cross-Citydataset.Ourmethodhasdi erentperformancegainsforthefourcities.Onaverageoverfourcities,ourFADAachieves8.

15 5%improvementcomparedwiththesource-onlyb
5%improvementcomparedwiththesource-onlybaselines,and2.25%gaincomparedwiththepreviousbestmethod.Largeshift:Synthetictorealadaptation.Table2and3demonstratethesemanticsegmentationperformanceonSYNTHIA!CityscapesandGTA5!Cityscapestasksincomparisonwithexistingstate-of-the-artdomainadaptationmethods.WecouldobservethatourFADAoutperformstheexistingmethodsbyalargemarginandobtainnewstate-of-the-artperformanceintermsofmIoU.Comparedtothesourcemodelwithoutanyadaptation,againof16.4%and13.9%areachievedforVGG16andResNet101respectivelyonSYNTHIA! 12H.Wangetal. Fig.4:Quantitativeanalysisofthefeaturejointdistributions.Foreachclass,weshowtheClassCenterDistanceasde nedinEquation9.OurFADAshowsabetteralignedstructureinclass-levelcomparedwithotherstate-of-the-artmethods.Cityscapes.FADAalsoobtains15.5%and12.4%improvementondi erentbae-linesforGTA5!Cityscapestask.Besides,comparedtothestate-of-the-artfeature-levelmethods,ageneralimprovementofover4%iswitnessed.Notethatasmentionedin[34],the\train"imagesinCityscapesaremorevisuallysimilartothe\bus"inGTA5insteadofthe\train"inGTA5,whichisalsoachallengetoothermethods.QualitativeresultsforGTA5!CityscapestaskarepresentedatFigure5,re ectingthatFADAalsobringsasigni cantvisualimprovement.4.5FeaturedistributionToverifywhetherour ne-grainedadversarialframeworkalignsfeaturesonaclass-level,wedesignanexperimenttoinvestigatetowhatdegreetheclass-levelfeaturesarealigned.Consideringdi erentnetworksmapfeaturestodi erentfeaturespaces,it'snecessarilyto ndastablemetric.CLAN[20]suggeststouseaClusterCenterDistance,whichisde nedastheratioofintra-classdistancebetweenthetrainedmodelandtheinitialmodel,tomeasureclass-levelalignmentdegree.Tobetterevaluatethee ectivenessofclass-levelfeaturealignmentonthesamescale,weproposetomodifytheClusterCenterDistancetotheClassCenter

16 Distance(CCD)bytakinginter-classdistance
Distance(CCD)bytakinginter-classdistanceintoaccount.TheCCDforclassiisde nedasfollows:CCD(i)=1 K�1KXj=1;j6=i1 jSijPx2Sikx�ik2 ki�jk2(9)whereiistheclasscenterforclassi,Siisthesetofallfeaturesbelongingtoclassi.WithCCD,wecouldmeasuretheratioofintra-classcompactnessoverinter-classdistance.AlowCCDsuggeststhefeaturesofsameclassareclus-tereddenselywhilethedistancebetweendi erentclassesisrelativelylarge.Werandomlypick2,000sourcesamplesand2,000targetsamplesrespectively,and ClassesMatter:AFine-grainedAdversarialApproach13Table4:Ablationstudiesofeachcomponent.F-Advrefersto ne-grainedadver-sarialtraining;SDreferstoselfdistillation;MSTreferstomulti-scaletesting. F-AdvSDMSTmIoU 36.8X46.9XX49.2XXX50.1 comparetheCCDvalueswithotherstate-of-the-artmethods:AdaptSegNetforglobalalignmentandCLANforclass-wisealignmentwithoutexplicitlymodel-ingtheclassrelationship.AsshownintheFigure4,FADAachievesamuchlowerCCDonmostclassesandgetthelowestmeanCCDvalue1.1comparedtootheralgorithms.WithFADA,wecanachievebetterclass-levelalignmentandpreserveconsistentclassstructuresbetweendomains.4.6AblationstudiesAnalysisofdi erentcomponents.Table4presentstheimpactofeachcomponentonDeeplabV2withResNet-101onGTA5!Cityscapestask.The ne-grainedadversarialtrainingbringsanimprovementof10.1%,whichalreadymakesitthenewstateoftheart.Tofurtherexplorethepotentialofthemodel,theselfdistillationstrategyleadstoanimprovementof2.3%andmulti-scaletestingfurtherbooststheperformanceby0.7%.Hardlabelsvs.Softlabels.AsdiscussedinSection3.3,theknowledgeextractedfromtheclassi erCcouldbeproducedfromhardlabelsorsoftlabels.HerewecomparethesetwoformsoflabelonGTA5!CityscapesandSYN-THIA!CityscapestaskswithDeeplabV2ResNet-101.Forsoftlabels,weuse"con denceclipping\withthrehold0.9asregularization.Forhardlabels,weonlykeeph

17 igh-con dencesamples,whileignoringth
igh-con dencesamples,whileignoringthesampleswithcon dencelowerthan0.9.TheresultsarereportedinTable5.Bothchoicesgivegreatboosttothebaselineglobalfeaturealignmentmodel.Weobservethatsoftlabelisamore exiblechoiceandpresentmoresuperiorperformance.ImpactofCon denceClipping.Inourexperiments,weuse"con denceclipping"asaregularizertopreventover ttingonnoisysoftlabels.Theval-uesofthecon dencearetruncatedbyagiventhreshold,thereforethevaluesarenotencouragedtoheavily ttoacertainclass.WetestseveralthresholdsandtheresultsareshowninTable6.Notethatwhenthethresholdis1.0,itmeansnoregularizationisused.Weobserveconstantperformancegainusingthecon denceclipping.Thebestresultisfoundwhenthethresholdis0.9.5ConclusionInthispaper,weaddresstheproblemofdomainadaptivesemanticsegmenta-tionbyproposinga ne-grainedadversarialtrainingframework.Anovel ne- 14H.Wangetal.Table5:Comparisonofdi erentstrategiesforextractingclass-levelknowledgeonGTA5!CityscapesandSYNTHIA!Cityscapestasks. GTA5SYNTHIA baseline[30]39.435.4hardlabels45.740.8softlabels46.941.5 Table6:In uenceofthresholdforcon denceclipping. GTA5!Cityscapes threshold0.70.80.91.0mIoU46.246.346.945.7 graineddiscriminatorisdesignedtonotonlydistinguishdomains,butalsocap-turecategory-levelinformationtoguidea ne-grainedfeaturealignment.Thebi-narydomainlabelsusedtosupervisethediscriminatoraregeneralizedtodomainencodingscorrespondinglytoincorporateclassinformation.Comprehensiveex-perimentsandanalysisvalidatethee ectivenessofourmethod.Ourmethodachievesnewstate-of-the-artresultsonthreepopulartasks,outperformingothermethodsbyalargemargin. ImageBeforeadaptationAfteradaptationGround-truthFig.5:QualitativesegmentationresultsforGTA5!Cityscapes.Acknowledgement:ThisworkwaspartiallysupportedbyBeijingAcademyofArti cialIntelligence(BAAI

Related Contents


Next Show more