TheseauthorscontributedequallyThisworkwasperformedwhenHaoranWangwasvisitingJDAIresearchasaresearchintern2HWangetalFig1Illustrationoftraditionalandour12negrainedadversariallearningTraditionaladversar ID: 881752
Download Pdf The PPT/PDF document "ClassesMatterAFinegrainedAdversarialAppr..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1 ClassesMatter:AFine-grainedAdversarialAp
ClassesMatter:AFine-grainedAdversarialApproachtoCross-domainSemanticSegmentationHaoranWang1*,TongShen2*,WeiZhang2,Ling-YuDuan3,andTaoMei21ETHZurich2JDAIResearch3PekingUniversityAbstract.Despitegreatprogressinsupervisedsemanticsegmentation,alargeperformancedropisusuallyobservedwhendeployingthemodelinthewild.Domainadaptationmethodstackletheissuebyaligningthesourcedomainandthetargetdomain.However,mostexistingmethodsattempttoperformthealignmentfromaholisticview,ignoringtheun-derlyingclass-leveldatastructureinthetargetdomain.Tofullyexploitthesupervisioninthesourcedomain,weproposeane-grainedadversar-iallearningstrategyforclass-levelfeaturealignmentwhilepreservingtheinternalstructureofsemanticsacrossdomains.Weadoptane-graineddomaindiscriminatorthatnotonlyplaysasadomaindistinguisher,butalsodierentiatesdomainsatclasslevel.Thetraditionalbinarydomainlabelsarealsogeneralizedtodomainencodingsasthesupervisionsignaltoguidethene-grainedfeaturealignment.AnanalysiswithClassCen-terDistance(CCD)validatesthatourne-grainedadversarialstrategyachievesbetterclass-levelalignmentcomparedtootherstate-of-the-artmethods.Ourmethodiseasytoimplementanditseectivenessisevalu-atedonthreeclassicaldomainadaptationtasks,i.e.,GTA5!Cityscapes,SYNTHIA!CityscapesandCityscapes!Cross-City.Largeperformancegainsshowthatourmethodoutperformsotherglobalfeaturealignmentbasedandclass-wisealignmentbasedcounterparts.Thecodeispubliclyavailableathttps://github.com/JDAI-CV/FADA.1IntroductionThesuccessofsemanticsegmentation[26]inrecentyearsismostlydrivenbyalargeamountofaccessiblelabeleddata.However,collectingmassivedenselyannotateddatafortrainingisusuallyalabor-intensivetask[9].Recentadvancesincomputergraphicsprovideanalternativeforreplacingexpensivehumanlabor.Throughphysicallybasedrendering,wecanobtainphoto-realistici
2 mageswiththepixel-levelground-truthreadi
mageswiththepixel-levelground-truthreadilyavailableinaneortlessway[23,24].However,performancedropisobservedwhenthemodeltrainedwithsyn-theticdata(asourcedomain)isappliedinrealisticscenarios(atargetdomain),becausethedatafromdierentdomainsusuallysharedierentdistributions. *Theseauthorscontributedequally.ThisworkwasperformedwhenHaoranWangwasvisitingJDAIresearchasaresearchintern. 2H.Wangetal. Fig.1:Illustrationoftraditionalandourne-grainedadversariallearning.Tra-ditionaladversariallearningpursuesthemarginaldistributionalignmentwhileignoringthesemanticstructureinconsistencybetweendomains.Weproposetouseane-graineddiscriminatortoenableclass-levelalignment.Thisphenomenonisknownasdomainshiftproblem[27],whichposesachal-lengetocross-domaintasks[16].Domainadaptationaimstoalleviatethedomainshiftproblembyaligningthefeaturedistributionsofthesourceandthetargetdomain.Agroupofworksfocusonadoptinganadversarialframework,whereadomaindiscriminatoristrainedtodistinguishthetargetsamplesfromthesourceones,whilethefeaturenetworktriestofoolthediscriminatorbygeneratingdomain-invariantfeatures[8,15,16,20,25,30,34,35,38].Althoughimpressiveprogresshasbeenachievedindomainadaptiveseman-ticsegmentation,mostofpriorworksstrivetoalignglobalfeaturedistributionswithoutpayingmuchattentiontotheunderlyingstructuresamongclasses.How-ever,asdiscussedinrecentworks[3,17],matchingthemarginalfeaturedistribu-tionsdoesnotguaranteesmallexpectederroronthetargetdomain.Theclassconditionaldistributionsshouldalsobealigned,meaningthatclass-levelalign-mentalsoplaysanimportantrole.AsillustratedinFigure1,theupperpartshowstheresultofglobalfeaturealignmentwherethetwodomainsarewell-alignedbutsomesamplesarefalselymixedup.Thismotivatesustoincorporateclassinformationintotheadversarialframeworktoenablene-grainedfeaturealignment.Asi
3 llustratedinthebottomofFigure1,featuresa
llustratedinthebottomofFigure1,featuresareexpectedtobealignedaccordingtospecicclasses.Therehavebeensomepioneeringworks[7,20]tryingtoaddressthisprob-lem.Chenetal.[7]proposetouseseveralindependentdiscriminatorstoperformclass-wisealignment,butindependentdiscriminatorsmightfailtocapturethe ClassesMatter:AFine-grainedAdversarialApproach3relationshipsbetweenclasses.Luoetal.[20]introduceanself-adaptiveadver-sariallosstoapplydierentweightstoeachregion.However,infact,theydonotexplicitlyincorporateclassinformationintheirmethods,whichmightfailtopromoteclass-levelalignment.Ourmotivationistodirectlyincorporateclassinformationintothediscrim-inatorandencourageittoalignfeaturesatane-grainedlevel.Traditionaladversarialtraininghasbeenproveneectiveforaligningfeaturesbyusingabi-narydomaindiscriminatortomodeldistributionP(djf)(dreferstodomainandfisthefeatureextractedfrominputdata).Byconfusingsuchadiscriminator,expectingP(d=0jf)P(d=1jf)where0standsforthesourcedomainand1forthetargetdomain,thefeaturesbecomedomaininvariantandwellaligned.Tofurthertakeclassesintoaccount,wesplittheoutputintomultiplechannelsaccordingtoP(djf)=PKc=1P(d;cjf)(wherecreferstoclassesf1;:::;Kg).WedirectlymodelthediscriminatorasP(d;cjf)toformulateane-graineddomainalignmenttask.Althoughinthesettingofdomainadaptationthecategory-levellabelsfortargetdomainareinaccessible,wendthatthemodelpredictionsontargetdomainalsocontainclassinformationandprovethatitispossibletosupervisethediscriminatorwiththepredictionsonbothdomains.Inthead-versariallearningprocess,classinformationisincorporatedandthefeaturesareexpectedtobealignedaccordingtospecicclasses.Inthispaper,weproposesuchane-grainedadversariallearningframeworkfordomainadaptivesemanticsegmentation(FADA).AsillustratedinFigure1,werepresentthesupervisionoftraditiona
4 ldiscriminatoratane-grainedse-manti
ldiscriminatoratane-grainedse-manticlevel,whichenablesourne-graineddiscriminatortocapturerichclass-levelinformation.Theadversariallearningprocessisperformedatne-grainedlevel,sothefeaturesareexpectedtobeadaptivelyalignedaccordingtotheircorrespondingsemanticcategories.Theclassmismatchproblem,whichbroadlyexistsintheglobalfeaturealignment,isexpectedtobefurthersuppressed.Cor-respondingly,byincorporatingclassinformation,thebinarydomainlabelsarealsogeneralizedtoamorecomplexform,called\domainencodings"toserveasthenewsupervisionsignal.Domainencodingscouldbeextractedfromthenet-work'spredictionsonbothdomains.Dierentstrategiesofconstructingdomainencodingswillbediscussed.WeconductananalysiswithClassCenterDistancetodemonstratetheeectivenessofourmethodregardingclass-levelalignment.Ourmethodisalsoevaluatedonthreepopularcross-domainbenchmarksandpresentsnewstate-of-the-artresults.Themaincontributionsofthispaperaresummarizedbelow.{Weproposeane-grainedadversariallearningframeworkforcross-domainsemanticsegmentationthatexplicitlyincorporatesclass-levelinformation.{Thene-grainedlearningframeworkenablesclass-levelfeaturealignment,whichisfurtherveriedbyanalysisusingClassCenterDistance.{Weevaluateourmethodswithcomprehensiveexperiments.Signicantim-provementscomparedtootherstate-of-the-artmethodsareachievedonpop-ulardomainadaptivesegmentationtasksincludingGTA5!Cityscapes,SYNTHIA!CityscapesandCityscapes!Cross-City. 4H.Wangetal.2RelatedWork2.1SemanticSegmentationSemanticsegmentationisataskofpredictinguniquesemanticlabelforeachpixeloftheinputimage.Withtheadventofdeepconvolutionalneuralnetworks,theacademiaofcomputervisionwitnessesahugeprogressinthiseld.FCN[26]triggeredtheinterestsinintroducingdeeplearningforthistask.Manyfollow-upmethodsareproposedtoenlargethereceptiveel
5 dstocovermorecontextinformation[4{6,36].
dstocovermorecontextinformation[4{6,36].Amongalltheseworks,thefamilyofDeeplab[4{6]attractsalotofattentionandhasbeenwidelyappliedinmanyworksfortheirsimplicityandeectiveness.2.2DomainAdaptationDomainadaptationstrivestoaddresstheperformancedropcausedbythedif-ferentdistributionsoftrainingdataandtestingdata.Intherecentyears,sev-eralworksareproposedtoapproachthisprobleminimageclassication[3,25].Inspiredbythetheoreticalupperboundofriskintargetdomain[2],somepio-neeringworkssuggesttooptimizesomedistancemeasurementsbetweenthetwodomainstoalignthefeatures[18,29].Recently,motivatedbyGAN[13],adver-sarialtrainingbecomespopularforitspowertoalignfeaturesglobally[7,25,30].2.3DomainAdaptiveSemanticSegmentationUnlikedomainadaptationforimageclassicationtask,domainadaptiveseman-ticsegmentationreceiveslessattentionforitsdicultyeventhoughitsupportsmanyimportantapplicationsincludingautonomousdrivinginthewild[8,16].Basedonthetheoreticalinsight[2]ondomainadaptiveclassication,mostworksfollowthepathofshorteningthedomaindiscrepancybetweenthetwodomains.Largeprogressisachievedthroughoptimizationbyadversarialtrainingorex-plicitdomaindiscrepancymeasures[15,16,30].Inthecontextofdomainadaptivesemanticsegmentationtask,AdaptSegnet[30]attemptstoalignthedistributionintheoutputspace.InspiredbyCycleGAN[37],CyCADA[15]suggeststoadapttherepresentationinpixel-levelandfeature-level.Therearealsomanyworksfo-cusingonaligningdierentpropertiesbetweentwodomainssuchasentropy[32]andinformation[19].Althoughhugeprogresshasbeenmadeinthiseld,mostofexistingmethodsshareacommonlimitation:Enforcingglobalfeaturealignmentwouldinevitablymixsampleswithdierentsemanticlabelstogetherwhendrawingtwodomainscloser,whichusuallyresultsinamismatchofclassesfromdierentdomains.CLAN[20]isapioneerworktoaddresscategory-levelalign
6 ment.Itsuggestsapplyingdierentadver
ment.Itsuggestsapplyingdierentadversarialweighttodierentregions,butitdoesnotdirectlyandexplicitlyincorporatetheclassesintothemodel. ClassesMatter:AFine-grainedAdversarialApproach5 Fig.2:Overviewoftheproposedne-grainedadversarialframework.Imagesfromthesourcedomainandtargetdomainarerandomlypickedandfedtothefeatureextractorandtheclassier.Asegmentationlossiscomputedwiththesourcepre-dictionsandthesourceannotationstohelpthesegmentationnetworktogeneratediscriminativefeaturesandlearntaskspecicknowledge.Thesemanticfeaturesfrombothdomainsarefedtotheconvolutionalne-graineddomaindiscrimina-tor.Thediscriminatorstrivestodistinguishthefeature'sdomaininformationatane-grainedclasslevelusingthedomainencodingsprocessedfromthesamplepredictions.3Method3.1RevisitTraditionalFeatureAlignmentSemanticsegmentationaimstopredictper-pixeluniquelabelfortheinputim-age[26].Inanunsuperviseddomainadaptationsettingforsemanticsegmen-tation,wehaveaccesstoacollectionoflabeleddataXS=f(x(s)i;y(s)i)gnsi=1inasourcedomainS,andunlabeleddataXT=fx(t)jgntj=1inatargetdomainTwherensandntarethenumbersofsamplesfromdierentdomains.DomainSanddomainTsharethesameKsemanticclasslabelsf1;:::;Kg.ThegoalistolearnasegmentationmodelGwhichcouldachievealowexpectedriskonthetargetdomain.Generally,segmentationnetworkGcouldbedividedintoafeatureextractorFandamulti-classclassierC,whereG=CF.Traditionalfeature-leveladversarialtrainingreliesonabinarydomaindis-criminatorDtoalignthefeaturesextractedbyFonbothdomains.DomainadaptationistackledbyalternativelyoptimizingGandDwithtwosteps:(1)Distrainedtodistinguishfeaturesfromdierentdomains.ThisprocessisusuallyachievedbyxingFandCandsolving:minDLD=nsXi=1(1d)logP(d=0jfi)ntXj=1dlogP(d=1jfj)(1) 6H.Wangetal.wherefiandfjarethefeaturesextractedbyFonsourcesample
7 x(s)iandtargetsamplex(t)j;dreferstothedo
x(s)iandtargetsamplex(t)j;dreferstothedomainvariablewhere0referstothesourcedomainand1referstothetargetdomain.P(djf)istheprobabilityoutputfromthediscriminator.(2)GistrainedwiththetasklossLsegonthesourcedomainandthead-versariallossLadvonthetargetdomain.ThisprocessrequiresxingDandupdatingFandC:minF;CLseg+advLadv(2)Thecross-entropylossLsegonsourcedomainminimizesthedierencebe-tweenthepredictionandthegroundtruth,whichhelpsGtolearnthetaskspecicknowledge.Lseg=nsXi=1KXk=1y(s)iklogp(s)ik;(3)wherep(s)ikistheprobabilitycondenceofsourcesamplex(s)ibelongingtose-manticclasskpredictedbyC,y(s)ikistheentryfortheone-hotlabel.TheadversariallossLadvisusedtoconfusethediscriminatortoencourageFtogeneratedomaininvariantfeatures.Ladv=ntXj=1logP(d=0jfj)(4)3.2Fine-grainedAdversarialLearningToincorporatetheclassinformationintotheadversariallearningframework,weproposeanoveldiscriminatorandenableane-grainedadversariallearningprocess.ThewholepipelineisillustratedinFigure2.Thetraditionaladversarialtrainingstrivestoalignthemarginaldistributionbyconfusingabinarydiscriminator.Tomakethediscriminatornotmerelyfocusondistinguishingdomains,wespliteachofthetwooutputchannelsofthebi-narydiscriminatorintoKchannelsandencourageane-grainedleveladversariallearning.Withthisdesign,thepredictedcondencefordomainsisrepresentedasacondencedistributionoverdierentclasses,whichenablesthenewne-graineddiscriminatortomodelmorecomplexunderlyingstructuresbetweenclasses,thusencouragingclass-levelalignment.Correspondingly,thebinarydomainlabelsarealsoconvertedtoageneralform,namelydomainencodings,toincorporateclassinformation.Traditionally,thedomainlabelsusedfortrainingthebinarydiscriminatorare[1;0]and[0;1]forthesourceandtargetdomainsrespectively.Thedomainencodingsarerep-resentedasavector[a;0]and[
8 0;a]forthetwodomainsrespectively,whereai
0;a]forthetwodomainsrespectively,whereaistheknowledgeextractedfromtheclassierCrepresentedbyaK-dimensional ClassesMatter:AFine-grainedAdversarialApproach7 Fig.3:Illustrationofdierentstrategiestogeneratedomainencodings.Herewecomparethreedierentstrategiestoextractknowledgefromsegmentationnetworkforconstructingdomainencodings:binarydomainlabels,one-hothardlabelsandmultichannelsoftlabels.vector;0isanall-zeroK-dimensionalvector.ThechoicesofhowtogeneratedomainknowledgeawillbediscussedinSection3.3.Duringthetrainingprocess,thediscriminatornotonlytriestodistinguishdomains,butalsolearnstomodelclassstructures.TheLDinEquation1be-comes:LD=nsXi=1KXk=1a(s)iklogP(d=0;c=kjfi)ntXj=1KXk=1a(t)jklogP(d=1;c=kjfj)(5)wherea(s)ikanda(t)jkarethekthentriesoftheclassknowledgeforthesourcesampleiandtargetsamplej.TheadversariallossLadvusedtoconfusethediscriminatorandguidethegenerationofdomain-invariantfeaturesinEquation4becomes:Ladv=ntXj=1KXk=1a(t)jklogP(d=0;c=kjfj);(6)Ladvisdesignedtomaximizetheprobabilityoffeaturesfromtargetdomainbeingconsideredasthesourcefeatureswithouthurtingtherelationshipbetweenfeaturesandclasses.TheoverallnetworkinFigure2isusedinthetrainingstage.Duringinference,thedomainadaptationcomponentisremovedandoneonlyneedstousetheoriginalsegmentationnetworkwiththeadaptedweights. 8H.Wangetal.3.3ExtractingclassknowledgefordomainencodingsNowthatwehaveane-graineddomaindiscriminator,whichcouldadaptivelyalignfeaturesaccordingtotheclass-levelinformationcontainedindomainen-codings,anotherchallengeraises:howtogettheclassknowledgea(s)ikanda(t)ikinEquations5and6toconstructdomainencodingforeachsample?Consider-ingthatintheunsuperviseddomainadaptivesemanticsegmentationtasknoneofannotationsintargetdomainisaccessible,itseemscontradictorytousetheclassknowledgeonthetargetdomainforguidingclass-lev
9 elalignment.How-ever,duringtraining,with
elalignment.How-ever,duringtraining,withground-truthannotationsfromthesourcedomain,theclassierClearnstomapfeaturesintothesemanticclasses.Consideringthatthesourcedomainandthetargetdomainsharethesamesemanticclasses,itwouldbeanaturalchoicetousethepredictionsofCasknowledgetosupervisethediscriminator.Asillustratedinequations5and6,theclassknowledgeforoptimizingthene-graineddiscriminatorworksasthesupervisionsignal.Thechoicesofa(s)ikanda(t)jkareopentomanypossibilities.Forspecictasks,peoplecoulddesigndierentformstoproduceclassknowledgewithpriorknowledge.Herewediscusstwogeneralsolutionstoextractclassknowledgefromnetworkpredictionsforconstructingdomainencodings.Becausetheclass-levelknowledgefordierentdomainscouldbeextractedinthesameway,inthefollowingdiscussionwewoulduseaktorepresentkthentryforasinglesamplewithoutdierentiatingthedomain.Theone-hothardlabelscouldbeastraightforwardsolutionforgeneratingknowledge,whichcouldbedenotedas:ak=(1ifk=argmaxkpk0otherwise(7)wherepkisthesoftmaxprobabilityoutputofCforclassk.Inthisway,onlythemostcondentclassisselected.Inpractice,inordertoremovetheimpactofnoisysamples,wecanselectsampleswhosecondenceishigherthanacertainthresholdandignorethosewithlowcondence.Anotheralternativeismulti-channelsoftlabels,whichhasthefollowingdef-inition:ak=exp(zk T) PKj=1exp(zj T)(8)wherezkiskthentryoflogitsandTisatemperaturetoencouragesoftproba-bilitydistributionoverclasses.Notethatduringtraining,anadditionalregular-izationcouldalsobeapplied.Forexample,wepracticallyndthatclippingthevaluesofthesoftlabelsbyagiventhresholdachievesmorestableperformancebecauseitpreventsfromoverttingtocertainclasses.AnillustrativecomparisonofthesetwostrategieswiththetraditionalbinarydomainlabelsispresentedinFigure3.Wealsoconductexperimentsinsection4.6todemonstra
10 tetheperformanceofdierentstrategies
tetheperformanceofdierentstrategies. ClassesMatter:AFine-grainedAdversarialApproach9Table1:ExperimentalresultsforCityscapes!Cross-City. Cityscapes!Cross-City City Method roadsidewalkbuildinglightsignvegskypersonridercarbusmbikebike mIoU Rome SourceDilation-Frontend 77.721.983.50.110.778.988.121.610.067.230.46.10.6 38.2 Cross-City[7] 79.529.384.50.022.280.682.829.513.071.737.525.91.0 42.9 SourceDeepLab-v2 83.934.387.713.041.984.692.537.722.480.838.139.15.3 50.9 AdaptSegNet[30] 83.934.288.318.840.286.293.147.821.780.947.848.38.6 53.8 FADA 84.935.888.320.540.185.992.856.223.283.631.853.214.6 54.7 Rio SourceDilation-Frontend 69.031.877.04.73.771.880.838.28.061.238.911.53.4 38.5 Cross-City[7] 74.243.979.02.47.577.869.539.310.367.941.227.910.9 42.5 SourceDeepLab-v2 76.647.382.512.622.577.986.543.019.874.536.829.416.7 48.2 AdaptSegNet[30] 76.244.784.69.325.581.887.355.332.774.328.943.027.6 51.6 FADA 80.653.484.25.823.078.487.760.226.477.137.653.742.3 54.7 Tokyo SourceDilation-Frontend 81.226.771.78.75.673.275.739.314.957.619.01.633.8 39.2 Cross-City[7] 83.435.472.812.312.777.464.342.721.564.120.88.940.3 42.8 SourceDeepLab-v2 83.435.472.812.312.777.464.342.721.564.120.88.940.3 42.8 AdaptSegNet[30] 81.526.077.817.826.882.790.955.838.072.14.224.550.8 49.9 FADA 85.839.576.014.724.984.691.762.227.771.43.029.356.3 51.3 Taipei SourceDilation-Frontend 77.220.976.05.94.360.381.410.911.054.932.615.35.2 35.1 Cross-City[7] 78.628.680.013.17.668.282.116.89.460.434.026.59.9 39.6 SourceDeepLab-V2 78.628.680.013.17.668.282.116.89.460.434.026.59.9 39.6 AdaptSegNet[30] 81.729.585.226.415.676.791.731.012.571.541.147.327.7 49.1 FADA 86.042.386.16.220.578.392.747.217.772.237.254.344.0 52.7 4Experiments4.1DatasetsWepresentacomprehensiveevaluationofourproposedmethodonthreepop-ularunsuperviseddomainadaptivesemanticseg
11 mentationbenchmarks,e.g.,Cityscapes!Cros
mentationbenchmarks,e.g.,Cityscapes!Cross-City,SYNTHIA!Cityscapes,andGTA5!Cityscapes.CityscapesCityscapes[9]isareal-worldurbanscenedatasetconsistingofatrainingsetwith2,975images,avalidationsetwith500imagesandatestingsetwith1,525images.Followingthestandardprotocols[15,16,30],weusethe2,975imagesfromCityscapestrainingsetastheunlabeledtargetdomaintrainingsetandevaluateouradaptedmodelonthe500imagesfromthevalidationset.Cross-CityCross-City[7]isanurbanscenedatasetcollectedwithGoogleStreetView.Itcontains3,200unlabeledimagesand100annotatedimagesoffourdierentcitiesrespectively.TheannotationsofCross-Cityshare13classeswithCityscapes.SYNTHIASYNTHIA[24]isasyntheticurbanscenedataset.WepickitssubsetSYNTHIA-RAND-CITYSCAPES,whichshares16semanticclasseswithCityscapes,asthesourcedomain.Intotal,9,400imagesfromSYNTHIAdatasetareusedassourcedomaintrainingdataforthetask.GTA5GTA5dataset[23]isanothersyntheticdatasetsharing19seman-ticclasseswithCityscapes.24,966urbansceneimagesarecollectedfromaphysically-basedrenderedvideogameGrandTheftAutoV(GTAV)andareusedassourcetrainingdata. 10H.Wangetal.Table2:ExperimentalresultsforSYNTHIA!Cityscapes. SYNTHIA!Cityscapes Backbone Method RoadSWBuildWallFencePoleTLTSVeg.SkyPRRiderCarBusMotorBike mIoU mIoU* VGG-16 FCNsinthewild[16] 11.519.630.84.40.020.30.111.742.368.751.23.854.03.20.20.6 20.2 22.9 CDA[34] 65.226.174.90.10.510.73.53.076.170.647.18.243.220.70.713.1 29.0 34.8 ST[38] 0.214.553.81.60.018.90.97.872.280.348.16.367.74.70.24.5 23.9 27.8 CBST[38] 69.628.769.512.10.125.411.913.682.081.949.114.566.06.63.732.4 35.4 36.1 AdaptSegNet[30] 78.929.275.5---0.14.872.676.743.48.871.116.03.68.4 - 37.6 SIBAN[19] 70.125.780.9---3.87.272.380.543.35.073.316.01.73.6 - 37.2 CLAN[20] 80.430.774.7---1.48.077.179.046.58.973.818.22.29.9 - 39.3 AdaptPatch[31] 72.629.577.23.50.421.01.47.973.379.04
12 5.714.569.419.67.416.5 33.7 39.6 ADVENT[
5.714.569.419.67.416.5 33.7 39.6 ADVENT[32] 67.929.471.96.30.319.90.62.674.974.935.49.667.821.44.115.5 31.4 36.6 Sourceonly 10.014.752.44.20.120.93.56.574.377.544.94.964.021.64.26.4 25.6 29.6 Baseline(feat.only)[30] 63.626.867.33.80.321.51.07.476.176.540.511.262.119.45.313.2 31.0 36.2 FADA 80.435.980.92.50.330.47.922.381.883.648.916.877.731.113.517.9 39.5 46.0 ResNet-101 SIBAN[19] 82.524.079.4---16.512.779.282.858.318.079.325.317.625.9 - 46.3 AdaptSegNet[30] 84.342.777.5---4.77.077.982.554.321.072.332.218.932.3 - 46.7 CLAN[20] 81.337.080.1---16.113.778.281.553.421.273.032.922.630.7 - 47.8 AdaptPatch[31] 82.438.078.68.70.626.03.911.175.584.653.521.671.432.619.331.7 40.0 46.5 ADVENT[32] 85.642.279.78.70.425.95.48.180.484.157.923.873.336.414.233.0 41.2 48.0 Sourceonly 55.623.874.69.20.224.46.112.174.879.055.319.139.623.313.725.0 33.5 38.6 Baseline(feat.only)[30] 62.421.976.311.50.124.911.711.475.380.953.718.559.713.720.624.0 35.4 40.8 FADA 84.540.183.14.80.034.320.127.284.884.053.522.685.443.726.827.8 45.2 52.5 4.2EvaluationMetricsThemetricsforevaluatingouralgorithmisconsistentwiththecommonsemanticsegmentationtask.Specically,wecomputePSACALVOCintersection-over-union(IoU)[11]ofourpredictionandthegroundtruthlabel.WehaveIoU=TP TP+FP+FN,whereTP,FPandFNarethenumbersoftruepositive,falsepositiveandfalsenegativepixelsrespectively.InadditiontotheIoUforeachclass,amIoUisalsoreportedasthemeanofIoUsoverallclasses.4.3ImplementationDetailsOurpipelineisimplementedbyPyTorch[22].Forfaircomparison,weemployDeeplabV2[4]withVGG-16[28]andResNet-101[14]asthesegmentationbasenetworks.Allmodelsarepre-trainedonImageNet[10].Forthene-graineddiscriminator,weadoptasimplestructureconsistingof3convolutionlayerswithchannelnumbersf256;128;2Kg,33kernels,andstrideof1.EachconvolutionlayerisfollowedbyaLeaky-ReLU[21]param
13 eterizedby0.2exceptforthelastlayer.Totra
eterizedby0.2exceptforthelastlayer.Totrainthesegmentationnetwork,weusetheStochasticGradientDescent(SGD)optimizerwherethemomentumis0.9andtheweightdecayis104.Thelearningrateisinitiallysetto2:5104andisdecreasedfollowinga`poly'learningratepolicywithpowerof0.9.Fortrainingthediscriminator,weadopttheAdamoptimizerwith1=0:9,2=0:99andtheinitiallearningrateas104.Thesame'poly'learningratepolicyisused.advisconstantlysetto0.001.TemperatureTissetas1.8forallexperiments. ClassesMatter:AFine-grainedAdversarialApproach11Table3:ExperimentalresultsforGTA5!Cityscapes. GTA5!Cityscapes Backbone Method RoadSWBuildWallFencePoleTLTSVeg.TerrainSkyPRRiderCarTruckBusTrainMotorBike mIoU VGG-16 FCNsinthewild[16] 70.432.462.114.95.410.914.22.779.221.364.644.14.270.48.07.30.03.50.0 27.1 CDA[34] 74.922.071.76.011.98.416.311.175.713.366.538.09.355.218.818.90.016.814.6 28.9 ST[38] 83.817.472.114.62.916.516.06.881.424.247.240.77.671.710.27.60.511.10.9 28.1 CBST[38] 90.450.872.018.39.527.228.614.182.425.170.842.614.576.95.912.51.214.028.6 36.1 CyCADA[15] 85.237.276.521.815.023.822.921.580.531.360.750.59.076.917.128.24.59.80.0 35.4 AdaptSegNet[30] 87.329.878.621.118.222.521.511.079.729.671.346.86.580.123.026.90.010.60.3 35.0 SIBAN[19] 83.413.077.820.417.524.622.89.681.329.677.342.710.976.022.817.95.714.22.0 34.2 CLAN[20] 88.030.679.223.420.526.123.014.881.634.572.045.87.980.526.629.90.010.70.0 36.6 AdaptPatch[31] 87.335.779.532.014.521.524.813.780.432.070.550.516.981.020.828.14.115.54.1 37.5 ADVENT[32] 86.928.778.728.525.217.120.310.980.026.470.247.18.481.526.017.218.911.71.6 36.1 Sourceonly 35.413.272.116.711.620.722.513.176.07.666.141.119.069.815.216.30.016.24.7 28.3 Baseline(feat.only)[30] 85.722.877.624.810.622.219.710.879.727.864.841.518.479.719.921.80.516.24.2 34.1 FADA 92.351.183.733.129.128.52
14 8.021.082.632.685.355.228.883.524.437.40
8.021.082.632.685.355.228.883.524.437.40.021.115.2 43.8 ResNet-101 AdaptSegNet[30] 86.536.079.923.423.323.935.214.883.433.375.658.527.673.732.535.43.930.128.1 42.4 SIBAN[19] 88.535.479.526.324.328.532.518.381.240.076.558.125.882.630.334.43.421.621.5 42.6 CLAN[20] 87.027.179.627.323.328.335.524.283.627.474.258.628.076.233.136.76.731.931.4 43.2 AdaptPatch[31] 92.351.982.129.225.124.533.833.082.432.882.258.627.284.333.446.32.229.532.3 46.5 ADVENT[32] 89.433.181.026.626.827.233.524.783.936.778.858.730.584.838.544.51.731.632.4 45.5 Sourceonly 65.016.168.718.616.821.331.411.283.022.078.054.433.873.912.730.713.728.119.7 36.8 Baseline(feat.only)[30] 83.727.675.520.319.927.428.327.479.028.470.155.120.272.922.535.78.320.623.0 39.3 FADA 92.547.585.137.632.833.433.818.485.337.783.563.239.787.532.947.81.634.939.5 49.2 FADA-MST 91.050.686.043.429.836.843.425.086.838.387.464.038.085.231.646.16.525.437.1 50.1 Regardingthetrainingprocedure,thenetworkisrsttrainedonsourcedatafor20kiterationsandthenne-tunedusingourframeworkfor40kiterations.Thebatchsizeiseight.Fouraresourceimagesandtheotherfouraretargetimages.Somedataaugmentationsareusedincludingrandom ipandcolorjitteringtopreventovertting.Althoughourmodelisalreadyabletoachievenewstate-of-the-artresults,wefurtherboosttheperformancebyusingselfdistillation[1,12,33]andmulti-scaletesting.AdetailedablationstudyisconductedinSection4.5torevealtheeectofeachcomponent,which,wehope,couldprovidemoreinsightsintothetopic.4.4ComparisonwithState-of-the-artMethodsSmallshift:Crosscityadaptation.Adaptationbetweenrealimagesfromdierentcitiesisascenariowithgreatpotentialforpracticalapplications.Table1showstheresultsofdomainadaptationonCityscapes!Cross-Citydataset.Ourmethodhasdierentperformancegainsforthefourcities.Onaverageoverfourcities,ourFADAachieves8.
15 5%improvementcomparedwiththesource-onlyb
5%improvementcomparedwiththesource-onlybaselines,and2.25%gaincomparedwiththepreviousbestmethod.Largeshift:Synthetictorealadaptation.Table2and3demonstratethesemanticsegmentationperformanceonSYNTHIA!CityscapesandGTA5!Cityscapestasksincomparisonwithexistingstate-of-the-artdomainadaptationmethods.WecouldobservethatourFADAoutperformstheexistingmethodsbyalargemarginandobtainnewstate-of-the-artperformanceintermsofmIoU.Comparedtothesourcemodelwithoutanyadaptation,againof16.4%and13.9%areachievedforVGG16andResNet101respectivelyonSYNTHIA! 12H.Wangetal. Fig.4:Quantitativeanalysisofthefeaturejointdistributions.Foreachclass,weshowtheClassCenterDistanceasdenedinEquation9.OurFADAshowsabetteralignedstructureinclass-levelcomparedwithotherstate-of-the-artmethods.Cityscapes.FADAalsoobtains15.5%and12.4%improvementondierentbae-linesforGTA5!Cityscapestask.Besides,comparedtothestate-of-the-artfeature-levelmethods,ageneralimprovementofover4%iswitnessed.Notethatasmentionedin[34],the\train"imagesinCityscapesaremorevisuallysimilartothe\bus"inGTA5insteadofthe\train"inGTA5,whichisalsoachallengetoothermethods.QualitativeresultsforGTA5!CityscapestaskarepresentedatFigure5,re ectingthatFADAalsobringsasignicantvisualimprovement.4.5FeaturedistributionToverifywhetherourne-grainedadversarialframeworkalignsfeaturesonaclass-level,wedesignanexperimenttoinvestigatetowhatdegreetheclass-levelfeaturesarealigned.Consideringdierentnetworksmapfeaturestodierentfeaturespaces,it'snecessarilytondastablemetric.CLAN[20]suggeststouseaClusterCenterDistance,whichisdenedastheratioofintra-classdistancebetweenthetrainedmodelandtheinitialmodel,tomeasureclass-levelalignmentdegree.Tobetterevaluatetheeectivenessofclass-levelfeaturealignmentonthesamescale,weproposetomodifytheClusterCenterDistancetotheClassCenter
16 Distance(CCD)bytakinginter-classdistance
Distance(CCD)bytakinginter-classdistanceintoaccount.TheCCDforclassiisdenedasfollows:CCD(i)=1 K1KXj=1;j6=i1 jSijPx2Sikxik2 kijk2(9)whereiistheclasscenterforclassi,Siisthesetofallfeaturesbelongingtoclassi.WithCCD,wecouldmeasuretheratioofintra-classcompactnessoverinter-classdistance.AlowCCDsuggeststhefeaturesofsameclassareclus-tereddenselywhilethedistancebetweendierentclassesisrelativelylarge.Werandomlypick2,000sourcesamplesand2,000targetsamplesrespectively,and ClassesMatter:AFine-grainedAdversarialApproach13Table4:Ablationstudiesofeachcomponent.F-Advreferstone-grainedadver-sarialtraining;SDreferstoselfdistillation;MSTreferstomulti-scaletesting. F-AdvSDMSTmIoU 36.8X46.9XX49.2XXX50.1 comparetheCCDvalueswithotherstate-of-the-artmethods:AdaptSegNetforglobalalignmentandCLANforclass-wisealignmentwithoutexplicitlymodel-ingtheclassrelationship.AsshownintheFigure4,FADAachievesamuchlowerCCDonmostclassesandgetthelowestmeanCCDvalue1.1comparedtootheralgorithms.WithFADA,wecanachievebetterclass-levelalignmentandpreserveconsistentclassstructuresbetweendomains.4.6AblationstudiesAnalysisofdierentcomponents.Table4presentstheimpactofeachcomponentonDeeplabV2withResNet-101onGTA5!Cityscapestask.Thene-grainedadversarialtrainingbringsanimprovementof10.1%,whichalreadymakesitthenewstateoftheart.Tofurtherexplorethepotentialofthemodel,theselfdistillationstrategyleadstoanimprovementof2.3%andmulti-scaletestingfurtherbooststheperformanceby0.7%.Hardlabelsvs.Softlabels.AsdiscussedinSection3.3,theknowledgeextractedfromtheclassierCcouldbeproducedfromhardlabelsorsoftlabels.HerewecomparethesetwoformsoflabelonGTA5!CityscapesandSYN-THIA!CityscapestaskswithDeeplabV2ResNet-101.Forsoftlabels,weuse"condenceclipping\withthrehold0.9asregularization.Forhardlabels,weonlykeeph
17 igh-condencesamples,whileignoringth
igh-condencesamples,whileignoringthesampleswithcondencelowerthan0.9.TheresultsarereportedinTable5.Bothchoicesgivegreatboosttothebaselineglobalfeaturealignmentmodel.Weobservethatsoftlabelisamore exiblechoiceandpresentmoresuperiorperformance.ImpactofCondenceClipping.Inourexperiments,weuse"condenceclipping"asaregularizertopreventoverttingonnoisysoftlabels.Theval-uesofthecondencearetruncatedbyagiventhreshold,thereforethevaluesarenotencouragedtoheavilyttoacertainclass.WetestseveralthresholdsandtheresultsareshowninTable6.Notethatwhenthethresholdis1.0,itmeansnoregularizationisused.Weobserveconstantperformancegainusingthecondenceclipping.Thebestresultisfoundwhenthethresholdis0.9.5ConclusionInthispaper,weaddresstheproblemofdomainadaptivesemanticsegmenta-tionbyproposingane-grainedadversarialtrainingframework.Anovelne- 14H.Wangetal.Table5:Comparisonofdierentstrategiesforextractingclass-levelknowledgeonGTA5!CityscapesandSYNTHIA!Cityscapestasks. GTA5SYNTHIA baseline[30]39.435.4hardlabels45.740.8softlabels46.941.5 Table6:In uenceofthresholdforcondenceclipping. GTA5!Cityscapes threshold0.70.80.91.0mIoU46.246.346.945.7 graineddiscriminatorisdesignedtonotonlydistinguishdomains,butalsocap-turecategory-levelinformationtoguideane-grainedfeaturealignment.Thebi-narydomainlabelsusedtosupervisethediscriminatoraregeneralizedtodomainencodingscorrespondinglytoincorporateclassinformation.Comprehensiveex-perimentsandanalysisvalidatetheeectivenessofourmethod.Ourmethodachievesnewstate-of-the-artresultsonthreepopulartasks,outperformingothermethodsbyalargemargin. ImageBeforeadaptationAfteradaptationGround-truthFig.5:QualitativesegmentationresultsforGTA5!Cityscapes.Acknowledgement:ThisworkwaspartiallysupportedbyBeijingAcademyofArticialIntelligence(BAAI