ABC 12Conv64 12Conv64 12Conv6432BasicBlock64 12MaxPool 12Conv12842BasicBlock128 12Conv128 12MaxPool62BasicBlock256 12MaxPool 22Conv25632BasicBlock512 1 ID: 822463
Download Pdf The PPT/PDF document "InvariantInformationClusteringforUnsuper..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
InvariantInformationClusteringforUnsuper
InvariantInformationClusteringforUnsupervisedImageClassicationandSegmentation:SupplementaryMaterialXuJiUniversityofOxfordxuji@robots.ox.ac.ukJoĆ£oF.HenriquesUniversityofOxfordjoao@robots.ox.ac.ukAndreaVedaldiUniversityofOxfordvedaldi@robots.ox.ac.uk1.ReleaseWeimplementedIICinPyTorch[7].Thecode,datasetsandtrainedmodelshavebeenreleased.github.com/xu-ji/IIC2.FurtherexperimentaldetailsWeusedthreegenericCNNbasesbacrossourexperi-ments:A(ResNet34[5]),B(4convolutionallayers)andC(6convolutionallayers).Fordetailsseetable1.Seeta-ble2forper-experimentdetailsincludingb,batchsize,in-putchannels,inputsize,andnumberofclustersusedinoverclusteringdenotedbyk.Recallthelatterreferstothesoleoutputheadforsemi-supervisedoverclusteringbuttotheauxiliaryheadforunsupervisedIIC,wherethemainheadproducesoutputwithdimensionalitykgt.Forseg-mentation,bilinearresamplingisusedtoresizethenetworkoutputbacktoinputsizeforimplementationalsimplicity.SincethereisonepoolinglayerinnetworkCwhichhalvesspatialsize,thisisbyafactorof2.ABC1Conv@641Conv@641Conv@643BasicBlock@641MaxPool1Conv@1284BasicBlock@1281Conv@1281MaxPool6BasicBlock@2561MaxPool2Conv@2563BasicBlock@5121Conv@2562Conv@5121AvgPool1MaxPool1Conv@512Table1:Architecturebasesb,showinglayertypeandoutputchannels.Poolinglayersdonotchangechannelsize.Convolutionallayershaveltersize3or5andstride1or2.ThemodelsusedarestandardResNetandVGG-stylenetworks.Implementationsaregiveninthecode.3.Semi-supervisedoverclusteringstudyPaperg.6containsaccuraciesnormalisedbydividingbythemaximumaccuracyforeachseries.Theabsoluteaccuraciesaregivenintable3andtable4.bnhrkinkgtkcropsize(s)inputsizeIICSTL10A70055210706464CIFAR10A66053210702032CIFAR100-20A1000552201402032MNISTB700551105016,20,2424COCO-Stuff-3C120115315128128COCO-StuffC601151545128128Potsdam-3C75114324200200PotsdamC60114636200200IIC*STL10A1400552101406464CIFAR10A1320532101402032CIFAR100-20B2800555202802024MNISTB350551102516,20,2424COCO-Stuff-3C180115315128128COCO-StuffC901151545128128Potsdam-3C7511439200200PotsdamC60114624200200Table2:IICdenotesunsupervisedclustering,IIC*denotessemi-supervisedoverclustering.ndenotesbatchsize,handrdenotenumberofsub-headsandsamplerepeats(seepapersection4.1),kindenotesinputchannels(1forgreyscale,2forSobelltered,4forRGBIR,5forSobell-teredwithRGB),kgtdenotesnumberofgroundtruthclusters,kdenotesnumberofoutputchannelsforoverclustering.COCO-StuffandCOCO-Stuff-3arescaledby0.33priortocropping;croppedimagesarescaledtonalinputsizewithbilinearresampling.4.RenderingpredictionsTogen
eratethevisualisationinpaperg.3,thee
eratethevisualisationinpaperg.3,theentireMNISTdatasetwasrunthrougheachnetworksnapshot.Thepredictionforeachimagex,sayz=(x)2[0;1]CforCclasses(seepapersection3.1),wasrenderedasapointwithcoordinatepositionp:p=hCXc=1zcsin2cC;CXc=1zccos2cCi:STL10CIFAR10CIFAR100-20CIFAR100MNIST%ofmaxkkACCkACCkACCkACCkACC10014063.114065.028034.7100020.310098.6507061.47062.214033.150020.35098.6253559.73560.57030.025019.12598.712.51854.81853.73525.712515.01397.9Table3:Absoluteaccuracyforsemi-supervisedoverclusteringexperimentsinpaperg.6-right.1STL101.00.50.250.10.01%ofmaxknaACCnaACCnaACCnaACCnaACC100500063.1250061.0125058.650052.45025.550500061.4250059.8125059.150057.85030.725500059.7250059.2125058.550057.65044.112.5500054.8250054.8125054.150050.65041.31.00.50.250.10.01naACCnaACCnaACCnaACCnaACCSTL10500063.1250061.0125058.650052.45025.5CIFAR105000062.92500062.71250062.6500062.050053.9CIFAR100-205000034.52500034.01250033.6500031.950020.1CIFAR1005000020.32500019.21250017.9500015.15007.43MNIST-256000098.93000098.91500098.9600098.960098.9Table4:Absoluteaccuracyforsemi-supervisedoverclusteringexperimentsinpaperg.6-left(top)andg.6-center(bottom).nadenotesnumberoflabelsusedtondmappingfromoutputktokgtforevaluation.5.OptionalentropycoefcientConsiderinsertingacoefcient,1,intothedeni-tionofmutualinformation(eq.3,papersection3.1):I(z;z0)=CXc=1CXc0=1Pcc0lnPcc0PcPc0(1)=I1(z;z0)+(1)(H(z)+H(z0)):(2)For=1,thisreducestothestandardmutualinforma-tiondenition.However,insertinganexponentof1intothedenominatorof(1)translatesintoprioritisingthemaximisationofpredictionentropy(2).6.Expectationoverallshiftst2TRecallthatIICforsegmentationinvolvesmaximisingmutualinformationbetweenapatchandallitsneighbourswithinlocalboxgivenbyT(papersection3.3).Analterna-tiveformulationofpapereq.(5)wouldinvolvebringingtheexpectationoverTwithinthecomputationforinformationasfollows:maxI(P);P=1njTjjGjjjnXi=1Xt2TXg2GConvolutionz}|{Xu2u(xi)[g1(gxi)]u+t:Wefoundpapereq.(5)toworkmarginallybutconsistentlybetter,forexampleby0.1%forCOCO-Stuff-3and0.02%forPotsdam-3.Thisislikelybecausecloserneighboursaremoreinformativethanfartherones,andanexternalexpec-tationavoidsentanglingthesignalbetweencloseandfarneighbourspriortocomputingmutualinformation.7.RandomtransformationsgHorizontalipping,randomcropsa
ndrandomcolourchangesinhue,saturationand
ndrandomcolourchangesinhue,saturationandbrightnessconstitutethegusedinmostofourexperiments.Wealsotriedrandomafnetransformsbutfoundourmodelsperformedbetterwithoutthem,asthepresenceofskewandscalingmate-riallyaffectedthenetwork'sabilitytodistillvisualcorre-spondencesbetweenpairsofimages.8.DatasetsizesForthesizesofthetrainingandtestingsetsusedinourexperiments,seetable5andtable6.STL10CIFAR10CIFAR100-20MNISTTrainTestTrainTestTrainTestTrainTestIIC113k13k60k60k60k60k70k70kSemi-supervised105k8k50k10k50k10k60k10kTable5:Datasetsforimageclustering.COCO-Stuff-15COCO-Stuff-3Potsdam-6Potsdam-3TrainTestTrainTestTrainTestTrainTestIIC518045180436660366608550540085505400Semi-supervised49629217535228143276958557695855Table6:Datasetsforsegmentation.9.BaselineexperimentsDeepCluster[1],alsooriginallyimplementedinPy-Torch,wasadaptedfromthereleasedimageclusteringcodeforbothpurelyunsupervisedimageclusteringandsegmen-tation.Sincethisisnottheintendedtaskforthemethod,DeepClusterwasusedasafeaturelearner,withk-meansperformedonlearnedfeaturerepresentationsinordertoob-tainclusterassignmentsforevaluation.DataaugmentationtransformsareusedaswithIIC,thesamebasIICisusedforeachmodel'sfeaturerepresentation,andthenumberofoutputclustersissetto10kgtassuggestedbythepa-per.Thefeaturedescriptorlengthsrangefrom4096(imageclustering)to512(segmentation).Forimageclustering,thek-meansproceduresattrainingandtesttimearebothtrainedandevaluatedonthefulltrainingandtestsetsrespectively.Forsegmentation,sincealldescriptorsforthetrainingsetcannottinRAM(needednotonlyfortheimplementationofk-means,butalsoforthePCAdimensionalityreduction)itwasnecessarytousesamplingfork-meansbothduringcomputationofthepseudolabelsfortraining,andevalua-tion.Thiswasdonewith10Mand50MsamplesforPots-dam*andCOCO-Stuff*datasetsrespectively.Oncethek-meanscentroidswereobtained,trainingstilloccuredoverPlaneBirdCarCatDeerDogHorseMonkeyShipTruckFigure1:Additionalunsupervisedclustering(IIC)resultsonSTL10.Predictedclusterprobabilitiesshownasbars.Predictioncorrespondstotallest,groundtruthisgreen,incorrectlypredictedclassesarered,andallothersareblue.Thebottomrowshowsfailurecases.PlaneBirdCarCatDeerDogHorseMonkeyShipTruckFigure2:Semi-supervisedoverclusteringresultsonSTL10.Predictedclusterprobabilitiesshownasbars.Predictioncorrespondstotallest,groundtruthisgreen,incorrectlypredictedclassesarered,andallothersareblue.Thebottomrowshowsfailurecases.Figure3:Additionalunsupervisedsegmentation(IIC)resultsonCOCO-Stuff-3(non-stuffpixelsinblack).Lefttorightforeachtriplet:image,prediction,groundtr
uth.Figure4:Additionalsemi-supervisedcl
uth.Figure4:Additionalsemi-supervisedclusteringforsegmentationresultsonCOCO-Stuff-3(non-stuffpixelsinblack).Lefttorightforeachtriplet:image,prediction,groundtruth.theentiretrainingsetwithaccuracycomputedovertheen-tiretestset.Forthesemi-supervisedexperiment,netuningofthelearnedrepresentationwasused,aswithIIC.ADC[4],originallyimplementedinTensorFlow,wasadaptedfromthereleasedcodeforimageclusteringonly.ForthefullyunsupervisedCIFAR100-20experiment(pa-pertable1),sinceADCwasalreadyimplementedforCI-FAR100,weadoptedtheexistingarchitectureandtrain-ingsettingsforCIFAR100whentrainingCIFAR100-20.Similarly,weadoptedtheexistingarchitectureandsettingsincludedforSTL10forthesemi-supervisedexperiment,traininganSVMontopofxedfeaturesasthisisthesemi-supervisedimplementationprovidedintheircode.Triplets[8]wasimplementedasarepresentationlearnerbysettingthepositiveexampleforeachimagetobeitsran-domtransform,usingthesametransformationsastheIICexperimentsforfairness.Thenegativeexampleforeachim-agewassettoarandomlyselectedimage.K-meanswasrunFigure5:AdditionalsegmentationresultsforunsupervisedIICandsemi-supervisedoverclusteringonPotsdam-3.Lefttorightforeachquadruplet:image,IICprediction,semi-supervisedoverclusteringprediction,groundtruth.onthelearnedembeddingstoobtainclusterassignments.ForsegmentationbaselinesIsola[6]andDoersch[3],whichareunsupervisedfeaturelearningmethodswithoutsegmentationcode,weuseourownimplementation.Sincebothoperatebypredictingthespatialrelationshipbetweenpairsofpatches(spatialproximityandexactrelativeposi-tionrespectively),weadaptedthemtosegmentationbyran-domlysamplingpairsfromthedensefeaturesproducedbyb(whichareeithercloseorfarforIsola,forexample),usingadditionallinearlayerstopredictthespatialrelationship,minimisingthedistancebetweenthispredictionandknowngroundtruth,andbackpropagatinggradientsend-to-end.10.FurtherdiscussionofevaluationForimageclusteringweuseh=5sub-heads,whicharedifferentrandomlyinitialisedinstantiationsofthesame-nallayer,forincreasedrobustness(papersection4.1).Weevaluatetheaccuracyofthemodelbyidentifyingthesub-headwiththelowesttraininglossonthemainoutputheadandreportingitstestsetaccuracy.Thisevaluationproce-duremakesnouseoflabelsatallasthesub-headisselectedusingtheunsupervisedloss.Intable7,weadditionallyreportthetestsetaccuracyofthesub-headwiththehigh-esttrainingsetaccuracy,whichillustratesthemaximumachievedscorebythesub-heads.ForIIC,sincethetrainingandtestsetsarethesame(unlabelleddatanotwithstanding),thelatterisequivalenttorunninganidenticaltrainingpro-cedurehtimes,selectingadifferentsub-headeachtime,andreportingbestperformanceout
oftheseruns.STL10CIFAR10CFR100-20MNIST
oftheseruns.STL10CIFAR10CFR100-20MNISTIIC(bestsub-head)61.061.725.799.3Table7:Imageclusteringaccuracyofbestsub-head.11.FurtherdiscussionofavoidingdegeneracyHerewecontinuethediscussiononhowIICavoidsun-desirablesolutions(papersection3.1).Recallthatmax-imisingmutualinformationentailsmaximisingentropyandminimisingconditionalentropy.Considerthemalevolentcaseofallimagesbeingassignedtheambivalentpredic-tion[1C;:::;1C]forCclusters.Forexample,forC=2,ifforbothdatapointsinallpairsthemodelpredicts[0:5;0:5],thenP=[[0:25;0:25];[0:25;0:25]]T.ConditionalentropyH(zjz0)wouldnotbeminimisedto0inthiscase,asthepredictionsarenotdeterministic(one-hot)[2].ToachieveH(zjz0)=0,predictionswouldneedtobe[0,1]or[1,0].ThusasIICminimisesH(zjz0),itavoidsambivalentsolu-tions.Ontheotherhand,ifallimagesareassignedtothesameclusteri,thejointdistributionwouldbeall-zeroexceptatPii=1.Likewise,themarginalswouldbeall-zeroexceptatPi=1.EntropyH(z)wouldnotbemaximised,andindeedwouldbeminimisedtoH(z)=0as:H(z)=CXc=1PclnPc:(3)ThusasIICmaximisesH(z),italsoavoidsthisdegeneratesolution.12.AdditionalexamplesFormoreexamplesofimageclusteringandsegmenta-tionresultsforbothunsupervisedIICandsemi-supervisedoverclustering,seeg.1,g.2,g.3,g.4andg.5.References[1]MathildeCaron,PiotrBojanowski,ArmandJoulin,andMatthijsDouze.Deepclusteringforunsupervisedlearningofvisualfeatures.arXivpreprintarXiv:1807.05520,2018.2[2]ThomasMCoverandJoyAThomas.Entropy,relativeen-tropyandmutualinformation.Elementsofinformationthe-ory,2:155,1991.4[3]CarlDoersch,AbhinavGupta,andAlexeiAEfros.Unsuper-visedvisualrepresentationlearningbycontextprediction.InProceedingsoftheIEEEInternationalConferenceonCom-puterVision,pages14221430,2015.4[4]PhilipHaeusser,JohannesPlapp,VladimirGolkov,ElieAljal-bout,andDanielCremers.Associativedeepclustering:Train-ingaclassicationnetworkwithnolabels.InGermanCon-ferenceonPatternRecognition,pages1832.Springer,2018.3[5]KaimingHe,XiangyuZhang,ShaoqingRen,andJianSun.Deepresiduallearningforimagerecognition.InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecog-nition,pages770778,2016.1[6]PhillipIsola,DanielZoran,DilipKrishnan,andEdwardHAdelson.Learningvisualgroupsfromco-occurrencesinspaceandtime.arXivpreprintarXiv:1511.06811,2015.4[7]AdamPaszke,SamGross,SoumithChintala,GregoryChanan,EdwardYang,ZacharyDeVito,ZemingLin,AlbanDesmaison,LucaAntiga,andAdamLerer.Automaticdiffer-entiationinpytorch.InNIPS-W,2017.1[8]MatthewSchultzandThorstenJoachims.Learningadistancemetricfromrelativecomparisons.InAdvancesinneuralin-formationprocessingsystems,pages4148,2004.