/
InvariantInformationClusteringforUnsupervisedImageClassicationandS InvariantInformationClusteringforUnsupervisedImageClassicationandS

InvariantInformationClusteringforUnsupervisedImageClassicationandS - PDF document

blanko
blanko . @blanko
Follow
343 views
Uploaded On 2020-11-23

InvariantInformationClusteringforUnsupervisedImageClassicationandS - PPT Presentation

ABC 12Conv64 12Conv64 12Conv6432BasicBlock64 12MaxPool 12Conv12842BasicBlock128 12Conv128 12MaxPool62BasicBlock256 12MaxPool 22Conv25632BasicBlock512 1 ID: 822463

conv acc train stuff acc conv stuff train 500 test 5000 iic 60k 1250 2500 150 basicblock maxpool cxc

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "InvariantInformationClusteringforUnsuper..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

InvariantInformationClusteringforUnsuper
InvariantInformationClusteringforUnsupervisedImageClassicationandSegmentation:SupplementaryMaterialXuJiUniversityofOxfordxuji@robots.ox.ac.ukJoĆ£oF.HenriquesUniversityofOxfordjoao@robots.ox.ac.ukAndreaVedaldiUniversityofOxfordvedaldi@robots.ox.ac.uk1.ReleaseWeimplementedIICinPyTorch[7].Thecode,datasetsandtrainedmodelshavebeenreleased.github.com/xu-ji/IIC2.FurtherexperimentaldetailsWeusedthreegenericCNNbasesbacrossourexperi-ments:A(ResNet34[5]),B(4convolutionallayers)andC(6convolutionallayers).Fordetailsseetable1.Seeta-ble2forper-experimentdetailsincludingb,batchsize,in-putchannels,inputsize,andnumberofclustersusedinoverclusteringdenotedbyk.Recallthelatterreferstothesoleoutputheadforsemi-supervisedoverclusteringbuttotheauxiliaryheadforunsupervisedIIC,wherethemainheadproducesoutputwithdimensionalitykgt.Forseg-mentation,bilinearresamplingisusedtoresizethenetworkoutputbacktoinputsizeforimplementationalsimplicity.SincethereisonepoolinglayerinnetworkCwhichhalvesspatialsize,thisisbyafactorof2.ABC1Conv@641Conv@641Conv@643BasicBlock@641MaxPool1Conv@1284BasicBlock@1281Conv@1281MaxPool6BasicBlock@2561MaxPool2Conv@2563BasicBlock@5121Conv@2562Conv@5121AvgPool1MaxPool1Conv@512Table1:Architecturebasesb,showinglayertypeandoutputchannels.Poolinglayersdonotchangechannelsize.Convolutionallayershaveltersize3or5andstride1or2.ThemodelsusedarestandardResNetandVGG-stylenetworks.Implementationsaregiveninthecode.3.Semi-supervisedoverclusteringstudyPaperg.6containsaccuraciesnormalisedbydividingbythemaximumaccuracyforeachseries.Theabsoluteaccuraciesaregivenintable3andtable4.bnhrkinkgtkcropsize(s)inputsizeIICSTL10A70055210706464CIFAR10A66053210702032CIFAR100-20A1000552201402032MNISTB700551105016,20,2424COCO-Stuff-3C120115315128128COCO-StuffC601151545128128Potsdam-3C75114324200200PotsdamC60114636200200IIC*STL10A1400552101406464CIFAR10A1320532101402032CIFAR100-20B2800555202802024MNISTB350551102516,20,2424COCO-Stuff-3C180115315128128COCO-StuffC901151545128128Potsdam-3C7511439200200PotsdamC60114624200200Table2:IICdenotesunsupervisedclustering,IIC*denotessemi-supervisedoverclustering.ndenotesbatchsize,handrdenotenumberofsub-headsandsamplerepeats(seepapersection4.1),kindenotesinputchannels(1forgreyscale,2forSobelltered,4forRGBIR,5forSobell-teredwithRGB),kgtdenotesnumberofgroundtruthclusters,kdenotesnumberofoutputchannelsforoverclustering.COCO-StuffandCOCO-Stuff-3arescaledby0.33priortocropping;croppedimagesarescaledtonalinputsizewithbilinearresampling.4.RenderingpredictionsTogen

eratethevisualisationinpaperg.3,thee
eratethevisualisationinpaperg.3,theentireMNISTdatasetwasrunthrougheachnetworksnapshot.Thepredictionforeachimagex,sayz=(x)2[0;1]CforCclasses(seepapersection3.1),wasrenderedasapointwithcoordinatepositionp:p=hCXc=1zcsin2cC;CXc=1zccos2cCi:STL10CIFAR10CIFAR100-20CIFAR100MNIST%ofmaxkkACCkACCkACCkACCkACC10014063.114065.028034.7100020.310098.6507061.47062.214033.150020.35098.6253559.73560.57030.025019.12598.712.51854.81853.73525.712515.01397.9Table3:Absoluteaccuracyforsemi-supervisedoverclusteringexperimentsinpaperg.6-right.1STL101.00.50.250.10.01%ofmaxknaACCnaACCnaACCnaACCnaACC100500063.1250061.0125058.650052.45025.550500061.4250059.8125059.150057.85030.725500059.7250059.2125058.550057.65044.112.5500054.8250054.8125054.150050.65041.31.00.50.250.10.01naACCnaACCnaACCnaACCnaACCSTL10500063.1250061.0125058.650052.45025.5CIFAR105000062.92500062.71250062.6500062.050053.9CIFAR100-205000034.52500034.01250033.6500031.950020.1CIFAR1005000020.32500019.21250017.9500015.15007.43MNIST-256000098.93000098.91500098.9600098.960098.9Table4:Absoluteaccuracyforsemi-supervisedoverclusteringexperimentsinpaperg.6-left(top)andg.6-center(bottom).nadenotesnumberoflabelsusedtondmappingfromoutputktokgtforevaluation.5.OptionalentropycoefcientConsiderinsertingacoefcient,1,intothedeni-tionofmutualinformation(eq.3,papersection3.1):I(z;z0)=CXc=1CXc0=1Pcc0lnPcc0PcPc0(1)=I1(z;z0)+(�1)(H(z)+H(z0)):(2)For=1,thisreducestothestandardmutualinforma-tiondenition.However,insertinganexponentof�1intothedenominatorof(1)translatesintoprioritisingthemaximisationofpredictionentropy(2).6.Expectationoverallshiftst2TRecallthatIICforsegmentationinvolvesmaximisingmutualinformationbetweenapatchandallitsneighbourswithinlocalboxgivenbyT(papersection3.3).Analterna-tiveformulationofpapereq.(5)wouldinvolvebringingtheexpectationoverTwithinthecomputationforinformationasfollows:maxI(P);P=1njTjjGjjjnXi=1Xt2TXg2GConvolutionz}|{Xu2u(xi)[g�1(gxi)]�u+t:Wefoundpapereq.(5)toworkmarginallybutconsistentlybetter,forexampleby0.1%forCOCO-Stuff-3and0.02%forPotsdam-3.Thisislikelybecausecloserneighboursaremoreinformativethanfartherones,andanexternalexpec-tationavoidsentanglingthesignalbetweencloseandfarneighbourspriortocomputingmutualinformation.7.RandomtransformationsgHorizontalipping,randomcropsa

ndrandomcolourchangesinhue,saturationand
ndrandomcolourchangesinhue,saturationandbrightnessconstitutethegusedinmostofourexperiments.Wealsotriedrandomafnetransformsbutfoundourmodelsperformedbetterwithoutthem,asthepresenceofskewandscalingmate-riallyaffectedthenetwork'sabilitytodistillvisualcorre-spondencesbetweenpairsofimages.8.DatasetsizesForthesizesofthetrainingandtestingsetsusedinourexperiments,seetable5andtable6.STL10CIFAR10CIFAR100-20MNISTTrainTestTrainTestTrainTestTrainTestIIC113k13k60k60k60k60k70k70kSemi-supervised105k8k50k10k50k10k60k10kTable5:Datasetsforimageclustering.COCO-Stuff-15COCO-Stuff-3Potsdam-6Potsdam-3TrainTestTrainTestTrainTestTrainTestIIC518045180436660366608550540085505400Semi-supervised49629217535228143276958557695855Table6:Datasetsforsegmentation.9.BaselineexperimentsDeepCluster[1],alsooriginallyimplementedinPy-Torch,wasadaptedfromthereleasedimageclusteringcodeforbothpurelyunsupervisedimageclusteringandsegmen-tation.Sincethisisnottheintendedtaskforthemethod,DeepClusterwasusedasafeaturelearner,withk-meansperformedonlearnedfeaturerepresentationsinordertoob-tainclusterassignmentsforevaluation.DataaugmentationtransformsareusedaswithIIC,thesamebasIICisusedforeachmodel'sfeaturerepresentation,andthenumberofoutputclustersissetto10kgtassuggestedbythepa-per.Thefeaturedescriptorlengthsrangefrom4096(imageclustering)to512(segmentation).Forimageclustering,thek-meansproceduresattrainingandtesttimearebothtrainedandevaluatedonthefulltrainingandtestsetsrespectively.Forsegmentation,sincealldescriptorsforthetrainingsetcannottinRAM(needednotonlyfortheimplementationofk-means,butalsoforthePCAdimensionalityreduction)itwasnecessarytousesamplingfork-meansbothduringcomputationofthepseudolabelsfortraining,andevalua-tion.Thiswasdonewith10Mand50MsamplesforPots-dam*andCOCO-Stuff*datasetsrespectively.Oncethek-meanscentroidswereobtained,trainingstilloccuredoverPlaneBirdCarCatDeerDogHorseMonkeyShipTruckFigure1:Additionalunsupervisedclustering(IIC)resultsonSTL10.Predictedclusterprobabilitiesshownasbars.Predictioncorrespondstotallest,groundtruthisgreen,incorrectlypredictedclassesarered,andallothersareblue.Thebottomrowshowsfailurecases.PlaneBirdCarCatDeerDogHorseMonkeyShipTruckFigure2:Semi-supervisedoverclusteringresultsonSTL10.Predictedclusterprobabilitiesshownasbars.Predictioncorrespondstotallest,groundtruthisgreen,incorrectlypredictedclassesarered,andallothersareblue.Thebottomrowshowsfailurecases.Figure3:Additionalunsupervisedsegmentation(IIC)resultsonCOCO-Stuff-3(non-stuffpixelsinblack).Lefttorightforeachtriplet:image,prediction,groundtr

uth.Figure4:Additionalsemi-supervisedcl
uth.Figure4:Additionalsemi-supervisedclusteringforsegmentationresultsonCOCO-Stuff-3(non-stuffpixelsinblack).Lefttorightforeachtriplet:image,prediction,groundtruth.theentiretrainingsetwithaccuracycomputedovertheen-tiretestset.Forthesemi-supervisedexperiment,netuningofthelearnedrepresentationwasused,aswithIIC.ADC[4],originallyimplementedinTensorFlow,wasadaptedfromthereleasedcodeforimageclusteringonly.ForthefullyunsupervisedCIFAR100-20experiment(pa-pertable1),sinceADCwasalreadyimplementedforCI-FAR100,weadoptedtheexistingarchitectureandtrain-ingsettingsforCIFAR100whentrainingCIFAR100-20.Similarly,weadoptedtheexistingarchitectureandsettingsincludedforSTL10forthesemi-supervisedexperiment,traininganSVMontopofxedfeaturesasthisisthesemi-supervisedimplementationprovidedintheircode.Triplets[8]wasimplementedasarepresentationlearnerbysettingthepositiveexampleforeachimagetobeitsran-domtransform,usingthesametransformationsastheIICexperimentsforfairness.Thenegativeexampleforeachim-agewassettoarandomlyselectedimage.K-meanswasrunFigure5:AdditionalsegmentationresultsforunsupervisedIICandsemi-supervisedoverclusteringonPotsdam-3.Lefttorightforeachquadruplet:image,IICprediction,semi-supervisedoverclusteringprediction,groundtruth.onthelearnedembeddingstoobtainclusterassignments.ForsegmentationbaselinesIsola[6]andDoersch[3],whichareunsupervisedfeaturelearningmethodswithoutsegmentationcode,weuseourownimplementation.Sincebothoperatebypredictingthespatialrelationshipbetweenpairsofpatches(spatialproximityandexactrelativeposi-tionrespectively),weadaptedthemtosegmentationbyran-domlysamplingpairsfromthedensefeaturesproducedbyb(whichareeithercloseorfarforIsola,forexample),usingadditionallinearlayerstopredictthespatialrelationship,minimisingthedistancebetweenthispredictionandknowngroundtruth,andbackpropagatinggradientsend-to-end.10.FurtherdiscussionofevaluationForimageclusteringweuseh=5sub-heads,whicharedifferentrandomlyinitialisedinstantiationsofthesame-nallayer,forincreasedrobustness(papersection4.1).Weevaluatetheaccuracyofthemodelbyidentifyingthesub-headwiththelowesttraininglossonthemainoutputheadandreportingitstestsetaccuracy.Thisevaluationproce-duremakesnouseoflabelsatallasthesub-headisselectedusingtheunsupervisedloss.Intable7,weadditionallyreportthetestsetaccuracyofthesub-headwiththehigh-esttrainingsetaccuracy,whichillustratesthemaximumachievedscorebythesub-heads.ForIIC,sincethetrainingandtestsetsarethesame(unlabelleddatanotwithstanding),thelatterisequivalenttorunninganidenticaltrainingpro-cedurehtimes,selectingadifferentsub-headeachtime,andreportingbestperformanceout

oftheseruns.STL10CIFAR10CFR100-20MNIST
oftheseruns.STL10CIFAR10CFR100-20MNISTIIC(bestsub-head)61.061.725.799.3Table7:Imageclusteringaccuracyofbestsub-head.11.FurtherdiscussionofavoidingdegeneracyHerewecontinuethediscussiononhowIICavoidsun-desirablesolutions(papersection3.1).Recallthatmax-imisingmutualinformationentailsmaximisingentropyandminimisingconditionalentropy.Considerthemalevolentcaseofallimagesbeingassignedtheambivalentpredic-tion[1C;:::;1C]forCclusters.Forexample,forC=2,ifforbothdatapointsinallpairsthemodelpredicts[0:5;0:5],thenP=[[0:25;0:25];[0:25;0:25]]T.ConditionalentropyH(zjz0)wouldnotbeminimisedto0inthiscase,asthepredictionsarenotdeterministic(one-hot)[2].ToachieveH(zjz0)=0,predictionswouldneedtobe[0,1]or[1,0].ThusasIICminimisesH(zjz0),itavoidsambivalentsolu-tions.Ontheotherhand,ifallimagesareassignedtothesameclusteri,thejointdistributionwouldbeall-zeroexceptatPii=1.Likewise,themarginalswouldbeall-zeroexceptatPi=1.EntropyH(z)wouldnotbemaximised,andindeedwouldbeminimisedtoH(z)=0as:H(z)=�CXc=1PclnPc:(3)ThusasIICmaximisesH(z),italsoavoidsthisdegeneratesolution.12.AdditionalexamplesFormoreexamplesofimageclusteringandsegmenta-tionresultsforbothunsupervisedIICandsemi-supervisedoverclustering,seeg.1,g.2,g.3,g.4andg.5.References[1]MathildeCaron,PiotrBojanowski,ArmandJoulin,andMatthijsDouze.Deepclusteringforunsupervisedlearningofvisualfeatures.arXivpreprintarXiv:1807.05520,2018.2[2]ThomasMCoverandJoyAThomas.Entropy,relativeen-tropyandmutualinformation.Elementsofinformationthe-ory,2:1–55,1991.4[3]CarlDoersch,AbhinavGupta,andAlexeiAEfros.Unsuper-visedvisualrepresentationlearningbycontextprediction.InProceedingsoftheIEEEInternationalConferenceonCom-puterVision,pages1422–1430,2015.4[4]PhilipHaeusser,JohannesPlapp,VladimirGolkov,ElieAljal-bout,andDanielCremers.Associativedeepclustering:Train-ingaclassicationnetworkwithnolabels.InGermanCon-ferenceonPatternRecognition,pages18–32.Springer,2018.3[5]KaimingHe,XiangyuZhang,ShaoqingRen,andJianSun.Deepresiduallearningforimagerecognition.InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecog-nition,pages770–778,2016.1[6]PhillipIsola,DanielZoran,DilipKrishnan,andEdwardHAdelson.Learningvisualgroupsfromco-occurrencesinspaceandtime.arXivpreprintarXiv:1511.06811,2015.4[7]AdamPaszke,SamGross,SoumithChintala,GregoryChanan,EdwardYang,ZacharyDeVito,ZemingLin,AlbanDesmaison,LucaAntiga,andAdamLerer.Automaticdiffer-entiationinpytorch.InNIPS-W,2017.1[8]MatthewSchultzandThorstenJoachims.Learningadistancemetricfromrelativecomparisons.InAdvancesinneuralin-formationprocessingsystems,pages41–48,2004.

Related Contents


Next Show more