This work is licensed under a Creative Commons Attribution 4.0 License

1 - 20

melody's Recent Documents

2 keessa 1  MAQAA NAMA GAFAATU LAKKOOFSA WARAQAA EENYUMMAA
2 keessa 1 MAQAA NAMA GAFAATU LAKKOOFSA WARAQAA EENYUMMAA

BYE PROCESS DATE LWP BALANCE WBA EXT BATCH PR State of Washington - Employment Security Department FORMII GARGAARSA HOJ-DHABDUMMAA ITTIIN GAAFATANOFFICE USE ONLY ADDR CHANGE?____ LATE?____ Maqaa k

published 0K
Nano Res. Electronic Supplementary Material Nanocapsules of oxalate ox
Nano Res. Electronic Supplementary Material Nanocapsules of oxalate ox

Ming Zhao, Di Wu Materials www.editorialmanager.com/nare/default.asp Nano Res.assay. A typical procedure for the conjugation is described as follow; fluorescamine was dissolved in anhydrous DMSO to

published 0K
Investor Presentation
Investor Presentation

June 2020 Update Cautionary Note on Forward - Looking Statements This presentation contains forward - looking statements within the meaning of the federal securities laws . You can identify these sta

published 3K
David Waye
David Waye

Current Position: Center Manager, SelecTrucks of Memphis (operated by Lonestar Truck Group /TAG Truck Center) David oversees used truck sales for TAG Truck Center and works closely with Lonestar Tru

published 0K
OfficeBuilding(Reception:7thMarunouchi,1000005,6212550062125700yuwa.co
OfficeBuilding(Reception:7thMarunouchi,1000005,6212550062125700yuwa.co

PrimaryContactsmasaaki.sawano@city oko.maeda @ cit y .  City‐Yuwa provides Finance,SecuritiesandInsuranceMergersandAcquisitionsRealEstateBankruptcyandRestructuringLitigationIntellectualPropertyan

published 1K
Which
Which

. There is so much of it that I don

published 0K
HAVA Grant Instructions
HAVA Grant Instructions

. The State Vendor ID is assigned by the State Comptrollers Office when an entity receives funds from the state and is comprised of the federal ID number plus a few other digits. The mail code deter

published 1K
The Journal of Neuroscience, March 1991, 17(3): 641-649
The Journal of Neuroscience, March 1991, 17(3): 641-649

A Direct Demonstration of Functional Specialization in Human Visual Cortex S. Zeki,’ J. D. G. Watson,1r2-3 C. J. Lueck,4 K. J. Friston,* C. Kennard,4 and R. S. J. Frackowiak2,3 ‘Department

published 1K

Document on Subject : "This work is licensed under a Creative Commons Attribution 4.0 License"— Transcript:

1 This work is licensed under a Creative C
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/ACCESS.2019.2954936, IEEE Access Dateofpublicationxxxx00,0000,dateofcurrentversionxxxx00,0000.DigitalObjectIdentier10.1109/ACCESS.2017.DOIYOMO:YouOnlyMoveOnce:AnEfcientConvolutionalNeuralNetworkforFaceDetectionJIEXU,(Member,IEEE),YETIAN,HAOYUWU,BAOWENLUOJINHONGGUO,(Member,IEEE)SchoolofInformationandCommunicationEngineering,UniversityofElectronicScienceandTechnologyofChina,Chengdu,China,(e-mail:xuj@uestc.edu.cn;tianye222222@163.com;847194217@qq.com;guojinhong@uestc.edu.cn)DepartmentofComputerScience,UniversityofBristol,Bristol,UK(e-mail:haoyuwu1996@gmail.com)Correspondingauthor:JinhongGuon(e-mail:guojinhong@uestc.edu.cn).ThisworkwassupportedbyNationalKeyResearchandDevelopmentProgram(GrantNo.2016YFB0800105),SichuanProvinceScienticandTechnologicalSupportProject(GrantNos.2018GZ0255,2019YFG0191). ABSTRACTOur“YouOnlyMoveOnce”(YOMO)detectorbasedondepthwiseseparableconvolutionsisasinglestagefacedetectorthatbalancesaccuracyandlatency.YOMOperformsscale-invariantlybyutilizingtop-downarchitecturewithfeatureagglomeration,andmultipledetectionmodulesinsteadofinanimagepyramidapproach.Atthesametime,weproposeasemi-softrandomcroppingalgorithmthatenablesdifferentdetectionmoduleadequatelytrainedbydifferentscalesofsamples.SeveralexperimentsareconductedontheFDDBdatasetwithdiscreteandcontinuousmeasures,todemonstratethatthemethodshavestronglycompetitivenessresults.Afterusinganellipsesregressor,therecallratesreachedasatisfactory97.59%and83.66%,respectively.Surprisingly,YOMOhasonly21millionparametersandachievessuperiorperformancewith51framespersecond(FPS)fora544544inputimageonaGPU. INDEXTERMSdeeplearning,facedetection,featurefusion,top-downarchitecture,semi-softrandomI.INTRODUCTIONACEdetectionisgenerallyakeycomponentofthehuman-centered"smartcity",relatingtofacialexpres-sionanalysis,identication,individualservice,etc.Despitebeingwidelyresearched,itremainsadifcultproblemtobuildreal-timefacedetectorswithhighaccuracyundernatu-ralconditions.Traditionalfacedetectioninvolvesamanuallydesignedeffectiveclassier[1][2].However,performanceofsuchex-hibitsdecreaseinaccuracyoverapplicationcontextresultingfromsuboptimalmodel.Recently,thedeeplearningmethodhasdemonstratedpotentialandachievednoticeablesuccessinfacedetectiontasks;however,ahugechallengeremainstocreateacontext-unlimited,highlyaccuratefacedetectionModernfacedetectorsaregroupedintotwocategories:two-stageandone-stage.FasterR-CNN[3],whichisatwo-stagemodelwiththehighestefciencyandaccuracyintheregion-convolutionalneuralnetworks(R-CNN)basedmodels,usesaRegionProposalNetwork(RPN)toreplacetheselectivesearchalgorithm,andintegratesborderregres-sionandclassicationintoanetwork.However,massiveproposalsofmultiplescalesgeneratedbyRPN,andlargecomputationaloverheadduetocomplexnetworkstructurescausethedisablingofreal-timedetection.One-stagemethods,includingYOLO[4],considerdetec-tiontasksasregressionproblemsanddirectlyextracttheregressionfaceboundingboxfromthefeaturegeneratedbythefeatureextractionnetwork,butwithoutaproposa

2 lstructure.Theproblemisthatthismethodalw
lstructure.Theproblemisthatthismethodalwayshasloweraccuracythanthetwo-stagemethod.SSD[5]employsdeepfeatureswithdifferentscalestojointlycalculateforbound-ingboxregressionandclassprobabilities.Themulti-scaleinferencehelpsdetectvariablefaces,however,eachofitsstagesisnotspeciallytrainedtodetectaspecicscalerangeasimplementedbyYOMO.Despitehavingtheadvantageoflesscomputation,smallfacescannotbehandledwellbysingle-stagefacedetectors.Therearetwomainwaystoimprovethemulti-scaledetectionresultsofthemodel.Oneistouseanimagepyramidstrategytotrainthemulti-shotsingle-scalefacedetectors,i.e.,theHRVOLUME4,2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/ACCESS.2019.2954936, IEEE Access Authoretal.:PreparationofPapersforIEEETRANSACTIONSandJOURNALS detector[6],eachofthedetectorsistrainedtodetectspecicscalefaces.However,becauseimagesofdifferentscaleshavetopassthroughaprettydeepnetworkmultipletimes,suchanapproachisverytime-consuming.Anotherwayistouseafeaturepyramidmethod(“Hypercolumns”[7])totrainasingle-shotmulti-scalefacedetector.Forexample,S3FD[8]followsthesingle-shotmulti-scaleapproachforfacedetection.However,S3FDhaslowperformanceonsmallfaces,becauseoftheshallowlayersusedbyS3FDhavelesssemanticinformationaboutfaces.InspiredbyFPN[9],YOMOutilizestop-downarchitec-turewithlateralconnectionstocreateafeaturepyramidthathasstrongsemanticsatallscaleswhilenaturallyleveragingthepyramidalshapeoffeatures.Duetothedeconvolutionoperation,theresolutionofthefeaturemapusedforpredic-tionisgraduallyincreased,whichcapturesmorene-grainedfeaturesandcontextinformation.Tobuildahighly-efciencynetwork,depthwiseseparableconvolutionsareintroducedintoYOMO,whichsacricesasmallprecisioninexchangefornearlyeighttimeslesscomputationthanstandardconvo-Themaincontributionsofthispaperaresummarizedasfollows:1)Weproposeareal-timefacedetectornamedYOMO,whichconsistsofdepthwiseseparableconvolutionsandcontainsmultiplefeaturefusionstructuresintheformoftop-bottom.Eachdetectionmoduleisonlyre-sponsiblefordetectingfaceswithinthecorresponding2)Arandomcroppingstrategythatismoreconsistentwithmulti-scaledetectionstructuresallowseachde-tectionmoduletobetrainedbyasufcientnumberof3)Theproposedellipseregressorcangreatlyimprovethedetectionrecallrateunderthecontinuousmea-suresofFDDB.4)YOMOhasonly21millionparametersandachievessuperiorperformancewith51FPSfora544inputimageonaGPU.Thecontentoftherestofthispaperisassuch.Part2presentsaliteraturereviewofrelevantworkinthefacedetectioneld.Theproposedmethodisintroducedinpart3,followedbyexperimentdetailsinpart4.Part5concludesthiswork.II.RELATEDWORKSTherearenumerousresearchattemptsinthefacedetectiontask,amongwhichoneremarkableachievementhasbeenreached.Acascadeoffacebinaryclassiers[2]istrainedusingHaar-likefeatures.However,thesemethodsareoftensuboptimalandmayachievedisappointingresultsastheapplicationscenariochanges.Withthepopularityofdeeplearningtechniquesintheeldoffacedetection,somemethodsattempttocombinearticialfeatureswithfeaturesextractedbyconvolutionalneuralnetworkstodetecthumanfaces.Previousstu

3 dies[10][11]attempttoestablishaconnectio
dies[10][11]attempttoestablishaconnectionbetweenDPMandCNN,e.g.,B.Yangetal.[12]combinestheBoostingForestmodelwithCNN.However,suchmethodsarecumbersometooperateandcontributetonon-signicantresults.Usingacascadingstructuretoeffectivelydetectfacesisthenextresearchhotspot.Aimingattrainingamorerobustmulti-viewfacedetector,[13]proposesafunnel-structuredcascade(FuSt)detectionmodule.[14]cascadesofsixCNNsareusedtodetecttheface:threeareusedfordeterminingwhetherafaceexistsandtheremainderforbordercorrectionoffaceregions.[15]cascadesofveCNNsdetecthair,eyes,nose,mouthandneck,respectively.CascadingCNN-basedmethods[16]–[18]trainacascadeoffacebinaryclassiersanduseahardexampleminingscheme.However,CNNsin-sideacascadestructurearemerelystacked,whichseparatescascadedclassiersoptimizationandtheCNNoptimization.Consequently,[19]proposestheinsidecascadestructuretofeeddifferentlayerswithdifferentdata.However,thismethodstillhasthepromotionspaceintheaccuracyofsmallfacedetection.Faster-RCNN[20]employstwo-stagedetectionschemestoachievetopperformanceonseveralfacedetectiondatasets.However,thetrainingofthetwostagesisatediousprocessandtheinferencephaseistime-consuming,thus,practicalapplicationisrestricted.Insingle-shotdetectionmethods,suchasYOLO[4]andFD-CNN[21],boundingboxesarestraightlypredictedandregressedinthefeaturemapobtainedbythefeatureex-tractionnetwork.However,duetothelackofne-grainedinformationinthetop-levelfeaturemap,thesemethodsareunabletorobustlydetectsuchtinyfaces.MS-FCN[22]exploitsafeaturepyramidtodetectmulti-scalefaces.Morerecently,someresearchindicatesthatmulti-scalefeaturesfromdifferentlayersperformbetterfortinyfaces.Specif-ically,SSD[5],S3FD[8],MS-CNN[22],andSSH[23]predictboxesonmultiplelayersofthefeaturehierarchy.However,thedetectionmodulesareseparatedfromeachother,resultinginalackofsemanticinformationintheshal-lowfeaturemapswhilethemiddle-levelfeaturemapslackne-grainedinformation.Forthisreason,Parsenet[24]andCMS-RCNN[25]fusemultiplelayerfeaturestoenhancefea-turediscrimination.FPN[9]proposestop-downarchitecturethatintegrateshigh-levelsemanticinformationtoallscales.FPN-basedmethods,suchasDSSD[26],FANet[27],andPyramidBox[28]achievesignicantdetectionimprovement.Therefore,theYOMOnetworkproposedinthispaperdrawsontheexperienceofFPNandadoptsatop-downfeaturefusionmethodtoformafeaturepyramid.III.PROPOSEDMETHODThissectionintroducesourYOMOfacedetector,includingthegeneralarchitecture(Sec.III-A,Sec.III-B,Sec.III-C),thetrainingmethodology(Sec.III-D),theimprovedcroppingstrategy(Sec.III-E)andtheellipsesregressor(Sec.III-F).VOLUME4,2016 Authoretal.:PreparationofPapersforIEEETRANSACTIONSandJOURNALS FIGURE1.NetworkarchitectureofYOMO.YOMOdetectsmulti-scalefacessimultaneouslyfromthreedifferentconvolutionallayersofthefeaturepyramidbyusingdetectionmodules,andinasingleforwardpassofthenetwork.A.GENERALARCHITECTURETheframeworkofYOMOisillustratedinFig.1.AllmodulescontainBatchnormandLeakyReLUnonlinearitiesexceptforthedetectionmodules.OurarchitectureusesMobileNet[29]asitsbackbone,whichistruncatedbeforetheclassi-cationlayersandaddedwithadepthwiseseparableconvo-lutionmoduleoutputting1024channels.ThedesignideaofYOMO'sbackboneisasfollows:1)Thefacedetectionmoduleisseparatedfromthefea-tureextractionnetwork,andthenewlyaddeddepth-wiseseparableconvolutionmodulesareusedasbufferareastopreventthegradientvalueofth

4 edetectionmodulefromgeneratinglargenoise
edetectionmodulefromgeneratinglargenoisesontheweightvalueofthefeatureextractionnetwork.2)ThefullyconnectedlayerinMobileNetcanbecom-putationallyexpensive,soitisreplacedbyadepth-wiseseparableconvolutionmodulewith1024chan-nels,whichincreasesthedepthandwidthofthenetworkandallowsformoreabstractsemanticinfor-B.SCALE-INVARIANCEDESIGNFacesinimagesofunconstraineddatasetsaremultiplescales.Employinganimagepyramidstrategytotrainthemulti-shotsingle-scalefacedetectors,asin[6],iscomputationallyexpensiveandquiteslow.Incontrast,YOMOdetectsmulti- FIGURE2.ThedetailparametersofeachlayerinYOMO.scalefacessimultaneouslyfromthreedifferentconvolutionallayersofthefeaturepyramidbyusingdetectionmodules,andinasingleforwardpassofthenetwork.Thesemoduleshavestridesof8,16,and32andareresponsi-blefordetectingsmall,medium,andlargefaces,respectively.VOLUME4,2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/ACCESS.2019.2954936, IEEE Access Authoretal.:PreparationofPapersforIEEETRANSACTIONSandJOURNALS TABLE1.Scaleboundaryoffacesthatthedetectionmodulesareresponsibleforarerelatedtothestridesizeofthedetectionmodules. Boundary (Smallface) (Mediumface) (Largeface) MinScale 0 822 1623MaxScale 822 1623 3224 FIGURE3.Boundingboxeswithdimensionanchorsandlocationprediction.Thedottedrectanglerepresentstheanchor,andthebluerectanglerepresentsthepredictionboundingbox.AsshowninTab.1,eachdetectorisallocatedadetectiontaskwithinatargetscalerangeduringtrainingandinferencestage.TheinferenceresultisconstructedbyjoiningtogetherdifferentpredictedscalesusingNon-MaximumSuppressionC.DETECTIONMODULEAsshowninFig.3,theoutputlayeradoptstheregressionstrategyinYOLO[4],whichpredictsarelativecoordinatetothegridcell.Thisstrategywillboundboththegroundtruthandpredictionsintherangefrom0to1byalogisticfunction.Thenetworkgeneratesfourcoordinates()foreachboundingboxasanoffsettoanchor.Theoutputwillbecalculatedconsideringthewidthandheightwellastheoffsetfromthetopleftcorner()asfollows:)+;b)+;bD.TRAININGTheparametercongurationduringtrainingisshowninTa-ble.2.AsdiscussedinSec.III-B,threedetectionmodulesareplacedonlayerswithdifferentstridestodetectvariablefaces,eachofwhichhasvemulti-tasklossesforclassicationandregressionsub-tasks.Themulti-tasklossfunctionusedbyYOMOconsistsofveparts—non-targetloss,anchorpre-trainingloss,targetpositioningloss,targetcondenceloss,andtargetclassloss—andformulatedasEq.2.lossMaxIoUThreshijktepochx;y;w;hijktruthcoordx;y;w;htruthijkIoUtruthijktruthijkarethewidthandheightofthefeatureisthenumberofanchors,andisthenumberofiterations.SignfunctionisdenedasEq.4.coord,andclassaretheweightvaluesofeachsub-task;theseare,respectively,non-targetlossweight,anchorpre-trainlossweight,coordinatelossweight,targetlossweight,andcategorylossweight.truthshouldbe1iftheboundingboxanchoroverlapsagroundtruthfacebymorethananyotherboundingboxanchor.Inordertoadaptthenetworktoanchorasquicklyaspossible,theanchorpre-traininglossweightisintroducedintheearlystageoftraining.ThenumberofanisdenedintheYOMOforthepre-trainingperiod.E.SEMI-SOFTRANDOMC

5 ROPPINGALGORITHMThemainideaoftherandomcr
ROPPINGALGORITHMThemainideaoftherandomcroppingalgorithmusedby[8],[27],[30]istogeneratecroppingboundingboxeswithdifferentscalesbyusingvariousclippingstrategiestocropthetrainingpicturesasrequired.Asaresultoftherandomoperationofthisalgorithmandalargenumberofsmallfacesinthetrainingset,thesmallfacesinthecroppedpicturesarestillinadominantposition.Toaddressthisproblem,thispaperproposesasemi-softrandomcroppingalgorithmbasedontherandomcroppingalgorithm.Bystatisticallycountingthenumberoffacesineachscaleofthecroppedimages,thetargetedcroppingboxesareselectedtobalancethevariousscalesoffaces,allowingthedetectionmodulestobeadequatelytrained.Theprocedureofthesemi-softrandomcroppingalgorithmisdescribedasfollows:1)First,weusetherandomcroppingalgorithminDSSD[26]togenerateseveralcroppingboxesSamplesdwithanaspectratioof1.Second,VOLUME4,2016 Authoretal.:PreparationofPapersforIEEETRANSACTIONSandJOURNALS TABLE2.ParametersettingsinthesolverleduringtrainingusingtheCaffelibrary. base_lr step_value gamma batch_size iter_size type weight_decay max_iter 0.001 40000 0.1 9 3 RMSProp 0.00005 200000 croppingboxesgeneratedintheprevioussteparescaledtothesizerequiredbythenetworkaswellastheannotations.Finally,accordingtothescalerangeinTable.1,thenumberofvariousscalecategoriesoffacesisformulatedasEq.3.NumMinScaleMaxScale;cc;M;ii;NNumcorrespondstothenumberofthescalecategoryoffaceinthe-thcroppingboundingboxes.SignfunctionisdenedasEq.4.Hereistheindexofscalecategories(inourexperiments,=3.thereforeequalsto1,2and3representingasmall,mediumandlargeface,respectively.).Therangeofthe-thscalecategoryoffaceisindicatedbyMinScaleMaxScale)=x�2)RankingquantityofNumforeachcroppingboxindescendingorder:isoneofthescalecategoriesoffaces,whichisdifferentforeach.ActualNumisthenumberofvariousfacescalesinaforwardpassofthetrainingphase,whichiscountedbyEq.3.RankingActualNuminascendingorder,wecanacquirethefollowinequality:InallSamplesd,foreachindex,randomlyselectacroppingboxSelectCropsatisedbytherelationof4)Ifthecroppingboxwiththerequirementsofthe4-thstepisnotfound,thenrandomlyselectSelectCropsatisedbytherelation5)Ifthecroppingboxwiththerequirementsofthe5-thstepisnotfound,thenrandomlyselectSelectCropSamplesd6)UpdateActualNumbycountingNum,whichisthenumberofthefaceindifferentscalecategoriesSelectCrop.Thesubscriptmeanstheindexoftheselectedcroppingbox:ActualNumActualNum;cc;MActualNumActualNumNum;cc;MF.ELLIPSEREGRESSOROurmodelsaretrainedontheWIDERFACEdatasetandevaluatedontheFDDBdataset.However,groundtruthsinFDDBareinellipseshapewhileYOMOpredictsresultsinrectangles.Theinconsistencyinshapenoticeablyinuencesthecontinuouspredictionscore.InordertoimprovetherecallrateinContROC,wetrainanellipseregressortoconvertrectangleboundingboxestoellipses.ThepredictedboundingboxforYOMOconsistsofavec-containingfourelementsxmin;ymin;width;heightdenotingtop-leftcorneratxmin;yminofsizewidth;heightofaboundingbox.However,thegroundtruthboxforFDDBisovalandrepresentedbyVectorveelements;r;;c;c,wheredenoteslongsemi-denotesshortsemi-axis,denotesangle,and(arethecenterpointcoordinates.WeuseamultiplelinearregressionequationtoconvertasEq.8.XWistheregressioncoefcientmatrixwithdimen-standsforrandomerrorandwesetthevalue=0tosimplifythetrainingprocess.Atrst,weshouldcalculatethemeanandthestandarddeviationofthecoordi

6 natevectorforthepredictedboundingbox,and
natevectorforthepredictedboundingbox,andthemean,thestandarddeviationofvectorfortherealbox.Finally,wenormalizethepredictedboundingboxandthetruthboxcoordinatevectorbyEq.9andEq.10. X(9)Y0=Y�UY Then,minimizethemeansquareerrorfunctionbyleastsquaresmethodtooptimizetheregressioncoefcientmatrixasEqu.11.)= XWXWVOLUME4,2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/ACCESS.2019.2954936, IEEE Access Authoretal.:PreparationofPapersforIEEETRANSACTIONSandJOURNALS (a)FDDBdiscretescore. (b)FDDBcontinuousscore.FIGURE4.EvaluationresultonFDDBdataset.Performanceisassessedonbothcontinuousanddiscretereceiveroperatingcharacteristic(ROC).Legendsinbothlineguresrepresentthetruepositiverate(TPR)underfalsepositive(FP)equalsto1000condition.IV.EXPERIMENTSA.EXPERIMENTALSETUPTheexperimentalenvironmentisbasedona64-bitUbuntu14.04LTSsystemwith16GBofrunningmemoryandan8-coreIntelCorei7-7700KCPUwhosesinglecorefrequencyis4.20GHz.AllmodelsarebasedontheCaffeframeworkandaretrainedonasingleNVIDIAGeForceGTX1080Ti.OurfeatureextractionnetworkofYOMOisne-tunedfor200Kiterationsstartingfromapre-trainedImageNetclassicationnetwork.TheparametercongurationduringtrainingisshowninTable.2.Theweightsofthepartsinthelossfunctionare=1=1coord=1=5class=1.WesetthethresholdofNMSto0.7duringtraining,whereasthethresholdequalto0.45intheinferencephase.Duringtrainingandtesting,thepicturesforallmodelsinthispaperwerescaledtotomaintaintheaspectratio. (a)Effectofkeepratiodesign. (b)Effectofactivationfunction. (c)Effectofgradientdescentmethod. (d)Effectofgradientdescentmethod.FIGURE5.Impactofdifferentmethodsonperformance.VOLUME4,2016 Authoretal.:PreparationofPapersforIEEETRANSACTIONSandJOURNALS TABLE3.Inferencetimeofdifferentmodels. Models GPU InputSizes FPS FANet NVIDIAGTX1080ti 640480 35.6 HR NVIDIATitanX 19201080 1.4 HR NVIDIATitanX 1280720 3.1 ICC-CNN NVIDIATitanBlack 640480 40 ScaleFace NVIDIATitanX 1300900 3.7 MTCNN NVIDIATitanBlack 640480 99 OurYOMO NVIDIAGTX1080ti 10881088 18 OurYOMO NVIDIAGTX1080ti 544544 51 OurYOMO NVIDIAGTX1080ti 256256 117 B.DATASETSWIDERdataset:Thisdatasetchooses32203imagesandlabels393,703facesandvariesintargetscale,humanposeandocclusionasdepictedinthesampleimages.WIDERFACEisdividedinto61eventclasses;then,eachclassisrandomlydividedto40%,10%,and50%,respectively,fortraining,testingandvalidationsets.FDDBcontains2845imagesand5171annotatedfaces.Ithascertaindifculties,includingocclusion,difcultposture,lowresolution,andpoorfocusaswellasblackandwhiteandcolorpictures.Unlikeotherfacedetectiondatasets,thegroundtruthfacesareellipsesratherthanrectangles.Weusethisdatasetonlyforevaluation.C.FDDBDATASETRESULTWhenevaluatingintheFDDBdataset,allimagesarescaledresolutiontomaintaintheaspectratioandembeddedinablackbackgroundoftoavoiddefor-mation.WecompareYOMOwithMTCNN[17],ScaleFace[31],HR[6],HR-ER[6],ICC-CNN[32],andFANet[27]onFDDBwithdiscreteandcontinuousmeasures.InFig.4,YOMO-Fitistheresultofusinganellipsere-gressoronYOMO,whichachievescompetitiveperformanceonbothdiscreteandcontinuousROCcurves,i.e.97

7 .7%and83.6%,respectively,whenthenumberof
.7%and83.6%,respectively,whenthenumberoffalsepositivesequalsto1000,rankingonlysecondtoFANet.WithoutOHEM,thecontextmoduleandhierarchicallossstrategy,therecallrateofYOMO-Fitisonlyreducedby0.6%and1.6%thanFANetindiscreteandcontinuousROCcurves,respectively.YOMO-FitoutperformstheHR-ERby4.9%incontinuousROCcurveseventhoughHR-ERistrainedontheFDDBdatasetina10-FoldCrossValidationfashion.Figure.4showsseveralqualitativeresultsontheFDDBdataset.D.TIMINGDuetoanetworkfeaturepyramidanddepthwisesepara-bleconvolutions,YOMOhasreal-timedetectioncapability.Inordertogettheinferencetime,YOMOistestedusingNVIDIAGTX1080tiGPUbyaveragingtheruntimeof1000imagesrandomlysampledfromtheWIDERFACEdataset.Eachimageisembeddedinablackbackgroundatwhilemaintainingtheaspectratio.Aswecanobservefrom (a)QualitativeresultsontheWIDERFACEdataset. (b)QualitativeresultsontheFDDBdataset.FIGURE6.QualitativeresultsontheWIDERFACEandFDDBdatasets.Table.3,YOMOtestedat51FPSwitharesolutionofresolution,whereastheinferencetimeofFANetimagesisonly35.6FPS.E.ABLATIONSTUDY:SPECIALSETTINGSDuringtrainingandtestingstages,theimagesareresizedwhilemaintainingtheaspectratio,andembeddedinablackbackgroundof.Whenembedded,theblackedgesonbothsidesoftheimagearethesamewidth.Meanwhile,YOMOutilizestheLeakyReLUfunctionandRMSPropgradientoptimizationalgorithmtoimprovetherecallrateasshowinFig.5.VOLUME4,2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/ACCESS.2019.2954936, IEEE Access Authoretal.:PreparationofPapersforIEEETRANSACTIONSandJOURNALS TABLE4.Theproportionofeachscalefacesgeneratedbydifferentmodels. Models Smallfaceratio(%) Mediumfaceratio(%) Largefaceratio(%) YOMO-RMSProp 74.33 19.91 5.75 YOMO-DSSD-Sample 67.52 24.60 7.88 YOMO-Semi-Soft-Sample 56.03 30.81 13.16 TABLE5.TheimpactofCroppingalgorithmonperformance. Models FP Recallindiscrete(%) Recallincontinuous(%) YOMO-RMSProp 1000 95.5493 72.5193 YOMO-DSSD-Sample 1000 97.1766 74.5425 YOMO-Semi-Soft-Sample 1000 97.6893 75.004 F.CROPPINGALGORITHMInasingleepochtraining,theproportionofeachscalefacesgeneratedbythetwocroppingalgorithmsisshowninTable.4,whereYOMO-RMSPropdoesnotusethecrop-pingalgorithm.YOMO-DSSD-SampleandYOMO-Semi-Soft-SampleusetherandomcroppingalgorithmasDSSDandsemi-softrandomcroppingalgorithmrespectively.ComparedwithYOMO-DSSD-Sample,afterusingthesemi-softrandomcroppingalgorithm,thepercentageofsmall-scalefacesintotalfacesdecreasedfrom67.52%to56.03%,adecreaseof11.49%.Asaresult,thepercentageofmediumandlargefacesinYOMO-Semi-Soft-Samplehasincreasedby6.21%and5.28%,respectively.Becausethelargerfacescontainhighlyrepresentativefeaturesthatcandistinguishfaceswithlargeappearancevariations,itdoesnotrequirealargenumberfortraining.Therefore,fromtheperspectivethatthefacedetectionmodulesneedsufcienttrainingsamples,thesemi-softrandomcroppingalgorithmismoreconducivetothesingle-shotmulti-scaleapproach.TherecallresultsofthemodelstrainedbydifferentcroppingstrategiesareshowninFig.5(d)andTable.5,whereYOMO-Semi-Soft-SampleisequivalenttotheYOMOmodelmen-tionedinSec.IV-C.TheYOMO-Semi-Soft-Sampleoutper-formstheYOMO-RMSPropby2.1%and2.5%indiscreteand

8 continuousscores,respectively.Meanwhile,
continuousscores,respectively.Meanwhile,comparedwithYOMO-DSSD-Sample,YOMO-Semi-Soft-Sampleim-provedrecallwithdiscreteandcontinuousscoresare0.51%and0.46%,respectively.G.QUALITATIVERESULTSFigures.6(a)and(b)showthevisualizationresultsofsam-plingimagesoftheWIDERFACEandFDDBdatasets.RedrectanglesarethepredictionsofYOMO-FitinFig.6(a).InFig.6(b),orangerectanglesarethepredictions,purpleellipsesaretheregressionresults,andgreenellipsesarethegroundtruthfacesprovidedbytheFDDBdataset.V.CONCLUSIONWeproposeasingle-shotfacedetectornamedYOMO,whichconsistsofdepthwiseseparableconvolutionsandmultiplefeaturefusionstructuresintheformoftop-bottom,achievingagoodtrade-offbetweenaccuracyandlatency.Asemi-softrandomcroppingalgorithmisproposedtoimprovethemulti-scalefacedetectionaccuracyandtherobustnesstoilluminationandocclusion.Forhighercontinuousscores,anellipseregressoristrainedtotransformthepredictedrectangularboundingboxestoellipses.Throughtheabovestrategy,YOMOachievessuperiorperformancewith51FPSinputimagesonaGPU.ACKNOWLEDGMENTThisworkwassupportedbyNationalKeyResearchandDe-velopmentProgram(GrantNo.2016YFB0800105),SichuanProvinceScienticandTechnologicalSupportProject(GrantNos.2018GZ0255,2019YFG0191).[1]N.DalalandB.Triggs,“Histogramsoforientedgradientsforhumandetection,”inComputerVisionandPatternRecognition(CVPR),vol.1.2005IEEEComputerSocietyConferenceon,2005,pp.886–893.[2]P.ViolaandM.J.Jones,“Robustreal-timefacedetection,”Internationaljournalofcomputervision(IJCV),vol.57,no.2,pp.137–154,2004.[3]H.JiangandE.Learned-Miller,“Facedetectionwiththefasterr-cnn,”inAutomaticFace&GestureRecognition(FG2017).201712thIEEEInternationalConferenceon,2017,pp.650–657.[4]J.Redmon,S.Divvala,R.Girshick,andA.Farhadi,“Youonlylookonce:Unied,real-timeobjectdetection,”inComputerVisionandPatternRecognition(CVPR).2016IEEEComputerSocietyConferenceon,2016,pp.779–788.[5]W.Liu,D.Anguelov,D.Erhan,C.Szegedy,S.Reed,C.-Y.Fu,andA.C.Berg,“Ssd:Singleshotmultiboxdetector,”inIEEEEuropeanConferenceonComputerVision(ECCV).Springer,2016,pp.21–37.[6]P.HuandD.Ramanan,“Findingtinyfaces,”inComputerVisionandPatternRecognition(CVPR).2017IEEEComputerSocietyConferenceon,2017,pp.951–959.[7]B.Hariharan,P.Arbeláez,R.Girshick,andJ.Malik,“Hypercolumnsforobjectsegmentationandne-grainedlocalization,”inComputerVisionandPatternRecognition(CVPR).IEEEComputerSocietyConferenceon,2015,pp.447–456.[8]S.Zhang,X.Zhu,Z.Lei,H.Shi,X.Wang,andS.Z.Li,“S3fd:Singleshotscale-invariantfacedetector,”inIEEEInternationalConferenceonComputerVision(ICCV),2017,pp.192–201.[9]T.-Y.Lin,P.Dollár,R.Girshick,K.He,B.Hariharan,andS.Belongie,“Featurepyramidnetworksforobjectdetection,”inComputerVisionandPatternRecognition(CVPR).2017IEEEComputerSocietyConferenceon,2017,pp.2117–2125.[10]P.-A.Savalle,S.Tsogkas,G.Papandreou,andI.Kokkinos,“Deformablepartmodelswithcnnfeatures,”inIEEEEuropeanConferenceonCom-puterVision(ECCV),PartsandAttributesWorkshop,2014.VOLUME4,2016 Authoretal.:PreparationofPapersforIEEETRANSACTIONSandJOURNALS [11]R.Girshick,F.Iandola,T.Darrell,andJ.Malik,“Deformablepartmod-elsareconvolutionalneuralnetworks,”inComputerVisionandPatternRecognition(CVPR).2015IEEEComputerSocietyConferenceon,2015,pp.437–446.[12]B.Yang,J.Yan,Z.Lei,an

9 dS.Z.Li,“Convolutionalchannelfeatur
dS.Z.Li,“Convolutionalchannelfeatures,”inIEEEInternationalConferenceonComputerVision(ICCV),2015,pp.[13]S.Wu,M.Kan,Z.He,S.Shan,andX.Chen,“Funnel-structuredcascadeformulti-viewfacedetectionwithalignment-awareness,”Neurocomput-ing,vol.221,pp.138–145,2017.[14]H.Li,Z.Lin,X.Shen,J.Brandt,andG.Hua,“Aconvolutionalneuralnetworkcascadeforfacedetection,”inComputerVisionandPatternRecognition(CVPR).2015IEEEComputerSocietyConferenceon,2015,pp.5325–5334.[15]S.Yang,P.Luo,C.-C.Loy,andX.Tang,“Fromfacialpartsresponsestofacedetection:Adeeplearningapproach,”inIEEEInternationalConfer-enceonComputerVision(ICCV),2015,pp.3676–3684.[16]H.Li,Z.Lin,X.Shen,J.Brandt,andG.Hua,“Aconvolutionalneuralnetworkcascadeforfacedetection,”inComputerVisionandPatternRecognition(CVPR.2015IEEEComputerSocietyConferenceon,2015,pp.5325–5334.[17]K.Zhang,Z.Zhang,Z.Li,andY.Qiao,“Jointfacedetectionandalignmentusingmultitaskcascadedconvolutionalnetworks,”IEEESignalProcessingLetters,vol.23,no.10,pp.1499–1503,2016.[18]H.Qin,J.Yan,X.Li,andX.Hu,“Jointtrainingofcascadedcnnforfacedetection,”inComputerVisionandPatternRecognition.2016IEEEComputerSocietyConferenceon,2016,pp.3456–3465.[19]K.Zhang,Z.Zhang,H.Wang,Z.Li,Y.Qiao,andW.Liu,“Detectingfacesusinginsidecascadedcontextualcnn,”inIEEEInternationalConferenceonComputerVision(ICCV),2017,pp.3171–3179.[20]S.Ren,K.He,R.Girshick,andJ.Sun,“Fasterr-cnn:Towardsreal-timeobjectdetectionwithregionproposalnetworks,”inAdvancesinneuralinformationprocessingsystems,2015,pp.91–99.[21]D.Triantafyllidou,P.Nousi,andA.Tefas,“Fastdeepconvolutionalfacedetectioninthewildexploitinghardsamplemining,”Bigdataresearch,vol.11,pp.65–76,2018.[22]Y.BaiandB.Ghanem,“Multi-scalefullyconvolutionalnetworkforfacedetectioninthewild,”inComputerVisionandPatternRecognition(CVPR.2017IEEEComputerSocietyConferenceon,2017,pp.132–[23]M.Najibi,P.Samangouei,R.Chellappa,andL.S.Davis,“Ssh:Singlestageheadlessfacedetector,”inIEEEInternationalConferenceonCom-puterVision(ICCV),2017,pp.4875–4884.[24]W.Liu,A.Rabinovich,andA.C.Berg,“Parsenet:Lookingwidertoseebetter,”InternationalConferenceonLearningRepresentationsWorkshop,[25]C.Zhu,Y.Zheng,K.Luu,andM.Savvides,“Cms-rcnn:contextualmulti-scaleregion-basedcnnforunconstrainedfacedetection,”inDeeplearningforbiometrics.Springer,2017,pp.57–79.[26]C.-Y.Fu,W.Liu,A.Ranga,A.Tyagi,andA.C.Berg,“Dssd:Deconvolu-tionalsingleshotdetector,”arXivpreprintarXiv:1701.06659,2017.[27]J.Zhang,X.Wu,J.Zhu,andS.C.Hoi,“Featureagglomerationnetworksforsinglestagefacedetection,”arXivpreprintarXiv:1712.00721,2017.[28]X.Tang,D.K.Du,Z.He,andJ.Liu,“Pyramidbox:Acontext-assistedsingleshotfacedetector,”inIEEEEuropeanConferenceonComputerVision(ECCV),2018,pp.797–813.[29]A.G.Howard,M.Zhu,B.Chen,D.Kalenichenko,W.Wang,T.Weyand,M.Andreetto,andH.Adam,“Mobilenets:Efcientconvo-lutionalneuralnetworksformobilevisionapplications,”arXivpreprintarXiv:1704.04861,2017.[30]J.Li,Y.Wang,C.Wang,Y.Tai,andF.Huang,“Dsfd:Dualshotfacedetector,”arXivpreprintarXiv:1810.10220,2018.[31]S.Yang,Y.Xiong,C.C.Loy,andX.Tang,“Facedetectionthroughscale-friendlydeepconvolutionalnetworks,”arXivpreprintarXiv:1706.02863,[32]K.

10 Zhang,Z.Zhang,H.Wang,Z.Li,Y.Qiao,andW.Li
Zhang,Z.Zhang,H.Wang,Z.Li,Y.Qiao,andW.Liu,“Detectingfacesusinginsidecascadedcontextualcnn,”inIEEEInternationalConferenceonComputerVision(ICCV),2017,pp.3171–3179. JIEXUreceivedtheBachelorsdegreeinautoma-tionfromChongQingUniversity,China,in2003,theMasters.degreeininformationandautomationengineeringfromUniversityofBesancon,France,in2004,andthePh.D.degreeinautomaticsys-temfromNationalInstituteofAppliedSciences(INSA-Toulouse),France,in2008.HeisanAs-sociateProfessoroftheSchoolofInformationandCommunicationEngineeringinUniversityofElectronicScienceandTechnologyofChina.Hehasauthoredorcoauthorednearly40publicationsinjournalsandconfer-ences.Hisresearchinterestsincludethedeeplearning,facedetection,imageandvideorecognition,andnetworkandinformationsecurity. YETIANreceivedtheB.S.degreeincommuni-cationengineeringfromSouthwestJiaotongUni-versity,Kunming,China.Heiscurrentlywork-ingtowardtheM.S.degreeinelectronicsandcommunicationengineeringfromUniversityofElectronicScienceandTechnologyofChina,Chengdu,China.Hiscurrentresearchinterestsin-cludefacedetectionandimagerecognition,com-putervisionanddeeplearning. HAOYUWUreceivedhisBachelordegreeinCommunicationEngineeringfromUniversityofElectronicScienceandTechnologyofChinain2018.HeisexpectedtoobtainhisMasterdegreeinComputerSciencein2019fromUniversityofBristol,UK.Hehasresearchinterestsinpatternrecognition,deeplearning,objectdetectionandnowiscurrentlyworkingongenerativeadversarialnetwork. BAOWENLUOreceivedtheB.S.degreeinCommunicationEngineeringfromUniversityofElectronicScienceandTechnologyofChina,Chengdu,China.HeiscurrentlyworkingtowardtheM.S.degreeinelectronicsandcommunicationengineeringfromUniversityofElectronicScienceandTechnologyofChina,Chengdu,China.Hiscurrentresearchinterestsincludemachinelearn-ing,deeplearningandnetworksecurity.VOLUME4,2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/ACCESS.2019.2954936, IEEE Access Authoretal.:PreparationofPapersforIEEETRANSACTIONSandJOURNALS JINHONGGUOreceivedtheBachelorsdegreeinelectronicengineeringfromtheUniversityofElectronicScienceandTechnologyofChina,Chengdu,China,in2010,andthePh.D.degreeinbiomedicalengineeringfromNanyangTechnolog-icalUniversity,Singapore,in2014.HeiscurrentlyaFullProfessorwiththeSchoolofInformationandCommunicationEngineering,UniversityofElectronicScienceandTechnologyofChina.Af-terhisdoctoralstudies,hewasaPostdoctoralFellowinthePillarofEngineeringDesignwithMIT-SUTDSingaporefrom2014to2015.HethenworkedasaVisitingProfessorwiththeSchoolofMe-chanicalEngineering,UniversityofMichigan,AnnArbor,MI,USA,fromJanuary2016toJuly2016.Hiscurrentresearchfocusesonelectrochemicalsensorandlab-on-a-chipdevicesforpointofcaretesttowardclinicaluse.Hehasauthoredorcoauthoredmorethan70publicationsintopjournals,suchastheIEEETRANSACTIONSONINDUSTRIALINFORMATICS,theIEEETRANSACTIONSONBIOMEDICALENGINEERING,theIEEETRANSACTIONSONBIOMEDICALCIRCUITSANDSYSTEMS,An-alyticalChemistry,BiosensorandBioelectronics,etc.HewastherecipientoftheChinaSichuanThousandTalentsPlanforScholarsAward(2015)andChengduExpertinScienceandTechnologyAward(2015). VOLUME4,2