Figure1AnoverviewofhowKivaworks1Aborrowerrequestsaloantoaeldpartnerandaloanisdisbursed2ThepartneruploadsaloanrequesttoKivaandlendersfundtheloan3TheborrowermakesrepaymentsthroughthepartnerandKivathenre ID: 858865
Download Pdf The PPT/PDF document "UnderstandingandPromotingMicroFinanceAct..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1 UnderstandingandPromotingMicro-FinanceAc
UnderstandingandPromotingMicro-FinanceActivitiesinKiva.orgJaegulChooGeorgiaInstituteofTechnologyjaegul.choo@cc.gatech.eduChanghyunLeeGeorgiaInstituteofTechnologyclee407@gatech.eduDanielLeeGeorgiaTechResearchInstitutedaniel.lee@gtri.gatech.eduHongyuanZhaGeorgiaInstituteofTechnologyzha@cc.gatech.eduHaesunParkGeorgiaInstituteofTechnologyhpark@cc.gatech.eduABSTRACTNon-protMicro-nanceorganizationsprovideloaningop-portunitiestoeradicatepovertybynanciallyequippingim-poverished,yetskilledentrepreneurswhoareindesperateneedofaninstitutionthatlendstothosewhohavelittle.Kiva.org,awidely-usedcrowd-fundedmicro-nancialser-vice,providesresearcherswithanextensiveamountofpub-liclyavailabledatacontainingarichsetofheterogeneousinformationregardingmicro-nancialtransactions.Ourob-jectiveinthispaperistoidentifythekeyfactorsthaten-couragepeopletomakemicro-nancingdonations,andulti-mately,tokeepthemactivelyinvolved.Inourcontributiontofurtherpromoteahealthymicro-nanceecosystem,wedetailourpersonalizedloanrecommendationsystemwhichweformulateasasupervisedlearningproblemwherewetrytopredicthowlikelyagivenlenderwillfundanewloan.WeconstructthefeaturesforeachdataitembyutilizingtheavailableconnectivityrelationshipsinordertointegratealltheavailableKivadatasources.Forthoselenderswithnosuchrelationships,e.g.,rst-timelenders,weproposeanovelmethodoffeatureconstructionbycomputingjointnonnegativematrixfactorizations.Utilizinggradientboost-ingtreemethods,astate-of-the-artpredictionmodel,weareabletoachieveupto0.92AUC(areaunderthecurve)value,whichshowsthepotentialofourmethodsforprac-ticaldeployment.Finally,wepointoutseveralinterestingphenomenaonlenders'socialbehaviorsinmicro-nanceac-tivities.CategoriesandSubjectDescriptorsH.3.3[InformationSearchandRetrieval]:Informationltering;I.2.6[ArticialIntelligence]:LearningPermissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcita-tionontherstpage.CopyrightsforcomponentsofthisworkownedbyothersthanACMmustbehonored.Abstractingwithcreditispermitted.Tocopyotherwise,orre-publish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.Requestpermissionsfrompermissions@acm.org.WSDM'14,February2428,2014,NewYork,NewYork,USA.Copyright2014ACM978-1-4503-2351-2/14/02...$15.00.http://dx.doi.org/10.1145/2556195.2556253. Figure1:AnoverviewofhowKivaworks.1.Aborrowerrequestsaloantoa(eld)partner,andaloanisdisbursed.2.ThepartneruploadsaloanrequesttoKiva,andlendersfundtheloan.3.Theborrowermakesrepaymentsthroughthepartner,andKivathenrepaysthelenders.Theycanmakeanotherloan,donatetoKiva,orwithdrawthemoneytotheirPayPalaccount.KeywordsRecommendersystems;cold-startproblem;micronance;crowdfunding;jointmatrixfactorization;gradientboostingtree;heterogeneousdata1.INTRODUCTIONKivawasfoundedbyMattFlanneryandJessicaJack-leywhobasedtheirconceptontheinspirationofMuham-madYunus'lectureontheGrameenBank.TheGrameenBank,whichwontheNobelPeacePrizein2006foritsim-pactinhelpingtheimpoverished,wasfoundedbyYunusin1977toaddressthelackofpracticalcreditavailabletotheunder-utilized,yetskillfulentrepreneursinimpoverishedcountries[32].InYunus'Book,hedocumentedhowhecameupwiththeconceptbynoticingthattheverypoorcouldbarelysustainthemselves,letaloneworktheirtrade,sincemanytimesthepoorweretakingloanstobuythemateri-als,onlytoselltheirnishedproductbackasrepayment.Inresponse,Yunusbeganhiscredit-loaningprogramwhich 0 100 200 300 400 500 600 700 (a) #days taken for a loan to be paid back#loans #loans m= 5 #loans m=15 0 100 200 300 400 500 600 700 (b) #days taken to fund the next loan#loans m=50 Figure2:Temporallendingpatternsfordierentlendergroupswithaspeciclendingcountmprovidedloanswithoutcollateralandinterest,andwithaneasyrepaymentplan.TherearemultitudesofsuccessstoriesinYunus'bookaswellasontheKivablog1thatportrayhowmicro-nancinghasgivenopportunitytochangethelivesoftheborrowers,theirbusinesses,andtheirlocalareas.Sinceitsinceptionin2005,Kivaanditsgenerouslendersnowimpactthelivesofcourageous,hardworkingborrowersacross72countries.Kiva2isanon-protmicro-nancialor-ganizationwhichactsasanintermediaryservicetoprovidepeoplewiththeopportunitytolendmoneytounderprivi-legedentrepreneursindevelopingcountries.Kiva'slendingmodelisbasedonacrowd-fundingmodelinwhichanyindi-vidualcanfundaparticularloanbycontributingtoaloanindividuallyorasapartofalenderteam.TheKivaloanprocessissummarizedinFig.1.3OpenpublicaccesstoKiva'sdata,providedthroughdailysnapshotsandanAPI,isapartofKiva'scharitableinitia-tivetoprovidetheworkingpoorwithaninfrastructurethatKivahopeswillencouragelife-changinglending.ThisleveloftransparencyliesatthecoreofKiva'ssuccessfulgrowthasMattFlanneryputsit:\Transparencyinthisnextperiodwillbeourbestweaponagain
2 stthechallengesofgrowth.Thismodelthrives
stthechallengesofgrowth.Thismodelthrivesoninformation,notmarketing"[13].Kivadatacontainawealthysetofheterogeneousinfor-mationaboutlenders,loans,lenderteams,borrowers,andeldpartners.AsofJune2013,thepubliclyavailableKivadatasetcontainedover1,100,000lenders,500,000loans,and150,000journalentriesforover4,000,000transactionsthatresultedinover400,000,000USDofloansissued.Therearealsomultipletypesofmany-to-manyrelationshipsbetweeneachofthedataentities.Forexample,lendersmaybeapartofmultiplelenderteamswhilelendersmaychooseanynumberofloanstoparticipatein.Furthermore,borrowers 1http://pages.kiva.org/kivablog23mayoptionallyupdatetheirprogresstotheireldpartnerforentryintotheKivawebsiteasajournalentry.Thisdatasetincludesgeospatial,temporal,andfree-textdataalongwithavarietyofothernumericalandcategoricalinforma-tion,consequentlyformingafascinatingsetofdataformanydataminingandsocialmediaresearchers.Loanrecommendationanddiverselenderbehav-iors.Kivaasanon-protorganizationencourageslendingbypromotingtheideathatthoseinneedcancreatebetterlivesforthemselvesandtheirfamilieswhengiventheop-portunity,i.e.,capital.Thus,onecannaturallyrealizethatlenders,whoarealsoregardedasdonorsduetothelackofanyinterestorrewardtheyreceiveinreturnfortheirloan,areapivotalcomponenttotheKivamodel.Consequently,oneofthekeystoahealthyKivaecosystemreliesonkeep-ingtheirlendersinterestedincontinuingintheirgenerousdonations.Thisiswhereactiverecommendationcanplayamajorrolebymatchingthelenderwithloansthattheywouldbesincerelyinterestedin.Inaddition,whatmakesloanrecommendationaninter-estingproblemisthediversityoflenders'behaviors.Howdolendersdierintheirlendingbehaviorsandwhatarethemajorfactorstodrivethesedierences?Fig.2displaysanexampleshowingtemporallendingpatterns.Forapartic-ularloantobefullypaid,itusuallytakesfromahalftoafullyear(Fig.2(a)).Incaseofpassivelenderswithasmallnumberoflendingexperiences(asmallerminFig.2(b)),thetimetakenbetweentwoconsecutivelendingactivitiesshowarelativelyhighcorrelationwiththetimerequiredforaloantobepaid,comparedtotheothercases.Thisbehav-iorismostlikelyexplainedbythenotionthatsomepassivelendersparticipateinanotherloanwhentheirinitialloanispaidback,ratherthancontributingmoremoneyoftheirown.However,activelenderswithmorelendingexperiencescontinuetheirlendingactivitiesmainlywithinashorttimeinterval,asshowninastrongpeakwithalmostnotailintheexampleswithalargerminFig.2(b).Challengesinloanrecommendation.Theproblemofloanrecommendationpresentsvariouschallengescomparedtoothertraditionalrecommendationproblems.Therstisthetransientnatureofloans.Standardrec-ommendationtechniquesbasedoncollaborativelteringpri-marilyutilizeothersimilarusers'ratingsorpreferencesontheitemsforrecommendation.Thekeynotionisthattheitemsbeingrecommendedsuchasbooksormoviesareper-sistentandreusable,i.e,anitem(oracopythereof)canservemanyusers.Loans,ontheotherhand,aretransientandaparticularloancanonlyserveasingleborrower.Moreimportantly,loansareonlyavailableforashortamountoftimeuntiltheloanrequestisfullymet,oftenleavinglittleornoinformationavailabletoutilizefrompreviouslenders.Thesecondchallengeisthebinaryratingstructure.Mostratingsystemsarecomposedofamulti-gradesetofratingsfromwhichausercanselect,yetinKiva,theonlyinfor-mationavailablesimilartoaratingiswhetherornots/hefundedtheloan.4Furthermore,thefactthatthefundingdidnothappenmaynotnecessarilymeanthats/hedidnotlikeit.Thischallengeisoftenfoundinothersettingswheretherecommendationreliesonlyonpreviouspurchases,viewingofitempages,etc.Suchlimitedinformationandambigu- 4Individualloanamountcouldbeutilizedsimilarlytoratinginformation,butsuchinformationisnotavailablefromKivaAPIforlenderprivacy. ityrequiremorethanjuststandardcollaborativelteringapproaches.Finally,anotherchallengeistheheterogeneityofdata.TheKivadatasetcomprisesavarietyofintertwineden-titiesgivingrisetoarichsetofheterogeneousinformation.Mergingandfusingthisdiversesetofinformationinauni-edpredictiveframeworkforloanrecommendationpresentsanon-trivialproblem.Overviewofproposedapproaches.Inordertobet-terhandlethesechallengesanddeeplyanalyzevariouslend-ingpatternsamongKivausers,weproposeasupervisedlearningapproachtotacklethisuniqueloanrecommenda-tionproblem.Thatis,weformulateitasabinaryclassi-cation/regressionproblem,where,givenalenderandloanpair,thetrainedmodelcomputesthescorethatrepresentsthelikelihoodoffunding.Inordertotrainourmodelwithalltheavailableinformation,weproposetwomainfeaturegenerationmethods:(1)graph-basedfeatureintegration(forlenderswithpreviousloans)(2)featurealignmentviajointnonnegativematrixfactorization(forlenderswithnoprevi-ousloans).Theformerprovidesuswithageneralframeworkforincorporatingalltheavailableheterogeneousinformationtorepresentalender-loanpair.Ontheotherhand,thelat-teralleviatesthelackofinformationfornewcomers,whichisawell-knownissuereferredtoasthecold-startproble
3 minmanyrecommendationapplications.Utiliz
minmanyrecommendationapplications.Utilizingtheproposedapproachesalongwithagradientboostingtree,astate-of-the-artlearnermodel,weachieveapracticallyusefullevelofperformanceuptoaround0.92AUC(areaunderthecurve)value.Furthermore,wepresentin-depthanalysisoftheresultingmodelanditsoutput,re-vealingvariousinterestingknowledgeaboutlenders'socialbehaviorsinmicro-nanceactivities.Therestofthispaperisorganizedasfollows.Section2describesourbasicpreprocessingstepstohandlethehetero-geneityofKivadata;inaddition,wehavemadethepost-processeddatareadilyavailableonthewebforotherre-searchers.Section3describesourmainapproachesforloanrecommendation,andSection4presentsthepredictionper-formancesaswellasvariousndingsfromouranalysis.Sec-tion5discussesrelatedwork.Finally,Section6concludesthepaperanddiscussesfuturework.2.BASICDATAREPRESENTATIONTheKivadatasetiscomposedofvariousentities,eachofwhichhasitsownsetofrichinformationincludingunstruc-tureddata(e.g.,text,image,andvideo)aswellasstructureddata(e.g.,geo-spatial,numerical,categorical,andordinaldata).Lenderentitiescontainbasicwebproledata,i.e.,proleimage,registrationtimestamp,location,loancount,andotherelds,inadditiontolinkstovarioustypesofenti-ties.Forexample,alenderwillhavelinkstoloansthats/hehasfundedandtoanynumberoflenderteamswithwhichs/heisaliated.Fieldpartnersmanageloanswithintheirlocalregion,whileborrowersrequestloansfromtheirlocaleldpartnerinrespecttotheirlackofaccesstoacomputerwithinternetaccess.KivaprovidesarecentsnapshotofitsdatasetinJSONandXMLformats,5.Forourwork,weuseda2.9GBJSONsnapshotwhichwascollectedon5/31/2013.Weprepro-cessedittoobtainthenumericalrepresentationsofeach 5http://build.kiva.org/docs/data/snapshotsavailableeld.Particularly,thepreprocessingoftempo-ral,categorical,andtextualeldsallrequiredanontrivialamountofwork.Fortemporaldata,suchastheloan'spost-ingdateandlender'ssign-update,weconvertedittoaserialdatenumberusingMatlab'sdatenumfunction,whichrepre-sentsthewholeandfractionalnumberofdaysfromaxedpresetdateofJanuary0inyear0000.Forcategoricaldata,suchasalender'sgenderandaloan'scountrycode,weusedadummyencodingschemewhichconvertsavariablewithmcategoriesintoanm-dimensionalbinaryvectorwhereonlythevaluesinthecorrespondingcategoriesaresettoones.Finally,fortextualdata,weencodedeachtextualeldseparatelyasabag-of-wordsvectorwhereanindividualdi-mensioncorrespondstoauniqueword.Afterwardswere-ducedthedimensionalityusingnonnegativematrixfactor-ization6(NMF)[21,19]to100foreachtextualeld.Weperformeddimensionreductionfortworeasons.First,al-thoughtheencodedrepresentationsmaybeinsparseformat,theentiredimensioneasilyamountsuptothehundredsofthousandsrequiringenormouscomputationaltimeinlearn-ingourpredictionmodel.Second,thereduceddimensions,whicharecomposedofagroupofwords,aremoresemanti-callymeaningfulthanindividualtermdimensions,andthus,theycanbeversatileforbothgoodpredictionperformanceanddata/modelunderstanding[10,30].Thereduceddimen-sionwassetto100becauselargervaluesdidnotimprovethepredictionperformancesreportedinSection4.Asanalpreprocessingstepwecreatedmappingsbetweenentitiesfromthedierenttables.Forexample,alenderen-tityfoundinthetablecontainingmetadataforlendersmayhaveadierentidentierinanothertableaboutthelender-loangraph,andevenworse,itmayexistinonlyonetable,meaningthatsomeinformationaboutitwillbecompletelymissing.Themappingswecreatedallowtheseissuestobehandledwithease.WemadetheprocessedformatteddataasMatlablesavailableatedu/processed-kiva-data.3.METHODOLOGYInthissection,wedescribeourmethodologyforpromot-ingnon-protmicro-nanceactivitiesinKiva.Weformulatethistaskasabinaryclassication/regressionproblem.Thatis,weconsiderapair(u;l)ofalenderu7andaloanlasanindividualdataitem,andgivensuchapair,weintendtopredicthowlikelys/hewillfundtheloan,whichwede-noteasf(u;l).Theassociatedlabelissetto1iffundingoccurredforthepairand0otherwise.Oncethelearnermodelistrainedbasedonasetofdataitemsalongwiththeselabels,itcanthenpredictthelikelihoodoffundingforanygivenlender-loanpair.Suchacapabilityisbroadlyapplicableinvariousloanrecommendationproblems.Forexample,itallowsonetoidentifythebestmatchinglenderforaparticularloanbysolvingargmaxuf(u;l)foraxedlaswellasthemostappropriateloantorecommendgivenaparticularlenderbysolvingargmaxlf(u;l)foraxedu.Inthisapproach,thekeyprocedureaectingtheoverallperformanceisfeaturegeneration,i.e.,howwecharacter-izeandrepresentaparticularlender-loanpair.Thisises-peciallychallengingconsideringthecomplexityoftheKivadatasetwhichinvolvesheterogeneousentities,suchasbor- 6http://www.cc.gatech.edu/~hpark/nmfsoftware.php7Weuseanacronymubyviewingalenderasakiva`u'ser. Figure3:Agraph-basedfeatureintegrationforalender-loanpair(grey-colored).rowers,eldpartners,loans,lenders,andlenderteams,withtheirownvarioussetofinformationandcomplexrelation-shipsamongthem.Toproperlyhandlethisissue,weactappropria
4 telyfortwosituationssplitbywhetherornota
telyfortwosituationssplitbywhetherornotalenderhashadpreviousfundingexperiences.Inthefollow-ing,wepresentourfeaturegenerationprocedureforeachcaseindetail.3.1Graph-basedFeatureIntegrationWheninformationaboutpreviousfundingexperiencesofaparticularlenderisavailable,weutilizerelationshiplinksbetweendierententitiestotakeintoaccountalltheinfor-mationavailablefromthelinkedentities.AssummarizedinFig.3,givenalender-loanpair(u;l),werstretrieveallthelinkedentitiesfromboththelenderandtheloan.Speci-cally,alenderuwillcontainlinkstothelistofteamss/heisaliatedwith,loanss/hefundedpreviously,andpartnersandborrowershis/herpreviousloanswereassociatedwith.Similarly,aloanwillcontainthelinkstotheassociatedpart-nerandthelistsofborrowers,lenders(excludingthelenderofinterest),andlenderteamsthatlendersarealiatedwith.Lender-andloan-specicfeatures.Eachentitytype,e.g.,thei-thtypeamongaborrower,apartner,aloan,alender,andalenderteam,composestheentity-type-wisefeature(column)vectors,vuiandvli,torepresentalenderuandaloanl,respectively,which,inturn,formalender-specicfeaturevectorvuvu1vu5Tandaloan-speciconevlvl1vl5T(circlesinFig.3).Inthisprocess,oneissueisthatwemayhaveavariablenumberoflinkedentitiesofthesametype.Forinstance,onelendermayhavefundedfourloansinthepast,yetanothermayhavefundedfteen.Tomaintainaxednumberofdimensionsforvui(orvli)givenavariablenumberofentities,weaggregatethemintoasinglesetoffeaturesbyaddingupallthefeaturevectorsofindividualentities.Supposethei-thentitytypeisaloanandalenderuisassociatedwithasetofentities(loans)n(eui)j:j=1;;nowhereanentity(eui)jisrepresentedasafeaturevector(vui)j.Thefeaturevectorvui(ofthei-thentitytype)foruisrepresentedasvuiXj(vui)j:(1)Forexample,aloan'srequestedamount(indollars)willcor-respondtothesummationofthevaluesfrommultipleloans,asinglevalueindicatingatotalrequestedamount.Forcat-egoricalvariables,suchasalender'sgenderwhichisrep-resentedasabinaryvectorintwodimensions,aftersum-mingupthefeaturevectorsoflendersforaparticularloan,thevaluescorrespondingtothetwodimensionsbecomethenumberofmaleandfemaleslenders,respectively.Thesameideacanalsobeappliedtotextualfeatures,whicharenon-negativerepresentationscomputedbyNMF.Inaddition,eveniftherearenolinkstoentitiesofapar-ticularentitytype,e.g.,noassociatedloansforaparticularlender,Eq.(1)stillholdssinceitwillproduceanequal-dimensionalfeaturevectorcontainingallzeros.Lender-loanmatchingfeatures.Wehavedescribedhowwegeneratelender-andloan-specicfeaturesbyin-cludinginformationfromeachofthelinkedentities.Wenotethatalthoughtheresultingdataincludelinkstohet-erogeneousentitytypes,bothalenderandaloannowhavecounterpartsgeneratedfromthesameentitytype,whichcanbedirectlycomparedwitheachother.Inotherwords,bothlendersandloanswillhaveallthefeaturesetsassociatedwithborrowers,eldpartners,loans,lenders,andlenderteams.Intuitively,iftheentitiesfromalendersideandaloansidearesimilar,ourpredictorf(u;l)shouldgiveahighscoreaboutthelikelihoodoffunding.Toleveragethisinourfeaturerepresentation,wegenerateanadditionalsetoffeaturesvulthatindicatehowwelltheentitiesofthesametypematchesinanindividualfeaturelevel.Tothisend,wecomputetheproductofindividualfeaturesreferringtothemaslender-loanmatchingfeatures(hexagonsinFig.3),i.e.,vulvuvl,whererepresentsanelement-wiseprod-uct.Giventhenonnegativityofvuandvl,vulindicateshowstronglythevaluesofaparticulardimensionarerepresentedin`both'thelenderandtheloansides;thiscanbeconsideredasthedegreeofmatchingatanindividualfeaturelevel.Thesematchingfeatures,whichareoriginallythesecond-ordertermsofexistingfeatures,maybeinherentlyutilizedinnonlinearorkernelmodels,buttheyarepotentiallycriticalinformationtomanyothermodelssuchaslinearmodelsandothertree-basedmodelsthatdealwithonlyonevariableatatime,aswillbedescribedinSection4.1.Temporalfeatures.Inspiredfromtheanalysisdis-cussedearlierinSection1,wegenerateadditionalfeaturesusingtemporalinformationaboutalenderandaloan.Avail-abletemporalinformationincludesalender'smember sinceandaloan'sposted date,funded date,andpaid date.Byconsideringtherelativetimedierencesbetweenaloanlandthemostrecentloan,lr,thatalenderfundedinthepast,weconstructsixtemporalfeatureshavingtheformofxywherexisoneofl'sposted dateandfunded dateandyisoneoflr'sposted date,funded date,andpaid date.Thesefeaturesbasicallyre\rectthetemporalpatternsofconsecu-tivelendingactivities.3.2FeatureAlignmentviaJointNonnegativeMatrixFactorizationCold-startproblem.Thefeaturegenerationproceduredescribedpreviouslyisquitegeneraland\rexiblewhenincor-poratingalltheinformationfromeachoftheheterogeneousentities,butthemainlimitationofthisapproachariseswhenlittleornorelationshiplinkbetweenalenderand/oraloanexists.Althoughdetailsmaydier,thisproblem,whichisoftenreferredtoasacold-startproblem,iscommoninmanyrecommendationapplications.Forinstance,supposeanewKivauserconsidersfundingaloanfort
5 hersttimeandwewouldliketorecommendthemo
hersttimeandwewouldliketorecommendthemostappropriateloantheywouldbelikelytofund.Itisverylikelythattheymaynothaveanyconnectionswithlenderteams,previousloans,and Figure4:AnoverviewofhowjointNMFworks.Givenahigh-dimensionalspaceoflenders'andloans'textualdata(`'-marked)alongwiththeirlinkedinformation(dashedlines),jointNMFgeneratesacommonalignedspacewherelinkeddatapointsarecloselyplaced.First-timelendersandfreshloans(`'-marked)arethenmappedtothealignedspacesothattheresultingrepresentationsrevealtheirhid-denlinkedrelationships(dottedellipses).accordingly,anypartnersorborrowers.Ontheotherhand,supposeanewloanwebpagehasjustlaunchedontheKivawebsiteanditcurrentlyhasnotsecuredalender.Inthisscenariowewouldnothaveanyavailablelinkstolendersandtheirlenderteamsthatcanbeutilizedinthefeaturegenerationprocessontheloan'sside.Thesecold-startprob-lemsmakeourloanrecommendationtaskchallengingsinceanumberoffeatureblocksdepictedinFig.3wouldbezerovectors,leavinglittleinformationusefulforrecommenda-tion.HowjointNMFworks.Asawaytoalleviatethisproblem,weproposeanovelfeaturegenerationapproachbasedonjointnonnegativematrixfactorization(NMF)forarst-timelenderandafreshloanthathavenoavailablelininformation.AsshowninFig.4,themainideabehindthisapproachistotransformthefeaturesgeneratedfromhet-erogeneoussources,oneofwhichcomesfromalender'ssideandtheotherfromaloan'sside,intoacommonspacewherethevectorsrepresentingalenderandaloanwithwhichitislinkedcanbeplacedclosetoeachother.Onceweob-tainthevectorrepresentationsofalenderandaloanintheresultingcommonspace,onecanalsoeasilygeneratethecor-respondinglender-loanmatchingfeatureswhichwouldplayasignicantroleinestimatingthelikelihoodoffunding.InputmatricesforjointNMF.Tobegin,westartwithtextualelds,e.g.,alender'sloan because,alender'soccupationalinfo,whichalenderllsoutwhensigningupatKiva.org,andaloan'sloan description.AsdescribedinSec-tion(2),eachofthesetextualeldsisinitiallyrepresentedasabag-of-wordsvectorbasedonitsownvocabulary.Notethatthevocabularysetofaparticulartextualeldisinde-pendentofthatofanyother,makingeachofthemrepre-sentedinaseparatespace.Now,weformtwoterm-documentmatricesAuandAlus-ingthetextualeldfromalenderandaloan,respectively.Thatis,Auencodeseitheralender'sloan becauseoroccupa-tionalinfowhileAlencodesaloan'sloan description.Ad-ditionally,weassumethecolumnsofAuandAlarealignedbasedonthelinkedrelationshipsbetweenlendersandloans.Forexample,therstcolumnofAuandthatofAlrepresentalenderandaloan,respectively,thathavealink.Follow-ingthisassumption,weexcludethoselendersandloansthathavenolinkswhenformingAuandAl.Whenaparticularloan,i.e.,acolumnofAl,haslinkstomultiplelenders,wesumupthetextualvectorsofthecorrespondinglendersandputthissinglevectorinthecorrespondingcolumnofAu.Inthismanner,wemaintainaone-to-onemappingbetweenthecolumnsofAuandAl.Formulation.GiventhetwomatricesAu2RunandAl2Rmln,aninteger,andaparameter,jointNMFsolvesminWu;Hu;Wl;Hl\r\r\rAuWuHTu\r\r\r2F\r\r\rAlWlHTl\r\r\r2FkHuHlk2F;(2)whereWu2Rmuk,Hu2Rnk,Wl2Rlk,Hl2Rnkarenonnegativefactors.Intheaboveequation,therstandthesecondtermcorrespondtostandardNMFformulations,butatthesametime,thethirdtermenforcesHuandHltobeclosetoeachother.Asaresult,therowsofHuandHlcanbeconsideredasnewvectorrepresentationsinacommon-dimensionalspacewherethelinkedlenderandloanvectorsarecloselyplaced.Joint-NMFfeaturesforarst-timelenderandafreshloan.Uptonow,wehavecomputedjointNMFusingthetextualinformationoflendersandloansbyus-ingtheirlinkedrelationships,leadingtoacommonspacewheretheserelationshipsarerevealed.However,westillneedtorepresentarst-timelenderandafreshloantoajoint-NMFspace.Toachievethistask,weutilizetheresult-ingfactormatricesWuandWl,whichprovideamappingforanarbitrarybag-of-wordsrepresentationinanoriginalspacetothejoint-NMFspace.Indetail,givenu2Ru1andl2Rml1correspondingtoarst-timelenderandafreshloan,respectively,wesolvethefollowingnonnegativity-constrainedleastsquaresproblem,minhu0\r\r\ruWuhTu\r\r\r2andminhl0\r\r\rlWlhTl\r\r\r2(3)wherehTu2Rk1andhTl2Rk1areournewrepresentationsinthejoint-NMFspace,i.e.,joint-NMFfeatures.Thesejoint-NMFfeaturesmainlyhavetwoadvantages.First,eventhougharst-timelenderandafreshloanhavenoexplicitlinks,onecanexpecttheirjoint-NMFfeaturesthatgeneratedinthiswaytobetterrevealtheirproximityowingtothelearntfactormatricesWuandWl.Second,sincetheyareconsideredtobeinthecommonspace,wecannowgeneratetheirlender-loanmatchingfeaturesinasimilarwaypresentedintheprevioussubsection.4.EXPERIMENTSANDFINDINGSInthissection,wepresentourexperimentsandanalysisontwoloanrecommendationcasesdependingonwhetheralenderofinteresthaspreviousfundinghistory.4.1ExperimentalSetupLearner.Consideringtheheterogeneityofourdataandthecomplexityoftheproblem,itiscrucialtousethemostsuitableandpowerfulpredictionmodeltodate.Tothisend,wehavechosenagradientboostingtree(GBtree)8[17,14].AGBtreeisanensem
6 blemethodwhereanindividuallearnerisadeci
blemethodwhereanindividuallearnerisadecisiontree[6].ThereasonforchoosingaGBtreeforourproblemisasfol-lows:Firstofall,anensemblemethodisknownforitssupe-riorgeneralizationcapabilityforunseendata.Moreimpor-tantly,adecisiontree,ourbaselearner,usesonevariableateachnodewhenitistrained/constructedaswellaswhenit 8TheGBtreeimplementationweusedisavailableathttps://sites.google.com/site/carlosbecker/resources/gradient-boosting-boosted-trees 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 False positive rateTrue positive rate y=x +Lender/loan text +Lender/loan info +Loan delinquency +Partner +Borrower +Temporal +Lender Team (a)m=5 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 False positive rateTrue positive rate y=x +Lender/loan text +Lender/loan info +Loan delinquency +Partner +Borrower +Temporal +Lender Team (b)m=10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 False positive rateTrue positive rate y=x +Lender/loan text +Lender/loan info +Loan delinquency +Partner +Borrower +Temporal +Lender Team (c)m=15 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 False positive rateTrue positive rate y=x +Lender/loan text +Lender/loan info +Loan delinquency +Partner +Borrower +Temporal +Lender Team (d)m=20 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 False positive rateTrue positive rate y=x +Lender/loan text +Lender/loan info +Loan delinquency +Partner +Borrower +Temporal +Lender Team (e)m=25 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 False positive rateTrue positive rate y=x +Lender/loan text +Lender/loan info +Loan delinquency +Partner +Borrower +Temporal +Lender Team (f)m=50Figure5:TheROCcurveresultsfordierentlendergroupswithvariousnumbersofpreviousloansm.isappliedtotestdata.Thischaracteristicpreventsusfromworryingaboutheterogeneityinthefeatureswegenerated.Thedownsidetootherlearners,suchaslogisticregressionandsupportvectormachines,isthatheterogeneousfeatureshavetobenormalizedvia,say,standardizationoftheirdis-tributions,whichtransformseachfeaturetohavezeromeanandunitvariance.Suchnormalizationdoesnotalwaysmakesenseforbinaryandintegerfeatures,andfurthermoreitre-movesthenonnegativityofourfeaturerepresentationthatoersintuitiveinterpretationofthem.Lendergroupsanddataselection.Aspreviouslyhighlighted,itisimportanttohandledierentuserbehav-iorsproperly.Therefore,werstselectedlendersthathaveaspeciclendingcountm,wherewevariedmfrom5to50,indicatingthedegreeofhowactivelylendersparticipatedinloans.Then,weconductedourexperimentsseparatelyoneachoftheselendergroups.Wefeltthatlenderswithinthisrangeofmcontainedthesetoflendersnottooactivenortoopassive,andthusweexpectthemtobemoresignicantlyin\ruencedwhengivenarecommendationforanappropriateloan.Next,weformedalender-by-loanadjacencymatrixwhereonlythecomponentswhosecorrespondinglendersfundedthecorrespondingloansaresetto1and0otherwise.Fromthisgraph,werandomlyselected5,000positive(1-valuedcomponents)and5,000negative(0-valuedcomponents)sam-plesandgeneratedtheirfeaturevectors,asdescribedinSec-tion3.9Thesesamplesarethenusedasourtrainingandtestsetsundera10-foldcross-validationsetup. 9Notethatweusedbalanceddatasetsintermsofpositivevs.negativesampleswhileoriginaldataareseverelyunbal-anced.However,theROC-basedperformancemeasuredoesnotdependonthebalancedness[12].Featuregroups.Forlenderswithfundinghistory,weutilizedvariousfeaturespresentedinSection3.1andcon-structedseveralfeaturegroupsasfollows:(1) Loan/lender text (600 dimensions):Textualfeaturesfromalender'sloan becauseandaloan'sloan description,whosedimensionisreducedbyNMF(Section2).(2) Loan/lender info (183 dimensions):Featuresfromalender'sandaloan'svariouselds.(3) Loan delinquency (13 dimensions):Featuresindicatinghowmanypreviousloansforalenderhavebeennon-paidordelinquent.(4) Partner (33 dimensions):Featuresabouteldpartners.(5) Borrower (12 dimensions):Featuresaboutborrowers,e.g.,aborrower'sgenderandpictured.(6) Temporal (6 dimensions):Timedierencesbetweenanewloanandalender'smostrecentlyfundedloan(Section3.1).(7) Lender team (15 dimensions):Featuresaboutlenderteamsalenderisassociatedwith.Usingthisstructure,alender-loanpair,whichisourdataitem,isrepresentedasan862-dimensionalvector.Forlenderswithoutfundinghistory,manyofthesefea-turesarenotavailable.Thus,thetwosetsofjoint-NMFfeaturesdescribedinSection3.2weremainlyused:thosegeneratedfromaligning(1-a))alender'sloan becausever-susaloan'sloan description(300-dimensional)and(1-b))alender'soccupational infoversusaloan'sloan description(300-dimensional),respectively.Next,weincluded(2)loan/lenderinfo(61-dimensional),(3)partner(11-dimensional),and(4)borrower(4-dimensional)information.Performancemeasure.Althoughourexperimentalset-tingisabinaryclassication,thedesiredcapabilityfromlearningthefunctionf(u;l)byaGBtreeistocomputethelikelihoodoffunding,whichallowsustorankthemostap- Lender/loan text Lender/loan info Loan delinquency Partner Borrower Temporal Len
7 der Team 0 0.1 0.2 0.3 Var. importance
der Team 0 0.1 0.2 0.3 Var. importance m= 5 m=10 m=15 m=20 m=25 m=50 (a)TheAUCvalueimprovementover.5whenusingonlyaparticularfeaturegroup Lender/loan text Lender/loan info Loan delinquency Partner Borrower Temporal Lender Team 0 0.05 0.1 Feature groupsVar. importance (b)TheAUCvaluedegradationduetotheexclusionofaparticularfeaturegroupFigure6:TheanalysisonthevariableimportanceTable1:ThecumulativeAUCvalueinFig.5 Thenumberofpreviousloansm 5 10 15 20 25 50 Lender/loantext .6938 .5930 .5524 .5594 .5788 .6730 Lender/loaninfo .7010 .5974 .5601 .5572 .5793 .6679 Loandelinquency .8416 .7453 .6438 .6000 .6265 .6691 Partner .8646 .7610 .6600 .6222 .6391 .6778 Borrower .8879 .7852 .6760 .6275 .6415 .6909 Temporal .9179 .8415 .7736 .7675 .7449 .7802 Team .9209 .8420 .7839 .7923 .7900 .8318 propriateloansforaparticularlenderaswellasthemostappropriatelendersforaparticularloan.Therefore,weareinterestedinthequalityintermsoftheresultingrankingofagiventestsetoflender-loanpairs,ratherthantheclassi-cationaccuracy.Inthisrespect,wereportareceiverop-eratingcharacteristic(ROC)curveanditsareaunderthecurve(AUC)value,whichmeasureshowmuchhigherposi-tivesamplesarerankedthannegativesamples.4.2PredictivePerformance4.2.1LenderswithavailablefundinghistoryOverallperformance.Incaseswherepreviousfundinginformationofalenderisavailable,wegraduallyincorpo-ratedadditionalfeaturesdescribedinSection4.1fordier-entlendergroups.TheperformanceresultsareshownbytheROCcurvesinFig.5alongwiththeirAUCvaluessum-marizedinTable1.ThebestAUCvaluesrangedfrom.78to.92,whichisasignicantimprovementoverabaselinevalueof.5.Theseresultsweregenerallyachievedonlywhenusingallthefeaturesavailable,indicatingtheadvantageofourfeatureintegrationframework.Amongdierentlendergroups,lenderswith15m25werethemostdicultinpredictingtheirlikelyloanstofundwhilelenderswithalowerorhighermwererelativelyeasier.Analysisonfeaturegroups.Theanalysisonthevari-ableimportanceofeachfeaturegroup,asshowninFig.6,revealsvariousinterestingknowledgeaboutmicro-nanceactivities,asfollows:(1) The relative time with respect to the last funded loan plays an important role.Temporalfeatures,whichcontainelapsedtimeinformationsincethemostrecentlyfundedloan,e.g.,whenitwaspostedand/orwhenitwasre-paid,consistentlyimprovetheperformancebyanon-trivialamountforallcases.(2) Loan delinquencies discourage passive lenders although they do not impact active lenders as much.Theperformanceincreaseduetotheloandelinquencyfea-turesissubstantialforlendergroupswithm15,butthatincreasedropssignicantlyforlendergroupswithm=50.Ourfurtherinvestigationshowedthesefeatureswerenega-tivelycorrelatedwiththelabels.Forexample,whenm=5,only36%ofthelenderswhopreviouslyexperiencedloandelinquencyhadpositivelabelswhile53%ofthelenderswithoutsuchexperienceshadpositivelabels.Ontheotherhand,whenm=50,thesetworatioswere49.7%and50.1%,respectively,showingalmostnocorrelation.(3) Lender teams exhibit greater in\ruence on active lenders.Theperformanceduetotheinclusionoflenderteamfeaturesimprovesasmincreases.Weconjecturethatitispartlybecausepassivelendersdidnotjointeamsyet.Infact,wefoundthattheaveragenumberofteamsofeachlenderwithm=50was.72whilethatwithm=5wasonly.25.Inaddition,fromFigs.5(e)(f),thesefeaturespulluptheROCcurvemainlyatthefalsepositiveratevalue(anxaxis)from.4to.7.Thisindicatesthattheyarehelpfulincorrectlyclassifyingthosesomewhatambiguouslender-loanpairs.4.2.2First-timelendersandfreshloansAbaselineapproach.Toevaluatetheeectivenessofjoint-NMFfeatures,wedesignedabaselineapproachtocom-pareourmethodagainst,asfollows.Inthebaselineap-proach,eachpairoftextualelds,(alender'sloan because,aloan'sloan description)and(alender'soccupational info,aloan'sloan description),hasbeenaggregatedintoasingledocumentcorpus,whichisencodedasalistofbag-of-wordsvectorsbasedonacommonvocabularyset.Next,weappliedstandardNMFinordertoobtaintheirreduced-dimensionalvectors.Notethat,similartothejointNMFapproach,theresultingvectorrepresentationsoflenders'andloans'tex-tualdataexistinacommonspace.Nonetheless,themaindierenceisthatjointNMFutilizesadditionallinkinfor-mationandenforceslinkedlendersandloanstobeclosetoeachotherinthecommonspace(Fig.4).Performancecomparison.Forrst-timelendersandfreshloans,Fig.7showsthecomparisonsintermsofAUCmeasuresbetweenthejointNMFandthebaselineapproaches.Foreachcase,jointNMFwascomputedbasedonadier- Text +Lender/loan +Partner +Borrower 0.5 0.55 0.6 0.65 0.7 0.75 Feature groupsAUC value Joint NMF Baseline (a)4m6 Text +Lender/loan +Partner +Borrower 0.5 0.55 0.6 0.65 0.7 0.75 Feature groupsAUC value Joint NMF Baseline (b)15m20 Text +Lender/loan +Partner +Borrower 0.5 0.55 0.6 0.65 0.7 0.75 Feature groupsAUC value Joint NMF Baseline (c)50m100Figure7:TheAUCvaluesforrst-timelendersandfreshloanswhentrainingjointNMFwithlenderswithvariousnumbersofpreviousloansmandtheirassociatedloa
8 ns.Table2:Therepresentativekeywordsoftwo
ns.Table2:TherepresentativekeywordsoftwotopicpairsalignedbyjointNMF Topic1 alender'soccupational info aloan'sloan description teacher,preschool,math, children,school,family, librarian,school married,husband Topic2 alender'soccupational info aloan'sloan description student,mba,college, business,activities, graduate,university entrepreneur,revenue entlendergroupanditsassociatedloansdependingonthevaluerangeofm.Notethatallthetraining/testdatainthesupervisedlearningexperimentshavebeenselectedonlyfromrst-timelendersandfreshloansandthatloandelin-quencyandtemporalfeatureswereexcludedsincetheyarenotavailableforrst-timelenders.Inalltheresults,thejoint-NMFapproachshowssignif-icantimprovementoverthebaselineapproach,indicatingthatjoint-NMFfeaturesareclearlyhelpfulinrevealinghid-denlinksbetweenrst-timelenderandfresh-loans.Com-binedwithotherfeaturesavailable,thebestAUCresult,whichisabout.72,wasfoundwhenusingtheactivelendergroupwith50m100.Thisobservationissomewhatcounter-intuitivesincerst-timelenderswouldbeexpectedtohavesimilarbehaviorstothoseofpassivelenderswithasmallervalueofm.However,itcanstillbeexplainedinasensethatactivelenderslikelyprovidedetailedinformationaboutthemselvesinalender'stextualelds,whichwouldhaveprovidedjointNMFwithvitalcluesinlearningthemappingbetweenlendersandloans.Alignedtopics.Thequalitativeanalysisoftheresult-ingmappingofjointNMFsuggestsin-depthunderstand-ingoflendingbehaviors.Table2showstheexamplesofalignedtopicsbetweenalender'soccupationalinfoandaloan'sloan description.Theserepresentativekeywordswereobtainedasthemosthighlyweightedtermsinthecorre-spondingcolumnsofthetwomatricesWlandWuinEqs.(2)and(3).BothtopicsinTable2arerelatedtolenderswithschool-orientedoccupations.LendersinTopic1areshowntohaveprofessionaljobsinaeducationenvironment,suchasteach-ersandlibrarians,whilethoseinTopic2mainlyconsistofstudents.Byexaminingtheassociatedtopickeywordsinaloan'loan description,onecanseethattheformergrouptendstoparticipateinfamily-relatedloans,e.g.,helpingchil-drengotoschoolandsupportingafamilyand/orahusband.Onthecontrary,thelattergroup(students)likestolendtoentrepreneurswithaparticularbusinesssuchasrunningarestaurant.4.3FurtherDiscussionsOuranalysisonloanrecommendationandlendingbehav-iorssuggestsseveralimportantdirectionsthatKivashouldtaketopromotemicro-nancialactivities.First,asseenfromthesignicantimportanceoftemporalfeatures,performingloanrecommendationatarighttimewouldbecrucialinkeepinglendersactivelyinvolved.AsshowninFig.2,Kivacangiverecommendation(1)soonafteralenderfundedaloanaswellas(2)whenone'spre-viousloanshavebeenpaidback.Otherwise,peopletendtograduallyloseinterestinmicro-nanceactivitiesastimegoeson.Second,Kivashouldhelplenders,especiallypassiveornovicelenders,avoidpotentiallyriskyloans.Fromouranal-ysis,non-paidand/ordelinquentloansseemtobethemajorcauseforpassivelenderstostoptheirlendingactivities,andthusitwouldbeimportanttoleadthemtoloanswithahighchanceofrepayment.Finally,inordertosecureactivelendersasmuchaspos-sible,Kivashouldencouragepassivelenderstojointeamssincelenderteamsseemtobeoneofthedrivingfactorsforactivelenders.5.RELATEDWORKInthissection,wemainlydiscussrelatedworkabout(1)recommendersystems(relevanttoSection3.1),(2)mani-foldalignment(relevanttoSection3.2),and(3)analysisonmicro-nancialactivities.Recommendersystems.Basically,arecommendersys-tem,anactiveinformationlteringsystem[5],aimsates-timatingtheso-calledutilityfunctionforagivenitemandauser,whichisanalogoustoourfundinglikelihoodfunc-tionf(u;l).Arecommendersystemtypicallyfallwithintwomethods:content-basedmethodswhichmatchuserstoproductsbymatchingauser'sproletotheproduct'schar-acteristicsandcollaborativelteringmethodswhichrecom-mendproductsthatotheruserswithsimilarpreferenceshavechoseninthepast[26,1].Numerousstudiesonrecom-mendersystemshavefocusedoncollaborativelteringap-proaches.Thesemethodsaregenerallycategorizedaccord-ingtowhethertheyarememory-basedandmodel-based. Foracomprehensivesummaryofcollaborativelteringtech-niques,thereaderisreferredtothesurveyarticles[28,1].DuetothediscussedchallengesinSection1,whichmakecollaborativelteringmethodsinapplicable,ourworkpartlyfollowsthecontent-basedapproachinthattheproposedlender-specicfeaturescanbeviewedasauser'sprolewhiletheloan-specicfeaturesrepresenttheproductcontent.How-ever,thetypicalcontent-basedapproach,mainlyoriginatingfrominformationretrievalliterature[4],focusesonlyontex-tualinformation.Inordertointegratealltheotherinfor-mationavailable,ourapproachextendsitinthecontextofad-hocinformationretrieval[24],whichthrowsvariousinfor-mationasfeaturesandtrainsalearnermodelforpredictingarelevancescoreofanitem.Thesetypesofapproachesarewidelyapplicableinvariousnovelapplicationsincludingonlinedatingsystems[11].Manifoldalignment.Thisareahasbeenactivelystud-iedrecentlyinthecontextofimageanalysis[20]andcross-ling
9 ualinformationretrieval[29,8,9].Theprobl
ualinformationretrieval[29,8,9].Theproblemsettingisgenerallysimilartothatofourfeaturealignmentwhere,giventhedierentvectorrepresentationsand/orrelation-shipsofthecorrespondingitems,theirnewembeddinginacommonspaceiscomputed.Recently,fromtheperspectiveofmulti-relationallearningfrommultiplegraphsorsources,severaladvancedmethodsbasedonjointmatrixfactoriza-tionhavebeenproposed[33,27].Inaddition,ajointNMF-basedapproachhasbeenproposedformulti-viewclusteringproblems[22].However,mostofthesemethodsfocusonthebestrepresentationsofexistingdataitemswhileourpro-posedapproachfocusesonageneralizationcapability,i.e.,embeddingofunseendataintoacommonspacesothattheirhiddencorrespondencesareproperlyrevealed.Analysisonmicro-nancialactivities.Previousworkrelatedtothecomplexmicro-nancelendingbehavioralpat-ternsandKiva'snow-integralroleinthecrowd-sourcedmicro-nancingmovementhavelookedattheeectsoftheinternetonmicro-nancing[7]andotherpeer-to-peerlendingtrans-actions[3].Studiesonmicro-nancedecision-makinghavediscoveredthatlendersfavorlendingopportunitiesnotonlytoentitiessimilartothemselvesbutalsotoindividualsinsituationsthattriggeranemotionalreaction[2,15].Kiva-relatedndingshavesuggestedbiasinlendingdeci-sionsbyshowingthatparticularborrowerfeaturesgenerateahigherlevelofattractionfromthewiderlendingaudience.Inparticular,womenandmorephysicallyattractiveindi-vidualsinheritagreaterchanceofsecuringcharitableloansupport,atleastfromlendersthatconstitutethesetofrst-timeandlesser-activelenders[18].OtherstudiesonKivahaveobservedthenatureoflendingbehaviorbycorrelatingtheimpactofgroupdynamicstolendingparticipation[16,23].Thesestudiesprovideabasisforourworkinwhichweextendsimilardecision-makingprocessesthroughautoma-tiontosupportourlender-loanrecommendationsystem.AllthesestudieshaveanalyzedKiva'sdatainanumberofways,yetthereisalackofresearchthathasutilizedstatis-ticalnumericalanalysisapproacheswhichareclosertoourbodyofwork.Onesuchstudymanuallydenedasetofcategoriesaboutthemotivationoflendingandappliedma-chinelearningtechniquestotrainautomatictextclassiersusingalender'sloan becauseeld[23].Italsoincorporatedseveralsimplefeaturessuchastheloancountandteamaf-liationsinperformingregressiononlendingfrequencyandamount.Thisworkrevealedvariousinterestingknowledgeaboutlendingbehavior,buttheusedinformationandtech-niquesarerelativelylimitedcomparedtoourwork.Tothebestofourknowledge,ourworkistherstin-depthstudytodirectlytackletheloanrecommendationprob-lembyincorporatingalltheheterogeneousinformationavail-ablefromKiva.AsseeninSection4,weachieveperformanceviableforpracticalapplicationandrevealsignicantndingaboutlendingbehaviorthathasnotbeendiscussedinanypreviousotherwork.6.CONCLUSIONSANDFUTUREWORKInthispaper,wepresentedanovelapplicationofloanrec-ommendationinthenon-protmicro-nancesector.Start-ingwithanextensivedatasetfromKiva,afamousmicro-nanceservice,wetackledtheproblemusingasupervisedlearningapproach.Inordertorepresentanygivenlender-loanpairasafeaturevector,whichisakeyprocedureinthisapproach,weproposedtwomainmethodologies:(1)graph-basedfeatureintegrationto\rexiblyincorporateallthehet-erogeneousinformationavailableand(2)featurealignmentviajointNMFtoenhancethelimitedinformationofrst-timelendersandfreshloans.Basedontheproposedap-proachescombinedwithagradientboostingtree,astate-of-the-artpredictionmodel,weachievedupto.92AUCvalue.Furthermore,wepresentedinterestingphenomenaaboutmicro-nancingbehaviorsofKivalendersfromtem-poralandsocialaspects.Theimportanceofourworkandtheinformation-richna-tureoftheKivadataopenupvariousfutureresearchpossi-bilities.Wedescribeafewoftheminthefollowing.Selectingnegativeinstances.Althoughwefoundourexperimentsshowedconsistentresultsovermultiplerunsofdierentsetsofrandomsamples,itwouldbebenecialtochoosenegativesampleswithmorecare.Thatis,notallnegativeexamplesaretrulynegative.Forexample,alendermaynothavefundedaparticularloansimplybecausehedidnotknowaboutitbutnotbecausehedecidednottofundit.Advancedtechniquessuchastheone-classtypeapproach[25]andtheoneleveragingthecontextofuser-systeminteractions[31],whichtackletheseissuesinotherrecommendationapplications,couldbeadoptedinourwork.Frauddetection.AsseeninSection4,non-paidanddelinquentloanssignicantlyimpactfurtherlendingactivi-tiesofnovicelenders,andthus,itiscriticaltodetectpoten-tiallyfraudulentloansanddiscouragelendersfromlendingthem.Afraudloandetectionproblemcanbeformulatedandsolvedinasimilarwaytotheproposedmethodsinthispaper.Eventually,integratingtheresultingpotentialfraudscoretoourfeaturerepresentationwillincreasetheloanrec-ommendationperformanceevenfurther.7.ACKNOWLEDGMENTSThisworkwassupportedinpartbyNSFIIS-1116886,NSFCCF-0808863,NSFC61129001,andDARPAXDATAgrantFA8750-12-2-0309.Anyopinions,ndingsandcon-clusionsorrecommendationsexpressedinthismaterialarethoseoftheauthorsanddonotnecessarilyre\rectthevi
10 ewsoffundingagencies.Wealsothankanonymou
ewsoffundingagencies.Wealsothankanonymousreviewersfortheirinsightfulcommentsandsuggestions. 8.REFERENCES[1]G.AdomaviciusandA.Tuzhilin.Towardthenextgenerationofrecommendersystems:Asurveyofthestate-of-the-artandpossibleextensions.IEEETransactionsonKnowledgeandDataEngineering(TKDE),17(6):734{749,2005.[2]J.Andreoni.Impurealtruismanddonationstopublicgoods:Atheoryofwarm-glowgiving.TheEconomicJournal,100(401):464{477,1990.[3]A.AshtaandD.Assadi.Dosocialcauseandsocialtechnologymeet?impactofweb2.0technologiesonpeer-to-peerlendingtransactions.CahiersduCEREN,29:177{192,2009.[4]R.Baeza-YatesandB.Ribeiro-Neto.Moderninformationretrieval.Addison-Wesley,1999.[5]N.J.BelkinandW.B.Croft.Informationlteringandinformationretrieval:twosidesofthesamecoin?CommunicationsoftheACM,35(12):29{38,1992.[6]L.Breiman.Classicationandregressiontrees.CRCpress,1993.[7]T.Bruett.Cows,kiva,andprosper.com:Howdisintermediationandtheinternetarechangingmicronance.CommunityDevelopmentInvestmentReview,3(2):44{50,2007.[8]P.A.Chew,B.W.Bader,T.G.Kolda,andA.Abdelali.Cross-languageinformationretrievalusingparafac2.InProc.the13thACMinternationalconferenceonKnowledgediscoveryanddatamining(SIGKDD),pages143{152,2007.[9]J.Choo,S.Bohn,G.Nakamura,A.White,andH.Park.Heterogeneousdatafusionviaspacealignmentusingnonmetricmultidimensionalscaling.InProc.theSIAMInternationalConferenceonDataMining(SDM),pages177{188,2012.[10]S.Deerwester,S.Dumais,G.Furnas,T.Landauer,andR.Harshman.Indexingbylatentsemanticanalysis.JournaloftheSocietyforInformationScience,41:391{407,1990.[11]F.Diaz,D.Metzler,andS.Amer-Yahia.Relevanceandrankinginonlinedatingsystems.InProc.the33rdinternationalACMconferenceonResearchanddevelopmentininformationretrieval(SIGIR),pages66{73,2010.[12]T.Fawcett.Rocgraphs:Notesandpracticalconsiderationsforresearchers.MachineLearning,31:1{38,2004.[13]M.Flannery.Kivaandthebirthofperson-to-personmicronance.Innovations,2(1-2):31{56,2007.[14]J.H.Friedman.Greedyfunctionapproximation:agradientboostingmachine.AnnalsofStatistics,pages1189{1232,2001.[15]J.Galak,D.Small,andA.T.Stephen.Micro-nancedecisionmaking:Aeldstudyofprosociallending.JournalofMarketingResearch,48(SPL):S130{S137,2011.[16]S.Hartley.Kiva.org:Crowd-sourcedmicronance&cooperationingrouplending.2010.[17]T.Hastie,R.Tibshirani,andJ.Friedman.TheElementsofStatisticalLearning:DataMining,Inference,andPrediction.Springer,2009.[18]C.Jenq,J.Pan,andW.Theseira.Whatdodonorsdiscriminateon?evidencefromkiva.org.2012.[19]H.KimandH.Park.Sparsenon-negativematrixfactorizationsviaalternatingnon-negativity-constrainedleastsquaresformicroarraydataanalysis.Bioinformatics,23(12):1495{1502,2007.[20]S.Lafon,Y.Keller,andR.R.Coifman.Datafusionandmulticuedatamatchingbydiusionmaps.IEEETransactionsonPatternAnalysisandMachineIntelligence(TPAMI),28:1784{1797,2006.[21]D.D.LeeandH.S.Seung.Learningthepartsofobjectsbynon-negativematrixfactorization.Nature,401:788{791,1999.[22]J.Liu,C.Wang,J.Gao,andJ.Han.Multi-viewclusteringviajointnonnegativematrixfactorization.InProc.theSIAMInternationalConferenceonDataMining(SDM),pages252{260,2013.[23]Y.Liu,R.Chen,Y.Chen,Q.Mei,andS.Salib.Iloanbecause...:Understandingmotivationsforpro-sociallending.InProc.the5thACMInternationalConferenceonWebSearchandDataMining(WSDM),pages503{512,2012.[24]C.D.Manning,P.Raghavan,andH.Schutze.Introductiontoinformationretrieval,volume1.CambridgeUniversityPressCambridge,2008.[25]R.PanandM.Scholz.Mindthegaps:weightingtheunknowninlarge-scaleone-classcollaborativeltering.InProc.the15thACMinternationalconferenceonKnowledgediscoveryanddatamining(SIGKDD),pages667{676,2009.[26]B.Sarwar,G.Karypis,J.Konstan,andJ.Riedl.Analysisofrecommendationalgorithmsfore-commerce.InProc.the2ndACMconferenceonElectroniccommerce,pages158{167,2000.[27]A.P.SinghandG.J.Gordon.Relationallearningviacollectivematrixfactorization.InProc.the14thACMinternationalconferenceonKnowledgediscoveryanddatamining(SIGKDD),pages650{658,2008.[28]X.SuandT.M.Khoshgoftaar.Asurveyofcollaborativelteringtechniques.AdvancesinArticialIntelligence,2009:4:2{4:2,2009.[29]C.WangandS.Mahadevan.Manifoldalignmentusingprocrustesanalysis.InProc.the25thInternationalConferenceonMachineLearning(ICML),pages1120{1127,2008.[30]W.Xu,X.Liu,andY.Gong.Documentclusteringbasedonnon-negativematrixfactorization.InProc.the26thinternationalACMconferenceonResearchanddevelopmentininformaionretrieval(SIGIR),pages267{273,2003.[31]S.-H.Yang,B.Long,A.J.Smola,H.Zha,andZ.Zheng.Collaborativecompetitiveltering:learningrecommenderusingcontextofuserchoice.InProc.the34thinternationalACMconferenceonResearchanddevelopmentinInformationRetrieval(SIGIR),pages295{304,2011.[32]M.Yunus.BankertothePoor.PenguinBooksIndia,1998.[33]D.Zhou,S.Zhu,K.Yu,X.Song,B.L.Tseng,H.Zha,andC.L.Giles.Learningmultiplegraphsfordocumentrecommendations.InProc.the17thinternationalconferenceonWorldWideWeb(WWW),pages141{150,20