/
InferringSocialTiesacrossHeterogenousNetworksJieTangDepartmentofComput InferringSocialTiesacrossHeterogenousNetworksJieTangDepartmentofComput

InferringSocialTiesacrossHeterogenousNetworksJieTangDepartmentofComput - PDF document

pasty-toler
pasty-toler . @pasty-toler
Follow
386 views
Uploaded On 2015-08-15

InferringSocialTiesacrossHeterogenousNetworksJieTangDepartmentofComput - PPT Presentation

AdamBobChrisDannyProduct 1 AdamBobChrisDanny distrust trust trust distrust From Home0840From Office1135 Both in office0800 ID: 107667

AdamBobChrisDannyProduct 1 AdamBobChrisDanny distrust trust trust distrust From Home08:40From Office11:35 Both

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "InferringSocialTiesacrossHeterogenousNet..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

InferringSocialTiesacrossHeterogenousNetworksJieTangDepartmentofComputerScienceTsinghuaUniversityBeijing100084,Chinajietang@tsinghua.edu.cnTianchengLouInstituteforInterdisciplinaryInformationSciencesTsinghuaUniversityBeijing100084,Chinaltc08@tsinghua.edu.cnJonKleinbergDepartmentofComputerScienceCornellUniversityIthacaNY14853kleinber@cs.cornell.eduABSTRACTItiswellknownthatdifferenttypesofsocialtieshaveessentiallydifferentinuencebetweenpeople.However,usersinonlineso-cialnetworksrarelycategorizetheircontactsinto“family”,“col-leagues”,or“classmates”.Whileabulkofresearchhasfocusedoninferringparticulartypesofrelationshipsinaspecicsocialnet-work,fewpublicationssystematicallystudythegeneralizationoftheproblemofinferringsocialtiesovermultipleheterogeneousnetworks.Inthiswork,wedevelopaframeworkforclassifyingthetypeofsocialrelationshipsbylearningacrossheterogeneousnetworks.Theframeworkincorporatessocialtheoriesintoama-chinelearningmodel,whicheffectivelyimprovestheaccuracyofinferringthetypeofsocialrelationshipsinatargetnetwork,bybor-rowingknowledgefromadifferentsourcenetwork.Ourempiricalstudyonvedifferentgenresofnetworksvalidatestheeffective-nessoftheproposedframework.Forexample,byleveraginginfor-mationfromacoauthornetworkwithlabeledadvisor-adviseerela-tionships,theproposedframeworkisabletoobtainanF1-scoreof90%(8-28%improvementsoveralternativemethods)forinferringmanager-subordinaterelationshipsinanenterpriseemailnetwork.CategoriesandSubjectDescriptorsH.3.3[InformationSearchandRetrieval]:TextMining;H.2.8[DatabaseManagement]:DatabaseApplicationsGeneralTermsAlgorithms,ExperimentationKeywordsSocialnetwork,Predictivemodel,Socialinuence1.INTRODUCTIONOurrealsocialnetworksarecomplexandconsistofmanyover-lappingparts.Nobodyexistsmerelyinonesocialnetwork.Peopleareconnectedviadifferenttypesofsocialtiesindifferentnetworks.Forexample,inanenterpriseemailnetwork,wherepeopleareconnectedbysending/receivingemailsto/fromothers,therelation-shipsbetweenpeoplecanbecategorizedasmanager-subordinate,Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationontherstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.Copyright200XACMX-XXXXX-XX-X/XX/XX...$5.00.colleague,etc.;inamobilecommunicationnetwork,therelation-shiptypescouldincludefamily,colleagues,andfriends.Itiswellknownthatthedifferenttypesofsocialtieshaveessentiallydif-ferentinuencebetweenpeople.Agraduate'sresearchtopicmaybemainlyinuencedbyhisorheradvisor,whileotherpartsofhiseverydaylifewillbemoreinuencedbytheirclosefriends.Awarenessofthesedifferenttypesofsocialrelationshipscanben-etmanyapplications.Forexample,ifwecouldhaveextractedfriendshipsbetweenusersfromamobilecommunicationnetwork,wecanleveragethefriendshipsfora“word-of-mouth”promotionofanewproduct.However,inmostonlinenetworks(e.g.,Facebook,Twitter,LinkedIn,YouTube,andSlashdot),suchinformation(relationshiptype)isusuallyunavailable.Usersmayeasilyaddlinkstoothersbyclicking“friendrequest”,“follow”or“agree”,butdonotoftentakethetimetocreatelabelsandmaintaintheirfriendlist.Indeed,onesurveyofmobilephoneusersinEuropeshowsthatonly16%ofusershavecreatedcontactgroupsontheirmobilephones[10,27];ourpreliminarystatisticsonLinkedIndataalsoshowsthatmorethan70%oftheconnectionshavenotbeenwelllabeled.Afeweffortshavebeenmadetoinfersocialties.Forexample,Crandalletal.[4]investigatetheproblemofinferringfriendshipsbetweenpeoplefromco-occurrenceintimeandspace.Wangetal.[30]aimtodiscoveradvisor-adviseerelationshipsfromthepublicationnetwork.Diehletal.[6]trytoidentifysocialties(e.g.,manager-subordinate)bylearningarankingfunctionwithpredenedfea-tures.However,mostoftheseworksfocusonminingparticulartypesofrelationshipsinaspecicdomain.Forexample,[30]de-nestwoheuristicrulesasconstraintsandtriestodiscoveradvisor-adviseerelationshipsbypropagatingtheconstraintsinagraphicalmodel.However,themethodisdifculttoextendtootherdomains.Anotherchallengeisthatdifferentnetworksareveryunbal-anced.Insomenetworks,suchasSlashdot,itmightbeeasytocollectsomelabeledrelationships(e.g.,trust/distrustrelationshipsbetweenusers).However,inmostothernetworks,itmaybein-feasibletoobtainthelabeledinformationandthusdifculttoac-curatelyinferthesocialrelationships.Onepotentialopportunityisthatintherealworld,differentnetworksareintertwinedwith,insteadofseparatedfrom,eachother.Canweleveragethecorre-lationsbetweendifferentnetworkstohelpinferthetypesofsocialties?MotivatingExamplesToclearlyillustratetheproblem,Figure1givesanexampleofinferringsocialtiesacrossaproductreview-ers'networkandamobilecommunicationnetwork.InFigure1,theleftsub-gureistheinputtoourproblem:areviewernetwork,whichconsistsofreviewersandrelationshipsbetweenreviewers;andamobilenetwork,whichiscomprisedofmobileusersandtheircommunicationinformation(callingortextingmessage).The AdamBobChrisDannyProduct 1 AdamBobChrisDanny distrust trust trust distrust From Home08:40From Office11:35 Both in office08:00 –18:00 From Office15:20From Outside21:30From Office17:55 5HYLHZHUQHWZRUN &RPPXQLFDWLRQQHWZRUN .QRZOHGJH7UDQVIHUIRU,QIHUULQJ6RFLDO7LHV +HWHURJHQHRXV1HWZRUNV ,QIHUUHGVRFLDOWLHVLQGLIIHUHQWQHWZRUNV Family ColleagueColleagueColleague FriendFriend review review Product 2reviewreview Figure1:Exampleofinferringsocialtiesacrosstwoheteroge-neousnetworks:areviewernetworkandamobilecommunica-tionnetwork.rightsub-gureshowstheoutputofourproblem:theinferredso-cialtiesinthetwonetworks.Inthereviewernetwork,weinferthetrust/distrustrelationshipsandinthecommunicationnetwork,weidentifyfriendships,colleagues,andfamilies.ThemiddleofFigure1isthecomponentofknowledgetransferforinferringsocialtiesindifferentnetworks.Thisisthekeyobjectiveofthiswork.Thefundamentalchallengeishowtobridgetheavailableknowledgefromdifferentnetworkstohelpinferthedifferenttypesofsocialrelationships.Theproblemisnon-trivialandposesasetofuniquechallenges.First,whatarethefundamentalfactorsthatformthestructureofdifferentnetworks?Second,howcanwedesignageneralizedframeworktoformalizetheprobleminauniedway?Third,asrealsocialnetworksaregettinglargerwithhundredsofmillionsofnodes,howtoscaleupthemodellearningalgorithmtoadapttothegrowthoflargerealnetworks?ResultsInthiswork,weaimtoconductasystematicinvestiga-tionoftheproblemofinferringsocialtiesacrossheterogeneousnetworks.Wepreciselydenetheproblemandproposeatransfer-basedfactorgraph(TranFG)model.Themodelincorporatessocialtheoriesintoasemi-supervisedlearningframework,whichcanbeusedtotransfersupervisedinformationfromasourcenetworktohelpinfersocialtiesinatargetnetwork.Weevaluatetheproposedmodelonvedifferentgenresofnet-works:Epinions,Slashdot,Mobile,Coauthor,andEnron.Weshowthattheproposedmodelcansignicantlyimprovetheperformance(averagely+15%intermsofF1-Measure)forinferringsocialtiesacrossdifferentnetworkscomparingwithseveralalternativemeth-ods.Ourstudyalsorevealsseveralinterestingphenomenaforso-cialscience:Socialbalanceissatisedwellonfriendship(ortrust)net-works;butnot(20%withalargevariance)onusercom-municationnetworks(e.g.,mobilecommunicationnetwork).Usersaremorelikely(+10%-+98%higherthanchance)tohavethesametypeofrelationshipwithauserwhospansastructuralhole.Disconnectedusershaveanevenhigherlikelihood.Itwasvalidatedthatsocialstatusissatisedinmanynet-works.Wefurtherdiscoverthatseveralfrequentformsoftriadshaveasimilardistributionindifferentnetworks(Coau-thorandEnron).Opinionleadersaremorelikely(+71%-+84%)tohaveahighersocialstatusthanordinaryuses.OrganizationSection2formulatestheproblem;Section3intro-ducesthedatasetandourobservationsoverdifferentnetworks.Section4explainstheproposedmodelanddescribesthealgorithmforlearningthemodel;Section5givestheexperimentalsetupandSection6presentstheresults;nally,Section7discussesrelatedworkandSection8concludes.2.PROBLEMDEFINITIONInthissection,werstgiveseveralnecessarydenitionsandthenpresenttheproblemformulation.Tosimplifytheexplanation,weframetheproblemwithtwosocialnetworks:asourcenetworkandatargetnetwork,althoughthegeneralizationofthisframeworktomultiplenetworksettingisstraightforward.LetG=(V;EL;EU;X)denoteapartiallylabeledsocialnet-work,whereELisasetoflabeledrelationshipsandEUisasetofunlabeledrelationshipswithEL[EUEXisanjEjdattributematrixassociatedwithedgesinEwitheachrowcorre-spondingtoanedge,eachcolumnanattribute,andanelementxijdenotingthevalueofthejthattributeofedgeei.Thelabelofedgeeiisdenotedasyi2Y,whereYisthepossiblespaceofthelabels(e.g.,family,colleague,classmate).Input:Theinputtoourproblemconsistsoftwopartiallyla-belednetworksGS(sourcenetwork)andGT(targetnetwork)withjELSjjELTj.Inotherwords,thenumberoflabeledrelationshipsinthesourcenetworkismorelargerthanthatofthetargetnetwork,withanextremecaseofjELTj=0.Inrealsocialnetworks,therelationshipcouldbeundirected(e.g.,friendshipsinamobilenetwork)ordirected(e.g.,manager-subordinaterelationshipsinanenterpriseemailnetwork).Tokeepthingsconsistent,wewillrstintroducetheprobleminthecon-textofundirectednetworkandthendiscusshowtoextendthepro-posedframeworktothedirectedones.Inaddition,thelabelofarelationshipmaybestatic(e.g.,thefamily-memberrelationship)orchangeovertime(e.g.,themanager-subordinaterelationship).Inthiswork,wefocusonstaticrelationships.LearningTask:GivenasourcenetworkGSwithabundantlyla-beledrelationshipsandatargetnetworkGTwithalimitednumberoflabeledrelationships,thegoalistolearnapredictivefunctionf:(GTjGS)!YTforinferringthetypeofrelationshipsinthetargetnetworkbyleveragingthesupervisedinformation(labeledrelationships)fromthesourcenetwork.Withoutlossofgenerality,weassumethatforeachpossibletypeyiofrelationshipei,thepredictivefunctionwilloutputaproba-bilityp(yijei);thusourtaskcanbeviewedasobtainingatriple(ei;yi;p(yijei))tocharacterizeeachlinkeiinthesocialnetwork.Thereareseveralkeyissuesthatmakeourproblemformulationdif-ferentfromexistingworksonsocialrelationshipmining[4,6,29,30].First,thesourcenetworkandthetargetnetworkmaybeverydifferent,e.g.,acoauthornetworkandanemailnetwork.Whatarethefundamentalfactorsthatformthestructureofthenetworks?Second,thelabelofrelationshipsinthetargetnetworkandthatofthesourcenetworkcouldbedifferent.Howreliablycanweinferthelabelsofrelationshipsinthetargetnetworkusingtheinforma-tionprovidedbythesourcenetwork?Third,asboththesourceand thetargetnetworksarepartiallylabeled,thelearningframeworkshouldconsiderthelabeledinformationaswellastheunlabeledinformation.3.DATAANDOBSERVATIONS3.1DataCollectionWetrytondanumberofdifferenttypesofnetworkstoin-vestigatetheproblemofinferringsocialtiesacrossheterogenousnetworks.Inthisstudy,weconsidervedifferenttypesofnet-works:Epinions,Slashdot,Mobile,Coauthor,andEnron.Table1listsstatisticsofthevenetworks.Alldatasetsandcodesusedinthisworkarepubliclyavailable.1Epinionsisanetworkofproductreviewers.Eachuseronthesitecanpostareviewonanyproductandotheruserswouldratethereviewwithtrustordistrust.Inthisdata,wecreatedanetworkofreviewersconnectedwithtrustanddistrustrelationships.Thedatasetconsistsof131,828nodes(users)and841,372edges,ofwhichabout85.0%aretrustlinks.80,668usersreceivedatleastonetrustordistrustedge.Ourgoalonthisdatasetistoinferthetrustrelationshipsbetweenusers.Slashdotisanetworkoffriends.Slashdotisasiteforsharingtechnologyrelatednews.In2002,SlashdotintroducedtheSlash-dotZoowhichallowsuserstotageachotheras“friends”(like)or“foes”(dislike).Thedatasetiscomprisedof77,357usersand516,575edgesofwhich76.7%are“friend”relationships.Ourgoalonthisdatasetistoinferthe“friend”relationshipsbetweenusers.Mobileisanetworkofmobileusers.Thedatasetisfrom[7].Itconsistsofthelogsofcalls,blue-toothscanningdataandcelltowerIDsof107usersduringabouttenmonths.Iftwouserscommuni-cated(makingacallorsendingatextmessage)witheachotherorco-occurredinthesameplace,wecreateanedgebetweenthem.Intotal,thedatacontains5,436edges.Ourgoalistoinferwhethertwousershaveafriendrelationship.Forevaluation,allusersarerequiredtocompleteanonlinesurvey,inwhich157pairsofusersarelabeledasfriendsofeachother.Coauthorisanetworkofauthors.Thedataset,crawledfromArnetminer.org[28],iscomprisedof815,946authorsand2,792,833coauthorrelationships.Inthisdataset,weattempttoinferadvisor-adviseerelationshipsbetweencoauthors.Forevalua-tion,wecreatedasmallergroundtruthdatainthefollowingways:(1)collectingtheadvisor-adviseeinformationfromtheMathemat-icsGenealogyproject2andtheAIGenealogyproject3;(2)man-uallycrawlingtheadvisor-adviseeinformationfromresearchers'homepages.Finally,wehavecreatedadatasetwith1,534coau-thorrelationships,ofwhich514areadvisor-adviseerelationships.Enronisanemailcommunicationnetwork.Itconsistsof136,329emailsbetween151Enronemployees.Twotypesofrela-tionships,i.e.,manager-subordinateandcolleague,wereannotatedbetweentheseemployees.Thedatasetwasprovidedby[6].Ourgoalonthisdatasetistoinfermanager-subordinaterelationshipsbetweenusers.Thereareintotal3,572edges,ofwhich133aremanager-subordinaterelationships.Pleasenotethatfortherstthreedatasets(i.e.,Epinions,Slash-dot,andMobile),ourgoalistoinferundirectedrelationships(friendshipsortrustfulrelationships);whilefortheothertwodatasets(i.e.,CoauthorandEnron),ourgoalistoinferdirectedrela-tionships(thesourceendhasahighersocialstatusthanthetargetend,e.g.,advisor-adviseerelationshipsandmanager-subordinate 1http://arnetminer.org/socialtie/2http://www.genealogy.math.ndsu.nodak.edu3http://aigp.eecs.umich.eduTable1:Statisticsofvedatasets. Relationship Dataset #Nodes #Edges Trust Epinions 131,828 841,372 Friendship Slashdot 77,357 516,575 Friendship Mobile 107 5,436 Advisor-advisee Coauthor 815,946 2,792,833 Manager-subordinate Enron 151 3,572 B C A friend B C A non-friend B C A non-friend B C A non-friend(A)(B)(C)(D) Figure2:Illustrationofstructuralbalancetheory.(A)and(B)arebalanced,while(C)and(D)arenotbalanced.relationships).3.2ObservationsAsarststep,weengageinsomehigh-levelinvestigationofhowdifferentfactorsinuencetheformationofdifferentsocialtiesindifferentnetworks.Generally,ifweconsiderinferringpartic-ularsocialtiesinaspecicnetwork(e.g.,miningadvisor-adviseerelationshipsfromthepublicationnetwork),wecandenedomain-specicfeaturesandlearnapredictivemodelbasedonsometrain-ingdata.Theproblembecomesverydifferent,whenhandlingmul-tipleheterogeneousnetworks,asthedenedfeaturesindifferentnetworksmaybesignicantlydifferent.Tosolvethisproblem,weconnectourproblemtoseveralbasicsocialpsychologicaltheoriesandfocusouranalysisonthenetworkbasedcorrelationsviathefollowingstatistics:1.Socialbalance[8].Howisthesocialbalancepropertysatis-edandcorrelatedindifferentnetworks?2.Structuralhole[3].Wouldstructuralholeshaveasimilarbehaviorpatternindifferentnetworks?3.Socialstatus[5,11,20].Howdodifferentnetworkssatisfythepropertiesofsocialstatus?4.“Two-stepow”[18].Howdodifferentnetworksfollowthe“two-stepow”ofinformationpropagation?SocialBalanceSocialbalancetheorysuggeststhatpeopleinasocialnetworktendtoformintoabalancednetworkstructure.Fig-ure2showssuchanexampletoillustratethestructuralbalancetheoryovertriads,whichisthesimplestgroupstructuretowhichbalancetheoryapplies.Foratriad,thebalancetheoryimpliesthateitherallthreeoftheseusersarefriendsoronlyonepairofthemarefriends.Figure3showstheprobabilitiesofbalancedtriadsofthethreeundirectednetworks(Epinions,Slashdot,andMobile).Ineachnetwork,wecomparetheprobabilityofbalancedtriadsbasedoncommunicationlinksandthatbasedonfriendships(ortrustre-lationships).Forexample,intheMobilenetwork,thecommuni-cationlinksincludemakingacallorsendingamessagebetweenusers.Wenditinterestingthatdifferentnetworkshaveverydif-ferentbalanceprobabilitiesbasedonthecommunicationlinks,e.g.,thebalanceprobabilityinthemobilenetworkisnearly7timeshigherthanthatoftheslashdotnetwork,whilebasedonfriendships(ortrustfulrelationships)thethreenetworkshaverelativelysimilarbalanceprobabilities(withamaximumof+28%difference). relationships communication links 0 0.2 0.4 0.6 0.8 1 Epinions Slashdot Mobile Figure3:Socialbalance.Probabilitiesofbalancedtriadsindiffer-entnetworksbasedoncommunicationlinksandfriendships(ortrust-fulrelationships).Basedoncommunicationlinks,differentnetworkshaveverydifferentbalanceprobabilities(e.g.,thebalanceprobabilityinthemobilenetworkisnearly7timeshigherthanthatoftheslash-dotnetwork).Whilebasedonfriendshipsthethreenetworkshavearelativelysimilarprobabilities. Epinions Slashdot Mobile 0 0.2 0.4 0.6 0.8 1 Random SH-not connected SH-connected Figure4:Structuralhole.Probabilitiesthattwoconnected(ordis-connected)users(AandB)havethesametypeofrelationshipwithuserC,conditionedonwhetheruserCspansastructuralholeornot.Itisclearthat(1)usersaremorelikely(averagely+70%higherthanchance)tohavethesametypeofrelationshipwithCifCspansastruc-turalhole;and(2)disconnectedusersaremorelikelythanconnecteduserstohavethesametypeofrelationshipwithauserwhospansastructuralhole(exceptthemobilenetwork).StructuralHoleRoughlyspeaking,apersonissaidtospanastructuralholeinasocialnetworkifheorsheislinkedtopeopleinpartsofthenetworkthatareotherwisenotwellconnectedtooneanother[3].Argumentsbasedonstructuralholessuggestthatthereisaninformationaladvantagetohavefriendsinanetworkwhodonotknoweachother.Asalesmanagerwithadiverserangeofcon-nectionscanbeconsideredasspanningastructuralhole,withanumberofpotentiallyweakties[9]toindividualsindifferentcom-munities.Moregenerally,wecanthinkaboutWebsitessuchaseBayasspanningstructuralholes,inthattheyfacilitateeconomicinteractionsbetweenpeoplewhowouldotherwisenotbeabletondeachother.Ourideahereistotestifastructuralholetendstohavethesametypeofrelationshipwiththeotherusers.Werstemployasimplealgorithmtoidentifystructuralholeusersinanetwork.Followingtheinformaldescriptionofstructuralholes[3],foreachnode,wecountthenumberofpairsofneighborswhoarenotdirectlycon-nected.Allusersarerankedbasedonthenumberofpairsandthentop1%users4withthehighestnumbersareviewedasstructuralholesinthenetwork.Figure4showstheprobabilitiesthattwousers(AandB)havethesametypeofrelationshipwithanotheruser(sayC),conditionedonwhetheruserCspansastructuralhole 4Thisisbasedontheobservationthatlessthan1%oftheTwitterusersproduce50%ofitscontent[32]. B C A (A)(B)(C)(D) B C A ___ B C A _++ B C A +__ Figure5:Illustrationofstatustheory.(A)and(B)satisfythestatustheory,while(C)and(D)donotsatisfythestatustheory.Herepositive“+”denotesthetargetnodehasahigherstatusthanthesourcenode;andnegative“-”denotesthetargetnodehasalowerstatusthanthesourcenode.Intotalthereare16differentcases. Enron Coauthor 0 0.1 0.2 0.3 0.4 0.5 011 101 110 100 000 Figure6:Socialstatus.Distributionofvemostfrequentforma-tionsoftriadswithsocialstatus.Givenatriad(A;B;C),letususe1todenotetheadvisor-adviseerelationshipand0colleaguerelation-ship.Thusthenumber011todenoteAandBarecolleagues,BisC'sadvisorandAisC'sadvisor.ornot.Wehavetwointerestingobservations:(1)usersaremorelikely(onaverage+70%higherthanchance)tohavethesametypeofrelationshipwithCifCspansastructuralhole;(2)disconnectedusersaremorelikelythanconnecteduserstohavethesametypeofrelationshipwithauserclassiedasspanningastructuralhole.Oneexceptionisthemobilenetwork,wheremostmobileusersinthedatasetareuniversitystudentsandthusfriendsfrequentlycom-municatewitheachother.SocialStatusAnothersocialpsychologicaltheoryisthetheoryofstatus[5,11,20].Thistheoryisbasedonthedirectedrelation-shipnetwork.Supposeeachdirectedrelationshiplabeledbyapos-itivesign“+”oranegativesign“-”(wheresign“+”/“-”denotesthetargetnodehasahigher/lowerstatusthanthesourcenode).Thenstatustheorypositsthatif,inatriangleonthreenodes(alsocalledtriad),wetakeeachnegativeedge,reverseitsdirection,andipitssigntopositive,thentheresultingtriangle(withallposi-tiveedgesigns)shouldbeacyclic.Figure5illustratesfourexam-ples.Thersttwotrianglessatisfythestatusorderingandthelattertwodonotsatisfyit.WeconductedananalysisontheCoauthorandtheEnronnetworks,whereweaimtonddirectedrelation-ships(advisor-adviseeandmanager-subordinate).Wefoundnearly99%oftriadsinthetwonetworkssatisfythesocialstatustheory,whichwasalsovalidatedin[20].Weinvestigatemorebylookingatthedistributionofdifferentformsoftriadsinthetwonetworks.Specically,thereareintotal16differentformsoftriads[20].Weselectvemostfrequentformsoftriadsinthetwonetworks.Foreasyunderstanding,givenatriad(A;B;C),weuse1todenotetheadvisor-adviseerelationshipand0colleaguerelationship,andthreeconsecutivenumbers011todenoteAandBarecolleagues,BisC'sadvisorandAisC'sadvisor.Itisstrikingthatalthoughthetwonetworks(CoauthorandEnron)aretotallydifferent,theyshareasimilardistributiononthevefrequentformsoftriads(asplottedinFigure6). Enron Coauthor 0 0.2 0.4 0.6 0.8 1 from OU to OU from OL to OU from OU to OL from OL to OL Figure7:Opinionleader.OL-Opinionleader;OU-Ordinaryuser.Probabilitythattwotypesofusershaveadirectedrelationship(fromhighersocialstatustolowerstatus,i.e.,manager-subordinaterelation-shipinEnronandadvisor-adviseerelationshipinCoauthor.Itisclearthatopinionleaders(detectedbyPageRank)aremorelikelytohaveahighersocial-statusthanordinaryusers.OpinionLeaderThetwo-stepowtheoryisrstintroducedin[18]andfurtherelaboratedinliterature[15,14].Thetheorysug-geststhatideas(innovations)usuallyowrsttoopinionleaders,andthenfromthemtoawiderpopulation.Intheenterpriseemailnetwork,forexample,managersmayactasopinionleaderstohelpspreadinformationtosubordinates.Ourbasicideahereistoexaminewhether“opinionleaders”aremorelikelytohaveahighersocialstatus(manageroradvisor)thanordinaryusers.Todothis,werstcategorizeusersintotwogroups(opinionleadersandordinaryusers)byPageRank[26]5.WithPageRank,weestimatetheimportanceofeachuseraccordingtothenetworkstructure,andthenselectasopinionleaderswiththetop1%userswhohavethehighestPageRankscoresandtherestasordinaryusers.Then,weexaminetheprobabilitiesthattwousers(AandB)haveadirectedsocialrelationship(fromhighersocial-statususertolowersocial-statususer)suchasadvisor-adviseere-lationshipormanager-subordinaterelationship.Figure7showssomeinterestingdiscoveries.First,inboththeEnronandCoau-thornetworks,opinionleaders(detectedbyPageRank)aremorelikely(+71%-+84%)tohaveahighersocialstatusthanordinaryusers.Secondandalsomoreinterestingly,inEnron,itislikelythatordinaryusershaveahighersocialstatusthanopinionleaders.Itsaveragelikelihoodismuchlarger(30times)thanthatintheCoau-thornetwork.Thereasonmightbeintheenterpriseemailnetwork(Enron),somemanagersmaybeinactive,andmostmanagement-relatedcommunicationsweredonebytheirassistants.SummaryAccordingtothestatisticsabove,wehavethefollowingintuitions:1.Probabilitiesofbalancedtriadsbasedoncommunicationlinksareverydifferentindifferentnetworks,whilethebal-anceprobabilitiesbasedonfriendships(ortrustfulrelation-ships)aresimilarwitheachother.2.Usersaremorelikely(+25%-+152%higherthanchance)tohavethesametypeofrelationshipwithauserwhospansastructuralhole.3.Mosttriads(nearly99%)satisfypropertiesofthesocialsta-tustheory.Forthevemostfrequentformationsoftriads,theCoauthorandtheEnronnetworkshaveasimilardistribution.4.Opinionleadersaremorelikely(+71%-+84%higherthanchance)tohaveahighersocialstatusthanordinaryusers. 5PageRankisanalgorithmtoestimatetheimportanceofeachnodeinanetwork.4.MODELFRAMEWORKWeproposeatransfer-basedfactorgraph(TranFG)modelforlearningandpredictingthetypeofsocialrelationshipsacrossnet-work.Werstdescribethemodelinthecontextofasinglenet-work,andthenexplainhowtotransferthesupervisedinformationprovidedbyonenetworktoanothernetwork.4.1ThePredictiveModelGivenanetworkG=(V;EL;EU;X),eachrelationship(edge)eiisassociatedwithanattributevectorxiandalabelyiindicatesthetypeoftherelationship.LetXfxigandYfyig.Thenwehavethefollowingformulation:P(YjX;G)=P(X;GjY)P(Y) P(X;G)(1)Here,Gdenotesallformsofnetworkinformation.Thisprob-abilisticformulationindicatesthatlabelsofedgesdependonnotonlylocalattributesassociatedwitheachedge,butalsothestruc-tureofthenetwork.AccordingtoBayes'rule,wehaveP(YjX;G)=P(X;GjY)P(Y) P(X;G)P(XjY)P(YjG)(2)whereP(YjG)representstheprobabilityoflabelsgiventhestruc-tureofthenetworkandP(XjY)denotestheprobabilityofgener-atingattributesXassociatedtoalledgesgiventheirlabelsY.Weassumethatthegenerativeprobabilityofattributesgiventhelabelofeachedgeisconditionallyindependent,thuswehaveP(YjX;G)P(YjG)YiP(xijyi)(3)whereP(xijyi)istheprobabilityofgeneratingattributesxigiventhelabelyi.Now,theproblemishowtoinstantiatetheprobabilityP(YjG)andP(xijyi).Inprinciple,theycanbeinstantiatedindifferentways,forexampleaccordingtotheBayesiantheoryorMarkovrandomelds.Inthiswork,wechoosethelatter.ThusbytheHammersley-Cliffordtheorem[12],thetwoprobabilitiescanbedenedas:P(xijyi)=1 Z1expfdXj=1 jgj(xij;yi)g(4)P(YjG)=1 Z2expfXcXkkhk(Yc)g(5)whereZ1andZ2arenormalizationfactors.Eq.4indicatesthatwedeneafeaturefunctiongj(xij;yi)foreachattributexijas-sociatedwithedgeeiand jistheweightofthejthattribute.Itcanbedenedaseitherabinaryfunctionorareal-valuedfunc-tion.Forexample,forinferringadvisor-adviseerelationshipsfromthepublicationnetwork,wecandeneareal-valuedfeaturefunc-tionasthedifferenceofyearswhenauthorsviandvjrespectivelypublishedhisrstpaper.SuchafeaturedenitionisoftenusedinConditionalRandomFields[17]andMaximumEntropymodel[23].Eq.5representsthatwedeneasetofcorrelationfeaturefunctionsfhk(Yc)gkovereachcliqueYcinthenetwork.Herekistheweightofthethcorrelationfeaturefunction.Thesim-plestcliqueisanedge,thusafeaturefunctionhk(yi;yj)canbedenedasthecorrelationbetweentwoedges(ei;ej),ifthetwoedgesshareacommonendnode.WealsoconsidertriadsascliquesintheTranFGmodel,inthatseveralsocialtheorieswediscussedin§3arebasedontriads. IfwearegivenasinglenetworkGwithlabeledinformationY,learningthepredictivemodelistoestimateaparametercongu-ration=(f g;fg)tomaximizethelog-likelihoodobjectivefunction()=logP(YjX;G),i.e.,?=argmax()(6)4.2LearningacrossHeterogeneousNetworksWenowturntodiscusshowtolearnthepredictivemodelwithtwoheterogeneousnetworks(asourcenetworkGSandatargetnet-workGT).Straightforwardly,wecandenetwoseparateobjectivefunctionsforthetwonetworks.Thechallengeisthenhowtobridgethetwonetworks,sothatwecantransferthelabeledinformationfromthesourcenetworktothetargetnetwork.Asthesourceandtargetnetworksmaybefromarbitrarydomains,itisdifculttode-necorrelationsbetweenthembasedonpriorknowledge.Tothisend,weproposeatransfer-basedfactorgraph(TranFG)model.Ourideaisbasedonthefactthatthesocialtheorieswediscussedin§3aregeneraloverallnetworks.Intuitively,wecanleveragethecorrelationintheextenttowhichdifferentnetworkssatisfythedifferentsocialtheoriestotransfertheknowledgeacrossnetworks.Inparticular,forsocialbalance,wedenetriadbasedfeaturestodenotetheproportionofdifferentbalancedtrianglesinanetwork;forstructuralhole,wedeneedgecorrelationbasedfeatures,i.e.,correlationbetweentworelationshipseiandej;forsocialstatus,wedenefeaturesovertriadstorespectivelyrepresentheprobabilitiesofthesevenmostfrequentformationsoftriads;foropinionleaders,wedenefeaturesovereachedge.Finally,byincorporatingthesocialtheoriesintoourpredictivemodel,wedenethefollowinglog-likelihoodobjectivefunctionoverthesourceandthetargetnetworks:O( ; ;)=OS( ;)+OT( ;)=jVSjXi=1dXj=1jgj(xSij;ySi)+jVTjXi=1d0Xj=1 jg0j(xTij;yTi)+Xkk(Xc2GSk(YSc)+Xc2GTk(YTc))logZ(7)wheredandd0arenumbersofattributesinthesourcenetworkandthetargetnetworkrespectively.Inthisobjectivefunction,thersttermandthesecondtermdenethelikelihoodrespectivelyoverthesourcenetworkandthetargetnetwork;whilethethirdtermdenesthelikelihoodoverallcommonfeaturesdenedinthetwonet-works.Thecommonfeaturefunctionsaredenedaccordingtothesocialtheories.Suchadenitionimpliesthatattributesofthetwonetworkscanbeentirelydifferentastheyareoptimizedwithdif-ferentparametersf gandf g,whiletheinformationtransferredfromthesourcenetworktothetargetnetworkistheimportanceofcommonfeaturesthataredenedaccordingtothesocialtheories.Finally,wedenefour(real-valued)balancebasedfeatures,seven(real-valued)statusbasedfeatures,four(binary)featuresforopin-ionleaderandsix(real-valued)correlationfeaturesforstructuralhole.MoredetailsaboutfeaturefunctionaregiveninAppendix.ModelLearningandInferringThelastissueishowtolearntheTranFGmodelandhowtoinferthetypeofunknownrelationshipsinthetargetnetwork.LearningtheTranFGmodelistoestimateaparameterconguration=(f g;f g;fg)tomaximizethelog-likelihoodobjectivefunction( ; ;).Weuseagradient Input:asourcenetworkGS,atargetnetworkGT,andthelearningrateOutput:estimatedparameters=(fg;f g;fg)Initialize 0;Performstatisticsaccordingtosocialtheories;Constructsocialtheoriesbasedfeaturesk(Yc);repeat Step1:PerformLBPtocalculatemarginaldistributionofunknownvariablesinthesourcenetworkP(yijxi;GS);Step2:PerformLBPtocalculatemarginaldistributionofunknownvariablesinthetargetnetworkP(yijxi;GT);Step3:PerformLBPtocalculatethemarginaldistributionofcliquec,i.e.,P(ycjXSc;XTc;GS;GT);Step4:CalculatethegradientofkaccordingtoEq.8(forjand jwithasimilarformula);Step5:Updateparameterwiththelearningrate:new=old+O() untilConvergence; Algorithm1:LearningalgorithmforTranFG.decentmethod(oraNewton-Raphsonmethod)tosolvetheobjec-tivefunction.Weuseastheexampletoexplainhowwelearntheparameters.Specically,werstwritethegradientofeachkwithregardtotheobjectivefunction:() kE[hk(YSc)+hk(YTc)]EPk(YcjXS;XT;GS;GT)[hk(YSc)+hk(YTc)](8)whereE[hk(YSc)+hk(YTc)]istheexpectationoffactorfunctionhk(YSc)+hk(YTc)giventhedatadistribution(i.e.,theaveragevalueofthefactorfunctionhk(Yc)overalltriadsinthesourceandthetargetnetworks);andthesecondtermEPk(YcjXS;XT;GS;GT)[:]istheexpectationunderthedistributionPk(YcjXS;XT;GS;GT)givenbytheestimatedmodel.Similargradientscanbederivedforparameter jand j.Asthegraphicalstructurecanbearbitraryandmaycontaincy-cles,weuseloopybeliefpropagation(LBP)[24]toapproximatethegradients.Itisworthnotingthattoleveragetheunlabeledrela-tionships,weneedtoperformtheLBPprocesstwiceineachitera-tion,onetimeforestimatingthemarginaldistributionofunknownvariablesyi=?andtheothertimeformarginaldistributionoverallcliques.Finallywiththegradient,weupdateeachparameterwithalearningrate.ThelearningalgorithmissummarizedinAlgorithm1.Weseethatinthelearningprocess,thealgorithmusesanadditionalloopybeliefpropagationtoinferthelabelofun-knownrelationships.Afterlearning,allunknownrelationshipsareassignedwithlabelsthatmaximizethemarginalprobabilities.5.EXPERIMENTALSETUPTheproposedframeworkisverygeneralandcanbeappliedtomanydifferentnetworks.Forexperiments,weconsidervediffer-enttypesofnetworks:Epinions,Slashdot,Mobile,Coauthor,andEnron.Ontherstthreenetworks(Epinions,Slashdot,andMo-bile),ourgoalistoinferundirectedrelationships(e.g.,friendships),whileontheresttwonetworks(CoauthorandEnron),thegoalistoinferdirectedrelationships(e.g.,advisor-adviseerelationships).ComparisonMethodsWecomparethefollowingmethodsforin-ferringthetypeofsocialrelationships.SVM:similartothelogisticregressionmodel[19],SVMuses Table2:Performancecomparisonofdifferentmethodsforin-ferringfriendships(ortrustfulrelationships).(S)indicatesthesourcenetworkand(T)thetargetnetwork.Forthetargetnetwork,weuse40%ofthelabeleddataintrainingandtherestfortest. DataSet Method Prec.Rec.F1-score Epinions(S)toSlashdot(T)(40%) SVM 0.71570.97330.8249 CRF 0.89190.67100.7658 PFG 0.93000.64360.7607 TranFG 0.94140.94460.9430 Slashdot(S)toEpinions(T)(40%) SVM 0.91320.99250.9512 CRF 0.89230.99110.9393 PFG 0.99540.97870.9870 TranFG 0.99540.97870.9870 Epinions(S)toMobile(T)(40%) SVM 0.89830.59550.7162 CRF 0.94550.54170.6887 PFG 1.00000.59240.7440 TranFG 0.82390.83440.8291 Slashdot(S)toMobile(T)(40%) SVM 0.89830.59550.7162 CRF 0.94550.54170.6887 PFG 1.00000.59240.7440 TranFG 0.72580.85990.7872 attributesassociatedwitheachedgeasfeaturestotrainaclassi-cationmodelandthenemploystheclassicationmodeltopredictedges'labelsinthetestdataset.ForSVM,weemploySVM-light.CRF:ittrainsaconditionalrandomeld[17]withattributesassociatedwitheachedgeandcorrelationsbetweenedges.PFG:themethodisalsobasedonCRF,butitemploystheunla-beleddatatohelplearnthepredictivemodel.Themethodispro-posedin[29].TranFG:theproposedapproach,whichleveragesthelabelin-formationfromthesourcenetworktohelpinferthetypeofrela-tionshipinthetargetnetwork.WealsocomparewiththemethodTPFGproposedin[30]forminingadvisor-adviseerelationshipsinthepublicationnetwork.Thismethodisdomain-specicandthusweonlycomparewithitontheCoauthornetwork.Inallexperiments,weusethesamefeaturedenitionsforallmethods.OntheCoauthornetwork,wedonotconsidersomedomain-speciccorrelationfeatures6.EvaluationMeasuresToquantitativelyevaluatetheperformanceofinferringthetypeofsocialrelationships,weconductedexperi-mentswithdifferentpairsof(sourceandtarget)networks,andeval-uatedtheapproachesintermsofPrecision,RecallandF1-Measure.AllcodeswereimplementedinC++,andallexperimentswereperformedonaPCrunningWindows7withIntel(R)Core(TM)2CPU6600(2.4GHzand2.39GHz)and4GBmemory.Theef-ciencyoftheproposedTranFGmodelisacceptable.Forexample,ittookaboutveminutestotrainaTranFGmodelovertheEpin-ionsandtheSlashdotnetworks.6.RESULTSANDANALYSISInthissection,werstevaluatetheperformanceoftheproposedapproachandthecomparisonmethods.Next,weanalyzehowso-cialtheoriescanhelpimprovethepredictionperformance.Finally,wegiveaqualitativecasestudytofurtherdemonstratetheeffec-tivenessoftheproposedapproach. 6Weconductedexperiments,butfoundthatthosefeaturescanleadtoovertting.Table3:Performancecomparisonofdifferentmethodsforin-ferringdirectedrelationships(thesourceendhasahigherso-cialstatusthanthetargetend).(S)indicatesthesourcenetworkand(T)thetargetnetwork.Forthetargetnetwork,weuse40%oflabeleddataintrainingandtherestfortest. DataSet Method Prec.Rec.F1-score Coauthor(S)toEnron(T)(40%) SVM 0.95240.55560.7018 CRF 0.95650.53660.6875 PFG 0.97300.65450.7826 TranFG 0.95560.78180.8600 Enron(S)toCoauthor(T)(40%) SVM 0.69100.37270.4842 CRF 1.00000.30430.4666 PFG 0.99160.45910.6277 TPFG 0.59360.76110.6669 TranFG 0.97930.55250.7065 6.1PerformanceAnalysisWecomparetheperformanceofthefourmethodsforinferringfriendships(ortrustfulrelationships)onfourpairsofnetworks:Epinions(S)toSlashdot(T),Slashdot(S)toEpinions(T),Epin-ions(S)toMobile(T),andSlashdot(S)toMobile(T).7Inallex-periments,weuse40%ofthelabeleddatainthetargetnetworkfortrainingandtherestfortest.Fortransfer,weconsiderthela-beledinformationinthesourcenetwork.Table2liststheperfor-manceofthedifferentmethodsonthefourtestcases.Ourapproachshowsbetterperformancethanthethreealternativemethods.Weconductedsigntestsforeachresult,whichshowsthatalltheim-provementsofourapproachTranFGoverthethreemethodsarestatisticallysignicant(p0:01).Table3showstheperformanceofthefourmethodsforinferringdirectedrelationships(thesourceendhasahighersocialstatusthanthetargetend)ontwopairsofnetworks:Coauthor(S)toEnron(T)andEnron(S)toCoauthor(T).Weusethesameexperimentalset-tingasthatforinferringfriendshipsonthefourpairsofnetworks,i.e.,taking40%ofthelabeleddatainthetargetnetworkfortrain-ingandtherestfortest,whilefortransfer,analogously,weconsiderthelabeledinformationfromthesourcenetwork.Weseethatbyleveragingthesupervisedinformationfromthesourcenetwork,ourmethodclearlyimprovestheperformance(about15%byF1-scoreonEnronand10%onCoauthor).ThemethodPFGcanbeviewedasanon-transferablecounter-partofourmethod,whichdoesnotconsiderthelabeledinforma-tionfromthesourcenetwork.FrombothTable2andTable3,wecanseethatwiththetransferredinformation,ourmethodcanclearlyimprovetherelationshipcategorizationperformance.An-otherphenomenonisthatPFGhasabetterperformancethantheothertwomethods(SVMandCRF)inmostcases.PFGcouldleveragetheunlabeledinformationinthetargetnetwork,thusim-provestheperformance.TheonlyexceptionisthecaseofEpinions(S)toSlashdot(T),whereitseemsthatusersinSlashdothavearelativelyconsistentpatternandmerelywithsomegeneralfeaturessuchasin-degree,out-degree,andnumberofcommonneighbors,aclassicationbasedmethod(SVM)canachieveveryhighperfor-mance.FactorcontributionanalysisWenowanalyzehowdifferentso-cialtheories(socialbalance,socialstatus,structuralhole,andtwo- 7WedidtrytouseMobileasthesourcenetworkandSlash-dot/Epinionsasthetargetnetwork.HoweverasthesizeofMobileismuchsmallerthantheothertwonetworks,theperformancewasconsiderablyworse. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.4 0.5 0.6 0.7 0.8 0.9 1 percentageF1-Measure SVM CRF PFG TranFG-SB TranFG-SH TranFG (a)Epinions-to-Slashdot 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.4 0.5 0.6 0.7 0.8 0.9 1 percentageF1-Measure SVM CRF PFG TranFG-SB TranFG-SH TranFG (b)Slashdot-to-Epinions 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.4 0.5 0.6 0.7 0.8 0.9 1 percentageF1-Measure SVM CRF PFG TranFG-SB TranFG-SH TranFG (c)Epinions-to-Mobile 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.4 0.5 0.6 0.7 0.8 0.9 1 percentageF1-Measure SVM CRF PFG TranFG-SB TranFG-SH TranFG (d)Slashdot-to-MobileFigure9:Performanceofinferringfriendshipswithandw/othebalancebasedtransferbyvaryingthepercentoflabeleddatainthetargetnetwork. 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 F1-Measure TranFG TranFG - SH TranFG - SB TranFG - ALL (a)Friendship 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 F1-Measure TranFG TranFG - OL TranFG - SS TranFG - ALL (b)DirectedFigure8:Factorcontributionanalysis.TranFG-SHdenotesourTranFGmodelbyignoringthestructuralholebasedtransfer.TranFG-SBstandsforignoringthestructuralbalancebasedtransfer.TranFG-OLstandsforignoringtheopinionleaderbasedtransferandTranFG-SSstandsforignoringsocialstatusbasedtransfer. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.4 0.6 0.8 1 percentageF1-Measure SVM CRF PFG TranFG-SS TranFG-OL TranFG (a)Coauthor-to-Enron 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.4 0.6 0.8 1 percentageF1-Measure SVM CRF PFG TranFG-SS TranFG-OL TranFG (b)Enron-to-CoauthorFigure10:Performanceofinferringdirectedrelationshipwithandw/othestatusbasedtransferbyvaryingthepercentofla-beleddatainthetargetnetwork.stepow(opinionleader))canhelpinfersocialties.Forinfer-ringfriendships,weconsidersocialbalance(SB)andstructuralhole(SH)basedtransferandforinferringdirectedfriendships,weconsidersocialstatus(SS)andopinionleader(OL)basedtransfer.HereweexaminethecontributionofthedifferentfactorsdenedinourTranFGmodel.Figure8showstheaverageF1-Measurescoreoverthedifferentnetworks,obtainedbytheTranFGmodelforinferringfriendshipsanddirectedrelationships.Inparticular,TranFG-SBrepresentsthatweremovesocialbalancebasedtrans-ferfeaturesfromourmodelandTranFG-Alldenotesthatwere-moveallthetransferfeatures.Itcanbeclearlyobservedthattheperformancedropswhenignoringeachofthefactors.Wecanalsoseethatforinferringfriendshipsthesocialbalanceisabitmoreusefulthanstructuralhole,andforinferringdirectedrelationshipsthesocialstatusfactorismoreimportantthanthefactorofopin-ionleader.Theanalysisalsoconrmsthatourmethodworkswell(furtherimprovementisobtained)whencombiningdifferentsocialtheories.Socialbalanceandstructuralholebasedtransfer.Wepresentanin-depthanalysisonhowthesocialbalanceandstructuralholebasedtransfercanhelpbyvaryingthepercentoflabeledtrainingdatainthetargetnetwork.WeseethatinallcasesexceptSlashdot-to-Epinions,clearimprovementscanbeobtainedbyusingtheso-cialbalanceandstructuralholebasedtransfer,whenthelabeleddatainthetargetnetworkislimited(50%).Indeed,insomecasesuchasEpinions-to-Slashdot,merelywith10%ofthelabeledrela-tionshipsinSlashdot,ourmethodcanobtainagoodperformance(88%byF1-score).Withouttransfer,thebestperformanceisonly70%(obtainedbySVM).Wealsondthatstructuralbalancebasedtransferismorehelpfulthanstructuralholdbasedtransferforin-ferringfriendshipsinmostcaseswithvariouspercentsoflabeledrelationships.Thisresultisconsistentwiththatobtainedinthefac-torcontributionanalysis.AdifferentphenomenonisfoundinthecaseofSlashdot-to-Epinions,whereallmethodscanobtainaF1-scoreof94%withonly10%ofthelabeleddata.Theknowledgetransferseemsnothelpful.Byacarefulinvestigation,wefoundsimplywiththosefeatures(Cf.Appendixfordetails)denedontheedges,wecouldachieveahighperformance(about90%).Thestructureinforma-tionindeedhelps,butthegainedimprovementislimited.Socialstatusandopinionleaderbasedtransfer.Figure10showsananalysisforinferringdirectedrelationshipsonthetwocases(Enron-to-CoauthorandCoauthor-to-Enron).Here,wefocusontestinghowsocialstatusandopinionleaderbasedtransfercanhelpinferthetypeofrelationshipsbyvaryingthepercentofla-beledrelationshipsinthetargetnetwork.Inbothcases(Coauthor-to-EnronandEnron-to-Coauthor),theTranFGmodelachievescon-sistentimprovements.Forexample,whenthereisonly10%ofla-beledadvisor-adviseerelationshipsintheCoauthornetwork,with-outconsideringthestatusandopinionleaderbasedtransfer,theF1-scoreisonly24%.Byleveragingthestatusandopinionleaderbasedtransferfromtheemailnetwork(Enron),thescoreisdou-bled(47%).Moreover,wendthatthesocialstatusbasedtransferismorehelpfulthantheopinionleaderbasedtransferwithvariouspercentsofthelabeleddata.6.2CaseStudyNowwepresentacasestudytodemonstratetheeffectivenessoftheproposedmodel.Figure11showsanexamplegeneratedfromourexperiments.ItrepresentsaportionoftheCoauthornetwork.Blackedgesandarrowsrespectivelydenotelabeledcolleaguere-lationshipsandadvisor-adviseerelationshipsinthetrainingdata.Coloredarrowsandedgesindicateadvisor-adviseeandcolleaguesrelationshipsdetectedbythreemethods:SVM,PFGandTranFG,withredcolorindicatingmistakeones.Thenumbersassociatedwitheachauthorrespectivelydenotethenumberofpapersandthescoreofh-index.Weinvestigatemorebylookingataspecicexample.SVMmis-takenlyclassiesthreeadvisor-adviseerelationshipandtwocol- /XLJL/DXUD $PRV)DLW /HDK(SVWHLQ *LRUJLR$XVLHOOR $UL)UHXQG -XOLD&KX]KR\ 6WHIDQR/HRQDUGL-RVHSK1DRU (VWHEDQ)HXHUVWHLQ 2GHG5HJHY /LDQH/HZLQ(\WDQ  6HUJH$ELWHERXO (a)SVM /XLJL/DXUD $PRV)DLW /HDK(SVWHLQ *LRUJLR$XVLHOOR $UL)UHXQG -XOLD&KX]KR\ 6WHIDQR/HRQDUGL-RVHSK1DRU (VWHEDQ)HXHUVWHLQ 2GHG5HJHY /LDQH/HZLQ(\WDQ  6HUJH$ELWHERXO (b)PFG /XLJL/DXUD $PRV)DLW /HDK(SVWHLQ *LRUJLR$XVLHOOR $UL)UHXQG -XOLD&KX]KR\ 6WHIDQR/HRQDUGL-RVHSK1DRU (VWHEDQ)HXHUVWHLQ 2GHG5HJHY /LDQH/HZLQ(\WDQ  6HUJH$ELWHERXO (c)Ourapproach(TranFG)Figure11:Casestudy.Illustrationofinferringadvisor-adviseerelationshipsontheCoauthornetwork.Directededgesindicateadvisorrelation-ships,andundirectedonesindicatecoauthorrelationships.Blackedgesindicatelabeleddata.Redcolorededgesindicateswrongpredictions.leaguerelationships.SVMtrainsalocalclassicationmodelwith-outconsideringthenetworkinformation.PFGconsidersthenet-workinformationaswellastheunlabeleddata,thusobtainsabetterresult.OurproposedTranFGmodelfurthercorrectstwomistakes(“Fait-Leonardi”and“Ausiello-Laura”)byleveragingpropertiesofsocialstatusandopinionleader.Forexample,theresultsobtainedbyPFGamong“Azar”,“Amos”and“Leonardi”formatriadof(“011”).Althoughitsatisesthepropertyofsocialstatus,theprob-abilityofsuchtriadismuchlower(0:4%vs.24:6%)thantheform(“100”).However,thelimitationofthetrainingdataleadsPFGtoresultinabiasmistake(5:8%vs.12:6%).TranFGsmoothestheinferringresultsbytransferringknowledgefromthesource(Enron)network.7.RELATEDWORKInferringsocialtiesisanimportantprobleminsocialnetworkanalysis.Liben-Nowelletal.[21]presentaunsupervisedmethodforlinkprediction.Xiangetal.[33]developalatentvariablemodeltoestimaterelationshipstrengthfrominteractionactivityandusersimilarity.Backstrometal.[2]proposeasupervisedrandomwalkalgorithmtoestimatethestrengthofsociallinks.Leskovecetal.[19]employalogisticregressionmodeltopredictpositiveandneg-ativelinksinonlinesocialnetworks.Hopcroftetal.[13]studytheextenttowhichtheformationofareciprocalrelationshipcanbepredictedinadynamicnetwork.However,mostexistingworksfo-cusonpredictingandrecommendingunknownlinksinsocialnet-works,butignorethetypeofrelationships.Recently,thereareseveralworksoninferringthemeaningsofsocialrelationships.Diehletal.[6]trytoidentifythemanager-subordinaterelationshipsbylearningarankingfunction.Wangetal.[30]proposeanunsupervisedprobabilisticmodelformin-ingtheadvisor-adviseerelationshipsfromthepublicationnetwork.Crandalletal.[4]investigatetheproblemofinferringfriendshipbetweenpeoplefromco-occurrenceintimeandspace.Eagleetal.[7]presentseveralpatternsdiscoveredinmobilephonedata,andtrytousethesepatternstoinferthefriendshipnetwork.How-ever,thesealgorithmsmainlyfocusonaspecicdomain,whileourmodelisgeneralandcanbeappliedtodifferentdomains.Moreim-portantly,ourworktakestherststeptoincorporatesocialtheoriesforinferringsocialtiesacrossheterogeneousnetworks.Ourworkisrelatedwithlinkprediction,whichisoneofthecoretasksinsocialnetworks.Existingworkonlinkpredictioncanbebroadlygroupedintotwocategoriesbasedonthelearningmethodsemployed:unsupervisedlinkpredictionandsupervisedlinkpredic-tion.Unsupervisedlinkpredictionsusuallyassignscorestopoten-tiallinksbasedontheintuition-themoresimilarthepairofusersare,themorelikelytheyarelinked.Varioussimilaritymeasuresofusersareconsidered,suchastheAdamicandAdarmeasure[1],thepreferentialattachment[25],andtheKatzmeasure[16].Asur-veyofunsupervisedlinkpredictioncanbefoundin[21].Recently,[22]designsaowbasedmethodforlinkprediction.Therearealsoafewworkswhichemploysupervisedapproachestopredictlinksinsocialnetworks,suchas[31,2,19].Themaindifferencebetweenexistingworkonlinkpredictionandoureffortliesinthatexistingworkmainlyfocusesonspecicdomains,whileourpro-posedmodelcombinessocialtheories(suchasstructuralbalance,structuralhole,andsocialstatus)intoatransferlearningframeworkandcanbeeasilyappliedtodifferentdomains.8.CONCLUSIONInthispaper,westudythenovelproblemofinferringsocialtiesacrossheterogeneousnetworks.Wepreciselydenetheproblemandproposeatransfer-basedfactorgraph(TranFG)model.Themodelincorporatessocialtheoriesintoasemi-supervisedlearningframework,whichisusedtotransfersupervisedinformationfromthesourcenetworktohelpinfersocialtiesinthetargetnetwork.Weevaluatetheproposedmodelonvedifferentgenresofnet-works.Weshowthattheproposedmodelcansignicantlyimprovetheperformanceforinferringsocialtiesacrossdifferentnetworkscomparingwithseveralalternativemethods.Ourstudyalsorevealsseveralinterestingphenomena.Thegeneralproblemofinferringsocialtiesrepresentsanewandinterestingresearchdirectioninsocialnetworkanalysis.Therearemanypotentialfuturedirectionsofthiswork.First,someotherso-cialtheoriescanbefurtherexploredandvalidatedforanalyzingtheformationofdifferenttypesofsocialrelationships.Next,itisalsointerestingtostudyhowtofurthercorrecttheinferringmistakesbyinvolvingusersintothelearningprocess(e.g.,viaactivelearning).Anotherpotentialissueistovalidatetheproposedmodelonsomeothersocialnetworks. 9.REFERENCES[1]L.A.AdamicandE.Adar.Friendsandneighborsontheweb.SOCIALNETWORKS,25:211–230,2001.[2]L.BackstromandJ.Leskovec.Supervisedrandomwalks:predictingandrecommendinglinksinsocialnetworks.InWSDM'11,pages635–644,2011.[3]R.S.Burt.StructuralHoles:TheSocialStructureofCompetition.Cambridge,Mass.:HarvardUniversityPress,1995.[4]D.J.Crandall,L.Backstrom,D.Cosley,S.Suri,D.Huttenlocher,andJ.Kleinberg.Inferringsocialtiesfromgeographiccoincidences.PNAS,107:22436–22441,Dec.2010.[5]J.A.DavisandS.Leinhardt.Thestructureofpositiveinterpersonalrelationsinsmallgroups.InJ.Berger,editor,SociologicalTheoriesinProgress,volume2,pages218–251.HoughtonMifin,1972.[6]C.P.Diehl,G.Namata,andL.Getoor.Relationshipidenticationforsocialnetworkdiscovery.InAAAI,pages546–552,2007.[7]N.Eagle,A.S.Pentland,andD.Lazer.Inferringsocialnetworkstructureusingmobilephonedata.PNAS,106(36),2009.[8]D.EasleyandJ.Kleinberg.Networks,Crowds,andMarkets:ReasoningaboutaHighlyConnectedWorld.CambridgeUniversityPress,2010.[9]M.Granovetter.Thestrengthofweakties.AmericanJournalofSociology,78(6):1360–1380,1973.[10]R.Grob,M.Kuhn,R.Wattenhofer,andM.Wirz.Cluestr:Mobilesocialnetworkingforenhancedgroupcommunication.InGROUP'09,2009.[11]R.Guha,R.Kumar,P.Raghavan,andA.Tomkins.Propagationoftrustanddistrust.InWWW'04,pages403–412,2004.[12]J.M.HammersleyandP.Clifford.Markoveldonnitegraphsandlattices.Unpublishedmanuscript,1971.[13]J.E.Hopcroft,T.Lou,andJ.Tang.Whowillfollowyouback?reciprocalrelationshipprediction.InCIKM'11,2011.[14]E.Katz.Thetwo-stepowofcommunication:anup-to-datereportofanhypothesis.InEnisandCox(eds.),MarketingClassics,pages175–193,1973.[15]E.KatzandP.F.Lazarsfeld.PersonalInuence.TheFreePress,NewYork,USA,1955.[16]L.Katz.Anewstatusindexderivedfromsociometricanalysis.Psychometrika,18(1):39–43,1953.[17]J.D.Lafferty,A.McCallum,andF.C.N.Pereira.Conditionalrandomelds:Probabilisticmodelsforsegmentingandlabelingsequencedata.InICML'01,pages282–289,2001.[18]P.F.Lazarsfeld,B.Berelson,andH.Gaudet.Thepeople'schoice:Howthevotermakesuphismindinapresidentialcampaign.ColumbiaUniversityPress,NewYork,USA,1944.[19]J.Leskovec,D.Huttenlocher,andJ.Kleinberg.Predictingpositiveandnegativelinksinonlinesocialnetworks.InWWW'10,pages641–650,2010.[20]J.Leskovec,D.Huttenlocher,andJ.Kleinberg.Signednetworksinsocialmedia.InCHI'10,pages1361–1370,2010.[21]D.Liben-NowellandJ.M.Kleinberg.Thelink-predictionproblemforsocialnetworks.JASIST,58(7):1019–1031,2007.[22]R.Lichtenwalter,J.T.Lussier,andN.V.Chawla.Newperspectivesandmethodsinlinkprediction.InKDD'10,pages243–252,2010.[23]A.McCallum,D.Freitag,andF.C.N.Pereira.Maximumentropymarkovmodelsforinformationextractionandsegmentation.InICML'00,pages591–598,2000.[24]K.P.Murphy,Y.Weiss,andM.I.Jordan.Loopybeliefpropagationforapproximateinference:Anempiricalstudy.InUAI'99,pages467–475,1999.[25]M.E.J.Newman.Clusteringandpreferentialattachmentingrowingnetworks.Phys.Rev.E,64(2):025102,2001.[26]L.Page,S.Brin,R.Motwani,andT.Winograd.Thepagerankcitationranking:Bringingordertotheweb.TechnicalReportSIDL-WP-1999-0120,StanfordUniversity,1999.[27]M.Roth,A.Ben-David,D.Deutscher,G.Flysher,I.Horn,A.Leichtberg,N.Leiser,Y.Matias,andR.Merom.Suggestingfriendsusingtheimplicitsocialgraph.InKDD'10,2010.[28]J.Tang,J.Zhang,L.Yao,J.Li,L.Zhang,andZ.Su.Arnetminer:Extractionandminingofacademicsocialnetworks.InKDD'08,pages990–998,2008.[29]W.Tang,H.Zhuang,andJ.Tang.Learningtoinfersocialrelationshipsinlargenetworks.InECML/PKDD'11,2011.[30]C.Wang,J.Han,Y.Jia,J.Tang,D.Zhang,Y.Yu,andJ.Guo.Miningadvisor-adviseerelationshipsfromresearchpublicationnetworks.InKDD'10,pages203–212,2010.[31]C.Wang,V.Satuluri,andS.Parthasarathy.Localprobabilisticmodelsforlinkprediction.InICDM'07,pages322–331,2007.[32]S.Wu,J.M.Hofman,W.A.Mason,andD.J.Watts.Whosayswhattowhomontwitter.InWWW'11,2011.[33]R.Xiang,J.Neville,andM.Rogati.Modelingrelationshipstrengthinonlinesocialnetworks.InWWW'10,pages981–990,2010.AppendixTherearetwocategoriesoffeatures.Therstcategoryincludeslocalfeaturesdenedforeachspecicnetwork,andthesecondin-cludestransferfeaturesdenedbasedonthesocialtheories.Tables4-7giveasummaryoflocalfeaturedenitionsforthevenet-works.Foramoredetaileddescriptionofthefeaturedenitions,pleaserefertoliterature[6,7,19,29].Forthetransferfeatures,inEpinions,SlashdotandMobile,wedenefour(real-valued)balancetriadbasedfeaturesandsix(real-valued)structuralholebasedfeatures.IntheCoauthorandEnron,wedeneseven(real-valued)socialstatusbasedfeaturesandfour(binary)opinionleaderbasedfeatures.Table4:Featuresdenedforedge(vi;vj)inEpinions/Slashdot. Feature Description in-degree din(vi),din(vj) out-degree dout(vi),dout(vj) total-degree din(vi)+dout(vi),din(vj)+dout(vj) commonneighbors thetotalnumberofcommonneighborsofviandvjinanundirectedsense. Table5:Featuresdenedforedge(vi;vj)inMobile. Feature Description TotalProximity Numberofproximityeventsbetweenaandb In-Role NumberofproximityeventsatworkingplaceindaytimefromMondaytoFriday Extra-Role Numberofproximityeventsathomeorelse-whereatnightofweekends TotalCommunication Numberofcommunicationlogsbetweenaandb NightCallRatio Theratioofcommunicationlogsatnight Table6:Featuresdenedforedge(vi;vj)inCoauthor.Pide-notesasetofpaperspublishedbyauthorvi. Feature Description papercount jPij,jPjj paperratio jPij=jPjj coauthorratio jPiPjj=jPij,jPiPjj=jPjj conferencecoverage Theproportionoftheconferenceswhichbothviandvjattendedamongconferencesvjattended. rst-pub-yeardiff Thedifferenceinyearoftherstearliestpubli-cationofviandvj. Table7:Featuresdenedforedge(vi;vj)inEnron. From Sent-To+CC From Sent-To+CC vi vj vj vi vi vkandnotvj vj vkandnotvi vk viandnotvj vk vjandnotvi vk viandvj

Related Contents


Next Show more