/
Type Inference on Noisy RDF Data Heiko Paulheim and Christian Bizer University of Mannheim Type Inference on Noisy RDF Data Heiko Paulheim and Christian Bizer University of Mannheim

Type Inference on Noisy RDF Data Heiko Paulheim and Christian Bizer University of Mannheim - PDF document

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
561 views
Uploaded On 2014-12-18

Type Inference on Noisy RDF Data Heiko Paulheim and Christian Bizer University of Mannheim - PPT Presentation

unimannheimde Abstract Type information is very valuable in knowledge bases How ever most large open knowledge bases are incomplete with respect to type information and at the same time contain noisy and incorrect data That makes classic type inferen ID: 25981

unimannheimde Abstract Type information

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Type Inference on Noisy RDF Data Heiko P..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

2Therestofthispaperisstructuredasfollows.Section2motivatesourworkbyshowingtypicalproblemsofreasoningonreal-worlddatasets.Section3in-troducestheSDTypeapproach,whichisevaluatedinSect.4indi erentex-perimentalsettings.InSect.5,weshowhowSDTypecanbeappliedtosolveareal-worldproblem,i.e.,thecompletionofmissingtypeinformationinDBpedia.WeconcludeourpaperwithareviewofrelatedworkinSect.6,andasummaryandanoutlookonfuturework.2ProblemswithTypeInferenceonReal-worldDatasetsAstandardwaytoinfertypeinformationintheSemanticWebistheuseofreasoning,e.g.,standardRDFSreasoningviaentailmentrules[20].Toillustratetheproblemsthatcanoccurwiththatapproach,wehaveconductedanexper-imentwithDBpediaknowledgebase[2].Wehaveusedthefollowingsubsetofentailmentrules:{?xa?t1.?t1rdfs:subClassOf?t2entails?xa?t2{?x?r?y.?rrdfs:domain?tentails?xa?t{?y?r?x.?rrdfs:range?tentails?xa?tWehaveappliedthesethreerulestotheinstancedbpedia:Germany.Theserulesintotalinduce23typesfordbpedia:Germany,onlythreeofwhicharecorrect.Thelistofinferredtypescontains,amongothers,thetypesaward,city,sportsteam,mountain,stadium,recordlabel,person,andmilitarycon ict.Areasonerrequiresonlyonefalsestatementtocometoawrongconclusion.Intheexampleofdbpedia:Germany,atmost20wrongstatementsareenoughtomakeareasonerinfer20wrongtypes.However,therearemorethan38,000statementsaboutdbpedia:Germany,i.e.,anerrorrateofonly0:0005isenoughtoendupwithsuchacompletelynonsensicalreasoningresult.Inotherwords:evenwithaknowledgebasethatis99.9%correct,anRDFSreasonerwillnotprovidemeaningfulresults.However,acorrectnessof99.9%isdicult,ifnotimpossible,toachievewithreal-worlddatasetspopulatedeither(semi-)automatically,e.g.,byinformationextractionfromdocuments,orbythecrowd.Intheexampleabove,theclassMountainintheaboveisinducedfromasinglewrongstatementamongthe38,000statementsaboutdbpedia:Germany,whichisdbpedia:Mzedbpedia-owl:sourceMountaindbpedia:Germany.Like-wise,theclassMilitaryConflictisinducedfromasinglewrongstatement,i.e.,dbpedia:XII Corps (United Kingdom)dbpedia-owl:battledbpedia:Ger-many.Theseproblemsexistbecausetraditionalreasoningisonlyusefulifa)boththeknowledgebaseandtheschemadonotcontainanyerrorsandb)theschemaisonlyusedinwaysforeseenbyitscreator[4].Bothassumptionsarenotreal-isticforlargeandopenknowledgebases.Thisshowsthat,althoughreasoningseemsthestraightforwardapproachtotackletheproblemofcompletingmissingtypes,itis{atleastinitsstandardform{notapplicableforlarge,openknowl-edgebases,sincetheyareunlikelytohavecorrectenoughdataforreasoningto 4property.Furthermore,eachpropertyisassignedacertainweightwp,whichre ectsitscapabilityofpredictingthetype(seebelow).Withthoseelements,wecancomputethecon denceforaresourcerhavingatypetasconf(T(r)):=1 NXallpropertiespofrP(T(r)j(9p:�)(r));(1)whereNisthenumberofpropertiesthatconnectsaresourcetoanotherone.Byusingtheaverageprobabilitiesofeachtype,weaddresstheproblemoffaultylinks,sincetheydonotcontributetoomuchtotheoverallprobability.Intheexamplewithdbpedia:Germanyusedabove,theclassMountainwasinferredduetoonewrongstatementoutof38,000.Withtheabovede nition,thatrelationwouldonlybeweightedwith1 38;000,thus,thetypeMountainwouldreceiveacomparablysmalloverallcon dence.Bylookingattheactualdistributionoftypesco-occurringwithaproperty,insteadofthede neddomainsandranges,propertieswhichare\abused",i.e.,useddi erentlythanconceivedbytheschemacreator,donotcauseanyproblemsforSDType.Aslongasapropertyisusedmoreorlessconsistentlythroughouttheknowledgebase,theinferenceswillalwaysbeconsistentaswell.Singlein-consistentusages,justlikesinglewrongstatements,donotcontributetoomuchtotheoverallresult.Furthermore,whenlookingattheactualusageofaschema,theresultscanbemore ne-grainedthanwhenusingtheschemaonly.Forex-ample,ontheMusicBrainzdataset2,foaf:nameisalwaysusedasapropertyofmo:MusicArtist.WhileRDFSentailmentrulescouldnotinferanyspeci ctypefromthefoaf:nameproperty,sinceithasnoexplicitdomainde ned.3Whileusingtheactualdistributioninsteadofde neddomainsandrangeseliminatesthoseproblems,itcaninducenewoneswhenadatasetisheavilyskewed,i.e.,theextensionsofsomeclassesareseveralordersofmagnitudelargerthanothers.Thisisaprobleminparticularwithgeneralpurposeproperties,suchasrdfs:labelorowl:sameAs,whichareratherequallydistributedintheoverallknowledgebase.Ifthatknowledgebaseisheavilyskewed(e.g.,adatabaseaboutcitiesandcountrieswhichcontains10,000citiespercountryonaverage),anditcontainsmanyofsuchgeneralpurposeproperties,thereisadangerofoverratingthemorefrequenttypes.Thus,wede neaweightwpforeachproperty(notethatpandp�1aretreatedindependentlyandareeachassignedanindividualweight),whichmeasuresthedeviationofthatpropertyfromtheaprioridistributionofalltypes:wp:=Xalltypest(P(t)�P(tj9p:�))2(2)Withthosetypes,wecanre netheabovede nitiontoconf(T(r)):=XallpropertiespofrwpP(T(r)j(9p:�)(r));(3) 2http://dbtune.org/musicbrainz/3Thede neddomainoffoaf:nameisowl:Thing,seehttp://xmlns.com/foaf/spec/ 6asintable1,de nition(1)wouldyieldacon dencescorefor:xadbpedia-owl:Personand:xadbpedia-owl:Placeof0.14and0.60,respectively.4Whenusingweights,thenumbersaredi erent.InourexamplefromDBpe-dia,theobtainedweightsfordbpedia-owl:locationandfoaf:nameare0:77and0:17,hence,theoverallcon dencescoresfor:xadbpedia-owl:Personand:xadbpedia-owl:Placeinthatexample,usingde nition(3),are0:05and0:78,respectively.Thisshowsthattheweightshelpreducingthein uenceofgeneralpurposepropertiesandthusassigningmoresensiblescorestothetypesthatarefoundbySDType,andintheendhelpreducingwrongresultscomingfromskeweddatasets.Insummary,wearecapableofcomputingascoreforeachpairofaresourceandatype.Givenareasonablecuto threshold,wecanthusinfermissingtypesatarbitrarylevelsofquality{thresholdsbetween0.4and0.6typicallyyieldstatementsataprecisionbetween0.95and0.99.3.2ImplementationSDTypehasbeenimplementedbasedonarelationaldatabase,asshowninFig.1.Theinputdataconsistsoftwotables,onecontainingalldirectpropertyassertionsbetweeninstances,theothercontainingalldirecttypeassertions.Fromtheseinput les,basicstatisticsandaggregationsarecomputed:thenumberofeachtypeofrelationforallresources,andthetheaprioriprobabilityofalltypes,i.e.,thepercentageofinstancesthatareofthattype.Eachofthosetablescanbecomputedwithonepassovertheinputtablesortheirjoin.Thebasicstatistictablesserveasintermediateresultsforcomputingtheweightsandconditionalprobabilitiesusedintheformulasabove.Onceagain,thoseweightsandconditionalprobabilitiescanbecomputedwithonepassovertheintermediatetablesortheirjoins.Ina nalstep,newtypescanbematerializedincludingthecon dencescores.Thiscanbedoneforallinstances,orimplementedasaservice,whichtypesaninstanceondemand.Sinceofeachofthestepsrequiresonepassoverthedatabase,theoverallcomplexityislinearinthenumberofstatementsintheknowledgebase.4EvaluationToevaluatethevalidityofourapproach,weusetheexistingtypeinformationintwolargedatasets,i.e.,DBpedia[2]andOpenCyc[9],asagoldstandard,5andletSDTypereproducethatinformation,allowingustoevaluaterecall,precision,andF-measure. 4TheactualnumbersforDBpediaare:P(Personjfoaf#name)=0:273941,P(Placejfoaf#name)=0:314562,P(Personjdbpedia#location)=0:000236836,P(Placejdbpedia#location)=0:876949.5InthecaseofDBpedia,thedatasetisratherasilverstandard.However,itprovidesthepossibilityofalarger-scaleevaluation.A ner-grainedevaluationwithmanualvalidationoftheresultsbyanexpertcanbefoundinSect.5. 8 Fig.2.Precision/recallcurvesofSDTypeonDBpedia,forinstanceswithatleastone,atleast10,andatleast25incominglinksleastoneingoinglink),achievinganF-measureof88:5%,theresultsareslightlybetteroninstancesthathaveatleast10or25ingoinglinks,withanF-measureof88:9%and89:9%,respectively.Thedi erencesshowmoresigni cantlyintheprecision@95%(i.e.theprecisionthatcanbeachievedat95%recall),whichis0:69(minimumonelink),0:75(minimumtenlinks),and0:82(minimum25links),respectively.Figure3depictsthecorrespondingresultsforOpenCyc.The rstobservationisthattheoverallresultsarenotasgoodasonDBpedia,achievingamaximumF-measureof60:1%(60:3%and60:4%whenrestrictingtoinstancesthathaveatleast10or25ingoinglinks).Thesecondobservationisthattheresultsforinstanceswithdi erentnumbersofingoingpropertiesdonotdi ermuch{infact,mostofthedi erencesaretoosmalltobevisibleinthe gure.While95%recallcannotbereachedonOpenCycwithSDType,theprecision@90%is0:18(minimumonelink),0:23(minimumtenand25links),respectively.ThestrongdivergenceoftheresultsbetweenDBpediaandOpenCyc,asdiscussedabove,wastobeexpected,sinceOpenCychasontheonehandmore(andmorespeci c)typesperinstance,ontheotherhandlessevidenceperinstance,sincethenumberofpropertiesconnectinginstancesissmaller.Asthediagramsshow,lookingatinstanceswithmorelinksimprovestheresultsonDBpedia,butnotonOpenCyc(apartfromasmallimprovementinprecisionatarecallofaround0.9).ThereasonforthatisthatDBpedia,withitsstrongerfocusoncoveragethanoncorrectness,containsmorefaultystatements.Whenmorelinksarepresent,thein uenceofeachindividualstatementisre-duced,whichallowsforcorrectingerrors.OpenCyc,ontheotherhand,withitsstrongerfocusonprecision,bene tslessfromthaterrorcorrectionmechanism.Sinceweassumethatitismorediculttopredictmorespeci ctypes(suchasHeavyMetalBand)thanpredictingmoregeneralones(likeBandorevenOrganization),wehaveadditionallyexaminedthebestF-measurethatcanbe 9 Fig.3.Precision/recallcurvesofSDTypeonOpenCyc,takingintoaccountonlyin-coming,onlyoutgoing,andbothincomingandoutgoingpropertiesachievedwhenrestrictingtheapproachtoacertainmaximumclasshierarchydepth.TheresultsaredepictedinFig.4.ItcanbeobservedthatSDTypeinfactworksbetteronmoregeneraltypes(achievinganF-measureofupto97:0%onDBpediaand71:6%onOpenCycwhenrestrictingtheapproachtopredictingonlytop-levelclasses).However,thee ectsareweakerthanweexpected.5Application:CompletingMissingTypesinDBpediaInthefollowing,weapplySDTypetoinfermissingtypeinformationinDBpe-dia.WhileDBpediahasaquitelargecoverage,therearemillionsofmissingtypestatements.Toinferthosemissingtypes,wehavecombinedtheapproachsketchedabovewithapreclassi cationstepseparatingtypeablefromuntypeableresourcesinordertoreducefalseinferences.5.1EstimatingTypeCompletenessinDBpediaAsidefromthetypeinformationinDBpediausingtheDBpediaontology,whichisgeneratedusingWikipediainfoboxes,resourcesinDBpediaarealsomappedtotheYAGOontology[18].ThosemappingsaregeneratedfromWikipediapagecategories.Thus,theyarecomplementarytoDBpediatypes{anarticlemayhaveacorrectinfobox,butmissingcategoryinformation,orviceversa.Bothmethodsofgeneratingtypeinformationareproneto(di erenttypesof)errors.However,lookingattheoverlapsanddi erencesoftypestatementscreatedbybothmethodsmayprovidesomeapproximateestimatesaboutthecompletenessofDBpediatypes.ToestimatethecompletenessoftypeinformationinDBpedia,weusedapartialmappingbetweentheYAGOontology[18]andtheDBpediaontology.8 8http://www.netestate.de/De/Loesungen/DBpedia-YAGO-Ontology-Matching 10 Fig.4.MaximumachievableF-measurebymaximumclassdepthforDBpediaandOpenCyc.ThegraphdepictsthemaximumF-measurethatcanbeachievedwhenrestrictingtheapproachto ndingclassesofamaximumhiearchydepthof1,2,etc.AssumingthattheYAGOtypesareatleastmoreorlesscorrect,wecanestimatethecompletenessofaDBpediatypedbpedia#tusingthemappedYAGOtypeyago#tbylookingattherelationofallinstancesofdbpedia#tandallinstancesthathaveatleastoneofthetypesdbpedia#tandyago#t:completeness(dbpedia#t)jdbpedia#tj jdbpedia#t[yago#tj(5)Thedenominatordenotesanestimateofallinstancesthatshouldhavethetypedbpedia#t.Sincetheactualnumberofresourcesthatshouldhavethattypecanbelargerthanthat(i.e.,neithertheDBpedianortheYAGOtypeisset),thecompletenesscanbesmallerthanthefraction,hencetheinequation.Calculatingthesumacrossalltypes,weobservethatDBpediatypesareatmost63.7%complete,withatleast2.7millionmissingtypestatements(whileYAGOtypes,whichcanbeassessedaccordingly,areatmost53.3%complete).TheclassesthemostmissingtypestatementsareshowninFig.5Classesthatareveryincompleteinclude{dbpedia-owl:Actor(completeness4%),with57,000instancesmissingthetype,including,e.g.,BradPittandTomHanks{dbpedia-owl:Game(completeness7%),with17,000instancesmissingthetype,includingTetrisandSimCity{dbpedia-owl:Sports(completeness5:3%),with3,300instancesmissingthetype,includingBeachVolleyballandBiathlonAsimilarexperimentusingtheclassesdbpedia-owl:Personandfoaf:Person(assumingthateachpersonshouldhavebothtypes)yieldedthattheclassdbpedia-owl:Personisatmost40%complete.TheseexamplesshowthattheproblemofmissingtypesinDBpediaislarge,andthatitdoesnotonlya ect 11 Fig.5.Largestnumberof(estimated)missingtypestatementsperclassmarginallyimportantinstances.InDBpedia,commonreasonsformissingtypestatementsare{Missinginfoboxes{anarticlewithoutaninfoboxisnotassignedanytype.{Toogeneralinfoboxes{ifanarticleaboutanactorusesapersoninfoboxinsteadofthemorespeci cactorinfobox,theinstanceisassignedthetypedbpedia-owl:Person,butnotdbpedia-owl:Actor.{Wronginfoboxmappings{e.g.,thevideogameinfoboxismappedtodbpedia-owl:VideoGame,notdbpedia-owl:Game,anddbpedia-owl:VideoGameisnotasubclassofdbpedia-owl:GameintheDBpediaontology.{Unclearsemantics{someDBpediaontologyclassesdonothaveclearseman-tics.Forexample,thereisaclassdbpedia-owl:College,butitisnotclearwhichnotionofcollegeisdenotedbythatclass.Thetermcollege,accord-ingtodi erentusages,e.g.,inBritishandUSEnglish,candenoteprivatesecondaryschools,universities,orinstitutionswithinuniversities.95.2TypingUntypedInstancesinDBpediaInoursecondexperiment,wehaveanalyzedhowwellSDTypeissuitableforaddingtypeinformationtountypedresources.Asdiscussedabove,resourcesmaybemissingatypebecausetheyusenoinfobox,aninfoboxnotmappedtoatype,orarederivedfromaWikipediaredlink.Inparticularinthelattercase,theonlyusableinformationaretheincomingproperties.SimplytypingalluntypedresourceswithSDTypewouldleadtomanyerrors,sincetherearequiteafewresourcesthatshouldnothaveatype,asdiscussed 9seehttp://oxforddictionaries.com/definition/english/college 12in[1].Examplesareresourcesderivedfromlistpages,10pagesaboutacategoryratherthananindividual,11orgeneralarticles.12Inordertoaddressthatproblem,wehavemanuallylabeled500untypedresourcesintotypeableandnon-typeableresources.Forthoseresources,wehavecreatedfeaturesusingtheFeGeLODframework[13],andlearnedarulesetforclassifyingtypeableandnon-typeableresourcesusingtheRipperrulelearner[3].Theresultingrulesethasaccuracyof91.8%(evaluatedusing10-foldcrossvalidation).Fromall550,048untypedresourcesinDBpedia,thisclassi eridenti es519,900(94.5%)astypeable.Wehavegeneratedtypesforthoseresourcesandevaluatedthemmanuallyonasampleof100randomresources.TheresultsforvariousthresholdsaredepictedinFig.6.Itcanbeobservedthat3.1typesperinstancecanbegeneratedwithaprecisionof0.99atathresholdof0.6,4.0typeswithaprecisionof0.97atathresholdof0.5,and4.8typeswithaprecisionof0.95atathresholdof0.4.13.Incontrast,RDFSreasoningonthetestdatasetgenerates3.0typesperinstancewithaprecisionof0.96,whichshowsthatSDTypeisbetterinbothprecisionandproductivity.Withthosethresholds,wecangenerateatotalof2,426,552and1,682,704typestatements,respectively,asdepictedinTable3.Itcanbeobservedthatwiththehigherthresholdguaranteeinghigherprecision,moregeneraltypesaregenerated,whilemorespeci ctypessuchasAthleteorArtist,arerarelyfound.Inmostcases,thegeneratedtypesareconsistent,i.e.,anArtistisalsoaPer-son,whilecontradictingpredictions(e.g.,OrganizationandPersonforthesameinstance)areratherrare.6RelatedWorkTheproblemsofinferenceonnoisydataintheSemanticWebhasbeenidenti- ed,e.g.,in[16]and[8].Whilegeneral-purposereasoningonnoisydataisstillactivelyresearched,therehavebeensolutionsproposedforthespeci cproblemoftypeinferencein(generalorparticular)RDFdatasetsintherecentpast,us-ingstrategiessuchasmachinelearning,statisticalmethods,andexploitationofexternalknowledgesuchaslinkstootherdatasourcesortextualinformation.[11]useasimilarapproachasours,butonadi erentproblem:theytrytopredictpossiblepredicatesforresourcesbasedonco-occurrenceofproperties.TheyreportanF-measureof0.85atlinearruntimecomplexity.Manyontologylearningalgorithmsarecapableofdealingwithnoisydata[19].However,whenusingthelearnedontologiesforinferringmissinginformationusingareasoner,thesameproblemsaswithmanuallycreatedontologiesoccur. 10e.g.,http://dbpedia.org/resource/Lists_of_writers11e.g.,http://dbpedia.org/resource/Writer12e.g.,http://dbpedia.org/resource/History_of_writing13AwebserviceforDBpediatypecompletion,aswellasthecodeusedtoproducetheadditionaltypes,isavailableathttp://wifo5-21.informatik.uni-mannheim.de:8080/DBpediaTypeCompletionService/ 14 Fig.6.PrecisionandaveragenumberoftypestatementsperresourcegeneratedonuntypedresourcesinDBpediaDOLCEontologiesinorderto ndappropriatetypes.Theauthorsreportanoverallrecallof0.74,aprecisionof0.76,andanF-measureof0.75.Theauthorsof[7]exploittypesofresourcesderivedfromlinkedresources,wherelinksbetweenWikipediapagesareusedto ndlinkedresources(whicharepotentiallymorethanresourcesactuallylinkedinDBpedia).Foreachresource,theyusetheclassesofrelatedresourcesasfeatures,anduseknearestneighborsforpredictingtypesbasedonthosefeatures.Theauthorsreportarecallof0.86,aprecsionof0.52,andhenceanF-measureof0.65.Theapproachdiscussedin[15]addressesaslightlydi erentproblem,i.e.,themappingDBpediaentitiestothecategorysystemofOpenCyc.Theyusedi erentindicators{infoboxes,textualdescriptions,Wikipediacategoriesandinstance-levellinkstoOpenCyc{andapplyanaposterioriconsistencycheckusingCyc'sownconsistencycheckingmechanism.Theauthorsreportarecallof0.78,aprecisionof0.93,andhenceanF-measureof0.85.Theapproachesdiscussedabove,exceptfor[12],areusingspeci cfeaturesforDBpedia.Incontrast,SDTypeisagnostictothedatasetandcanbeappliedtoanyRDFknowledgebase.Furthermore,noneoftheapproachesdiscussedabovereachesthequalitylevelofSDType(i.e.,anF-measureof88:5%ontheDBpediadataset).WithrespecttoDBpedia,itisfurthernoteworthythatSDTypeisalsocapableoftypingresourcesderivedfromWikipediapageswithverysparsein-formation(i.e.,noinfoboxes,nocategories,etc.){asanextremecase,wearealsocapableoftypinginstancesderivedfromWikipediaredlinksonlybyusinginformationfromtheingoinglinks.7ConclusionandOutlookInthispaper,wehavediscussedtheSDTypeapproachforheuristicallycom-pletingtypesinlarge,cross-domaindatabases,basedonstatisticaldistributions.Unliketraditionalreasoning,ourapproachiscapableofdealingwithnoisydataaswellasfaultyschemasorunforeseenusageofschemas. 15TheevaluationhasshownthatSDTypecanpredicttypeinformationwithanF-measureofupto88:9%onDBpediaand63:7%onOpenCyc,andcanbeappliedtovirtuallyanycross-domaindataset.ForDBpedia,wehavefurther-moreenhancedSDTypetoproducevalidtypesonlyforuntypedresources.Tothatend,wehaveusedatrainedpreclassi ertellingtypeablefromnon-typeableinstancesatanaccuracyof91:8%,andareabletopredict2.4millionmissingtypestatementsataprecisionof0.95,or1.7millionmissingtypestatementsataprecisionof0.99,respectively.Wehaveshownthatwiththesenumbers,weoutperformtraditionalRDFSreasoningbothinprecisionandproductivity.TheresultsshowthatSDTypeisgoodatpredictinghigher-levelclasses(suchasBand),whilepredictingmore ne-grainedclasses(suchasHeavyMetalBand)ismuchmoredicult.Onestrategytoovercomethislimitationwouldbetousequali edrelationsinsteadofonlyrelationinformation,i.e.,acombinationoftherelationandthetypeofrelatedobjects.Forexample,linksfromamusicgrouptoaninstanceofHeavyMetalAlbumcouldindicatethatthismusicgroupistobeclassi edasaHeavyMetalBand.However,usingsuchfeaturesresultsinamuchlargerfeaturespace[13]andthuscreatesnewchallengeswithrespecttoscalabilityofSDType.ThetypestatementscreatedbySDTypeareprovidedinawebservicein-terface,whichallowsforbuildingapplicationsandservicesatauser-de nedtrade-o ofrecallandprecision,assketchedin[14].Thestatisticalmeasuresusedinthispapercannotonlybeusedforpredict-ingmissingtypes.Otheroptionswewanttoexploreinthefutureincludethevalidationofexistingtypesandlinks.Likeeachlinkcanbeanindicatorforatypethatdoesnotexistintheknowledgebase,itmayalsobeanindicatorthatanexistingtype(orthelinkitself)iswrong.Insummary,wehaveshownanapproachthatiscapableofmakingtypeinferenceheuristicallyonnoisydata,whichsigni cantlyoutperformspreviousapproachesaddressingthisproblems,andwhichworksonlarge-scaledatasetssuchasDBpedia.TheresultinghighprecisiontypesforDBpediahavebeenaddedtotheDBpedia3.9releaseandarethuspubliclyusableviatotheDBpediaservices.Acknowledgements.TheauthorswouldliketothankChristianMeilickeforhisvaluablefeedbackonthispaper.References1.AlessioPalmeroAprosio,ClaudioGiuliano,andAlbertoLavelli.Automaticexpan-sionofdbpediaexploitingwikipediacross-languageinformation.In10thExtendedSemanticWebConference(ESWC2013),2013.2.ChristianBizer,JensLehmann,GeorgiKobilarov,SorenAuer,ChristianBecker,RichardCyganiak,andSebastianHellmann.DBpedia-AcrystallizationpointfortheWebofData.WebSemantics,7(3):154{165,2009.3.WilliamW.Cohen.Faste ectiveruleinduction.In12thInternationalConferenceonMachineLearning,1995.