/
N. Calzolari 1 2nd KYOTO Workshop, Gifu, Japan, January 2011 N. Calzolari 1 2nd KYOTO Workshop, Gifu, Japan, January 2011

N. Calzolari 1 2nd KYOTO Workshop, Gifu, Japan, January 2011 - PowerPoint Presentation

boyplay
boyplay . @boyplay
Follow
342 views
Uploaded On 2020-08-27

N. Calzolari 1 2nd KYOTO Workshop, Gifu, Japan, January 2011 - PPT Presentation

Nicoletta Calzolari Istituto di Linguistica Computazionale CNR Pisa glottoloilccnrit The Future of KYOTO with some historical notes to show a path along an evolving vision ID: 805219

workshop kyoto japan gifu kyoto workshop gifu japan january 2011 amp 2nd calzolari language data meta resources lrs community

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "N. Calzolari 1 2nd KYOTO Workshop, Gifu,..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

N. Calzolari

1

2nd KYOTO Workshop, Gifu, Japan, January 2011

Nicoletta Calzolari Istituto di Linguistica Computazionale – CNR – Pisaglottolo@ilc.cnr.it

The Future of KYOTO

… with some historical notes to show a path along an evolving vision

in today EU context: META-SHARE, ...

Slide2

Why

such needed LRs,

are lacking

after 30 years of R&D in the field? 1) Because the main trend until mid-’80s was to privilege the processing of so-called

“critical” phenomena

, studied by the dominating linguistic theories, rather than focusing on the deep analysis of the real uses of a languageAs a result CL was focusing on:

few examples - often artificially built

lexicons made of few entries (toy lexicons

)grammars with poor coverage 2) Because large-scale LRs are

costly

& their production requires a big organizing effort

N. Calzolari

2

2nd KYOTO Workshop, Gifu, Japan, January 2011

Old slide with Antonio Zampolli (’80s/early ‘90s)

Slide3

… back from the early ‘80s

It became evident that:

Part of the results of meaning extraction

, e.g. many meaning distinctions, which could be generalised over lexicographic definitions and automatically captured, were unmanageable at the formal representation level, and had to be blurred into unique features and valuesUnfortunately, it is still today

difficult to constrain word-meanings within a rigorously defined organization: by their very nature they tend to evade any strict boundaries

N. Calzolari3

2nd KYOTO Workshop, Gifu, Japan, January 2011

Automatic acquisition of lexical information from MRDs

Was my first research & became central in the Pisa group (ACQUILEX)And also Amsler

, Briscoe,

Boguraev

,

Wilks

’ group,

IBM, then

Japanese groups, …The trend was: “

large-scale computational methods for the transformation of machine readable dictionaries into machine tractable

dictionaries

Instead of relying on linguists’ introspection

Pioneering

Research

Historical notes

Slide4

Automatic acquisition of info

from texts:

This trend has become today a consolidated & pervasive factFrom acquisition of “linguistic information”To acquisition of “general knowledge”,

with more data intensive, robust, reliable methodsN. Calzolari

42nd KYOTO Workshop, Gifu, Japan, January 2011

… back from the late ‘80s

After acquisition from MRDs,

Historical notes

Slide5

N. Calzolari

5

2nd KYOTO Workshop, Gifu, Japan, January 2011

Looking into the pastAll started with the situation we had in the

late ‘80s – early ‘90sWith all the

Xxx-LEX projects

5

MultiLex

GeneLex

AcquiLex

Xxx-Lex

A. Zampolli: Let’s be coherent:

Xxx-Lex

After the “Grosseto Workshop” (1985): a turning point

EAGLES

ISLE

Standards, Best Practices, ...

Slide6

ISO LMF

Lexical

Markup

FrameworkN. Calzolari2nd KYOTO Workshop, Gifu, Japan, January 20116

Structural skeleton, with the basic hierarchy of information in a lexical entry

+

various extensions

Modular framework

LMF specs comply with modelling UML principlesan XML DTD allows implementationBuilds on

EAGLES/ISLE

NEDO

Asian

Lang.uages

The field is mature

NICT

Language-Grid

Service Ontology

ICT

KYOTO

LIRICS

New

initiatives

LexInfo

Slide7

N. Calzolari

7

2nd KYOTO Workshop, Gifu, Japan, January 2011

KYOTOA search environment using semantic technologiesA “compass” for the web2.0

Interdisciplinarity scientific community (LRT, web technologies, knowledge engineers), companies, domain experts

Multilingualism 7 languages (2 Asiatic languages)

Kyoto Core System is

open &

free

Slide8

Annotation Format

(KAF

)

Multi-level Annotation Format

stand-off

annotation

uniform representation for 7 languagesShared through the languages•

Text

: tokenisation, sentences, paragraphs with reference to the sources

Terms

: words & multi-words, parts-of-speech, etc.

Chunks

: constituents & syntagmatic

realization

Dependencies

:

grammatical

functions

L1 – Semantic modules

:

M

ultiword tagging, Sense Tagging, Named Entity Recognition,

OntoTagging

L2 – Semantic module

: event/fact extraction

N. Calzolari

8

2nd KYOTO Workshop, Gifu, Japan, January 2011

from

Piek

Vossen

Slide9

N. Calzolari

9

2nd KYOTO Workshop, Gifu, Japan, January 2011

KYOTO System & Adoption of StandardsLinearMAF/SYNAF

Linear

SEMAF

Term extraction Tybot

Generic

TMFSemantic annotation

Linear

Generic

FACTAF

Fact extraction

Kybot

Domain editing

Wikyoto

Wordnet

Domain Wordnet

LMF API

Ontology

Domain ontology

OWL API

Concept

User

Fact

User

from

Piek

Vossen

Source

Documents

Could be at the basis of a new standard?

Slide10

2nd KYOTO Workshop, Gifu, Japan, January 2011

A common representation format for WordNets

Seven

WordNetssimilar but not identical  hampered interoperability

to be accessed both intra- and inter-linguistically 

to support easier integration

Wn

IT

WnENWnEU

WnNL

WnJP

WnCH

WnES

endow

WordNet

with a

representation format allowing easy access, integration & interoperability

among resources

Wn

IT

WnEN

WnEU

WnNL

WnJP

WnCH

WnES

Slide11

2nd KYOTO Workshop, Gifu, Japan, January 2011

N. Calzolari

11

GlobalInformation

Lemma

Monolingual

ExternalRef

Monolingual

ExternalRefs

Sense

LexicalEntry

Statement

Definition

SynsetRelation

SynsetRelations

Monolingual

ExternalRef

Monolingual

ExternalRefs

Synset

Lexicon

Interlingual

ExternalRef

Interlingual

ExternalRefs

SenseAxis

SenseAxes

LexicalResource

1..1

1..*

0..1

1..*

1..*

1..1

0..*

0..1

1..*

Meta

0..1

0..1

Meta

0..1

0..1

Meta

Meta

0..1

Meta

0..*

0..1

0..1

0..1

1..*

1..*

0..*

0..1

1..*

A common representation format:

WordNet

- LMF

Data

Categories

from Monica Monachini

Slide12

2nd KYOTO Workshop, Gifu, Japan, January 2011

Towards a Centralized

WordNet DC Registry

A list of 85 sem.rels as a result of a mapping of the KYOTOWordNet grid

Inter-WN

Intra-WN

N. Calzolari

12

Slide13

2nd KYOTO Workshop, Gifu, Japan, January 2011

N. Calzolari

13

SWN

<fuego_3, llama_1>

09686541-n

<!ELEMENT SenseAxes (SenseAxis+)>

<!ELEMENT SenseAxis (Meta?, Target+, InterlingualExternalRefs?)>

<!ATTLIST SenseAxisid ID #REQUIREDrelType CDATA #REQUIRED><!ELEMENT Target EMPTY><!ATTLIST Target

ID CDATA #REQUIRED>

<!ELEMENT InterlingualExternalRefs (InterlingualExternalRef+)>

<!ELEMENT InterlingualExternalRef (Meta?)>

<!ATTLIST InterlingualExternalRef

externalSystem CDATA #REQUIRED

externalReference CDATA #REQUIRED

relType (at|plus|equal) #IMPLIED>

IWN

<fuoco_1, fiamma_1>

00001251-n

WordNet

-LMF Multilingual level - Cross-lingual Relations

WN3.0

<

fire_1 flame_1 flaming_1

>

13480848-n

groups monolingual synsets corresponding to each other and sharing the same relations to English

link to ontology/(ies

)

specifies the type of correspondence

from Monica

Monachini

Slide14

N. Calzolari

14

2nd KYOTO Workshop, Gifu, Japan, January 2011

Complex picture!Is there anything we need to do for Interoperability?Work within ISO:LMF:

abstract meta-model for lexical representation

Ontology Group or more Group

s?Language Resource Ontologies

: ontology of data categories

Real life:Lexicons (e.g. WordNets) that are called OntologiesLexicons linked to Ontologies: to be used in applications, in multilingual systems, domains, …

Work on “

ontologising

” Lexicons: to allow exploiting various relations, to make inferences, …

Semantic Lexicons, with many types of relations among semantic units: these are often of “conceptual/world-knowledge” nature. Do we want DCs for these?

ISO SC 4/WG 4 – Lexicon-Ontology relations

New work item:

PWI 24622

KYOTO can contribute

Slide15

N. Calzolari

15

2nd KYOTO Workshop, Gifu, Japan, January 2011

To explore the need of doing something within ISO about the relations between Lexicon and OntologyDo we/ISO need to address another (lexical) layer?How lexicons and ontologies are linked and information mapped from one to the other

The ontological layer in a/connected to a lexicon

Possible issues/questions:

Is LMF enough to represent Ontological links? How to connect work being done in ISO Lexical group and ISO Ontology groups?

Lexicon and Ontologies: separation? or lexicalised ontologies? or ontologies lexicons?

Lexicon, Ontologies and DomainsOn a very different dimension: Ontology of lexical/semantic/conceptual categories? Standardised semantic categories, ontology labels?Relation to multilinguality ...

KYOTO can contribute

Slide16

N. Calzolari

16

2nd KYOTO Workshop, Gifu, Japan, January 2011

Input to Multilingual Web http://www.multilingualweb.eu/ The MultilingualWeb project is exploring standards and best practices that support the

creation, localization and use of multilingual web-based information

It aims to raise the visibility of existing best practices and standards and identify gapsThe core vehicle for this is a series of four workshops, for networking across communities that span the various aspects involved

Next workshop on best practices aimed at development of Content for the Web, including creation of content ranging from personal authoring for blogs and social networking sites to development of large corporate or organizational enterprises: “Content on the Multilingual Web”

4-5 April 2011Pisa, Italy

KYOTO can contribute

Slide17

N. Calzolari

17

2nd KYOTO Workshop, Gifu, Japan, January 2011

A new paradigm of R&D in LRs & LT

Since few years

A

dopting the paradigm of

accumulation of knowledge

, so successful in more mature disciplines, based on sharing LRs, tools & results

A

bility to build on each other achievements,

allowing controlled & effective

cooperation of many groups on common tasks

(see HumanGenomeProject)

e. g. initiatives to achieve international consensus on annotation guidelines

Slide18

Some steps for a “new generation” of LRs

N. Calzolari

18

2nd KYOTO Workshop, Gifu, Japan, January 2011

Slide19

Lexical WEB

N. Calzolari

192nd KYOTO Workshop, Gifu, Japan, January 2011

ComLex

SIMPLE

WordNets

WordNets

WordNetsFrameNet

Lex_x

Lex_y

LMF

with

intelligent

agents

NomLex

Standards for

Content Interoperability

Enough??

Global WordNet GRID

Bio

Lexicon

SIMPLE-WEB

Standards

Slide20

(Distributed) Language Services

N. Calzolari

20

2nd KYOTO Workshop, Gifu, Japan, January 2011

A scenario implying:

Enabling:

Can KYOTO contribute?

Slide21

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

21

Which Communities?Language ResourcesLanguage TechnologiesStandardisationContent/OntologiesSystem developersIntegrators SSH

EC

National funding agencies

Industry

Many

applications/domainsMTCLIR…e-governmentcontent industry

intelligence

e-culture

e-health

domotics…

core

EU

Forum

with

Many LRs & LTs exist, but a global vision, policy

&

strategy is needed

for

CLARIN

for SSH

FLaReNet

Network

META-NET

NoE

Need

to consider together

technical

organisational

strategic

economic, social

cultural

legal

political

issues wrt LRs & LTs

Today

Slide22

Fostering Language Resources Network

FLaReNet at a glance

An international Forum

to facilitate interaction, toOvercome the fragmentation in LR & LT & recreate a community Anticipate the needs of new types of LR & LT & Language InfrastructuresCreate a shared policy for the next years Foster a

European strategy for consolidating the sector

22

http://www.flarenet.eu

N. Calzolari

222nd KYOTO Workshop, Gifu, Japan, January 2011Essential Community mobilisation (also to

prepare the ground for a

RI

)

A

“roadmap”

: a

plan of actions

as

input to policy development

A

(

EU)

model for the LRs/LTs area of the next years

Ambitious!

Slide23

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

23

Create a shared repository of data formats, annotations, etc. as a major help to achieve standardisationCommon repositories for tools & language data should be established that are universally and easily accessible by everyoneCoordinate input to ISO/W3C standardisation workResults from Vienna & Barcelona Forum:

Shaping the Future of the Multilingual Digital Europe

Access to LRs

is critical & should involve all the communityNeed to create the means to plug together different LR & LT, In a

web-based resource and technology “grid”

Slide24

2

nd Blueprint

Result of a permanent and cyclical consultation Inside the community it represents

Outside it, through connections with neighbouring projects, associations, initiatives, funding agenciesOrganised along three main “directions”: Infrastructural AspectsResearch and DevelopmentPolitical and Strategic IssuesReflect three major

development factors that can boost or hinder the growth of the field of LRT

N. Calzolari2nd KYOTO Workshop, Gifu, Japan, January 2011

24

http://www.flarenet.eu/

sites/default/files/D8.2b.pdf

Slide25

Sources: many meetings

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

25

Slide26

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

26

3rd FLaReNet ForumThe European Language Resources and Technologies Forum:Important role in defining recommendationsIn Barcelona: 120 Participants from 22 Countries

Define final recommendations

Previous

Proceedings & Reports on the web

Blueprint

will be discussed Also for adoption & endorsement by FLaReNet Institutional Members

Slide27

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

27

IssueChallengeRecommended ActionsMetadata

Interoperability of Metadata sets

Set up a global infrastructure of common and uniform and/or interoperable metadata sets

Metadata usable both by humans and by machinesCreate machine-understandable metadata

with formal syntax and clear semantics

Automate the process of metadata creation Develop structured metadataDocumentationReliable documentation of LRs according to

common best practices

Collect

all possible and existing LR

documentation

Devise and adopt a widely agreed

standard documentation template

for all types of resources

Infrastructural Aspects

Slide28

Political and Strategic dimensions

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

28IssueChallengeRecommended ActionsFunding Agencies policiesDevise models to allow different types of players

easy access

to resourcesEnsure that

publicly funded resources are publicly available

either free of charge or at a small distribution cost

Encourage/enforce use of best practices or standards in LR production projectsMake sustainability and sharing/distribution plans mandatory in projects concerning LR productionLR citation

Appropriate

citation of

Language Resources like traditional publications

Develop

a standard protocol for citing

language resources

KYOTO can be an example

Slide29

LRE Map: Why??

The Map as an answer

to start to fill this gap, but also:To encourage the needed

“change in culture”N. Calzolari2nd KYOTO Workshop, Gifu, Japan, January 201129

Problem:

Lack of information

& documentation about resources is, in the e-science paradigm, a very critical issueNon documented resources don’t exist!!

A

collective enterprise: Each researcher must become aware of the importance of his/her personal engagement in documenting resources A task as important as creating new resources and not an accessory to be disregardedAs the necessary service to the whole community

Will become an essential instrument to

monitor the field

www.resourcebook.eu

Slide30

N. Calzolari

30

2nd KYOTO Workshop, Gifu, Japan, January 2011

How many LRs & Types at LREC?

30

How many LRs & Types at COLING?

Languages: 170!

Slide31

Languages:

But

obviously …

N. Calzolari312nd KYOTO Workshop, Gifu, Japan, January 2011170

!!

image courtesy of Wordle (http://www.wordle.net)

Slide32

Availability

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

32Freely available!

54%

3%

15%

25%

57%

LREC

COLING

Slide33

The Project META-NET

N. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

33META-NET is a Network of Excellence (coord. Hans Uszkoreit) dedicated

to fostering the technological foundations of the European multilingual information society

Objectives:Prepare the ground for a large-scale concerted effort by building a strategic alliance of national and international research programmes, corporate users and commercial technology providers and language communities

Strengthen the European research community through research networking and by creating new schemes and structures for sharing resources and effortsBuild bridges by approaching open problems in collaboration with other research fields such as machine learning, social computing, cognitive systems, knowledge technologies and multimedia content

Final goal:

META – The Multilingual Europe Technology Alliance

Slide34

language communities

policy makers and funding bodies

user industries

provider industries

language technology

community

machine

learning

community

semantic

techno-

logies

community

cognitive

systems

community

multimedia

content

techno-

logies

The META Alliance

N. Calzolari

34

2nd KYOTO Workshop, Gifu, Japan, January 2011

Slide35

Founding Members

Deutsches

Forschungszentrum

für

Künstliche

Intelligenz

GmbH, Germany

Barcelona Media – Centre

d'Innovació

, Spain

Consiglio

Nazionale

Ricerche

Instituto

di

Linguistica

Computazionale “Antonio Zampolli”, ItalyInstitute for Language and Speech Processing, R.C. “Athena”, GreeceCharles University in Prague, Czech RepublicCentre National de la Recherche

Scientifique

Laboratoire

d'Informatique

pour la

Mécanique

et les

Sci.s

de

l'Ingénieur

, France

Universiteit

Utrecht, The Netherlands

Aalto University, Finland

Fondazione

Bruno Kessler, Italy

Dublin City University, Ireland

Rheinisch Westfälische Technische Hochschule Aachen, Germany Jožef Stefan Institute, SloveniaEvaluations and Language Resources Distribution Agency, France N. Calzolari352nd KYOTO Workshop, Gifu, Japan, January 2011

Slide36

Three Lines of Action

The META-NET objectives translate into three lines of action:

N. Calzolari36

2nd KYOTO Workshop, Gifu, Japan, January 2011

Slide37

The Process

Visions

Strategic Research Agenda

Roadmap

2010 2011 2012

communication

within

META-NET (

META-VISION) communication in the wider LT community

and

among

other

stakeholders

communication

to policy makers

funding

bodies

,

public

N. Calzolari

37

2nd KYOTO Workshop, Gifu, Japan, January 2011

Slide38

Data has become a key factor in LT R&D

A few indicators:

Increasing size & importance of LREC conference, corpora mailing list, etc.Citation ranks of publications on language resourcesLanguage research and language technology belong to the Data Intensive SciencesExpensive data become valuable through sharingHowever, the long demanded and well-contemplated instruments for managing and sharing this data are still missingN. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 2011

38

Slide39

META-SHARE: Key Features

META-SHARE is an

open, integrated, secure, interoperable exchange infrastructure (resp. Stelios Piperidis

) for language data & tools for the Human Language Technologies domainever-evolving, scalable, including free and for-a-fee LRs/LTs and servicesincluding legacy, contemporary and emerging datasets, tools and technologiesA marketplace where language data & tools are documented, uploaded and stored in repositories, catalogued and announced, downloaded, exchanged, aiming to support a data economy (includes free and for-a-fee LRs/LTs and also services)Standards-compliant, overcoming format, terminological and semantic differences

Based on distributed networked repositories

accessible through common interfacesN. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 201139

Slide40

What we’re offering

A channel to

share and

distribute language data and toolsTechnical solutions for building your own repositoriesProtocols and mechanisms for making the descriptions of your resources (and the actual resources) harvestableGuidelines and recommendations on standards used in the LR production and documentation processesRecommendations on data and tools licensing issuesAccess to large catalogues of documented, high-quality resources, as well as the actual data and

toolsN. Calzolari

2nd KYOTO Workshop, Gifu, Japan, January 201140

KYOTO can be among the first

Slide41

Features

Single Sign-On

Easy AdministrationMetadata HarvestingPersistent Identifiers (PIDs)

Intuitive SearchN. Calzolari41Open SourceService-OrientedDistributedReplication/BackupReporting & Statistics

2nd KYOTO Workshop, Gifu, Japan, January 2011

Slide42

v0 architecture

Slide43

On the communication/mobilisation side

A

change of cultureConvincing arguments that data assets and their value do not necessarily grow if locked in the drawer

Incentives and models that can convince data holders that there is life after the announcement of data existence and/or sharing (share does not necessarily mean for free, nor for unbridled use) Interoperability, common metadata, formats, etc. In other words we need to create/reinforce a data economy based on widely agreed principles and rules, mutual understanding, sustainable and adaptive models, simplified copyright rules and licensing models

The present time window seems appropriate

Challenges

43

N.Calzolari

Multilingual Web, Madrid, 2010

KYOTO can be a “model”

For other projects to follow

Slide44

LR building

as collaborative “common shared task”

New methodology of work

Assemble a comprehensive “map of language data and mechanisms” for the planet’s languages ( LRE Map

)

Interoperability acquires even more value

Needs consensual planning of common strategies

towards shared objectives

Not just the sum of many individual effortsBut an organised, well-structured, collective enterpriseSimilar to more mature sciences: Physicists/Astronomers’s experiments … of X,000 people working on the same big enterprise

N. Calzolari

44

2nd KYOTO Workshop, Gifu, Japan, January 2011

META-SHARE

is a

big step that

needs a real Paradigm shif

t

Slide45

45

2nd KYOTO Workshop, Gifu, Japan, January 2011

We wanted more & more data ...

We experience today a sort of statistical “intoxication” !

It started as a new strategy, a revolution maybe? But it has turned to tactics. Stuck with it? In a narrow loop of small advances, not linked to each other

Can we add

also a new strategy? and hopefully a vision?

Main Statement

We tend to forget about “language” & the need to

understand its properties & complexities

Where do we (try to) encode what we know about language properties?

In annotations

Preamble

Vision

Like the big Genome project

, ...

a large

Language initiative

Is there

any theoretical knowledge of or

Any serious methodology of studying and exploiting

the

interactions

among the various annotation layers?

BUT

Slide46

46

2nd KYOTO Workshop, Gifu, Japan, January 2011

Strategy

MANY

(parallel)

texts

for

MANY

languages With ALL possible annotation layers

Similar to

more mature sciences

, e.g.

p

hysics, … of thousands of people working together

on the same big experiment

Create a sustainable infrastructure

for a

large Language repository plan,

Where we

accumulate all the knowledge we have about language &

Encourage analysis of linguistic interrelations

Means

a change of mentality:

going

beyond “individual” research

interests

From “my approach” to some “compromise” allowing to go for big amounts/ integration/building on each other/…

Slide47

From no infrastructure ...

To many infrastructures/networks

We were complaining there was no infrastructure ...

Have we been too successful??Now many infrastructural/networking initiatives

Very good opportunity

But only if we are able to act in a coordinated & coherent way

Otherwise we spoil & confuse the field47

47

2nd KYOTO Workshop, Gifu, Japan, January 2011

N. Calzolari