/
Veni ,  vidi , CLARIN! Darja Fišer Veni ,  vidi , CLARIN! Darja Fišer

Veni , vidi , CLARIN! Darja Fišer - PowerPoint Presentation

aaron
aaron . @aaron
Follow
349 views
Uploaded On 2018-09-19

Veni , vidi , CLARIN! Darja Fišer - PPT Presentation

DARIAH Day UZH Zurich 18 December 2017 CCBY 40 Overview Intro to CLARIN CLARIN data architecture CLARIN for data science 2 Intro to CLARIN CLARIN in seven bullets CLARIN is the Common Language Resources and Technology ID: 670939

parliamentary data language clarin data parliamentary clarin language political social perspective linguistic sciences content science humanities dynamics speech tools

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Veni , vidi , CLARIN! Darja Fišer" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Veni, vidi, CLARIN!

Darja Fišer

DARIAH Day @ UZH

Zurich, 18 December 2017

CC-BY 4.0

Slide2

OverviewIntro to CLARINCLARIN data architecture

CLARIN for data science

2Slide3

Intro to CLARINSlide4

CLARIN in seven bullets

CLARIN

is the Common Language Resources and Technology Infrastructure

ESFRI

ERIC status since 2012, Landmark since 2016

t

hat provides easy and sustainable access for scholars in the humanities and social sciences and beyondto digital language data (in written, spoken, video or multimodal form)and advanced tools to discover, explore, exploit, annotate, analyse or combine them, wherever they are locatedthrough a single sign-on environmentand that serves as an ecosystem for knowledge sharing.

4Slide5

CLARIN ERIC in members and centres

A

consortium

of:

19

members: AT, BG, CZ, DE, DK, DLU, EE, FI, GR,

HU, IT

, LT, LV, NL, NO, PL, PT, SE, SI2 observers: FR, UK;>40 centres5Slide6

What CLARIN Centres offer

Repository

library of linguistic data and tools

search for data and tools and easily use them online or download them

deposit your data and be sure it is safely stored, everyone can find it, and correctly cite it

Federated single sign-on

log in

once with your existing institutional credentialsget access to protected resourcesMetadatadescribe content, provenance and formats of linguistic data and toolsfacilitate preservation and dissemination of linguistic data and toolsPersistent Identifier (PID or handle)a special permanent URL that provides a permanent link to linguistic data and toolswill resolve correctly even if in some distant future the data is movedshould be used as URL in citationsLicensingPublicAcademicRestricted

Preservation (Data Seal of

Approval)

committed to long-term care of items in the repository

ensure the archived data can be found, understood and used in the future

6Slide7

CLARIN data types and user communities

Newspaper

archives

Literary

texts

Parliamentary recordsLiterary textsHistorical lettersBroadcast

archives

Oral

History

data

Social

Media data

7

Digital humanities

Linguistics and

P

hilology

Translation and Lexicography

Literary Studies

History

Political and Social Sciences

Media

Studies

Culture, Folklore, Anthropology

Speech therapy

Teachers

G

eneral

P

ublicSlide8

CLARIN data architectureSlide9

Repositories9

* slides by Dieter Van

UytvanckSlide10

Harvesting10Slide11

Processing11

Slide12

Content search

12

Slide13

Workflows13

Slide14

CLARIN for data scienceSlide15

CLARIN and data science (1)

Text

and

speech as

social

and cultural dataContribution to the development of new methodological frameworks for the

integrated

processing of multiple datatypes,

and

multidisciplinary

research

agendas

Europe’s

multilinguality as a basis for

comparative

research of societal and cultural phenomena, that are reflected in

language use:Migration patterns

Intellectual

history

Language

variation

across

period

and

region

Dynamics in

mental

health

conditions

Parliamentary

discourse

15Slide16

Parliamentary records

great

potential

for

reuse

and re-purposing within many fields of study in the humanities and social sciences (and beyond):suited for both close reading and

distance

reading

Humanitie

s

:

history, language

change, discourse analysis …

Social

sciences: social and cultural dynamics, political sciences, economics ...

considered a rich data type

apart

from

linguistic

content,

rich

in

metadata

(speaker, party

affiliation

,

age

,

sex

,

education

,

origin

,

duration

of speech)

apart

from

linguistic

content,

rich

in extralinguistic clues (interruptions, voting results)made easily available under the Freedom of Information acts in over 100 countries all around the world to enable informed participation by the public and improve effective functioning of democratic systemsbut alsooften presenting itself as messy or noisy data calling for links with data in other modalities than text and speechcreated under specific circumstances that need to be well understood before strong conclusions can be drawn

16Slide17

Corpora of parliamentary records

Coverage

exist for

18 countries

Size

(in tokens)

largest: UK

(1.6 billion)smallest: Portuguese (1 million)Periods covered by the corpusmostly 2nd half of 20th century and 21st century, Dutch and British corpora from early 19th centuryAvailabilityFor download (7)at, cz [CPM], dk, de [sample only], no [ToN], pt, lvFor on-line searching (7)

Finnish (

KORP)

CzechParl (

SketchEngine)

Latvian (

noSketchEngine)

Bulgarian (CLaRK)

Hungarian (HNC, registration required)Proceedings of Norwegian Parliamentary Debates (Corpuscle

)

Both for download and on-line searching (5)

Dutch (Political Mashup)Estonian (Keeleveeb)Swedish (KORP)Slovenian (noSketchEngine)Polish (NKJP)Full overview available

here

17Slide18

CLARIN’s Parliamentary data for many disciplines

Perspective of curators and researchers:

Historical perspective

: the specifics of diachronical

perspective; time dynamics per topics, etc.

Political science perspective

: political activity of parties and politicians; the role of the various public political bodies; policy comparison; language differences as indicators to differing political views etc.

Sociological perspective: conflicts in parliament; attitudes of politicians to critical issue: trending topics; patterns of language use reflecting societal dynamics, models of parliamentary communication, control, commissions, etc.Psychological and language perspective: language portraits of politicians; semantic differences of political terms; gestures; behavior in parliament, etc.Developers' perspective:Design of parliamentary speech corpora: annotations, visualization, etc.Text analytics, semantic processing and linking of parliamentary dataSearches and information extraction from parliamentary corporaMultilinguality issues in parliamentary data

18Slide19
Slide20

ParlaCLARIN @ LREC 2018

Background

Need

for better harmonization, interoperability and comparability of the resources and tools relevant for the study of parliamentary discussions and decisions, not only in Europe but worldwide

Aim

Bring

together researchers interested in compiling, annotating, structuring, linking and

visualising parliamentary records that are suitable for research in a wide range of disciplines in the Humanities and Social SciencesPaper submission deadline10 January 2018More infohttps://www.clarin.eu/ParlaCLARIN20Slide21

Veni

,

vidi

,

CLARIN!

darja.fiser@ff.uni-lj.si