/
Creating JATS XML from Japanese language articles and autom Creating JATS XML from Japanese language articles and autom

Creating JATS XML from Japanese language articles and autom - PowerPoint Presentation

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
375 views
Uploaded On 2015-11-24

Creating JATS XML from Japanese language articles and autom - PPT Presentation

  Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi Yamamoto 1 1 Nakanishi Printing Co Ltd Kyoto Japan 2 Antenna   House Inc Tokyo Japan 3 ID: 204433

jats xml articles japanese xml jats japanese articles language writing word languages creating formatter alternatives vertical japan surname nakanishi

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Creating JATS XML from Japanese language..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. 

Hidehiko

Nakanishi

1

Toshiyuki Naganawa

2

Soichi

Tokizane

3

Tsuyoshi

Yamamoto

1

1

Nakanishi

Printing Co

., Ltd

. Kyoto Japan

2

Antenna

 

House Inc. Tokyo Japan

3

The University

of Tokyo Slide2

ContentsIntroduction

Creating

Japanese XML articles in

JATS

Creating PDF using AH

Formatter

Challenges of Applying JATS to Japanese language

texts

Future

ConclusionSlide3

IntroductionSlide4

Many countries use Non-Latin scriptSlide5

Not all research articles are written in English.Many articles are not even using Latin alphabetsSlide6

What languages are used in articles written in Japan?

Articles published in J-Stage,

E-journal

platform operated by the Japan Science and Technology

Agency (

JST

).

University journal articles indexed in

NDL-OPAC

,

All areasSlide7

We wanted schema applicable to JapaneseEven for Japanese-language articles, e-articles are essential.

We were looking for schema for Japanese-language articles.

Such schema had to accept English as well. Slide8

JATS multi-language supportIn 2011 JATS 0.4 enabled to express Japanese-language articles in XML

J-STAGE supported JATS 0.4 immediately

We started creating JATS XML for Japanese-language articles

Before that

Slide9

I am from Kyoto, Japan

Bethesda

Kyoto

East Asia Kanji

c

ultural zoneSlide10

Kyoto was a former capital

Where my company, Nakanishi Printing, is located.Slide11

Founded in 1865 by our ancestor.

150 year old family

business.

One of the oldest printers.

Former

building of Nakanishi Printing in Taisho era (1912-1926)

Current building of Nakanishi printing

Our TraditionSlide12

A brazier made by Woodcut print plate in 19c

Type

picker

1960’s

Our history

TodaySlide13

This is a Japanese e-journal

The Japanese Journal of Gastroenterological Surgery Slide14

Same page expressed in EnglishSlide15
Slide16

Expressing Multiple LanguagesAlternate expressions for a single object are necessarySimple repetition of a tag can be confusing

Two name expressions of the same person?

Or two different persons?

JATS

introduced “alternatives” tags for such casesSlide17

Two name expressions of a single person

<name-alternatives>

<

name name-style="eastern"

xml:lang

="ja-

Jpan

">

<surname>

中西

</surname>

<given-name>

秀彦

</given-name>

</name>

 

<

name name-style="western"

xml;lang

="

en

">

<surname>

Nakanishi

</surname>

<given-name>

Hidehiko

</given-name>

</name

>

</name-alternatives

>

“Alternatives” Tags Slide18

“Alternatives” tagsSlide19

element name

multi-language tag

Note

article title

<trans-title>

 

article subtitle

<trans-subtitle>

 

names

<name-alternatives>

 

affiliations

<

aff

-alternatives>

 

collaborators

<

collab

-alternatives>

 

abstract

<abstract>

<abstract> is repeatable with different "

xml:lang

".

<trans-abstract> is for articles later translated.

keyword group

<kwd-group>

<

kwd

-group> is repeatable with different "

xml:lang

".

generic

<alternatives>

any component which need multi-language data

How multiple language can be expressed in

JATSSlide20

Creating Japanese XML articles in JATSSlide21

Creating XML articles in JATSWe don’t have tools readily available for creating Japanese XML files.

Our method

Convert

Microsoft Word to Microsoft Office Open

XML

Convert

Microsoft Office Open XML to JATS

XML

Validate

XMLSlide22

(1) Converting Microsoft Word to Microsoft Office Open XML

MS Open

XML tags Slide23

(2) Converting Microsoft Office Open XML to JATS XML Through XSLT

,

removing

unnecessary

tags.

Perl program processing.

We faced the d

ifficulty

of Agglutinative

languages

A word connect next word without space.

Computer cannot distinguish word separation.

Even in given name and surname separation. Slide24

Agglutinative languagesTypical in East Asian languagesNo separating spaces between wordsSlide25

One sentence one character stringJapanese

Agglutinative

languages using

 

Ideograph

日本語

表意文字を用いた膠着語Slide26

Agglutinative languagesIn old days, even no punctuations were used i.e

. multiple sentences in one character string!Slide27

Inserting word separators. we insert separators manually.

surname

,

"

中西

",

given

name,

"

秀彦

", are attached as "

中西秀彦

" in an article It is separated as "中西@秀彦"

Possible alternatives are "中@西秀彦", and "中西

秀@彦", but only human can eliminate themThere is no algorithm to determine it correctly. Slide28

(3) Validating XMLUse the Oxygen XML editor

Final

JATS XML is

obtained to be uploaded to J-STAGESlide29

PDF is still necessary

For paper publishing.

For readability.Slide30
Slide31

Creating PDF using AH FormatterSlide32

Antenna House

AH Formatter Slide33

XSLTThe XSLT converts a JATS file into XSL-FO which expresses page model format for PDF. Slide34

For Japanese rendering

AH Formatter extension

Slide35

Using Formatter for STM articles

There are no major problems

The

basic style of writing STM papers do not differ greatly between western countries and

Japan.

Word separators should be inserted in XML in advance

Slide36

Challenges of Applying JATS to Japanese language textsBut in Japan,

exquisite type settings are

requested.

Automatic

type

setting by

AH

formatter may not be sufficient.Slide37

Avoiding Line-Top PunctuationsPunctuation marks shall not come at the top of a line

 ⇒ 

Also in English

or

ッ」 

(to mark a geminate consonant)

 

does not come in a head of a

line ⇒ Japanese ruleAH Formatter can handle these rules Slide38

Avoiding Word Breakup

Some words, such as personal names shall not be broken-up between lines

We

use

"Zero Width Joiner" code (&#x200D;)

e.g.

&#

x200D

;

西Slide39

Positioning Figures/TablesFigures and tables should be positioned in the SAME page that the corresponding texts appear.This requires customized

XSLT

, sometimes for each figures and tables.

This increases cost.Slide40

Positioning Figures/Tables

Every articles need these XSLTsSlide41

FutureWhat is to be done nextVertical writing

Emphasis or “

Kenten

WarichuSlide42

Vertical writingTraditionally, Japanese (and Chinese and Korean) writes from top to bottomSlide43

Vertical WritingVertical Writing causes some interesting problems, orientation of Arabic numerals and Latin alphabets

New element for direction is necessary

.

as <

writing-direction="

vertical">Slide44

EmphasisEmphasis or “Kenten

It is like bold faces and italics in English

We use <styled-content> and AH formatter extension to express this today.

We need a generic tag, <emphasis>Slide45

WarichuVertical writing texts sometimes contain notes called “

Warichu

”.

Warichu

uses 2 lines within a parent line.Slide46

Warichu

Historical document exampleSlide47

SuggestionAdditional tags forVertical writingEmphasis or “

Kenten

WarichuSlide48

ConclusionJATS opened a new horizon in processing Japanese-language articlesNo major difficulties

UTF-8, encoding for XML, also enables to express most Japanese characters correctlySlide49

ConclusionStill there are remaining issues in processing non-Latin, agglutinative languages such as Japanese.

Challenges

Word separators have to be inserted manually

Line break issues

Positioning figures and tables correctlySlide50

Heaven/Earth/Man

http://

artnews.blog.so-net.ne.jp

/2011-04-22Slide51

Structure vs. ExpressionIn pictograph/ideograph writing system, authors and publishers care more about the look appearance and the layout, than those in western world.Calligraphy

We sometimes need to describe such looks/layouts in XML.

May, or may not be solved by extending

JATSSlide52

Is JATS applicable?

Kaitai

shinsho

” the first western medical book translation in 1774. Slide53

Is JATS applicable?

Amma

tebiki

 

Eastern medical text book(1835)Slide54

Thank you