Samuel Ranta 23112015 Ingredients of archival disaster multiple platforms multiple formats no compatibility Who controls the past controls the future Who controls the present controls the past ID: 479794
Download Presentation The PPT/PDF document "Investigating the Effects of Data Migrat..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Investigating the Effects of Data Migration
Samuel Ranta, 23.11.2015Slide2
Ingredients of archival disaster: multiple platforms, multiple formats, no compatibility
Who controls the past controls the future. Who controls the present controls the past.
George Orwell, Nineteen Eighty-Four, 1949Slide3
Doomsday prophecies
“…we are moving into an era where much of what we
know today, much of what is coded and written electronically, will be lost forever. We are, to my mind, living in the midst of digital Dark Ages…”
Terry Kuny, 1997Slide4
Claims made
“Similarly, existing translation software available for the
migration and translation of document formats illustrates that the problems are significant and the results are often less than satisfactory. The simple case of conversion between the most popular document formats, MS-Word and WordPerfect provides ample illustration
of the challenges that are faced…”
Terry
Kuny
, 1997
For my surprise, the very first attempt to transfer old manuscripts of medicine paper caused no problem at all. The result was perfect. Every key stroke was migrated perfectly.Slide5
Claims made
”Digital documents are vulnerable
to loss via the decay and obsolescence of the media on which
they are stored, and they become inaccessible and unreadable when
the software needed to interpret them, or the hardware on which
that software runs, becomes obsolete and is lost”.
-Jeff Rothenberg, 1999
The very first attempt to read old VAX-tape from 1985 was
succesful
.
”
It took less than a minute, no problem at all
”,
said the statement of the company representative who did the job for me.Slide6
Claims made
”…the nearly universal experience has been that migration is labor-intensive, timeconsuming, expensive, error-prone, and fraught with the danger of losing or corrupting information”.
-Jeff Rothenberg, 1999
In our yet unpublished article (mostly due to the lack of reviewers and lack of suitable publishing forum), we migrated 41 files through 12 versions of Photoshop from 1993 to 2012. After eliminating changes produced by lossy formats, we were left with only two cases where migration produced errors. Errors were not very severe, images have to be zoomed in to spot the changes.
Changes produced at the binary level are tremendous (migration to the next generation can produce 76679 changes), but these all happen to the metadata, the actual image data is intact. (Contrary to what was claimed). Slide7
Image data (
intact
)
Metadata (
changed
)
Screenshot
from
ExamDiff
ProSlide8
It has been claimed this happens to images during the migration process:
This seems to happen only if warnings about quality loss or known facts about the nature of the image formats are completely ignored.Slide9
But
what
about
the
”
universal experience”. We all
have
it!
So
far
, my
experiments
with
office
applications
have shown
that
they
are
much more compatible than generally thought, but when tested with the .docx - files filled with charts, tables, shapes, SmartArt –objects and migrated to another application, the tracks of universal experience start to arise.
When
.docx
document
is
filled
with
the
whole
repertoire
of
objects
used
is
opened
in
LibreOffice
and
NeoOffice
…Slide10
Original:Slide11
Migrated to
LibreOffice
Vanilla (
mac
):Slide12
Migrated to LibreOffice
5
(
LubuntuPC
):Slide13
Very
complex
dynamic
documents
greatly differ from images and other
static
data.
Our
experience
probably
originates
from
all
the problems we
have
experienced
with them and some data transfer errors and rare cases where application, for some reason, writes the file in a form that is inaccessible for some applications.I noticed an incompatibility issue with notepad 10.0 (Windows 10) and Leafpad 0.81 (Lubuntu):Slide14
Should
be
:Slide15
In Leafpad
:Slide16
Difference
between
non-working
and
working
:Slide17
Reason
found:Unicode
UnicodeSlide18
I accidently
saved
the
same
file in with two different encodings
.
Leafpad
doesn’t
support
U
nicode
(
or
supports it inadequatly
).Migration
can
cause
errors
if the process is not controlled from start to end. The question is, what is the role of metadata in the future?What about program code? How precisely can we migrate that?