of delta sync and other optimisations in HTTP WebDav synchronisation protocols Do we need changes in OwnCloud protocol Wojciech Jarosz AGH University of Science and Technology CERN ID: 510057
Download Presentation The PPT/PDF document "A study" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A study of delta sync and other optimisations in HTTP/WebDav synchronisation protocols
Do we need changes in OwnCloud protocol?
Wojciech Jarosz
AGH
University
of Science and Technology / CERNSlide2
IntroductionOwncloud protocol, CERNBox serviceEnhancing current protocolInvestigation
of following enhancements:BundlingDelta-syncingCompressionChunk size
adjustment
Context: scientific environment at CERN
CS3 Zurich, January 2016
2Slide3
IntroductionData from CERNBox FS and network logs
CS3 Zurich, January 20163Slide4
CERNBoxDistinguished features: Integrated with 80PB of physics dataFuture: easy and
effective to share experiment resultsFuture: focus on scientific
usage
Currently: a mix of scientific and
personal use
CS3 Zurich, January 2016
4Slide5
CERNBox as of Oct 15~ 31 TB of data~ 3700 users~
24 milion files in ~ 3 milion directoriesAverage file size: ~ 1.3 MB, median file size <
100kB
200k
file uploads / downloads per day
CS3 Zurich, January 2016
5Slide6
FilesizesCS3 Zurich, January 2016
6Slide7
Files count and sizeCS3 Zurich, January 20167
No
extensionSlide8
Where are the transfers coming from?
CS3 Zurich, January 20168Slide9
Downloads vs UploadsCS3 Zurich, January 2016
9Slide10
Protocol - chunkingCould be used for:partial uploaddelta-sync
deduplicationIs the chunk size chosen correctly?Most of the files
are
smallModern protocols should use network-
aware chunkingCurrently
only ~0.15% of all
PUTs are chunkedIs dynamic chunking a viable
option
?
CS3 Zurich, January 2016
10Slide11
Enhancements to the current OwnCloud protocolFocus on bundling, delta-sync and compression
CS3 Zurich, January 201611Slide12
BundlingTypically users are active only a few days a
monthCS3 Zurich, January 201612Slide13
BundlingEven power users work in cycles
CS3 Zurich, January 201613Slide14
BundlingTypically users are active only a few days
a monthOften over 2000 requests in 10 minutesSmall file size
Implementation?
Simple bundling
– TARBall?Choose the right bundle
sizeSend chunks
in parallelError
reportingCS3 Zurich, January 201614
tar
untarSlide15
BundlingDROPBOX[1]CERNBOX*Reduce TCP slow-start
effectCS3 Zurich, January 201615
Before
bundling
After
bundling
Median flow size16.2 kB
42.4
kB
Throughput
PUT
358
kbit
/s
552.92
kbit
/s
Throughput
GET
783
kbit
/s
1294
kbit
/s
Before
bundling
After
bundling
Throughput
PUT
~3600
kbit
/s
Up
to 400
Mbit
/s ?
Throughput
GET
~7653
kbit
/s
Up
to 500
Mbit
/s ?
[1]
I. Drago, M. Mellia, M. M. Munaf`o, A.
Sperotto,
R
.
Sadre
, and A. Pras.
Inside
Dropbox
:
Understanding
Personal
Cloud
Storage Services
.
In
Proceedings
of the 12th ACM Internet
Measurement
Conference
, IMC’12, pages 481–494, 2012
.
* Based on users inside
CERN and affiliated institutionsSlide16
Extensions and filesizesCS3 Zurich, January 2016
16?Slide17
Delta-syncAbout 7.8 % of the files are versionsTypically files
are modified the same dayUsually small files
CS3 Zurich, January 2016
17Slide18
ROOT filesScientific software frameworkComplex file structureAlready compressed
Small changes scatteredthroughout the fileCS3 Zurich, January 2016
18Slide19
Delta-syncPossible implementationsChunk-basedByte-range requestMore data and simulation
neededIt might be not worth implementingCS3 Zurich, January 2016
19Slide20
CompressionFrom TOP20 extensions (sizewise) only .txt will compress
wellCompression can be slow, but almost all requests are
executed
from desktop
clientsCS3 Zurich, January 2016
20Slide21
FutureSlide22
Future - serviceCernBOX fully exposed to a very large scientific
repository (ATLAS, LHCb, CMS…)Fuse-mount to underlying CernBOX storage
available
everywhere at CERNWill users
use CERNBox in
new ways?
CS3 Zurich, January 201622Slide23
ConclusionOwncloud protocol is simple, but is it enough?
Understand before implementationWork in progress!
MSc
at AGHCS3 Zurich, January 2016
23Slide24
ConclusionBundling looks like the most viable enhancementFurther
research is needed for delta-sync and dynamic chunking
Compression
is less likely to enhance current
protocol
CS3 Zurich, January 2016
24Slide25
Contact detailsWojciech JaroszWojciech.Jarosz@cern.ch +41 22 76 75970
CS3 Zurich, January 201625Opinions
/
questions
most welcome!How the usage compares
to your system?How to
implement the new features
?Feedback, ideas, comments…