Ceph Ankit Jain Prerak Mall Sukanya Venkataraman Problem and Motivation Problem Statement Develop a new reserver class for background work in Ceph Motivation 3 types of background work Scrubbing Backfill Recovery which are separate performed according to hardcoded pri ID: 930547
Download Presentation The PPT/PDF document "Reserver Class( es ) for Background Tas..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Reserver Class(es) for Background Tasks in Ceph
Ankit Jain,
Prerak
Mall,
Sukanya
Venkataraman
Slide2Problem and Motivation
Problem Statement:
Develop a new
reserver
class for background work in
Ceph
Motivation:
3 types of background work - Scrubbing, Backfill, Recovery which are separate, performed according to hard-coded priorities (with preemption)
Efficiently and effectively use cluster resources to reduce the response time to background work
Slide3Quick Overview of Current System
OSDs have 2
reserver
objects
Local – Request background tasks to be carried out
Remote – Carry out background tasks
Placement Groups governed by a state machine that request/cancel local/remote reservations based on states
Scheduling of scrubbing separate from recovery/backfill
Scheduling based on multi-level priority queue with preemption
Slide4Our Solution
Split
Async
Reserver
class into 2 type – Local, Remote
New scheduling algorithms
Local
Reserver
: Uniformly distributed, OSD states aware, load distributed
Remote
Reserver
: OSD states aware, load distributed, Network Efficient
Slide5States in Ceph (Backfilling)
Activating
WaitLocalBackfill
Reserved
Request Backfill
WaitRemoteBackfill
Reserved
Local Backfill Reserved
Backfilling
All Backfill Reserved
Not Backfilling
1.
UnfoundBackfill
2. Defer Backfill
3.
RemoteReservation
RevokedTooFull
Failure
Recovered
Backfilled
Slide6The details – Ceph Current Flow
Placement Group
AsyncReserver
Finisher
Primary OSD’s Local
Reserver
Target OSD’s Remote
Reserver
Slide7AsyncReserver
3 main queues –
in_progress
– Current tasks that are being carried out by the Finisher
preempt_queue
– Tasks in the
in_progress
queue that can be preempted
all_queues
– Tasks waiting to be either scheduled for a remote reservation request, or scheduled to be performedScheduling algorithm –Same for local, remote reservationsSchedule based on priorityIf number of tasks in in_progress exceeds maximum background tasks allowed in an OSD (configurable), preempt tasks from preempt_queue
Slide8Our Solution
Split the
AsyncReserver
into
AsyncReserverLocal
and
AsyncReserverRemote
Local
Reserver
scheduling algorithms – Select
what tasks to schedule on which OSDsUniform Distribution: Maintain a Least Recently Used OSD queue to uniformly distribute load amongst OSDs. Assume no knowledge of state of other OSDsOSD load-based distribution: Take the least loaded OSD amongst all target OSDs (average out incase of multiple targets)Remote Reserver scheduling algorithms – Select what
tasks to schedule from which OSDsUniform Selection: Maintain a Least Recently Used OSD queue to uniformly
select tasks from different OSDs.Favor Long Connections: Select tasks from those OSDs which already have an established connection
Slide9Simulation (in Python)
Workload Thread Pool
Local
Reserver
Request Reservation
Local
Reserver
Request Reservation
Local
Reserver
Request Reservation
RR
TPool
RR
TPool
RR
TPool
RR
reqRR
reqRR req
RR req
RR req
RR req
Task 1
Task 2
Task 3
Success
Failure
Success
Failure
Failure
Failure
Failure
Failure
Failure
Success
RR (Remote
Reserver
)
Slide10Integration with Ceph
Current Implementation : /
src
/common/
AsyncResever.h
Replaced original
AsyncReserver
with
AsyncReserverLocal
and
AsyncReserverRemote
Defined new API to select reservation for uniform load in
LocalAsyncReserver
Main Functionality in
do_queues
()
Slide11OSD.h
Slide12Current
do_queues
() implementation
AsyncReserver
Added
uniform_load
API
AsyncReserverLocal
Slide13get_reservation_index_for_uniform_load
()
Slide14Results – 5 Tasks
Slide15Results – 25 Tasks
Slide16Summary
Local, remote
reservers
have different kinds of tasks and
behaviours
, hence need different algorithms
Scheduling based on overall state of the OSDs in the cluster reduces response time
Slide17Conclusion
This is not yet extensively tested for various workloads in
Ceph
Improvements/Alternatives:
Dynamically change priorities
Dynamically change maximum allowed background tasks per OSD
Integrate replica deletion with backfilling – currently separate, even though related
Slide18THANK YOU