CSA — Paper Review (I)

Reggie Hsu
5 min readOct 11, 2019

Hi, I’m Reggie. This is Part One of a review on the ICPP 2019 paperCooperative Job Scheduling and Data Allocation for Busy Data-Intensive Parallel Computing Clusters — published by Guoxin Liu et al.

This actually started as my midterm report assigned in our Operating System course taught by Pf. Farn Wang in National Taiwan University, where we review papers published in top-tier OS conferences.

While writing this report, I thought why not post it on Medium? But it turns out much more difficult than I first thought to conjure my poor vocabulary into a somewhat readable post, not to mention explaining technical details in simple terms.

That’s when I observed many Medium posts explain technical details with analogies, and they’re written in a somewhat less academic tone. (I might write another post about my observations.) So this is the output of my 3-day effort trying to analogize the topic at hand into a Medium-ish article.

Bear with my writing, but still enjoy!

The Textile Mill Analogy

Imagine yourself managing a huge textile mill with thousands of workers. Each day, new design layouts each with multiple patterns are sent in at different time of the day, and your workers’ jobs are to weave the layouts on specific type of fabrics before meeting a stated deadline.

As weaving a single pattern requires no interruption, the workers are not allowed to suspend their work once a pattern is in progress. (We say that weaving a single pattern is atomic in programming terms.)

Your job is to assign these layouts to the workers, along with managing logistics of the fabrics for them to accomplish their jobs. (Since you have a large factory, distributing hefty fabrics to the workers is a major problem.)

At first, you are satisfied with a scheme where you assign jobs to idle workers whenever a new order is made, along with fabrics they need. Yet as the amount of orders grew, some newly arrived orders had much earlier deadlines than older ones, and since many atomic operations are on the workers’ hands, if at the moment no workers are able to handle them, they are unfortunately destined to drop.

In view of this, you started to gather information about when the orders would arrive in advance and preschedule the jobs a day before. Now, aside from a few unexpected orders, you managed to fit in all the jobs among the workers within a day.

Sounds nice, right?

Instead, you now find many of your workers idle in between arrivals of orders, and since fabrics are allocated far wide, sharing fabrics among jobs are impractical. As such, time and resources are wasted.

Instead, you now find many of your workers idle in between arrivals of orders, and since fabrics are allocated far wide, sharing fabrics among jobs are impractical. As such, time and resources are wasted.

You thought there might be a better scheme where you could schedule the jobs and allocate fabrics in a more organized fashion, with as few dropped orders as possible.

But how?

Edit: I found it more inhumane as I tried to fit more constraints into the analogy, so for sanity purposes, I left out minor limitations, e,g. workers are grounded, but anyways!

Data-Intensive Parallel Computing Clusters

Aside from the fact that you sound like an inhumanely demanding boss, the textile mill scenario sounds pretty much a coarse analogy of Facebook’s distributive database system, where the jobs are now actually processed queries (this could be implemented through MapReduce, or other architectures), and the workers are instead servers processing distributed data (fabrics) given time limits.

As analogous to the textile mill scenario, jobs in a distributive/parallel database system are scheduled by a job scheduler (that’s you in the textile mill analogy), which assigns jobs to server nodes that host data according to the jobs’ requirements.

While scheduling, certain factors may be taken into account to lower monetary costs and to improve energy efficiency. Those may include keeping up as few servers as possible, maintaining minimum network load, or requiring maximum jobs to finish in time. We hence come up with performance metrics that subject to these constraints and try to design scheduling algorithms to minimize these metrics.

Cooperative vs Preemptive Scheduling

Before digging deeper, let’s get back and take a look at a constraint we imposed earlier but omitted explanation. In the textile mill analogy, we supposed the process of weaving a pattern to be an uninterrupted operation, this can be analogous to parts of a computer process that are atomic. In between atomic operations, processes are divisible, hence they can be interleaving.

If processes manage their own lifecycles, i.e. halt on their own requests and keep their own states, we say the scheduling is of a cooperative scheme. Otherwise, if processes are forced to halt by the scheduler, the scheduling is a preemptive one. Image source

Cooperative Scheduling
Preemptive Scheduling

Preemptive scheduling allows the scheduler to actively arrange jobs in a more flexible manner, but as is the case with most computer mechanisms, it is only a matter of trade-off. Since context switches happen arbitrarily, storing the process’s states is required, thus overhead is generally much larger when schedulers are scheduling the switches.

In the following discussion (within the context of the paper), we are only interested in the cooperative scheme.

Given some backgrounds on the problem, in our next post, we’ll dive deeper into the topic to take a closer look at how the objectives are modeled in mathematical terms. We’ll also see how the authors come up with novel heuristic solutions to solve such problems.

--

--