From: Petr Mladek <pmladek@suse.com>
To: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Oleg Nesterov <oleg@redhat.com>, Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Josh Triplett <josh@joshtriplett.org>,
Thomas Gleixner <tglx@linutronix.de>,
Linus Torvalds <torvalds@linux-foundation.org>,
Jiri Kosina <jkosina@suse.cz>, Borislav Petkov <bp@suse.de>,
Michal Hocko <mhocko@suse.cz>,
linux-mm@kvack.org, Vlastimil Babka <vbabka@suse.cz>,
live-patching@vger.kernel.org, linux-api@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC v2 07/18] kthread: Allow to cancel kthread work
Date: Fri, 2 Oct 2015 17:43:36 +0200 [thread overview]
Message-ID: <20151002154336.GC3122@pathway.suse.cz> (raw)
In-Reply-To: <20150928170314.GF2589@mtj.duckdns.org>
On Mon 2015-09-28 13:03:14, Tejun Heo wrote:
> Hello, Petr.
>
> On Fri, Sep 25, 2015 at 01:26:17PM +0200, Petr Mladek wrote:
> > 1) PENDING state plus -EAGAIN/busy loop cycle
> > ---------------------------------------------
> >
> > IMHO, we want to use the timer because it is an elegant solution.
> > Then we must release the lock when the timer is running. The lock
> > must be taken by the timer->function(). And there is a small window
> > when the timer is not longer pending but timer->function is not running:
> >
> > CPU0 CPU1
> >
> > run_timer_softirq()
> > __run_timers()
> > detach_expired_timer()
> > detach_timer()
> > #clear_pending
> >
> > try_to_grab_pending_kthread_work()
> > del_timer()
> > # fails because not pending
> >
> > test_and_set_bit(KTHREAD_WORK_PENDING_BIT)
> > # fails because already set
> >
> > if (!list_empty(&work->node))
> > # fails because still not queued
> >
> > !!! problematic window !!!
> >
> > call_timer_fn()
> > queue_kthraed_work()
>
> Let's say each work item has a state variable which is protected by a
> lock and the state can be one of IDLE, PENDING, CANCELING. Let's also
> assume that all cancelers synchronize with each other via mutex, so we
> only have to worry about a single canceler. Wouldn't something like
> the following work while being a lot simpler?
>
> Delayed queueing and execution.
>
> 1. Lock and check whether state is IDLE. If not, nothing to do.
>
> 2. Set state to PENDING and schedule the timer and unlock.
>
> 3. On expiration, timer_fn grabs the lock and see whether state is
> still PENDING. If so, schedule the work item for execution;
> otherwise, nothing to do.
>
> 4. After dequeueing from execution queue with lock held, the worker is
> marked as executing the work item and state is reset to IDLE.
>
> Canceling
>
> 1. Lock, dequeue and set the state to CANCELING.
>
> 2. Unlock and perform del_timer_sync().
>
> 3. Flush the work item.
>
> 4. Lock and reset the state to IDLE and unlock.
>
>
> > 2) CANCEL state plus custom waitqueue
> > -------------------------------------
> >
> > cancel_kthread_work_sync() has to wait for the running work. It might take
> > quite some time. Therefore we could not block others by a spinlock.
> > Also others could not wait for the spin lock in a busy wait.
>
> Hmmm? Cancelers can synchronize amongst them using a mutex and the
> actual work item wait can use flushing.
>
> > IMHO, the proposed and rather complex solutions are needed in both cases.
> >
> > Or did I miss a possible trick, please?
>
> I probably have missed something in the above and it is not completley
> correct but I do think it can be way simpler than how workqueue does
> it.
I have played with this idea and it opens a can of worms with locking
problems and it looks even more complicated.
Let me show this on a snippet of code:
struct kthread_worker {
spinlock_t lock;
struct list_head work_list;
struct kthread_work *current_work;
};
enum {
KTHREAD_WORK_IDLE,
KTHREAD_WORK_PENDING,
KTHREAD_WORK_CANCELING,
};
struct kthread_work {
unsigned int flags;
spinlock_t lock;
struct list_head node;
kthread_work_func_t func;
struct kthread_worker *worker;
};
/* the main kthread worker cycle */
int kthread_worker_fn(void *worker_ptr)
{
struct kthread_worker *worker = worker_ptr;
struct kthread_work *work;
repeat:
work = NULL;
spin_lock_irq(&worker->lock);
if (!list_empty(&worker->work_list)) {
work = list_first_entry(&worker->work_list,
struct kthread_work, node);
spin_lock(&work->lock);
list_del_init(&work->node);
work->flags = KTHREAD_WORK_IDLE;
spin_unlock(&work->lock);
}
worker->current_work = work;
spin_unlock_irq(&worker->lock);
if (work) {
__set_current_state(TASK_RUNNING);
work->func(work);
} else if (!freezing(current))
schedule();
goto repeat;
}
EXPORT_SYMBOL_GPL(kthread_worker_fn);
static void __queue_kthread_work(struct kthread_worker *worker,
struct kthread_work *work)
{
list_add_tail(&work->node, pos);
work->worker = worker;
}
bool queue_kthread_work(struct kthread_worker *worker,
struct kthread_work *work)
{
bool ret = false;
unsigned long flags;
/*
* Lock worker first to avoid ABBA deadlock.
* What if the work is already queued to another worker?
*/
spin_lock_irqsave(&worker->lock, flags);
spin_lock(&work->lock);
if (work->flags != KTHREAD_WORK_IDLE)
goto unlock_out;
__queue_kthread_work(worker, work);
ret = true;
unlock_out:
spin_unlock(&work->lock);
spin_unlock_irqrestore(&worker->lock, flags);
out:
return ret;
}
bool cancel_kthread_work_sync(struct kthread_work *work)
{
struct kthread_worker *worker;
bool flush = true;
unsigned long flags;
again:
worker = ACCESS_ONCE(work->worker);
if (worker)
spin_lock_irqsave(&worker->lock, flags);
else
local_irq_save(flags);
spin_lock(&work->lock);
if (worker && worker != work->worker) {
spin_unlock(&work->lock);
spin_unlock_irqrestore(&worker->lock, flags);
goto again;
}
switch (work->flags) {
case KTHREAD_WORK_PENDING:
list_del_init(&work->node);
case KTHREAD_WORK_IDLE:
work->flags = KTHREAD_WORK_CANCELING;
break;
case KTHREAD_WORK_CANCELING:
/*
* Some other work is canceling. Let's wait in a
* custom waitqueue until the work is flushed.
*/
prepare_to_wait_exclusive(&cancel_waitq, &cwait.wait,
TASK_UNINTERRUPTIBLE);
flush = false;
break;
}
spin_unlock(&work->lock);
if (worker)
spin_unlock_irqrestore(&worker->lock, flags);
else
local_irq_restore(flags);
if (flush) {
kthread_work_flush(work);
/*
* We are the cancel leader. Nobody else could manipulate the
* work.
*/
work->flags = KTHREAD_WORK_IDLE;
/*
* Paired with prepare_to_wait() above so that either
* waitqueue_active() is visible here or CANCELING bit is
* visible there.
*/
smp_mb();
if (waitqueue_active(&cancel_waitq))
__wake_up(&cancel_waitq, TASK_NORMAL, 1, work);
} elseif (READ_ONCE(work->flags) == KTHREAD_WORK_CANCELING) {
schedule();
}
}
IMHO, we need both locks. The worker manipulates more works and
need its own lock. We need work-specific lock because the work
might be assigned to different workers and we need to be sure
that the operations are really serialized, e.g. queuing.
Now, worker_fn() needs to get the first work from the the worker list
and then manipulate the work => it needs to get the worker->lock
and then work->lock.
It means that most other functions would need to take both locks
in this order to avoid ABBA deadlock. This causes quite some
complications, e.g. in cancel_work().
There might be even more problems if we try to queue the same work
into more workers and we start mixing the locks from different
workers.
I do not know. It is possible that you prefer this solution than
the atomic bit operations. Also it is possible that there exists
a trick that might make it easier.
I would personally prefer to use the tricks from the workqueues.
They are proven to work and they make the code faster. I think
that we might need to queue the work in a sensitive fast path.
IMHO, the code already is much easier than the workqueues because
there are no pools of kthreads behind each worker.
Best Regards,
Petr
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-10-02 15:43 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-21 13:03 [RFC v2 00/18] kthread: Use kthread worker API more widely Petr Mladek
2015-09-21 13:03 ` [RFC v2 01/18] kthread: Allow to call __kthread_create_on_node() with va_list args Petr Mladek
2015-09-21 13:03 ` [RFC v2 02/18] kthread: Add create_kthread_worker*() Petr Mladek
2015-09-22 18:20 ` Tejun Heo
2015-09-21 13:03 ` [RFC v2 03/18] kthread: Add drain_kthread_worker() Petr Mladek
2015-09-22 18:26 ` Tejun Heo
2015-09-21 13:03 ` [RFC v2 04/18] kthread: Add destroy_kthread_worker() Petr Mladek
2015-09-22 18:30 ` Tejun Heo
2015-09-21 13:03 ` [RFC v2 05/18] kthread: Add pending flag to kthread work Petr Mladek
2015-09-21 13:03 ` [RFC v2 06/18] kthread: Initial support for delayed " Petr Mladek
2015-09-21 13:03 ` [RFC v2 07/18] kthread: Allow to cancel " Petr Mladek
2015-09-22 19:35 ` Tejun Heo
2015-09-25 11:26 ` Petr Mladek
2015-09-28 17:03 ` Tejun Heo
2015-10-02 15:43 ` Petr Mladek [this message]
2015-10-02 19:24 ` Tejun Heo
2015-10-05 10:07 ` Petr Mladek
2015-10-05 11:09 ` Petr Mladek
2015-10-07 9:21 ` Petr Mladek
2015-10-07 14:24 ` Tejun Heo
2015-10-14 10:20 ` Petr Mladek
2015-10-14 17:30 ` Tejun Heo
2015-09-21 13:03 ` [RFC v2 08/18] kthread: Allow to modify delayed " Petr Mladek
2015-09-21 13:03 ` [RFC v2 09/18] mm/huge_page: Convert khugepaged() into kthread worker API Petr Mladek
2015-09-22 20:26 ` Tejun Heo
2015-09-23 9:50 ` Petr Mladek
2015-09-21 13:03 ` [RFC v2 10/18] ring_buffer: Do no not complete benchmark reader too early Petr Mladek
2015-09-21 13:03 ` [RFC v2 11/18] ring_buffer: Fix more races when terminating the producer in the benchmark Petr Mladek
2015-09-21 13:03 ` [RFC v2 12/18] ring_buffer: Convert benchmark kthreads into kthread worker API Petr Mladek
2015-09-21 13:03 ` [RFC v2 13/18] rcu: Finish folding ->fqs_state into ->gp_state Petr Mladek
2015-09-21 13:03 ` [RFC v2 14/18] rcu: Store first_gp_fqs into struct rcu_state Petr Mladek
2015-09-21 13:03 ` [RFC v2 15/18] rcu: Clean up timeouts for forcing the quiescent state Petr Mladek
2015-09-21 13:03 ` [RFC v2 16/18] rcu: Check actual RCU_GP_FLAG_FQS when handling " Petr Mladek
2015-09-21 13:03 ` [RFC v2 17/18] rcu: Convert RCU gp kthreads into kthread worker API Petr Mladek
2015-09-28 17:14 ` Paul E. McKenney
2015-10-01 15:43 ` Petr Mladek
2015-10-01 16:33 ` Paul E. McKenney
2015-09-21 13:03 ` [RFC v2 18/18] kthread: Better support freezable kthread workers Petr Mladek
2015-09-22 20:32 ` [RFC v2 00/18] kthread: Use kthread worker API more widely Tejun Heo
2015-09-30 5:08 ` Paul E. McKenney
2015-10-01 15:59 ` Petr Mladek
2015-10-01 17:00 ` Paul E. McKenney
2015-10-02 12:00 ` Petr Mladek
2015-10-02 13:59 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151002154336.GC3122@pathway.suse.cz \
--to=pmladek@suse.com \
--cc=akpm@linux-foundation.org \
--cc=bp@suse.de \
--cc=jkosina@suse.cz \
--cc=josh@joshtriplett.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=live-patching@vger.kernel.org \
--cc=mhocko@suse.cz \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox