From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f173.google.com (mail-wi0-f173.google.com [209.85.212.173]) by kanga.kvack.org (Postfix) with ESMTP id 0646482FA1 for ; Fri, 2 Oct 2015 11:43:39 -0400 (EDT) Received: by wicge5 with SMTP id ge5so38975197wic.0 for ; Fri, 02 Oct 2015 08:43:38 -0700 (PDT) Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id fd7si13988556wjc.157.2015.10.02.08.43.37 for (version=TLS1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 02 Oct 2015 08:43:37 -0700 (PDT) Date: Fri, 2 Oct 2015 17:43:36 +0200 From: Petr Mladek Subject: Re: [RFC v2 07/18] kthread: Allow to cancel kthread work Message-ID: <20151002154336.GC3122@pathway.suse.cz> References: <1442840639-6963-1-git-send-email-pmladek@suse.com> <1442840639-6963-8-git-send-email-pmladek@suse.com> <20150922193513.GE17659@mtj.duckdns.org> <20150925112617.GA3122@pathway.suse.cz> <20150928170314.GF2589@mtj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150928170314.GF2589@mtj.duckdns.org> Sender: owner-linux-mm@kvack.org List-ID: To: Tejun Heo Cc: Andrew Morton , Oleg Nesterov , Ingo Molnar , Peter Zijlstra , Steven Rostedt , "Paul E. McKenney" , Josh Triplett , Thomas Gleixner , Linus Torvalds , Jiri Kosina , Borislav Petkov , Michal Hocko , linux-mm@kvack.org, Vlastimil Babka , live-patching@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org On Mon 2015-09-28 13:03:14, Tejun Heo wrote: > Hello, Petr. > > On Fri, Sep 25, 2015 at 01:26:17PM +0200, Petr Mladek wrote: > > 1) PENDING state plus -EAGAIN/busy loop cycle > > --------------------------------------------- > > > > IMHO, we want to use the timer because it is an elegant solution. > > Then we must release the lock when the timer is running. The lock > > must be taken by the timer->function(). And there is a small window > > when the timer is not longer pending but timer->function is not running: > > > > CPU0 CPU1 > > > > run_timer_softirq() > > __run_timers() > > detach_expired_timer() > > detach_timer() > > #clear_pending > > > > try_to_grab_pending_kthread_work() > > del_timer() > > # fails because not pending > > > > test_and_set_bit(KTHREAD_WORK_PENDING_BIT) > > # fails because already set > > > > if (!list_empty(&work->node)) > > # fails because still not queued > > > > !!! problematic window !!! > > > > call_timer_fn() > > queue_kthraed_work() > > Let's say each work item has a state variable which is protected by a > lock and the state can be one of IDLE, PENDING, CANCELING. Let's also > assume that all cancelers synchronize with each other via mutex, so we > only have to worry about a single canceler. Wouldn't something like > the following work while being a lot simpler? > > Delayed queueing and execution. > > 1. Lock and check whether state is IDLE. If not, nothing to do. > > 2. Set state to PENDING and schedule the timer and unlock. > > 3. On expiration, timer_fn grabs the lock and see whether state is > still PENDING. If so, schedule the work item for execution; > otherwise, nothing to do. > > 4. After dequeueing from execution queue with lock held, the worker is > marked as executing the work item and state is reset to IDLE. > > Canceling > > 1. Lock, dequeue and set the state to CANCELING. > > 2. Unlock and perform del_timer_sync(). > > 3. Flush the work item. > > 4. Lock and reset the state to IDLE and unlock. > > > > 2) CANCEL state plus custom waitqueue > > ------------------------------------- > > > > cancel_kthread_work_sync() has to wait for the running work. It might take > > quite some time. Therefore we could not block others by a spinlock. > > Also others could not wait for the spin lock in a busy wait. > > Hmmm? Cancelers can synchronize amongst them using a mutex and the > actual work item wait can use flushing. > > > IMHO, the proposed and rather complex solutions are needed in both cases. > > > > Or did I miss a possible trick, please? > > I probably have missed something in the above and it is not completley > correct but I do think it can be way simpler than how workqueue does > it. I have played with this idea and it opens a can of worms with locking problems and it looks even more complicated. Let me show this on a snippet of code: struct kthread_worker { spinlock_t lock; struct list_head work_list; struct kthread_work *current_work; }; enum { KTHREAD_WORK_IDLE, KTHREAD_WORK_PENDING, KTHREAD_WORK_CANCELING, }; struct kthread_work { unsigned int flags; spinlock_t lock; struct list_head node; kthread_work_func_t func; struct kthread_worker *worker; }; /* the main kthread worker cycle */ int kthread_worker_fn(void *worker_ptr) { struct kthread_worker *worker = worker_ptr; struct kthread_work *work; repeat: work = NULL; spin_lock_irq(&worker->lock); if (!list_empty(&worker->work_list)) { work = list_first_entry(&worker->work_list, struct kthread_work, node); spin_lock(&work->lock); list_del_init(&work->node); work->flags = KTHREAD_WORK_IDLE; spin_unlock(&work->lock); } worker->current_work = work; spin_unlock_irq(&worker->lock); if (work) { __set_current_state(TASK_RUNNING); work->func(work); } else if (!freezing(current)) schedule(); goto repeat; } EXPORT_SYMBOL_GPL(kthread_worker_fn); static void __queue_kthread_work(struct kthread_worker *worker, struct kthread_work *work) { list_add_tail(&work->node, pos); work->worker = worker; } bool queue_kthread_work(struct kthread_worker *worker, struct kthread_work *work) { bool ret = false; unsigned long flags; /* * Lock worker first to avoid ABBA deadlock. * What if the work is already queued to another worker? */ spin_lock_irqsave(&worker->lock, flags); spin_lock(&work->lock); if (work->flags != KTHREAD_WORK_IDLE) goto unlock_out; __queue_kthread_work(worker, work); ret = true; unlock_out: spin_unlock(&work->lock); spin_unlock_irqrestore(&worker->lock, flags); out: return ret; } bool cancel_kthread_work_sync(struct kthread_work *work) { struct kthread_worker *worker; bool flush = true; unsigned long flags; again: worker = ACCESS_ONCE(work->worker); if (worker) spin_lock_irqsave(&worker->lock, flags); else local_irq_save(flags); spin_lock(&work->lock); if (worker && worker != work->worker) { spin_unlock(&work->lock); spin_unlock_irqrestore(&worker->lock, flags); goto again; } switch (work->flags) { case KTHREAD_WORK_PENDING: list_del_init(&work->node); case KTHREAD_WORK_IDLE: work->flags = KTHREAD_WORK_CANCELING; break; case KTHREAD_WORK_CANCELING: /* * Some other work is canceling. Let's wait in a * custom waitqueue until the work is flushed. */ prepare_to_wait_exclusive(&cancel_waitq, &cwait.wait, TASK_UNINTERRUPTIBLE); flush = false; break; } spin_unlock(&work->lock); if (worker) spin_unlock_irqrestore(&worker->lock, flags); else local_irq_restore(flags); if (flush) { kthread_work_flush(work); /* * We are the cancel leader. Nobody else could manipulate the * work. */ work->flags = KTHREAD_WORK_IDLE; /* * Paired with prepare_to_wait() above so that either * waitqueue_active() is visible here or CANCELING bit is * visible there. */ smp_mb(); if (waitqueue_active(&cancel_waitq)) __wake_up(&cancel_waitq, TASK_NORMAL, 1, work); } elseif (READ_ONCE(work->flags) == KTHREAD_WORK_CANCELING) { schedule(); } } IMHO, we need both locks. The worker manipulates more works and need its own lock. We need work-specific lock because the work might be assigned to different workers and we need to be sure that the operations are really serialized, e.g. queuing. Now, worker_fn() needs to get the first work from the the worker list and then manipulate the work => it needs to get the worker->lock and then work->lock. It means that most other functions would need to take both locks in this order to avoid ABBA deadlock. This causes quite some complications, e.g. in cancel_work(). There might be even more problems if we try to queue the same work into more workers and we start mixing the locks from different workers. I do not know. It is possible that you prefer this solution than the atomic bit operations. Also it is possible that there exists a trick that might make it easier. I would personally prefer to use the tricks from the workqueues. They are proven to work and they make the code faster. I think that we might need to queue the work in a sensitive fast path. IMHO, the code already is much easier than the workqueues because there are no pools of kthreads behind each worker. Best Regards, Petr -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org