From: Tejun Heo <tj@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux.com>, Michal Hocko <mhocko@suse.cz>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Lai Jiangshan <jiangshanlai@gmail.com>, Shaohua Li <shli@fb.com>,
linux-mm <linux-mm@kvack.org>
Subject: Re: [GIT PULL] workqueue fixes for v4.3-rc5
Date: Wed, 14 Oct 2015 15:02:59 -0400 [thread overview]
Message-ID: <20151014190259.GC12799@mtj.duckdns.org> (raw)
In-Reply-To: <CA+55aFzhHF0KMFvebegBnwHqXekfRRd-qczCtJXKpf3XvOCW=A@mail.gmail.com>
Hello, Linus.
On Wed, Oct 14, 2015 at 10:36:50AM -0700, Linus Torvalds wrote:
> > For both delayed and !delayed work items on per-cpu workqueues,
> > queueing without specifying a specific CPU always meant queueing on
> > the local CPU.
>
> That's just not true, as far as I can tell.
>
> I went back in the history to 2012, and it does
>
> if (unlikely(cpu != WORK_CPU_UNBOUND))
> add_timer_on(timer, cpu);
> else
> add_timer(timer);
>
> so this whole WORK_CPU_UNBOUND means "add_timer()" without specifying
> a CPU has been true for at least the last several years.
>
> So the documentation, the code, and the name all agree:
But wasn't add_timer() always CPU-local at the time? add_timer()
allowing cross-cpu migrations came way after that.
commit 597d0275736dad9c3bda6f0a00a1c477dc0f37b1
Author: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
Date: Thu Apr 16 12:13:26 2009 +0530
timers: Framework for identifying pinned timers
> WORK_CPU_UNBOUND does *not* mean that it's guaranteed to be the local
> CPU. The documentation says "preferred", the code clearly doesn't
> specify the CPU, and the name says "not bound to a particular CPU".
The name and documentation were mostly me being too hopeful and then
being lazy after realizing it wouldn't work.
> > I wish this were an ad-hoc thing but this has been the guaranteed
> > behavior all along. Switching the specific call-site to
> > queue_work_on() would still be a good idea tho.
>
> I really don't see who you say that it has been guaranteed behavior all along.
The behavior was just like that from the beginning. Workqueues were
always CPU-affine and timers too and we never properly distinguished
CPU affinity for the purpose of correctness from optimization.
> It clearly has not at all been guaranteed behavior. The fact that you
> had to change the code to do that should have made it clear.
It wasn't like that before the timer change.
> The code has *always* done that non-cpu-specific "add_timer()", as far
> as I can tell. Even back when that non-bound CPU was indicated by a
> negative CPU number, and the code did
>
> if (unlikely(cpu >= 0))
> add_timer_on(timer, cpu);
> else
> add_timer(timer);
>
> (that's from 2007, btw).
At the time, it was just an alternative way to writing.
if (cpu < 0)
cpu = raw_smp_processor_id();
add_timer_on(timer, cpu);
> So I really don't see your "guaranteed behavior" argument. It seems to
> be downright pure bullshit. The lack of a specific CPU has _always_
> (where "always" means "at least since 2007") meant "non-specific cpu",
> rather than "local cpu".
>
> If some particular interface ended up then actually using "local cpu"
> instead, that was neither guaranteed nor implied - it was just a
> random implementation detail, and shouldn't be something we guarantee
> at all.
>
> We strive to maintain user-space ABI issues even in the face of
> unintentional bugs and misfeatures. But we do *not* keep broken random
> in-kernel interfaces alive. We fix breakage and clean up code rather
> than say "some random in-kernel user expects broken behavior".
>
> And it seems very clear that WORK_CPU_UNBOUND does *not* mean "local
> cpu", and code that depends on it meaning local cpu is broken.
>
> Now, there are reasons why the *implementation* might want to choose
> the local cpu for things - avoiding waking up other cpu's
> unnecessarily with cross-cpu calls etc - but at the same time I think
> it's quite clear that mm/vmstat.c is simply broken in using a
> non-bound interface and then expecting/assuming a particular CPU.
I don't think I'm confused about the timer behavior. At any rate, for
!delayed work items, the failure to distinguish bound for correctness
and optimization has been for quite a while. I'd be happy to ditch
that but I don't think it'd be a good idea to do this at -rc5 w/o
auditing or debug mechanisms to detect errors.
> >> (That said, it's not obvious to me why we don't just specify the cpu
> >> in the work structure itself, and just get rid of the "use different
> >> functions to schedule the work" model. I think it's somewhat fragile
> >> how you can end up using the same work in different "CPU boundedness"
> >> models like this).
> >
> > Hmmm... you mean associating a delayed work item with the target
> > pool_workqueue on queueing and sharing the queueing paths for both
> > delayed and !delayed work items?
>
> I wouldn't necessarily even go that far. That's more of an
> implementation detail.
>
> I just wonder if we perhaps should add the CPU boundedness to the init
> stage, and hide it away in the work structure. So if you just want to
> do work (delayed or not), you'd continue to do
>
> INIT_DELAYED_WORK(&work, ...);
>
> schedule_delayed_work(&work, delay);
>
> but if you want a percpu thing, you'd do
>
> INIT_DELAYED_WORK_CPU(&work, cpu, ...);
>
> schedule_delayed_work(&work, delay);
>
> rather than have that "..work_on(cpu, &work, )" interface. The actual
> *implementation* could stay the same, we'd just hide the CPU bound in
> the work struct and use it as an implicit argument.
>
> But it's not a big deal. I'm just saying that the current interface
> obviously allows that confusion about whether a work is percpu or not,
> on a per-call-site basis.
I was thinking more along the line of introducing
queue_work_on_local() and friends but yeah we definitely need to
explicitly distinguish affinity for correctness and for performance.
Thanks.
--
tejun
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-10-14 19:03 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20151013214952.GB23106@mtj.duckdns.org>
2015-10-14 16:30 ` Linus Torvalds
2015-10-14 16:57 ` Tejun Heo
2015-10-14 17:36 ` Linus Torvalds
2015-10-14 17:57 ` Christoph Lameter
2015-10-14 18:37 ` Linus Torvalds
2015-10-14 18:58 ` Christoph Lameter
2015-10-14 19:01 ` Linus Torvalds
2015-10-14 19:02 ` Tejun Heo [this message]
2015-10-14 19:16 ` Linus Torvalds
2015-10-14 19:38 ` Tejun Heo
2015-10-14 20:10 ` Linus Torvalds
2015-10-14 20:24 ` Tejun Heo
2015-10-19 3:51 ` Mike Galbraith
2015-10-16 19:51 ` [PATCH] vmstat_update: ensure work remains on the same core Chris Metcalf
2015-10-16 19:54 ` Linus Torvalds
2015-10-14 18:03 ` [GIT PULL] workqueue fixes for v4.3-rc5 Christoph Lameter
2015-10-14 18:40 ` Linus Torvalds
2015-10-14 18:59 ` Christoph Lameter
2015-10-14 19:10 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151014190259.GC12799@mtj.duckdns.org \
--to=tj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=jiangshanlai@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=shli@fb.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox