linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Hugh Dickins <hughd@google.com>
Cc: Yosry Ahmed <yosryahmed@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	Josef Bacik <josef@toxicpanda.com>, Jens Axboe <axboe@kernel.dk>,
	Zefan Li <lizefan.x@bytedance.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vasily Averin <vasily.averin@linux.dev>,
	cgroups@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	bpf@vger.kernel.org
Subject: Re: [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock
Date: Wed, 29 Mar 2023 10:00:50 -1000	[thread overview]
Message-ID: <ZCSY8l/jVwszF6iA@slm.duckdns.org> (raw)
In-Reply-To: <f9b6410-ee17-635f-a35d-559fa0191dc3@google.com>

Hello, Hugh. How have you been?

On Wed, Mar 29, 2023 at 12:22:24PM -0700, Hugh Dickins wrote:
> Hi Tejun,
> Butting in here, I'm fascinated.  This is certainly not my area, I know
> nothing about rstat, but this is the first time I ever heard someone
> arguing for more disabling of interrupts rather than less.
> 
> An interrupt coming in while holding a contended resource can certainly
> add to latencies, that I accept of course.  But until now, I thought it
> was agreed best practice to disable irqs only regretfully, when strictly
> necessary.
> 
> If that has changed, I for one want to know about it.  How should we
> now judge which spinlocks should disable interrupts and which should not?
> Page table locks are currently my main interest - should those be changed?

For rstat, it's a simple case because the global lock here wraps around
per-cpu locks which have to be irq-safe, so the only difference we get
between making the global irq-unsafe and keeping it so but releasing
inbetween is:

 Global lock held: G
 IRQ disabled: I
 Percpu lock held: P
 
1. IRQ unsafe

 GGGGGGGGGGGGGGG~~GGGGG
 IIII IIII IIII ~~ IIII
 PPPP PPPP PPPP ~~ PPPP

2. IRQ safe released inbetween cpus

 GGGG GGGG GGGG ~~ GGGG
 IIII IIII IIII ~~ IIII
 PPPP PPPP PPPP ~~ PPPP

#2 seems like the obvious thing to do here given how the lock is used and
each P section may take a bit of time.

So, in the rstat case, the choice is, at least to me, obvious, but even for
more generic cases where the bulk of actual work isn't done w/ irq disabled,
I don't think the picture is as simple as "use the least protected variant
possible" anymore because the underlying hardware changed.

For an SMP kernel running on an UP system, "the least protected variant" is
the obvious choice to make because you don't lose anything by holding a
spinlock longer than necessary. However, as you increase the number of CPUs,
there rises a tradeoff between local irq servicing latency and global lock
contention.

Imagine a, say, 128 cpu system with a few cores servicing relatively high
frequency interrupts. Let's say there's a mildly hot lock. Usually, it shows
up in the system profile but only just. Let's say something happens and the
irq rate on those cores went up for some reason to the point where it
becomes a rather common occurrence when the lock is held on one of those
cpus, irqs are likely to intervene lengthening how long the lock is held,
sometimes, signficantly. Now because the lock is on average held for much
longer, it become a lot hotter as more CPUs would stall on it and depending
on luck or lack thereof these stalls can span many CPUs on the system for
quite a while. This is actually something we saw in production.

So, in general, there's a trade off between local irq service latency and
inducing global lock contention when using unprotected locks. With more and
more CPUs, the balance keeps shifting. The balance still very much depends
on the specifics of a given lock but yeah I think it's something we need to
be a lot more careful about now.

Thanks.

-- 
tejun


  reply	other threads:[~2023-03-29 20:00 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-23  4:00 [RFC PATCH 0/7] Make rstat flushing IRQ and sleep friendly Yosry Ahmed
2023-03-23  4:00 ` [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock Yosry Ahmed
2023-03-23  4:29   ` Shakeel Butt
2023-03-23  5:15     ` Yosry Ahmed
2023-03-23  6:33       ` Shakeel Butt
2023-03-23 13:35         ` Yosry Ahmed
2023-03-23 15:40           ` Shakeel Butt
2023-03-23 15:42             ` Yosry Ahmed
2023-03-23 15:46               ` Shakeel Butt
2023-03-23 16:09                 ` Shakeel Butt
2023-03-23 16:17                   ` Yosry Ahmed
2023-03-23 16:29                     ` Shakeel Butt
2023-03-23 16:36                       ` Yosry Ahmed
2023-03-23 16:45                         ` Shakeel Butt
2023-03-23 16:51                           ` Yosry Ahmed
2023-03-23 19:09                             ` Shakeel Butt
2023-03-23 17:33                     ` Johannes Weiner
2023-03-23 18:09                       ` Yosry Ahmed
2023-03-23 18:19                         ` Johannes Weiner
2023-03-24  1:39   ` Tejun Heo
2023-03-24  7:22     ` Yosry Ahmed
2023-03-24 14:12       ` Waiman Long
2023-03-24 22:50         ` Yosry Ahmed
2023-03-25  1:54       ` Tejun Heo
2023-03-25  2:17         ` Yosry Ahmed
2023-03-25  4:30           ` Shakeel Butt
2023-03-25  4:37             ` Yosry Ahmed
2023-03-25  4:46               ` Shakeel Butt
2023-03-27 23:23                 ` Yosry Ahmed
2023-03-29 18:53                   ` Tejun Heo
2023-03-29 19:22                     ` Hugh Dickins
2023-03-29 20:00                       ` Tejun Heo [this message]
2023-03-29 20:38                         ` Hugh Dickins
2023-03-30  4:26                           ` Yosry Ahmed
2023-03-31  1:51                           ` Tejun Heo
2023-03-23  4:00 ` [RFC PATCH 2/7] memcg: do not disable interrupts when holding stats_flush_lock Yosry Ahmed
2023-03-23  4:32   ` Shakeel Butt
2023-03-23  5:16     ` Yosry Ahmed
2023-03-23  4:00 ` [RFC PATCH 3/7] cgroup: rstat: remove cgroup_rstat_flush_irqsafe() Yosry Ahmed
2023-03-23 15:43   ` Johannes Weiner
2023-03-23 15:45     ` Yosry Ahmed
2023-03-23  4:00 ` [RFC PATCH 4/7] memcg: sleep during flushing stats in safe contexts Yosry Ahmed
2023-03-23 15:56   ` Johannes Weiner
2023-03-23 16:01     ` Yosry Ahmed
2023-03-23 17:27       ` Johannes Weiner
2023-03-23 18:07         ` Yosry Ahmed
2023-03-23 19:35           ` Shakeel Butt
2023-03-23  4:00 ` [RFC PATCH 5/7] vmscan: memcg: sleep when flushing stats during reclaim Yosry Ahmed
2023-03-23  4:00 ` [RFC PATCH 6/7] workingset: memcg: sleep when flushing stats in workingset_refault() Yosry Ahmed
2023-03-23 15:50   ` Johannes Weiner
2023-03-23 16:02     ` Yosry Ahmed
2023-03-23 16:00   ` Johannes Weiner
2023-03-23 16:02     ` Yosry Ahmed
2023-03-23  4:00 ` [RFC PATCH 7/7] memcg: do not modify rstat tree for zero updates Yosry Ahmed
2023-03-23  4:10 ` [RFC PATCH 0/7] Make rstat flushing IRQ and sleep friendly Shakeel Butt
2023-03-23  5:07   ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZCSY8l/jVwszF6iA@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=vasily.averin@linux.dev \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox