linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Marco Elver <elver@google.com>, Yue Zhao <findns94@gmail.com>,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	hannes@cmpxchg.org, mhocko@kernel.org, muchun.song@linux.dev,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: change memcg->oom_group access with atomic operations
Date: Tue, 21 Feb 2023 16:37:59 -0800	[thread overview]
Message-ID: <20230222003759.GO2948950@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <Y/VahsJO3xNXi4CG@P9FQF9L96D.corp.robot.car>

On Tue, Feb 21, 2023 at 03:57:58PM -0800, Roman Gushchin wrote:
> On Tue, Feb 21, 2023 at 03:38:24PM -0800, Paul E. McKenney wrote:
> > On Tue, Feb 21, 2023 at 03:13:36PM -0800, Shakeel Butt wrote:
> > > On Tue, Feb 21, 2023 at 2:38 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Tue, Feb 21, 2023 at 02:23:31PM -0800, Roman Gushchin wrote:
> > > > > On Tue, Feb 21, 2023 at 10:23:59AM -0800, Paul E. McKenney wrote:
> > > > > > On Tue, Feb 21, 2023 at 08:56:59AM -0800, Shakeel Butt wrote:
> > > > > > > +Paul & Marco
> > > > > > >
> > > > > > > On Tue, Feb 21, 2023 at 5:51 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > > >
> > > > > > > > On Mon, Feb 20, 2023 at 10:52:10PM -0800, Shakeel Butt wrote:
> > > > > > > > > On Mon, Feb 20, 2023 at 9:17 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > > > > > > > > > > On Feb 20, 2023, at 3:06 PM, Shakeel Butt <shakeelb@google.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Feb 20, 2023 at 01:09:44PM -0800, Roman Gushchin wrote:
> > > > > > > > > > >>> On Mon, Feb 20, 2023 at 11:16:38PM +0800, Yue Zhao wrote:
> > > > > > > > > > >>> The knob for cgroup v2 memory controller: memory.oom.group
> > > > > > > > > > >>> will be read and written simultaneously by user space
> > > > > > > > > > >>> programs, thus we'd better change memcg->oom_group access
> > > > > > > > > > >>> with atomic operations to avoid concurrency problems.
> > > > > > > > > > >>>
> > > > > > > > > > >>> Signed-off-by: Yue Zhao <findns94@gmail.com>
> > > > > > > > > > >>
> > > > > > > > > > >> Hi Yue!
> > > > > > > > > > >>
> > > > > > > > > > >> I'm curious, have any seen any real issues which your patch is solving?
> > > > > > > > > > >> Can you, please, provide a bit more details.
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > > > IMHO such details are not needed. oom_group is being accessed
> > > > > > > > > > > concurrently and one of them can be a write access. At least
> > > > > > > > > > > READ_ONCE/WRITE_ONCE is needed here.
> > > > > > > > > >
> > > > > > > > > > Needed for what?
> > > > > > > > >
> > > > > > > > > For this particular case, documenting such an access. Though I don't
> > > > > > > > > think there are any architectures which may tear a one byte read/write
> > > > > > > > > and merging/refetching is not an issue for this.
> > > > > > > >
> > > > > > > > Wouldn't a compiler be within its rights to implement a one byte store as:
> > > > > > > >
> > > > > > > >         load-word
> > > > > > > >         modify-byte-in-word
> > > > > > > >         store-word
> > > > > > > >
> > > > > > > > and if this is a lockless store to a word which has an adjacent byte also
> > > > > > > > being modified by another CPU, one of those CPUs can lose its store?
> > > > > > > > And WRITE_ONCE would prevent the compiler from implementing the store
> > > > > > > > in that way.
> > > > > > >
> > > > > > > Thanks Willy for pointing this out. If the compiler can really do this
> > > > > > > then [READ|WRITE]_ONCE are required here. I always have big bad
> > > > > > > compiler lwn article open in a tab. I couldn't map this transformation
> > > > > > > to ones mentioned in that article. Do we have name of this one?
> > > > > >
> > > > > > No, recent compilers are absolutely forbidden from doing this sort of
> > > > > > thing except under very special circumstances.
> > > > > >
> > > > > > Before C11, compilers could and in fact did do things like this.  This is
> > > > > > after all a great way to keep the CPU's vector unit from getting bored.
> > > > > > Unfortunately for those who prize optimization above all else, doing
> > > > > > this can introduce data races, for example:
> > > > > >
> > > > > >     char a;
> > > > > >     char b;
> > > > > >     spin_lock la;
> > > > > >     spin_lock lb;
> > > > > >
> > > > > >     void change_a(char new_a)
> > > > > >     {
> > > > > >             spin_lock(&la);
> > > > > >             a = new_a;
> > > > > >             spin_unlock(&la);
> > > > > >     }
> > > > > >
> > > > > >     void change_b(char new_b)
> > > > > >     {
> > > > > >             spin_lock(&lb);
> > > > > >             b = new_b;
> > > > > >             spin_unlock(&lb);
> > > > > >     }
> > > > > >
> > > > > > If the compiler "optimized" that "a = new_a" so as to produce a non-atomic
> > > > > > read-modify-write sequence, it would be introducing a data race.
> > > > > > And since C11, the compiler is absolutely forbidden from introducing
> > > > > > data races.  So, again, no, the compiler cannot invent writes to
> > > > > > variables.
> > > > > >
> > > > > > What are those very special circumstances?
> > > > > >
> > > > > > 1.  The other variables were going to be written to anyway, and
> > > > > >     none of the writes was non-volatile and there was no ordering
> > > > > >     directive between any of those writes.
> > > > > >
> > > > > > 2.  The other variables are dead, as in there are no subsequent
> > > > > >     reads from them anywhere in the program.  Of course in that case,
> > > > > >     there is no need to read the prior values of those variables.
> > > > > >
> > > > > > 3.  All accesses to all of the variables are visible to the compiler,
> > > > > >     and the compiler can prove that there are no concurrent accesses
> > > > > >     to any of them.  For example, all of the variables are on-stack
> > > > > >     variables whose addresses are never taken.
> > > > > >
> > > > > > Does that help, or am I misunderstanding the question?
> > > > >
> > > > > Thank you, Paul!
> > > > >
> > > > > So it seems like READ_ONCE()/WRITE_ONCE() are totally useless here.
> > > > > Or I still miss something?
> > > >
> > > > Yes, given that the compiler will already avoid inventing data-race-prone
> > > > C-language accesses to shared variables, so if that was the only reason
> > > > that you were using READ_ONCE() or WRITE_ONCE(), then READ_ONCE() and
> > > > WRITE_ONCE() won't be helping you.
> > > >
> > > > Or perhaps better to put it a different way...  The fact that the compiler
> > > > is not permitted to invent data-racy reads and writes is exactly why
> > > > you do not normally need READ_ONCE() and WRITE_ONCE() for accesses in
> > > > lock-based critical sections.  Instead, you only need READ_ONCE() and
> > > > WRITE_ONCE() when you have lockless accesses to the same shared variables.
> > > 
> > > This is lockless access to memcg->oom_group potentially from multiple
> > > CPUs, so, READ_ONCE() and WRITE_ONCE() are needed, right?
> > 
> > Agreed, lockless concurrent accesses should use READ_ONCE() and WRITE_ONCE().
> > And if either conflicting access is lockless, it is lockless.  ;-)
> 
> Now I'm confused, why we should use it here?
> Writing is happening from a separate syscall (a single write from a syscall),
> reading is happening from a oom context. The variable is boolean, it's either
> 0 or 1. What difference READ_ONCE()/WRITE_ONCE() will make here?
> Thanks!

In practice, not much difference other than documenting shared accesses.
Which can be valuable.

In theory, when you do a normal C-language store, the compiler is within
its rights to use the variable for temporary storage between the time
of the last read from that variable and the next write to that variable.
Back to practice, I have not heard of this happening for shared variables.
On the other hand, compilers really do this for on-stack variables whose
addresses are not taken, which is one of the reasons that gdb might say
that the variable is optimized out when you try to look at its value.

So the potential is there, and if it was my code, I would therefore use
READ_ONCE() and WRITE_ONCE().

							Thanx, Paul


  reply	other threads:[~2023-02-22  0:38 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-20 15:16 Yue Zhao
2023-02-20 21:09 ` Roman Gushchin
2023-02-20 23:06   ` Shakeel Butt
2023-02-21  5:17     ` Roman Gushchin
2023-02-21  6:52       ` Shakeel Butt
2023-02-21 13:51         ` Matthew Wilcox
2023-02-21 16:56           ` Shakeel Butt
2023-02-21 18:23             ` Paul E. McKenney
2023-02-21 22:23               ` Roman Gushchin
2023-02-21 22:38                 ` Paul E. McKenney
2023-02-21 23:13                   ` Shakeel Butt
2023-02-21 23:38                     ` Paul E. McKenney
2023-02-21 23:57                       ` Roman Gushchin
2023-02-22  0:37                         ` Paul E. McKenney [this message]
2023-02-22  4:28                           ` Roman Gushchin
2023-02-21 17:47           ` Roman Gushchin
2023-02-21 18:15             ` Shakeel Butt
2023-02-21 18:18             ` Matthew Wilcox
2023-02-22  9:01           ` David Laight
2023-02-21 17:00         ` Martin Zhao
2023-02-21  7:22       ` Muchun Song
2023-02-21 17:48         ` Roman Gushchin
2023-02-21 17:00       ` Martin Zhao
2023-02-21 18:02         ` Roman Gushchin
2023-02-21  8:26     ` Michal Hocko
2023-02-21 17:00       ` Martin Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230222003759.GO2948950@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=elver@google.com \
    --cc=findns94@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox