From: Muchun Song <songmuchun@bytedance.com>
To: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@suse.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>,
Shakeel Butt <shakeelb@google.com>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>,
Xiongchun duan <duanxiongchun@bytedance.com>,
fam.zheng@bytedance.com
Subject: Re: [External] Re: [PATCH] mm: memcontrol: fix root_mem_cgroup charging
Date: Fri, 23 Apr 2021 16:20:32 +0800 [thread overview]
Message-ID: <CAMZfGtU2tWsprvoS0W0+EG534gFqgtfWy5pgEXxNmZZJmm4dYw@mail.gmail.com> (raw)
In-Reply-To: <YIHGJyhUT0o2vtFZ@carbon.dhcp.thefacebook.com>
On Fri, Apr 23, 2021 at 2:53 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Thu, Apr 22, 2021 at 11:47:05AM +0800, Muchun Song wrote:
> > On Thu, Apr 22, 2021 at 8:57 AM Roman Gushchin <guro@fb.com> wrote:
> > >
> > > On Wed, Apr 21, 2021 at 09:39:03PM +0800, Muchun Song wrote:
> > > > On Wed, Apr 21, 2021 at 9:03 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Wed 21-04-21 17:50:06, Muchun Song wrote:
> > > > > > On Wed, Apr 21, 2021 at 3:34 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > > > >
> > > > > > > On Wed 21-04-21 14:26:44, Muchun Song wrote:
> > > > > > > > The below scenario can cause the page counters of the root_mem_cgroup
> > > > > > > > to be out of balance.
> > > > > > > >
> > > > > > > > CPU0: CPU1:
> > > > > > > >
> > > > > > > > objcg = get_obj_cgroup_from_current()
> > > > > > > > obj_cgroup_charge_pages(objcg)
> > > > > > > > memcg_reparent_objcgs()
> > > > > > > > // reparent to root_mem_cgroup
> > > > > > > > WRITE_ONCE(iter->memcg, parent)
> > > > > > > > // memcg == root_mem_cgroup
> > > > > > > > memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > > > > // do not charge to the root_mem_cgroup
> > > > > > > > try_charge(memcg)
> > > > > > > >
> > > > > > > > obj_cgroup_uncharge_pages(objcg)
> > > > > > > > memcg = get_mem_cgroup_from_objcg(objcg)
> > > > > > > > // uncharge from the root_mem_cgroup
> > > > > > > > page_counter_uncharge(&memcg->memory)
> > > > > > > >
> > > > > > > > This can cause the page counter to be less than the actual value,
> > > > > > > > Although we do not display the value (mem_cgroup_usage) so there
> > > > > > > > shouldn't be any actual problem, but there is a WARN_ON_ONCE in
> > > > > > > > the page_counter_cancel(). Who knows if it will trigger? So it
> > > > > > > > is better to fix it.
> > > > > > >
> > > > > > > The changelog doesn't explain the fix and why you have chosen to charge
> > > > > > > kmem objects to root memcg and left all other try_charge users intact.
> > > > > >
> > > > > > The object cgroup is special (because the page can reparent). Only the
> > > > > > user of objcg APIs should be fixed.
> > > > > >
> > > > > > > The reason is likely that those are not reparented now but that just
> > > > > > > adds an inconsistency.
> > > > > > >
> > > > > > > Is there any reason you haven't simply matched obj_cgroup_uncharge_pages
> > > > > > > to check for the root memcg and bail out early?
> > > > > >
> > > > > > Because obj_cgroup_uncharge_pages() uncharges pages from the
> > > > > > root memcg unconditionally. Why? Because some pages can be
> > > > > > reparented to root memcg, in order to ensure the correctness of
> > > > > > page counter of root memcg. We have to uncharge pages from
> > > > > > root memcg. So we do not check whether the page belongs to
> > > > > > the root memcg when it uncharges.
> > > > >
> > > > > I am not sure I follow. Let me ask differently. Wouldn't you
> > > > > achieve the same if you simply didn't uncharge root memcg in
> > > > > obj_cgroup_charge_pages?
> > > >
> > > > I'm afraid not. Some pages should uncharge root memcg, some
> > > > pages should not uncharge root memcg. But all those pages belong
> > > > to the root memcg. We cannot distinguish between the two.
> > > >
> > > > I believe Roman is very familiar with this mechanism (objcg APIs).
> > > >
> > > > Hi Roman,
> > > >
> > > > Any thoughts on this?
> > >
> > > First, unfortunately we do export the root's counter on cgroup v1:
> > > /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes
> > > But we don't ignore these counters for the root mem cgroup, so there
> > > are no bugs here. (Otherwise, please, reproduce it). So it's all about
> > > the potential warning in page_counter_cancel().
> >
> > Right.
> >
> > >
> > > The patch looks technically correct to me. Not sure about __try_charge()
> > > naming, we never use "__" prefix to do something with the root_mem_cgroup.
> > >
> > > The commit message should be more clear and mention the following:
> > > get_obj_cgroup_from_current() never returns a root_mem_cgroup's objcg,
> > > so we never explicitly charge the root_mem_cgroup. And it's not
> > > going to change.
> > > It's all about a race when we got an obj_cgroup pointing at some non-root
> > > memcg, but before we were able to charge it, the cgroup was gone, objcg was
> > > reparented to the root and so we're skipping the charging. Then we store the
> > > objcg pointer and later use to uncharge the root_mem_cgroup.
> >
> > Very clear. Thanks.
> >
> > >
> > > But honestly I'm not sure the problem is worth the time spent on the fix
> > > and the discussion. It's a small race and it's generally hard to trigger
> > > a kernel allocation racing with a cgroup deletion and then you need *a lot*
> > > of such races and then maybe there will be a single warning printed without
> > > *any* other consequences.
> >
> > I agree the race is very small. Since the fix is easy, but a little confusing
> > to someone. I want to hear other people's suggestions on whether to fix it.
>
> I'm not opposing the idea to fix this issue. But, __please__, make sure you
> include all necessary information into the commit log.
Got it. Thanks Roman.
>
> Thanks!
next prev parent reply other threads:[~2021-04-23 8:21 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-21 6:26 Muchun Song
2021-04-21 7:34 ` Michal Hocko
2021-04-21 9:50 ` [External] " Muchun Song
2021-04-21 13:03 ` Michal Hocko
2021-04-21 13:39 ` Muchun Song
2021-04-22 0:57 ` Roman Gushchin
2021-04-22 3:47 ` Muchun Song
2021-04-22 18:53 ` Roman Gushchin
2021-04-23 8:20 ` Muchun Song [this message]
2021-04-22 8:44 ` Michal Hocko
2021-04-22 18:37 ` Roman Gushchin
-- strict thread matches above, loose matches on Subject: below --
2021-03-02 8:18 Muchun Song
[not found] ` <YD6K3HghLy5glOgi@carbon.dhcp.thefacebook.com>
2021-03-03 3:12 ` [External] " Muchun Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMZfGtU2tWsprvoS0W0+EG534gFqgtfWy5pgEXxNmZZJmm4dYw@mail.gmail.com \
--to=songmuchun@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=duanxiongchun@bytedance.com \
--cc=fam.zheng@bytedance.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=shakeelb@google.com \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox