linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeelb@google.com>
To: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: "Tejun Heo" <tj@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Muchun Song" <songmuchun@bytedance.com>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Roman Gushchin" <guro@fb.com>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Hillf Danton" <hdanton@sina.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	Cgroups <cgroups@vger.kernel.org>,
	"Linux MM" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 2/2] memcg: infrastructure to flush memcg stats
Date: Fri, 16 Jul 2021 08:14:23 -0700	[thread overview]
Message-ID: <CALvZod5SONQ6=ewesLhMSampu=sxbA3iDS3f+rsHkEUY5G2Cyg@mail.gmail.com> (raw)
In-Reply-To: <78005c4c-9233-7bc8-d50e-e3fe11f30b5d@samsung.com>

Hi Marek

On Fri, Jul 16, 2021 at 8:03 AM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
>
> Hi,
>
> On 14.07.2021 03:39, Shakeel Butt wrote:
> > At the moment memcg stats are read in four contexts:
> >
> > 1. memcg stat user interfaces
> > 2. dirty throttling
> > 3. page fault
> > 4. memory reclaim
> >
> > Currently the kernel flushes the stats for first two cases. Flushing the
> > stats for remaining two casese may have performance impact. Always
> > flushing the memcg stats on the page fault code path may negatively
> > impacts the performance of the applications. In addition flushing in the
> > memory reclaim code path, though treated as slowpath, can become the
> > source of contention for the global lock taken for stat flushing because
> > when system or memcg is under memory pressure, many tasks may enter the
> > reclaim path.
> >
> > This patch uses following mechanisms to solve these challenges:
> >
> > 1. Periodically flush the stats from root memcg every 2 seconds. This
> > will time limit the out of sync stats.
> >
> > 2. Asynchronously flush the stats after fixed number of stat updates.
> > In the worst case the stat can be out of sync by O(nr_cpus * BATCH) for
> > 2 seconds.
> >
> > 3. For avoiding thundering herd to flush the stats particularly from the
> > memory reclaim context, introduce memcg local spinlock and let only one
> > flusher active at a time. This could have been done through
> > cgroup_rstat_lock lock but that lock is used by other subsystem and for
> > userspace reading memcg stats. So, it is better to keep flushers
> > introduced by this patch decoupled from cgroup_rstat_lock.
> >
> > Signed-off-by: Shakeel Butt <shakeelb@google.com>
>
> This patch landed in today's linux-next (next-20210716) as commit
> 42265e014ac7 ("memcg: infrastructure to flush memcg stats"). On my test
> system's I found that it triggers a kernel BUG on all ARM64 boards:
>
>   BUG: sleeping function called from invalid context at
> kernel/cgroup/rstat.c:200
>   in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 7, name:
> kworker/u8:0
>   3 locks held by kworker/u8:0/7:
>    #0: ffff00004000c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
> process_one_work+0x200/0x718
>    #1: ffff80001334bdd0 ((stats_flush_dwork).work){+.+.}-{0:0}, at:
> process_one_work+0x200/0x718
>    #2: ffff8000124f6d40 (stats_flush_lock){+.+.}-{2:2}, at:
> mem_cgroup_flush_stats+0x20/0x48
>   CPU: 2 PID: 7 Comm: kworker/u8:0 Tainted: G        W 5.14.0-rc1+ #3713
>   Hardware name: Raspberry Pi 4 Model B (DT)
>   Workqueue: events_unbound flush_memcg_stats_dwork
>   Call trace:
>    dump_backtrace+0x0/0x1d0
>    show_stack+0x14/0x20
>    dump_stack_lvl+0x88/0xb0
>    dump_stack+0x14/0x2c
>    ___might_sleep+0x1dc/0x200
>    __might_sleep+0x4c/0x88
>    cgroup_rstat_flush+0x2c/0x58
>    mem_cgroup_flush_stats+0x34/0x48
>    flush_memcg_stats_dwork+0xc/0x38
>    process_one_work+0x2a8/0x718
>    worker_thread+0x48/0x460
>    kthread+0x12c/0x160
>    ret_from_fork+0x10/0x18
>
> This can be also reproduced with QEmu. Please let me know if I can help
> fixing this issue.
>

Thanks for the report. The issue can be fixed by changing
cgroup_rstat_flush() to cgroup_rstat_flush_irqsafe() in
mem_cgroup_flush_stats(). I will send out the updated patch in a
couple of hours after a bit more testing.


  reply	other threads:[~2021-07-16 15:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-14  1:39 [PATCH v4 1/2] memcg: switch lruvec stats to rstat Shakeel Butt
2021-07-14  1:39 ` [PATCH v4 2/2] memcg: infrastructure to flush memcg stats Shakeel Butt
     [not found]   ` <CGME20210716150353eucas1p2c9ad1d1021ee584de587e5ec10b8467b@eucas1p2.samsung.com>
2021-07-16 15:03     ` Marek Szyprowski
2021-07-16 15:14       ` Shakeel Butt [this message]
2021-07-16 15:58         ` Marek Szyprowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALvZod5SONQ6=ewesLhMSampu=sxbA3iDS3f+rsHkEUY5G2Cyg@mail.gmail.com' \
    --to=shakeelb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=m.szyprowski@samsung.com \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=songmuchun@bytedance.com \
    --cc=tj@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox