Re: [BUG] The usage of memory cgroup is not consistent with processes when using THP

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yang Shi <shy828301@gmail.com>
To: 台运方 <yunfangtai09@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Hugh Dickins <hughd@google.com>, Tejun Heo <tj@kernel.org>,
	 vdavydov@parallels.com, Cgroups <cgroups@vger.kernel.org>,
	 Linux MM <linux-mm@kvack.org>
Subject: Re: [BUG] The usage of memory cgroup is not consistent with processes when using THP
Date: Mon, 27 Sep 2021 10:28:10 -0700	[thread overview]
Message-ID: <CAHbLzkpBCQp7UGK_WPJ-akdQ7HqkOEMtE6+9qX5ciu3DU-ZVrg@mail.gmail.com> (raw)
In-Reply-To: <CAHKqYaa7H=M4E-=ObO0ecj+NE2KwZN5d7QSz4_b6tXz2vOo+VA@mail.gmail.com>

On Sun, Sep 26, 2021 at 12:35 AM 台运方 <yunfangtai09@gmail.com> wrote:
>
> Hi folks，
> We found that the usage counter of containers with memory cgroup v1 is
> not consistent with the  memory usage of processes when using THP.
>
> It is  introduced in upstream 0a31bc97c80 patch and still exists in
> Linux 5.14.5.
> The root cause is that mem_cgroup_uncharge is moved to the final
> put_page(). When freeing parts of huge pages in THP, the memory usage
> of process is updated  when pte unmapped  and the usage counter of
> memory cgroup is updated when  splitting huge pages in
> deferred_split_scan. This causes the inconsistencies and we could find
> more than 30GB memory difference in our daily usage.

IMHO I don't think this is a bug. The disparity reflects the
difference in how the page life cycle is viewed between process and
cgroup. The usage of process comes from the rss_counter of mm. It
tracks the per-process mapped memory usage. So it is updated once the
page is zapped.

But from the point of cgroup, the page is charged when it is allocated
and uncharged when it is freed. The page may be zapped by one process,
but there might be other users pin the page to prevent it from being
freed. The pin may be very transient or may be indefinite. THP is one
of the pins. It is gone when the THP is split, but the split may
happen a long time after the page is zapped due to deferred split.

>
> It is reproduced with the following program and script.
> The program named "eat_memory_release" allocates every 8 MB memory and
> releases the last 1 MB memory using madvise.
> The script "test_thp.sh" creates a memory cgroup, runs
> "eat_memory_release  500" in it and loops the proceed by 10 times. The
> output shows the changing of memory, which should be about 500M memory
> less in theory.
> The outputs are varying randomly when using THP, while adding  "echo 2
> > /proc/sys/vm/drop_caches" before accounting can avoid this.
>
> Are there any patches to fix it or is it normal by design?
>
> Thanks,
> Yunfang Tai

next prev parent reply	other threads:[~2021-09-27 17:28 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-26  7:35 台运方
2021-09-27 17:28 ` Yang Shi [this message]
2021-09-28  7:15   ` 台运方
2021-09-28 22:14     ` Yang Shi
2021-09-29  3:25       ` Yunfang Tai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkpBCQp7UGK_WPJ-akdQ7HqkOEMtE6+9qX5ciu3DU-ZVrg@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    --cc=vdavydov@parallels.com \
    --cc=yunfangtai09@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox