linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Nhat Pham <nphamcs@gmail.com>
Cc: akpm@linux-foundation.org, riel@surriel.com, hannes@cmpxchg.org,
	roman.gushchin@linux.dev, shakeelb@google.com,
	muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com,
	shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com,
	linux-mm@kvack.org, kernel-team@meta.com,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [PATCH v2 1/2] hugetlb: memcg: account hugetlb-backed memory in memory controller
Date: Mon, 2 Oct 2023 15:43:19 +0200	[thread overview]
Message-ID: <ZRrI90KcRBwVZn/r@dhcp22.suse.cz> (raw)
In-Reply-To: <20230928005723.1709119-2-nphamcs@gmail.com>

On Wed 27-09-23 17:57:22, Nhat Pham wrote:
> Currently, hugetlb memory usage is not acounted for in the memory
> controller, which could lead to memory overprotection for cgroups with
> hugetlb-backed memory. This has been observed in our production system.
> 
> This patch rectifies this issue by charging the memcg when the hugetlb
> folio is allocated, and uncharging when the folio is freed (analogous to
> the hugetlb controller).

This changelog is missing a lot of information. Both about the usecase
(we do not want to fish that out from archives in the future) and the
actual implementation and the reasoning behind that.

AFAICS you have decided to charge on the hugetlb use rather than hugetlb
allocation to the pool. I suspect the underlying reasoning is that pool
pages do not belong to anybody. This is a deliberate decision and it
should be documented as such.

It is also very important do describe subtle behavior properties that
might be rather unintuitive to users. Most notably 
- there is no hugetlb pool management involved in the memcg
  controller. One has to use hugetlb controller for that purpose.
  Also the pre allocated pool as such doesn't belong to anybody so the
  memcg host overcommit management has to consider it when configuring
  hard limits.
- memcg limit reclaim doesn't assist hugetlb pages allocation when
  hugetlb overcommit is configured (i.e. pages are not consumed from the
  pool) which means that the page allocation might disrupt workloads
  from other memcgs.
- failure to charge a hugetlb page results in SIGBUS rather
  than memcg oom killer. That could be the case even if the
  hugetlb pool still has pages available and there is
  reclaimable memory in the memcg.
- hugetlb pages are contributing to memory reclaim protection 
  implicitly. This means that the low,min limits tunning has to consider
  hugetlb memory as well.

I suspect there is more than the above. To be completely honest I am
still not convinced this is a good idea.

I do recognize that this might work in a very limited environments but
hugetlb management is quite challenging on its own and this just adds
another layer of complexity which is really hard to see through without
an intimate understanding of both memcg and hugetlb. The reason that
hugetlb has been living outside of the core MM (and memcg) is not just
because we like it that way. And yes I do fully understand that users
shouldn't really care about that because this is just a memory to them.

We should also consider the global control for this functionality. I am
especially worried about setups where a mixed bag of workloads
(containers) is executed. While some of them will be ready for the new
accounting mode many will leave in their own world without ever being
modified. How do we deal with that situation?

All that being said, I am not going to ack nor nack this but I really do
prefer to be much more explicit about the motivation and current
implementation specifics so that we can forward users to something
they can digest.

> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
[...]

a minor implementation detail below. I couldn't spot anything obviously
broken with the rest of the hugetlb specific code. restore_reserve_on_memcg_failure
is rather clumsy and potentially error prone but I will leave that out
to Mike as he is much more familiar with that behavior than me.

> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index de220e3ff8be..ff88ea4df11a 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
[...]
> @@ -3119,6 +3121,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>  			hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h),
>  					pages_per_huge_page(h), folio);
>  	}
> +
> +	/* undo allocation if memory controller disallows it. */
> +	if (mem_cgroup_hugetlb_charge_folio(folio, GFP_KERNEL)) {

htlb_alloc_mask(h) rather than GFP_KERNEL. Ideally with
__GFP_RETRY_MAYFAIL which is a default allocation policy.

> +		if (restore_reserve_on_memcg_failure)
> +			restore_reserve_on_error(h, vma, addr, folio);
> +		folio_put(folio);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
>  	return folio;
>  
>  out_uncharge_cgroup:

-- 
Michal Hocko
SUSE Labs


  parent reply	other threads:[~2023-10-02 13:43 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-28  0:57 [PATCH v2 0/2] hugetlb memcg accounting Nhat Pham
2023-09-28  0:57 ` [PATCH v2 1/2] hugetlb: memcg: account hugetlb-backed memory in memory controller Nhat Pham
2023-09-28 22:59   ` Frank van der Linden
2023-09-29  0:33     ` Nhat Pham
2023-09-29  0:38   ` Yosry Ahmed
2023-09-29  0:58     ` Nhat Pham
2023-09-29  1:07       ` Nhat Pham
2023-09-29  1:18         ` Yosry Ahmed
2023-09-29  1:25           ` Nhat Pham
2023-09-29 15:08           ` Johannes Weiner
2023-09-29 15:11             ` Yosry Ahmed
2023-09-29 17:42               ` Johannes Weiner
2023-09-29 17:48                 ` Nhat Pham
2023-09-29 18:07                 ` Frank van der Linden
2023-10-02 12:18                 ` Michal Hocko
2023-09-29 18:17   ` [PATCH v2 1/2] hugetlb: memcg: account hugetlb-backed memory in memory controller (fix) Nhat Pham
2023-09-29 18:19     ` Nhat Pham
2023-10-02 13:43   ` Michal Hocko [this message]
2023-10-02 14:50     ` [PATCH v2 1/2] hugetlb: memcg: account hugetlb-backed memory in memory controller Johannes Weiner
2023-10-02 15:08       ` Michal Hocko
2023-10-02 15:25         ` Johannes Weiner
2023-10-02 17:32           ` Nhat Pham
2023-10-03  9:17           ` Michal Hocko
2023-10-02 16:21     ` Johannes Weiner
2023-10-02 17:28     ` Nhat Pham
2023-09-28  0:57 ` [PATCH v2 2/2] selftests: add a selftest to verify hugetlb usage in memcg Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZRrI90KcRBwVZn/r@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=riel@surriel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tj@kernel.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox