Re: [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nhat Pham <nphamcs@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	akpm@linux-foundation.org, riel@surriel.com,  mhocko@kernel.org,
	roman.gushchin@linux.dev, shakeelb@google.com,
	 muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com,
	 shuah@kernel.org, yosryahmed@google.com, fvdl@google.com,
	linux-mm@kvack.org,  kernel-team@meta.com,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller
Date: Tue, 3 Oct 2023 15:09:36 -0700	[thread overview]
Message-ID: <CAKEwX=O8X+ZRNEzzs+NYBAh9QJLObxKPOpmFo-ci4Aw=MR=0+Q@mail.gmail.com> (raw)
In-Reply-To: <20231003183928.GC20979@cmpxchg.org>

On Tue, Oct 3, 2023 at 11:39 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Tue, Oct 03, 2023 at 11:01:24AM -0700, Nhat Pham wrote:
> > On Tue, Oct 3, 2023 at 10:13 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > >
> > > On 10/02/23 17:18, Nhat Pham wrote:
> > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > > index de220e3ff8be..74472e911b0a 100644
> > > > --- a/mm/hugetlb.c
> > > > +++ b/mm/hugetlb.c
> > > > @@ -1902,6 +1902,7 @@ void free_huge_folio(struct folio *folio)
> > > >                                    pages_per_huge_page(h), folio);
> > > >       hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h),
> > > >                                         pages_per_huge_page(h), folio);
> > > > +     mem_cgroup_uncharge(folio);
> > > >       if (restore_reserve)
> > > >               h->resv_huge_pages++;
> > > >
> > > > @@ -3009,11 +3010,20 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> > > >       struct hugepage_subpool *spool = subpool_vma(vma);
> > > >       struct hstate *h = hstate_vma(vma);
> > > >       struct folio *folio;
> > > > -     long map_chg, map_commit;
> > > > +     long map_chg, map_commit, nr_pages = pages_per_huge_page(h);
> > > >       long gbl_chg;
> > > > -     int ret, idx;
> > > > +     int memcg_charge_ret, ret, idx;
> > > >       struct hugetlb_cgroup *h_cg = NULL;
> > > > +     struct mem_cgroup *memcg;
> > > >       bool deferred_reserve;
> > > > +     gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
> > > > +
> > > > +     memcg = get_mem_cgroup_from_current();
> > > > +     memcg_charge_ret = mem_cgroup_hugetlb_try_charge(memcg, gfp, nr_pages);
> > > > +     if (memcg_charge_ret == -ENOMEM) {
> > > > +             mem_cgroup_put(memcg);
> > > > +             return ERR_PTR(-ENOMEM);
> > > > +     }
> > > >
> > > >       idx = hstate_index(h);
> > > >       /*
> > > > @@ -3022,8 +3032,12 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> > > >        * code of zero indicates a reservation exists (no change).
> > > >        */
> > > >       map_chg = gbl_chg = vma_needs_reservation(h, vma, addr);
> > > > -     if (map_chg < 0)
> > > > +     if (map_chg < 0) {
> > > > +             if (!memcg_charge_ret)
> > > > +                     mem_cgroup_cancel_charge(memcg, nr_pages);
> > > > +             mem_cgroup_put(memcg);
> > > >               return ERR_PTR(-ENOMEM);
> > > > +     }
> > > >
> > > >       /*
> > > >        * Processes that did not create the mapping will have no
> > > > @@ -3034,10 +3048,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> > > >        */
> > > >       if (map_chg || avoid_reserve) {
> > > >               gbl_chg = hugepage_subpool_get_pages(spool, 1);
> > > > -             if (gbl_chg < 0) {
> > > > -                     vma_end_reservation(h, vma, addr);
> > > > -                     return ERR_PTR(-ENOSPC);
> > > > -             }
> > > > +             if (gbl_chg < 0)
> > > > +                     goto out_end_reservation;
> > > >
> > > >               /*
> > > >                * Even though there was no reservation in the region/reserve
> > > > @@ -3119,6 +3131,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> > > >                       hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h),
> > > >                                       pages_per_huge_page(h), folio);
> > > >       }
> > > > +
> > > > +     if (!memcg_charge_ret)
> > > > +             mem_cgroup_commit_charge(folio, memcg);
> > > > +     mem_cgroup_put(memcg);
> > > > +
> > > >       return folio;
> > > >
> > > >  out_uncharge_cgroup:
> > > > @@ -3130,7 +3147,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> > > >  out_subpool_put:
> > > >       if (map_chg || avoid_reserve)
> > > >               hugepage_subpool_put_pages(spool, 1);
> > > > +out_end_reservation:
> > > >       vma_end_reservation(h, vma, addr);
> > > > +     if (!memcg_charge_ret)
> > > > +             mem_cgroup_cancel_charge(memcg, nr_pages);
> > > > +     mem_cgroup_put(memcg);
> > > >       return ERR_PTR(-ENOSPC);
> > > >  }
> > > >
> > >
> > > IIUC, huge page usage is charged in alloc_hugetlb_folio and uncharged in
> > > free_huge_folio.  During migration, huge pages are allocated via
> > > alloc_migrate_hugetlb_folio, not alloc_hugetlb_folio.  So, there is no
> > > charging for the migration target page and we uncharge the source page.
> > > It looks like there will be no charge for the huge page after migration?
> > >
> >
> > Ah I see! This is a bit subtle indeed.
> >
> > For the hugetlb controller, it looks like they update the cgroup info
> > inside move_hugetlb_state(), which calls hugetlb_cgroup_migrate()
> > to transfer the hugetlb cgroup info to the destination folio.
> >
> > Perhaps we can do something analogous here.
> >
> > > If my analysis above is correct, then we may need to be careful about
> > > this accounting.  We may not want both source and target pages to be
> > > charged at the same time.
> >
> > We can create a variant of mem_cgroup_migrate that does not double
> > charge, but instead just copy the mem_cgroup information to the new
> > folio, and then clear that info from the old folio. That way the memory
> > usage counters are untouched.
> >
> > Somebody with more expertise on migration should fact check me
> > of course :)
>
> The only reason mem_cgroup_migrate() double charges right now is
> because it's used by replace_page_cache_folio(). In that context, the
> isolation of the old page isn't quite as thorough as with migration,
> so it cannot transfer and uncharge directly. This goes back a long
> time: 0a31bc97c80c3fa87b32c091d9a930ac19cd0c40
>
> If you rename the current implementation to mem_cgroup_replace_page()
> for that one caller, you can add a mem_cgroup_migrate() variant which
> is charge neutral and clears old->memcg_data. This can be used for
> regular and hugetlb page migration. Something like this (totally
> untested):
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a4d3282493b6..17ec45bf3653 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -7226,29 +7226,14 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new)
>         if (mem_cgroup_disabled())
>                 return;
>
> -       /* Page cache replacement: new folio already charged? */
> -       if (folio_memcg(new))
> -               return;
> -
>         memcg = folio_memcg(old);
>         VM_WARN_ON_ONCE_FOLIO(!memcg, old);
>         if (!memcg)
>                 return;
>
> -       /* Force-charge the new page. The old one will be freed soon */
> -       if (!mem_cgroup_is_root(memcg)) {
> -               page_counter_charge(&memcg->memory, nr_pages);
> -               if (do_memsw_account())
> -                       page_counter_charge(&memcg->memsw, nr_pages);
> -       }
> -
> -       css_get(&memcg->css);
> +       /* Transfer the charge and the css ref */
>         commit_charge(new, memcg);
> -
> -       local_irq_save(flags);
> -       mem_cgroup_charge_statistics(memcg, nr_pages);
> -       memcg_check_events(memcg, folio_nid(new));
> -       local_irq_restore(flags);
> +       old->memcg_data = 0;
>  }
>
>  DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key);
>

Ah, I like this. Will send a fixlet based on this :)
I was scratching my head trying to figure out why we were
doing the double charging in the first place. Thanks for the context,
Johannes!

next prev parent reply	other threads:[~2023-10-03 22:09 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-03  0:18 [PATCH v3 0/3] hugetlb memcg accounting Nhat Pham
2023-10-03  0:18 ` [PATCH v3 1/3] memcontrol: add helpers for " Nhat Pham
2023-10-03 11:50   ` Michal Hocko
2023-10-03 12:47   ` Johannes Weiner
2023-10-03  0:18 ` [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller Nhat Pham
2023-10-03  0:26   ` Nhat Pham
2023-10-03 12:54   ` Johannes Weiner
2023-10-03 12:58   ` Michal Hocko
2023-10-03 15:59     ` Johannes Weiner
2023-10-03 17:13   ` Mike Kravetz
2023-10-03 18:01     ` Nhat Pham
2023-10-03 18:39       ` Johannes Weiner
2023-10-03 22:09         ` Nhat Pham [this message]
2023-10-03 22:42           ` Mike Kravetz
2023-10-03 23:26             ` Nhat Pham
2023-10-03 23:14     ` [PATCH] memcontrol: only transfer the memcg data for migration Nhat Pham
2023-10-03 23:22       ` Yosry Ahmed
2023-10-03 23:31         ` Nhat Pham
2023-10-03 23:54           ` Yosry Ahmed
2023-10-04  0:02             ` Nhat Pham
2023-10-04  0:02       ` Nhat Pham
2023-10-04 14:17       ` Johannes Weiner
2023-10-04 19:45   ` [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller (fix) Nhat Pham
2023-10-06 17:25     ` Andrew Morton
2023-10-06 18:23       ` Nhat Pham
2023-10-03  0:18 ` [PATCH v3 3/3] selftests: add a selftest to verify hugetlb usage in memcg Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=O8X+ZRNEzzs+NYBAh9QJLObxKPOpmFo-ci4Aw=MR=0+Q@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=fvdl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=riel@surriel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tj@kernel.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox