From: Peter Xu <peterx@redhat.com>
To: Zach O'Keefe <zokeefe@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>,
David Hildenbrand <david@redhat.com>,
David Rientjes <rientjes@google.com>,
Matthew Wilcox <willy@infradead.org>,
Michal Hocko <mhocko@suse.com>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
Peter Xu <peterx@redhat.com>,
Rongwei Wang <rongwei.wang@linux.alibaba.com>,
SeongJae Park <sj@kernel.org>, Song Liu <songliubraving@fb.com>,
Vlastimil Babka <vbabka@suse.cz>, Yang Shi <shy828301@gmail.com>,
Zi Yan <ziy@nvidia.com>,
linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Arnd Bergmann <arnd@arndb.de>,
Axel Rasmussen <axelrasmussen@google.com>,
Chris Kennelly <ckennelly@google.com>,
Chris Zankel <chris@zankel.net>, Helge Deller <deller@gmx.de>,
Hugh Dickins <hughd@google.com>,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
"James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>,
Jens Axboe <axboe@kernel.dk>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Matt Turner <mattst88@gmail.com>,
Max Filippov <jcmvbkbc@gmail.com>,
Miaohe Lin <linmiaohe@huawei.com>,
Minchan Kim <minchan@kernel.org>,
Patrick Xia <patrickx@google.com>,
Pavel Begunkov <asml.silence@gmail.com>,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Subject: Re: [PATCH v6 08/15] mm/khugepaged: add flag to ignore THP sysfs enabled
Date: Wed, 29 Jun 2022 22:32:39 -0400 [thread overview]
Message-ID: <Yr0LR5bBmOEOudMw@xz-m1.local> (raw)
In-Reply-To: <CAAa6QmRXD5KboM8=ZZRPThOmcLEPtxzf0XyjkCeY_vgR7VOPqg@mail.gmail.com>
On Wed, Jun 29, 2022 at 06:42:25PM -0700, Zach O'Keefe wrote:
> On Jun 29 19:21, Peter Xu wrote:
> > On Fri, Jun 03, 2022 at 05:39:57PM -0700, Zach O'Keefe wrote:
> > > Add enforce_thp_enabled flag to struct collapse_control that allows context
> > > to ignore constraints imposed by /sys/kernel/transparent_hugepage/enabled.
> > >
> > > This flag is set in khugepaged collapse context to preserve existing
> > > khugepaged behavior.
> > >
> > > This flag will be used (unset) when introducing madvise collapse
> > > context since the desired THP semantics of MADV_COLLAPSE aren't coupled
> > > to sysfs THP settings. Most notably, for the purpose of eventual
> > > madvise_collapse(2) support, this allows userspace to trigger THP collapse
> > > on behalf of another processes, without adding support to meddle with
> > > the VMA flags of said process, or change sysfs THP settings.
> > >
> > > For now, limit this flag to /sys/kernel/transparent_hugepage/enabled,
> > > but it can be expanded to include
> > > /sys/kernel/transparent_hugepage/shmem_enabled later.
> > >
> > > Link: https://lore.kernel.org/linux-mm/CAAa6QmQxay1_=Pmt8oCX2-Va18t44FV-Vs-WsQt_6+qBks4nZA@mail.gmail.com/
> > >
> > > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> > > ---
> > > mm/khugepaged.c | 34 +++++++++++++++++++++++++++-------
> > > 1 file changed, 27 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index c3589b3e238d..4ad04f552347 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -94,6 +94,11 @@ struct collapse_control {
> > > */
> > > bool enforce_page_heuristics;
> > >
> > > + /* Enforce constraints of
> > > + * /sys/kernel/mm/transparent_hugepage/enabled
> > > + */
> > > + bool enforce_thp_enabled;
> >
> > Small nitpick that we could have merged the two booleans if they always
> > match, but no strong opinions if you think these two are clearer. Or maybe
> > there's other plan of using them?
> >
> > > +
> > > /* Num pages scanned per node */
> > > int node_load[MAX_NUMNODES];
> > >
> > > @@ -893,10 +898,12 @@ static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node)
> > > */
> > >
> > > static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > - struct vm_area_struct **vmap)
> > > + struct vm_area_struct **vmap,
> > > + struct collapse_control *cc)
> > > {
> > > struct vm_area_struct *vma;
> > > unsigned long hstart, hend;
> > > + unsigned long vma_flags;
> > >
> > > if (unlikely(khugepaged_test_exit(mm)))
> > > return SCAN_ANY_PROCESS;
> > > @@ -909,7 +916,18 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > hend = vma->vm_end & HPAGE_PMD_MASK;
> > > if (address < hstart || address + HPAGE_PMD_SIZE > hend)
> > > return SCAN_ADDRESS_RANGE;
> > > - if (!hugepage_vma_check(vma, vma->vm_flags))
> > > +
> > > + /*
> > > + * If !cc->enforce_thp_enabled, set VM_HUGEPAGE so that
> > > + * hugepage_vma_check() can pass even if
> > > + * TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG is set (i.e. "madvise" mode).
> > > + * Note that hugepage_vma_check() doesn't enforce that
> > > + * TRANSPARENT_HUGEPAGE_FLAG or TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG
> > > + * must be set (i.e. "never" mode).
> > > + */
> > > + vma_flags = cc->enforce_thp_enabled ? vma->vm_flags
> > > + : vma->vm_flags | VM_HUGEPAGE;
> >
> > Another nitpick..
> >
> > We could get a weird vm_flags when VM_NOHUGEPAGE is set. I don't think
> > it'll go wrong since hugepage_vma_check() checks NOHUGEPAGE first, but IMHO
> > we shouldn't rely on that as it seems error prone (e.g. when accidentally
> > moved things around).
> >
> > So maybe nicer to only apply VM_HUGEPAGE if !VM_NOHUGEPAGE? Or pass over
> > "enforce_thp_enabled" into hugepage_vma_check() should work too, iiuc.
> > Passing in the boolean has one benefit that we don't really need the
> > complicated comment above since the code should be able to explain itself.
>
> Hey Peter, thanks again for taking the time to review.
>
> Answering both of the above at the time:
>
> As in this series so far, I've tried to keep context functionally-declarative -
> specifying the intended behavior (e.g. "enforce_page_heuristics") rather than
> adding "if (khugepaged) { .. } else if (madv_collapse) { .. } else if { .. }"
> around the code which, IMO, makes it difficult to follow. Unfortunately, I've
> ran into the 2 problems you've stated here:
>
> 1) *Right now* all the behavior knobs are either off/on at the same time
> 2) For hugepage_vma_check() (now in mm/huge_memory.c and acting as the central
> authority on THP eligibility), things are complicated enough that I
> couldn't find a clean way to describe the parameters of the context without
> explicitly mentioning the caller.
>
> For (2), instead of adding another arg to specify MADV_COLLAPSE's behavior,
> I think we need to package these contexts into a single set of flags:
>
> enum thp_ctx_flags {
> THP_CTX_ANON_FAULT = 1 << 1,
> THP_CTX_KHUGEPAGED = 1 << 2,
> THP_CTX_SMAPS = 1 << 3,
> THP_CTX_MADVISE_COLLAPSE = 1 << 4,
> };
>
> That will avoid hacking vma flags passed to hugepage_vma_check().
>
> And, if we have these anyways, I might as well do away with some of the
> (semantically meaningful but functionally redundant) flags in
> struct collapse_control and just specify a single .thp_ctx_flags member. I'm
> not entirely happy with it - but that's what I'm planning.
>
> WDYT?
Firstly I think I wrongly sent previous email privately.. :( Let me try to
add the list back..
IMHO we don't need to worry too much on the "if... else if... else",
because they shouldn't be more complicated than when you spread the
meanings into multiple flags, or how could it be? :) IMHO it should
literally be as simple as applying:
s/enforce_{A|B|C|...}/khugepaged_initiated/g
Throughout the patches, then we squash the patches introducing enforce_X.
If you worry it's not clear on "what does khugepaged_initiated mean", we
could add whatever comment above the variable explaining A/B/C/D will be
covered when this is set, and we could postpone to do the flag split only
until there're real user.
Adding these flags could add unnecessary bit-and instructions into the code
generated at last, and if it's only about readability issue that's really
what comment is for?
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2022-06-30 2:32 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-04 0:39 [PATCH v6 00/15] mm: userspace hugepage collapse Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 01/15] mm: khugepaged: don't carry huge page to the next loop for !CONFIG_NUMA Zach O'Keefe
2022-06-06 18:25 ` Yang Shi
2022-06-29 20:49 ` Peter Xu
2022-06-30 1:15 ` Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 02/15] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds THP Zach O'Keefe
2022-06-06 20:45 ` Yang Shi
2022-06-07 16:01 ` Zach O'Keefe
2022-06-07 19:32 ` Zach O'Keefe
2022-06-07 21:27 ` Yang Shi
2022-06-08 0:27 ` Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 03/15] mm/khugepaged: add struct collapse_control Zach O'Keefe
2022-06-06 2:41 ` kernel test robot
2022-06-06 16:40 ` Zach O'Keefe
2022-06-06 20:20 ` Yang Shi
2022-06-06 21:22 ` Yang Shi
2022-06-06 22:23 ` Andrew Morton
2022-06-06 23:53 ` Yang Shi
2022-06-08 0:42 ` Zach O'Keefe
2022-06-08 1:00 ` Yang Shi
2022-06-08 1:06 ` Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 04/15] mm/khugepaged: dedup and simplify hugepage alloc and charging Zach O'Keefe
2022-06-06 20:50 ` Yang Shi
2022-06-29 21:58 ` Peter Xu
2022-06-30 20:14 ` Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 05/15] mm/khugepaged: make allocation semantics context-specific Zach O'Keefe
2022-06-06 20:58 ` Yang Shi
2022-06-07 19:56 ` Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 06/15] mm/khugepaged: pipe enum scan_result codes back to callers Zach O'Keefe
2022-06-06 22:39 ` Yang Shi
2022-06-07 0:17 ` Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 07/15] mm/khugepaged: add flag to ignore khugepaged heuristics Zach O'Keefe
2022-06-06 22:51 ` Yang Shi
2022-06-04 0:39 ` [PATCH v6 08/15] mm/khugepaged: add flag to ignore THP sysfs enabled Zach O'Keefe
2022-06-06 23:02 ` Yang Shi
[not found] ` <YrzehlUoo2iMMLC2@xz-m1.local>
[not found] ` <CAAa6QmRXD5KboM8=ZZRPThOmcLEPtxzf0XyjkCeY_vgR7VOPqg@mail.gmail.com>
2022-06-30 2:32 ` Peter Xu [this message]
2022-06-30 14:17 ` Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 09/15] mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse Zach O'Keefe
2022-06-06 23:53 ` Yang Shi
2022-06-07 22:48 ` Zach O'Keefe
2022-06-08 0:39 ` Yang Shi
2022-06-09 17:35 ` Zach O'Keefe
2022-06-09 18:51 ` Yang Shi
2022-06-10 14:51 ` Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 10/15] mm/khugepaged: rename prefix of shared collapse functions Zach O'Keefe
2022-06-06 23:56 ` Yang Shi
2022-06-07 0:31 ` Zach O'Keefe
2022-06-04 0:40 ` [PATCH v6 11/15] mm/madvise: add MADV_COLLAPSE to process_madvise() Zach O'Keefe
2022-06-07 19:14 ` Yang Shi
2022-06-04 0:40 ` [PATCH v6 12/15] selftests/vm: modularize collapse selftests Zach O'Keefe
2022-06-04 0:40 ` [PATCH v6 13/15] selftests/vm: add MADV_COLLAPSE collapse context to selftests Zach O'Keefe
2022-06-04 0:40 ` [PATCH v6 14/15] selftests/vm: add selftest to verify recollapse of THPs Zach O'Keefe
2022-06-04 0:40 ` [PATCH v6 15/15] tools headers uapi: add MADV_COLLAPSE madvise mode to tools Zach O'Keefe
2022-06-06 23:58 ` Yang Shi
2022-06-07 0:24 ` Zach O'Keefe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yr0LR5bBmOEOudMw@xz-m1.local \
--to=peterx@redhat.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@linux.alibaba.com \
--cc=arnd@arndb.de \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=axelrasmussen@google.com \
--cc=chris@zankel.net \
--cc=ckennelly@google.com \
--cc=david@redhat.com \
--cc=deller@gmx.de \
--cc=hughd@google.com \
--cc=ink@jurassic.park.msu.ru \
--cc=jcmvbkbc@gmail.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linmiaohe@huawei.com \
--cc=linux-mm@kvack.org \
--cc=mattst88@gmail.com \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=pasha.tatashin@soleen.com \
--cc=patrickx@google.com \
--cc=rientjes@google.com \
--cc=rongwei.wang@linux.alibaba.com \
--cc=shy828301@gmail.com \
--cc=sj@kernel.org \
--cc=songliubraving@fb.com \
--cc=tsbogend@alpha.franken.de \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox