From: Peter Xu <peterx@redhat.com>
To: Zach O'Keefe <zokeefe@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>,
David Hildenbrand <david@redhat.com>,
David Rientjes <rientjes@google.com>,
Matthew Wilcox <willy@infradead.org>,
Michal Hocko <mhocko@suse.com>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
SeongJae Park <sj@kernel.org>, Song Liu <songliubraving@fb.com>,
Vlastimil Babka <vbabka@suse.cz>, Yang Shi <shy828301@gmail.com>,
Zi Yan <ziy@nvidia.com>,
linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Arnd Bergmann <arnd@arndb.de>,
Axel Rasmussen <axelrasmussen@google.com>,
Chris Kennelly <ckennelly@google.com>,
Chris Zankel <chris@zankel.net>, Helge Deller <deller@gmx.de>,
Hugh Dickins <hughd@google.com>,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
Jens Axboe <axboe@kernel.dk>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Matt Turner <mattst88@gmail.com>,
Max Filippov <jcmvbkbc@gmail.com>,
Miaohe Lin <linmiaohe@huawei.com>,
Minchan Kim <minchan@kernel.org>,
Patrick Xia <patrickx@google.com>,
Pavel Begunkov <asml.silence@gmail.com>,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
kernel test robot <lkp@intel.com>
Subject: Re: [PATCH v3 03/12] mm/khugepaged: make hugepage allocation context-specific
Date: Thu, 28 Apr 2022 11:52:21 -0400 [thread overview]
Message-ID: <Ymq4NUXQim753UWC@xz-m1.local> (raw)
In-Reply-To: <CAAa6QmTDHNFvhrAaoR93M53cNLaSPGZwVBNgU_RjQgwgnKUpWQ@mail.gmail.com>
On Thu, Apr 28, 2022 at 08:37:06AM -0700, Zach O'Keefe wrote:
> On Thu, Apr 28, 2022 at 7:51 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Wed, Apr 27, 2022 at 05:51:53PM -0700, Zach O'Keefe wrote:
> > > > > @@ -1110,13 +1115,14 @@ static void collapse_huge_page(struct mm_struct *mm,
> > > > > */
> > > > > mmap_read_unlock(mm);
> > > > >
> > > > > + node = khugepaged_find_target_node(cc);
> > > > > /* sched to specified node before huage page memory copy */
> > > > > if (task_node(current) != node) {
> > > > > cpumask = cpumask_of_node(node);
> > > > > if (!cpumask_empty(cpumask))
> > > > > set_cpus_allowed_ptr(current, cpumask);
> > > > > }
> > > > > - new_page = khugepaged_alloc_page(hpage, gfp, node);
> > > > > + new_page = cc->alloc_hpage(cc, gfp, node);
> > > >
> > > > AFAICT you removed all references of khugepaged_alloc_page() in this patch,
> > > > then you'd better drop the function for both NUMA and UMA.
> > > >
> > >
> > > Sorry, I'm not sure I follow. In khugepaged context, logic WRT
> > > khugepaged_alloc_page() is unchanged - it's just called indirectly
> > > through ->alloc_hpage().
> >
> > Ah you're right, sorry for the confusion.
> >
> > >
> > > > Said that, I think it's actually better to keep them, as they do things
> > > > useful. For example,AFAICT this ->alloc_hpage() change can leak the hpage
> > > > alloacted for UMA already so that's why I think keeping them makes sense,
> > > > then iiuc the BUG_ON would trigger with UMA already.
> > > >
> > > > I saw that you've moved khugepaged_find_target_node() into this function
> > > > which looks definitely good, but if we could keep khugepaged_alloc_page()
> > > > and then IMHO we could even move khugepaged_find_target_node() into that
> > > > and drop the "node" parameter in khugepaged_alloc_page().
> > > >
> > >
> > > I actually had done this, until commit 4272063db38c ("mm/khugepaged:
> > > sched to numa node when collapse huge page") which forced me to keep
> > > "node" visible in this function.
> >
> > Right, I was looking at my local tree and that patch seems to be very
> > lately added into -next.
> >
> > I'm curious why it wasn't applying to file thps too if it is worthwhile,
> > since if so that's also a suitable candidate to be moved into the same
> > hook. I've asked in that thread instead.
> >
> > Before that, feel free to keep your code as is.
> >
>
> Thanks for checking on that. On second thought, it seems like we would
> actually want to move this sched logic into the khupaged hook impl
> since we probably don't want to be moving around the calling process
> if in madvise context.
Sounds correct, if it's moved over it must only be in the new helper (if
you're going to introduce it, like below :) that was only for khugepaged.
Actually I really start to question that idea more.. e.g. even for
khugepaged the target node can be changing rapidly depending on the owner
of pages planned to collapse.
Whether does it make sense to bounce khugepaged thread itself seems still
questionable to me, and definitely not a good idea for madvise() context.
The only proof in that patch was a whole testbench result but I'm not sure
how reliable that'll be when applied to real world scenarios.
>
> > >
> > > > I'd even consider moving cc->gfp() all into it if possible, since the gfp
> > > > seems to be always with __GFP_THISNODE anyways.
> > > >
> > >
> > > I would have liked to do this, but the gfp flags are referenced again
> > > when calling mem_cgroup_charge(), so I couldn't quite get rid of them
> > > from here.
> >
> > Yeah, maybe we could move mem_cgroup_charge() into the hook too? As below
> > codes are duplicated between file & anon and IMHO they're good candidate to
> > a new helper already anyway:
> >
> > /* Only allocate from the target node */
> > gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE;
> >
> > new_page = khugepaged_alloc_page(hpage, gfp, node);
> > if (!new_page) {
> > result = SCAN_ALLOC_HUGE_PAGE_FAIL;
> > goto out;
> > }
> >
> > if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) {
> > result = SCAN_CGROUP_CHARGE_FAIL;
> > goto out;
> > }
> > count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC);
> >
> > If we want to generalize it maybe we want to return the "result" instead of
> > the new page though, so the newpage can be passed in as a pointer.
> >
> > There's one mmap_read_unlock(mm) for the anonymous code but I think that
> > can simply be moved above the chunk.
> >
> > No strong opinion here, please feel free to think about what's the best
> > approach for landing this series.
> >
>
> Thanks for the suggestion. I'll play around with it a little and see
> what looks good.
Sounds good!
If the new helper would be welcomed then it can be a standalone
pre-requisite patch to clean things up first. Let's also see how it goes
with the other patch, because there's chance you can also cleanup
khugepaged_find_target_node() in the same patch along with all above.
--
Peter Xu
next prev parent reply other threads:[~2022-04-28 15:52 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-26 14:44 [PATCH v3 00/12] mm: userspace hugepage collapse Zach O'Keefe
2022-04-26 14:44 ` [PATCH v3 01/12] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds THP Zach O'Keefe
2022-04-27 0:26 ` Peter Xu
2022-04-27 15:48 ` Zach O'Keefe
2022-04-26 14:44 ` [PATCH v3 02/12] mm/khugepaged: add struct collapse_control Zach O'Keefe
2022-04-27 19:49 ` Peter Xu
2022-04-28 0:19 ` Zach O'Keefe
2022-04-26 14:44 ` [PATCH v3 03/12] mm/khugepaged: make hugepage allocation context-specific Zach O'Keefe
2022-04-27 20:04 ` Peter Xu
2022-04-28 0:51 ` Zach O'Keefe
2022-04-28 14:51 ` Peter Xu
2022-04-28 15:37 ` Zach O'Keefe
2022-04-28 15:52 ` Peter Xu [this message]
2022-04-26 14:44 ` [PATCH v3 04/12] mm/khugepaged: add struct collapse_result Zach O'Keefe
2022-04-27 20:47 ` Peter Xu
2022-04-28 21:59 ` Zach O'Keefe
2022-04-28 23:21 ` Peter Xu
2022-04-29 16:01 ` Zach O'Keefe
2022-04-26 14:44 ` [PATCH v3 05/12] mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse Zach O'Keefe
2022-04-26 14:44 ` [PATCH v3 06/12] mm/khugepaged: remove khugepaged prefix from shared collapse functions Zach O'Keefe
2022-04-27 21:10 ` Peter Xu
2022-04-28 22:51 ` Zach O'Keefe
2022-04-26 14:44 ` [PATCH v3 07/12] mm/khugepaged: add flag to ignore khugepaged_max_ptes_* Zach O'Keefe
2022-04-27 21:12 ` Peter Xu
2022-04-29 14:26 ` Zach O'Keefe
2022-04-26 14:44 ` [PATCH v3 08/12] mm/khugepaged: add flag to ignore page young/referenced requirement Zach O'Keefe
2022-04-26 14:44 ` [PATCH v3 09/12] mm/madvise: add MADV_COLLAPSE to process_madvise() Zach O'Keefe
2022-04-26 14:44 ` [PATCH v3 10/12] selftests/vm: modularize collapse selftests Zach O'Keefe
2022-04-26 14:44 ` [PATCH v3 11/12] selftests/vm: add MADV_COLLAPSE collapse context to selftests Zach O'Keefe
2022-04-26 14:44 ` [PATCH v3 12/12] selftests/vm: add test to verify recollapse of THPs Zach O'Keefe
2022-04-26 20:23 ` [PATCH v3 00/12] mm: userspace hugepage collapse Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ymq4NUXQim753UWC@xz-m1.local \
--to=peterx@redhat.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@linux.alibaba.com \
--cc=arnd@arndb.de \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=axelrasmussen@google.com \
--cc=chris@zankel.net \
--cc=ckennelly@google.com \
--cc=david@redhat.com \
--cc=deller@gmx.de \
--cc=hughd@google.com \
--cc=ink@jurassic.park.msu.ru \
--cc=jcmvbkbc@gmail.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linmiaohe@huawei.com \
--cc=linux-mm@kvack.org \
--cc=lkp@intel.com \
--cc=mattst88@gmail.com \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=pasha.tatashin@soleen.com \
--cc=patrickx@google.com \
--cc=rientjes@google.com \
--cc=shy828301@gmail.com \
--cc=sj@kernel.org \
--cc=songliubraving@fb.com \
--cc=tsbogend@alpha.franken.de \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox