From: Matthew Wilcox <willy@infradead.org>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Nick Piggin <npiggin@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Sven Schnelle <svens@linux.ibm.com>,
Arnd Bergmann <arnd@arndb.de>,
David Hildenbrand <david@redhat.com>, Yu Zhao <yuzhao@google.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Yin Fengwei <fengwei.yin@intel.com>,
Yang Shi <shy828301@gmail.com>,
"Huang, Ying" <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 4/5] mm: Refector release_pages()
Date: Fri, 1 Sep 2023 05:36:12 +0100 [thread overview]
Message-ID: <ZPFqPPqW3MYQFCSb@casper.infradead.org> (raw)
In-Reply-To: <31aa40aa-1471-44ca-9cc8-3f5476fbde7e@arm.com>
On Thu, Aug 31, 2023 at 08:57:51PM +0100, Ryan Roberts wrote:
> On 30/08/2023 20:11, Matthew Wilcox wrote:
> > On Wed, Aug 30, 2023 at 10:50:10AM +0100, Ryan Roberts wrote:
> >> In preparation for implementing folios_put_refs() in the next patch,
> >> refactor release_pages() into a set of helper functions, which can be
> >> reused. The primary difference between release_pages() and
> >> folios_put_refs() is how they iterate over the set of folios. The
> >> per-folio actions are identical.
> >
> > As you noted, we have colliding patchsets. I'm not hugely happy with
> > how patch 4 turned out,
>
> Could you describe the issues as you see them? I'm keen not to repeat the same
> bad patterns in future.
I think you absolutely had the right idea. I'd've probably done the same
as you in the same circumstances as you. It's just that I'd done this
patch series getting rid of / simplifying a bunch of the code you were
modifying, and I thought it'd make your 4/5 irrelevant and 5/5 simpler.
> > void release_unref_folios(struct folio_batch *folios)
> > {
> > struct lruvec *lruvec = NULL;
> > unsigned long flags = 0;
> > int i;
> >
> > for (i = 0; i < folios->nr; i++) {
> > struct folio *folio = folios->folios[i];
> > free_swap_cache(folio);
>
> Agree this can't go here - would put it in pfn_range_put(). But would not want
> it in folios_put() as you suggeted in the other email - that would surely change
> the behaviour of folios_put()?
yes; perhaps there are times when we want to put a batch of folios and
_don't_ want to rip them out of the swapcache. I haven't thought about
this in much detail, and we're definitely venturing into areas of the
MM where I get myself in trouble (ie anonymous memory).
> > __page_cache_release(folio, &lruvec, &flags);
> > }
>
> I think you would need to add:
>
> if (lruvec)
> unlock_page_lruvec_irqrestore(lruvec, flags);
Indeed.
> > mem_cgroup_uncharge_folios(folios);
> > free_unref_folios(folios);
> > }
> >
> > then this becomes:
> >
> > void folios_put(struct folio_batch *folios)
> > {
> > int i, j;
> >
> > for (i = 0, j = 0; i < folios->nr; i++) {
> > struct folio *folio = folios->folios[i];
> >
> > if (is_huge_zero_page(&folio->page))
> > continue;
> > if (folio_is_zone_device(folio)) {
> > if (put_devmap_managed_page(&folio->page))
> > continue;
> > if (folio_put_testzero(folio))
> > free_zone_device_page(&folio->page);
> > continue;
> > }
> >
> > if (!folio_put_testzero(folio))
> > continue;
> > if (folio_test_hugetlb(folio)) {
> > free_huge_folio(folio);
> > continue;
> > }
> >
> > if (j != i)
> > folios->folios[j] = folio;
> > j++;
> > }
> >
> > folios->nr = j;
> > if (!j)
> > return;
> > release_unref_folios(folios);
> > }
> >
> > and pfn_range_put() also becomes shorter and loses all the lruvec work.
>
> something like this?
>
> void pfn_range_put(struct pfn_range *range, unsigned int nr)
> {
> struct folio_batch folios;
> unsigned int i;
>
> folio_batch_init(&folios);
> for (i = 0; i < nr; i++) {
> struct folio *folio = pfn_folio(range[i].start);
> unsigned int refs = range[i].end - range[i].start;
>
> free_swap_cache(folio); // <<<<< HERE? To be equivalent to
> // free_pages_and_swap_cache()
>
> if (is_huge_zero_page(&folio->page))
> continue;
>
> if (folio_is_zone_device(folio)) {
> if (put_devmap_managed_page_refs(&folio->page, refs))
> continue;
> if (folio_ref_sub_and_test(folio, refs))
> free_zone_device_page(&folio->page);
> continue;
> }
>
> if (!folio_ref_sub_and_test(folio, refs))
> continue;
>
> /* hugetlb has its own memcg */
> if (folio_test_hugetlb(folio)) {
> free_huge_folio(folio);
> continue;
> }
>
> if (folio_batch_add(&folios, folio) == 0)
> release_unref_folios(&folios);
> }
>
> if (folios.nr)
> release_unref_folios(&folios);
> }
>
> >
> > Thoughts?
>
> Looks like its getting there to me. Although the bulk of the logic inside the
> loop is still common between folios_put() and pfn_range_put(); perhaps we can
> have a common helper for that, which would just need to take (folio, refs)?
>
> So by adding free_swap_cache() above, we can call pfn_range_put() direct and can
> remove free_pages_and_swap_cache() entirely.
Yes. Maybe this works? I'd like it to!
> What's the best way to move forward here? Two options as I see it:
>
> - I drop patch 4 and create a version of pfn_range_put() that has the same
> semantic as above but essentially just copies the old release_pages() (similar
> to my v1). That keeps my series independent. Then you can replace with the new
> pfn_range_put() as part of your series.
>
> - We can just hook my patches up to the end of your series and do it all in one go.
>
> Opinion?
I'm reluctant to tell you "hey, delay until after this series of mine".
We don't have good data yet on whether my patch series helps or hurts.
Plus I'm about to take a few days off (I'll be back for next Wednesday's
meeting). I think your first option is better (and do feel free to use
a different name from pfn_range_put()) just to keep us from colliding
in ways that ultimately don't matter.
next prev parent reply other threads:[~2023-09-01 4:36 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-30 9:50 [PATCH v2 0/5] Optimize mmap_exit for large folios Ryan Roberts
2023-08-30 9:50 ` [PATCH v2 1/5] mm: Implement folio_remove_rmap_range() Ryan Roberts
2023-08-30 14:51 ` Matthew Wilcox
2023-08-30 15:42 ` Ryan Roberts
2023-08-30 16:24 ` David Hildenbrand
2023-08-30 9:50 ` [PATCH v2 2/5] mm/mmu_gather: generalize mmu_gather rmap removal mechanism Ryan Roberts
2023-08-30 9:50 ` [PATCH v2 3/5] mm/mmu_gather: Remove encoded_page infrastructure Ryan Roberts
2023-08-30 9:50 ` [PATCH v2 4/5] mm: Refector release_pages() Ryan Roberts
2023-08-30 19:11 ` Matthew Wilcox
2023-08-30 21:17 ` Matthew Wilcox
2023-08-31 19:57 ` Ryan Roberts
2023-09-01 4:36 ` Matthew Wilcox [this message]
2023-08-30 9:50 ` [PATCH v2 5/5] mm/mmu_gather: Store and process pages in contig ranges Ryan Roberts
2023-08-30 15:07 ` Matthew Wilcox
2023-08-30 15:32 ` Ryan Roberts
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZPFqPPqW3MYQFCSb@casper.infradead.org \
--to=willy@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=arnd@arndb.de \
--cc=borntraeger@linux.ibm.com \
--cc=david@redhat.com \
--cc=fengwei.yin@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@gmail.com \
--cc=peterz@infradead.org \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=svens@linux.ibm.com \
--cc=will@kernel.org \
--cc=ying.huang@intel.com \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox