From: Peter Xu <peterx@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH 1/5] mm: Free non-hugetlb large folios in a batch
Date: Thu, 25 Apr 2024 11:00:39 -0400 [thread overview]
Message-ID: <ZipvVIKasc4b4q0r@x1n> (raw)
In-Reply-To: <ZinQYl3Vz37PLC8_@casper.infradead.org>
On Thu, Apr 25, 2024 at 04:39:14AM +0100, Matthew Wilcox wrote:
> On Wed, Apr 24, 2024 at 11:20:28AM -0400, Peter Xu wrote:
> > On Fri, Apr 05, 2024 at 04:32:23PM +0100, Matthew Wilcox (Oracle) wrote:
> > > free_unref_folios() can now handle non-hugetlb large folios, so keep
> > > normal large folios in the batch. hugetlb folios still need to be
> > > handled specially. I believe that folios freed using put_pages_list()
> > > cannot be accounted to a memcg (or the small folios would trip the "page
> > > still charged to cgroup" warning), but put an assertion in to check that.
> >
> > There's such user, iommu uses put_pages_list() to free IOMMU pgtables, and
> > they can be memcg accounted; since 2023 iommu_map switched to use
> > GFP_KERNEL_ACCOUNT.
> >
> > I hit below panic when testing my local branch over mm-everthing when
> > running some VFIO workloads.
> >
> > For this specific vfio use case, see 160912fc3d4a ("vfio/type1: account
> > iommu allocations").
> >
> > I think we should remove the VM_BUG_ON_FOLIO() line, as the memcg will then
> > be properly taken care of later in free_pages_prepare(). Fixup attached at
> > the end that will fix this crash for me.
>
> Yes, I think you're right.
>
> I was concerned about the deferred split list / memcg charge problem,
> but (a) page table pages can't ever be on the deferred split list, (b)
> just passing them through to free_unref_folios() works fine. The problem
> was that folios_put_refs() was uncharging a batch before passing them
> to free_unref_folios().
>
> That does bring up the question though ... should we be uncharging
> these folios as a batch for better performance? Do you have a workload
> which frees a lot of page tables? Presumably an exit would do that.
> If so, does adding a call to mem_cgroup_uncharge_folios() before calling
> free_unref_folios() improve performance in any noticable way?
Looks like something worth trying indeed.
The trace I hit was an exit path, but we can double check whether it can
even happen in some iommu hot paths too like unmap, so maybe such change
would justify better in that case?
AFAIU based on my reading to the current iommu pgtable mgmt it's more
aggresive than cpu pgtable on freeing pgtable pages, so it looks like such
batched release can happen during an iommu unmap too rather than exit only:
intel_iommu_unmap
domain_unmap
dma_pte_clear_level
dma_pte_list_pagetables
But still worth checking in a test, perhaps the easiest way is to use
ioctl(VFIO_IOMMU_[UN]MAP_DMA).
>
> In the meantime, this patch:
>
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>
> although I think Andrew will just fold it into
> "mm: free non-hugetlb large folios in a batch"
>
> Andrew, if you do do that, please also edit out the last couple of
> sentences from the commit message:
>
> free_unref_folios() can now handle non-hugetlb large folios, so keep
> normal large folios in the batch. hugetlb folios still need to be handled
> - specially. I believe that folios freed using put_pages_list() cannot be
> - accounted to a memcg (or the small folios would trip the "page still
> - charged to cgroup" warning), but put an assertion in to check that.
> + specially.
Yes, we should drop these lines too.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2024-04-25 15:01 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-05 15:32 [PATCH 0/5] Clean up __folio_put() Matthew Wilcox (Oracle)
2024-04-05 15:32 ` [PATCH 1/5] mm: Free non-hugetlb large folios in a batch Matthew Wilcox (Oracle)
2024-04-24 15:20 ` Peter Xu
2024-04-25 3:39 ` Matthew Wilcox
2024-04-25 15:00 ` Peter Xu [this message]
2024-04-25 20:54 ` Andrew Morton
2024-04-05 15:32 ` [PATCH 2/5] mm: Combine free_the_page() and free_unref_page() Matthew Wilcox (Oracle)
2024-04-05 15:32 ` [PATCH 3/5] mm: Inline destroy_large_folio() into __folio_put_large() Matthew Wilcox (Oracle)
2024-04-05 15:32 ` [PATCH 4/5] mm: Combine __folio_put_small, __folio_put_large and __folio_put Matthew Wilcox (Oracle)
2024-04-05 15:32 ` [PATCH 5/5] mm: Convert free_zone_device_page to free_zone_device_folio Matthew Wilcox (Oracle)
2024-04-05 16:34 ` [PATCH 0/5] Clean up __folio_put() Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZipvVIKasc4b4q0r@x1n \
--to=peterx@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox