From: Alistair Popple <apopple@nvidia.com>
To: David Hildenbrand <david@redhat.com>
Cc: akpm@linux-foundation.org, dan.j.williams@intel.com,
linux-mm@kvack.org, lina@asahilina.net, zhang.lyra@gmail.com,
gerald.schaefer@linux.ibm.com, vishal.l.verma@intel.com,
dave.jiang@intel.com, logang@deltatee.com, bhelgaas@google.com,
jack@suse.cz, jgg@ziepe.ca, catalin.marinas@arm.com,
will@kernel.org, mpe@ellerman.id.au, npiggin@gmail.com,
dave.hansen@linux.intel.com, ira.weiny@intel.com,
willy@infradead.org, djwong@kernel.org, tytso@mit.edu,
linmiaohe@huawei.com, peterx@redhat.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linuxppc-dev@lists.ozlabs.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
jhubbard@nvidia.com, hch@lst.de, david@fromorbit.com
Subject: Re: [PATCH v4 14/25] rmap: Add support for PUD sized mappings to rmap
Date: Thu, 19 Dec 2024 09:55:35 +1100 [thread overview]
Message-ID: <volhyxjxlbsflldgs36ghzartel2tu625ubz3kfed2gdwrsamt@cpfsfhdpc4rp> (raw)
In-Reply-To: <4b5768b7-96e0-4864-9dbe-88fd1f0e87b8@redhat.com>
On Tue, Dec 17, 2024 at 11:27:13PM +0100, David Hildenbrand wrote:
> On 17.12.24 06:12, Alistair Popple wrote:
> > The rmap doesn't currently support adding a PUD mapping of a
> > folio. This patch adds support for entire PUD mappings of folios,
> > primarily to allow for more standard refcounting of device DAX
> > folios. Currently DAX is the only user of this and it doesn't require
> > support for partially mapped PUD-sized folios so we don't support for
> > that for now.
> >
> > Signed-off-by: Alistair Popple <apopple@nvidia.com>
> >
> > ---
> >
> > David - Thanks for your previous comments, I'm less familiar with the
> > rmap code so I would appreciate you taking another look. In particular
> > I haven't added a stat for PUD mapped folios as it seemed like
> > overkill for just the device DAX case but let me know if you think
> > otherwise.
> >
> > Changes for v4:
> >
> > - New for v4, split out rmap changes as suggested by David.
> > ---
> > include/linux/rmap.h | 15 ++++++++++++-
> > mm/rmap.c | 56 +++++++++++++++++++++++++++++++++++++++++++++-
> > 2 files changed, 71 insertions(+)
> >
> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > index 683a040..7043914 100644
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -192,6 +192,7 @@ typedef int __bitwise rmap_t;
> > enum rmap_level {
> > RMAP_LEVEL_PTE = 0,
> > RMAP_LEVEL_PMD,
> > + RMAP_LEVEL_PUD,
> > };
> > static inline void __folio_rmap_sanity_checks(const struct folio *folio,
> > @@ -228,6 +229,14 @@ static inline void __folio_rmap_sanity_checks(const struct folio *folio,
> > VM_WARN_ON_FOLIO(folio_nr_pages(folio) != HPAGE_PMD_NR, folio);
> > VM_WARN_ON_FOLIO(nr_pages != HPAGE_PMD_NR, folio);
> > break;
> > + case RMAP_LEVEL_PUD:
> > + /*
> > + * Assume that we are creating * a single "entire" mapping of the
> > + * folio.
> > + */
> > + VM_WARN_ON_FOLIO(folio_nr_pages(folio) != HPAGE_PUD_NR, folio);
> > + VM_WARN_ON_FOLIO(nr_pages != HPAGE_PUD_NR, folio);
> > + break;
> > default:
> > VM_WARN_ON_ONCE(true);
> > }
> > @@ -251,12 +260,16 @@ void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages,
> > folio_add_file_rmap_ptes(folio, page, 1, vma)
> > void folio_add_file_rmap_pmd(struct folio *, struct page *,
> > struct vm_area_struct *);
> > +void folio_add_file_rmap_pud(struct folio *, struct page *,
> > + struct vm_area_struct *);
> > void folio_remove_rmap_ptes(struct folio *, struct page *, int nr_pages,
> > struct vm_area_struct *);
> > #define folio_remove_rmap_pte(folio, page, vma) \
> > folio_remove_rmap_ptes(folio, page, 1, vma)
> > void folio_remove_rmap_pmd(struct folio *, struct page *,
> > struct vm_area_struct *);
> > +void folio_remove_rmap_pud(struct folio *, struct page *,
> > + struct vm_area_struct *);
> > void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
> > unsigned long address, rmap_t flags);
> > @@ -341,6 +354,7 @@ static __always_inline void __folio_dup_file_rmap(struct folio *folio,
> > atomic_add(orig_nr_pages, &folio->_large_mapcount);
> > break;
> > case RMAP_LEVEL_PMD:
> > + case RMAP_LEVEL_PUD:
> > atomic_inc(&folio->_entire_mapcount);
> > atomic_inc(&folio->_large_mapcount);
> > break;
> > @@ -437,6 +451,7 @@ static __always_inline int __folio_try_dup_anon_rmap(struct folio *folio,
> > atomic_add(orig_nr_pages, &folio->_large_mapcount);
> > break;
> > case RMAP_LEVEL_PMD:
> > + case RMAP_LEVEL_PUD:
> > if (PageAnonExclusive(page)) {
> > if (unlikely(maybe_pinned))
> > return -EBUSY;
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index c6c4d4e..39d0439 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1203,6 +1203,11 @@ static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
> > }
> > atomic_inc(&folio->_large_mapcount);
> > break;
> > + case RMAP_LEVEL_PUD:
> > + /* We only support entire mappings of PUD sized folios in rmap */
> > + atomic_inc(&folio->_entire_mapcount);
> > + atomic_inc(&folio->_large_mapcount);
> > + break;
>
>
> This way you don't account the pages at all as mapped, whereby PTE-mapping it
> would? And IIRC, these PUD-sized pages can be either mapped using PTEs or
> using a single PUD.
Oh good point. I was thinking that because we don't account PUD mappings today
that it would be fine to ignore them. But of course this series means we start
accounting them if mapped with PTEs so agree we should be consistent.
> I suspect what you want is to
Yes, I think so. Thanks for the hint. I will be out over the Christmas break but
will do a respin to incorporate this before then.
> diff --git a/mm/rmap.c b/mm/rmap.c
> index c6c4d4ea29a7e..1477028d3a176 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1187,12 +1187,19 @@ static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
> atomic_add(orig_nr_pages, &folio->_large_mapcount);
> break;
> case RMAP_LEVEL_PMD:
> + case RMAP_LEVEL_PUD:
> first = atomic_inc_and_test(&folio->_entire_mapcount);
> if (first) {
> nr = atomic_add_return_relaxed(ENTIRELY_MAPPED, mapped);
> if (likely(nr < ENTIRELY_MAPPED + ENTIRELY_MAPPED)) {
> - *nr_pmdmapped = folio_nr_pages(folio);
> - nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
> + nr_pages = folio_nr_pages(folio);
> + /*
> + * We only track PMD mappings of PMD-sized
> + * folios separately.
> + */
> + if (level == RMAP_LEVEL_PMD)
> + *nr_pmdmapped = nr_pages;
> + nr = nr_pages - (nr & FOLIO_PAGES_MAPPED);
> /* Raced ahead of a remove and another add? */
> if (unlikely(nr < 0))
> nr = 0;
>
> Similar on the removal path.
>
>
> --
> Cheers,
>
> David / dhildenb
>
next prev parent reply other threads:[~2024-12-18 22:55 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-17 5:12 [PATCH v4 00/25] fs/dax: Fix ZONE_DEVICE page reference counts Alistair Popple
2024-12-17 5:12 ` [PATCH v4 01/25] fuse: Fix dax truncate/punch_hole fault path Alistair Popple
2024-12-17 5:12 ` [PATCH v4 02/25] fs/dax: Return unmapped busy pages from dax_layout_busy_page_range() Alistair Popple
2024-12-17 5:12 ` [PATCH v4 03/25] fs/dax: Don't skip locked entries when scanning entries Alistair Popple
2024-12-17 5:12 ` [PATCH v4 04/25] fs/dax: Refactor wait for dax idle page Alistair Popple
2024-12-17 5:12 ` [PATCH v4 05/25] fs/dax: Create a common implementation to break DAX layouts Alistair Popple
2024-12-17 5:12 ` [PATCH v4 06/25] fs/dax: Always remove DAX page-cache entries when breaking layouts Alistair Popple
2024-12-17 5:12 ` [PATCH v4 07/25] fs/dax: Ensure all pages are idle prior to filesystem unmount Alistair Popple
2024-12-17 5:12 ` [PATCH v4 08/25] fs/dax: Remove PAGE_MAPPING_DAX_SHARED mapping flag Alistair Popple
2024-12-17 5:12 ` [PATCH v4 09/25] mm/gup.c: Remove redundant check for PCI P2PDMA page Alistair Popple
2024-12-17 22:06 ` David Hildenbrand
2024-12-17 5:12 ` [PATCH v4 10/25] mm/mm_init: Move p2pdma page refcount initialisation to p2pdma Alistair Popple
2024-12-17 22:14 ` David Hildenbrand
2024-12-18 22:49 ` Alistair Popple
2024-12-20 18:29 ` David Hildenbrand
2024-12-17 5:12 ` [PATCH v4 11/25] mm: Allow compound zone device pages Alistair Popple
2024-12-17 5:12 ` [PATCH v4 12/25] mm/memory: Enhance insert_page_into_pte_locked() to create writable mappings Alistair Popple
2024-12-20 19:01 ` David Hildenbrand
2024-12-20 19:06 ` David Hildenbrand
2025-01-06 2:07 ` Alistair Popple
2025-01-07 11:29 ` David Hildenbrand
2024-12-17 5:12 ` [PATCH v4 13/25] mm/memory: Add vmf_insert_page_mkwrite() Alistair Popple
2024-12-17 5:12 ` [PATCH v4 14/25] rmap: Add support for PUD sized mappings to rmap Alistair Popple
2024-12-17 22:27 ` David Hildenbrand
2024-12-18 22:55 ` Alistair Popple [this message]
2024-12-20 18:31 ` David Hildenbrand
2024-12-17 5:12 ` [PATCH v4 15/25] huge_memory: Add vmf_insert_folio_pud() Alistair Popple
2024-12-20 18:52 ` David Hildenbrand
2025-01-06 6:39 ` Alistair Popple
2024-12-17 5:12 ` [PATCH v4 16/25] huge_memory: Add vmf_insert_folio_pmd() Alistair Popple
2024-12-20 18:54 ` David Hildenbrand
2024-12-17 5:13 ` [PATCH v4 17/25] memremap: Add is_device_dax_page() and is_fsdax_page() helpers Alistair Popple
2024-12-20 18:39 ` David Hildenbrand
2024-12-17 5:13 ` [PATCH v4 18/25] gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages Alistair Popple
2024-12-17 22:33 ` David Hildenbrand
2024-12-17 5:13 ` [PATCH v4 19/25] proc/task_mmu: Ignore ZONE_DEVICE pages Alistair Popple
2024-12-17 22:31 ` David Hildenbrand
2024-12-18 23:11 ` Alistair Popple
2024-12-20 18:32 ` David Hildenbrand
2025-01-06 6:43 ` Alistair Popple
2024-12-17 5:13 ` [PATCH v4 20/25] mm/mlock: Skip ZONE_DEVICE PMDs during mlock Alistair Popple
2024-12-17 22:28 ` David Hildenbrand
2024-12-17 5:13 ` [PATCH v4 21/25] fs/dax: Properly refcount fs dax pages Alistair Popple
2024-12-17 5:13 ` [PATCH v4 22/25] device/dax: Properly refcount device dax pages when mapping Alistair Popple
2024-12-17 5:13 ` [PATCH v4 23/25] mm: Remove pXX_devmap callers Alistair Popple
2024-12-17 5:13 ` [PATCH v4 24/25] mm: Remove devmap related functions and page table bits Alistair Popple
2024-12-17 5:13 ` [PATCH v4 25/25] Revert "riscv: mm: Add support for ZONE_DEVICE" Alistair Popple
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=volhyxjxlbsflldgs36ghzartel2tu625ubz3kfed2gdwrsamt@cpfsfhdpc4rp \
--to=apopple@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=bhelgaas@google.com \
--cc=catalin.marinas@arm.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=david@fromorbit.com \
--cc=david@redhat.com \
--cc=djwong@kernel.org \
--cc=gerald.schaefer@linux.ibm.com \
--cc=hch@lst.de \
--cc=ira.weiny@intel.com \
--cc=jack@suse.cz \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=lina@asahilina.net \
--cc=linmiaohe@huawei.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=logang@deltatee.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=nvdimm@lists.linux.dev \
--cc=peterx@redhat.com \
--cc=tytso@mit.edu \
--cc=vishal.l.verma@intel.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=zhang.lyra@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox