From: Alistair Popple <apopple@nvidia.com>
To: akpm@linux-foundation.org, dan.j.williams@intel.com, linux-mm@kvack.org
Cc: Alison Schofield <alison.schofield@intel.com>,
lina@asahilina.net, zhang.lyra@gmail.com,
gerald.schaefer@linux.ibm.com, vishal.l.verma@intel.com,
dave.jiang@intel.com, logang@deltatee.com, bhelgaas@google.com,
jack@suse.cz, jgg@ziepe.ca, catalin.marinas@arm.com,
will@kernel.org, mpe@ellerman.id.au, npiggin@gmail.com,
dave.hansen@linux.intel.com, ira.weiny@intel.com,
willy@infradead.org, djwong@kernel.org, tytso@mit.edu,
linmiaohe@huawei.com, david@redhat.com, peterx@redhat.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linuxppc-dev@lists.ozlabs.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
jhubbard@nvidia.com, hch@lst.de, david@fromorbit.com,
chenhuacai@kernel.org, kernel@xen0n.name,
loongarch@lists.linux.dev
Subject: Re: [PATCH v9 00/20] fs/dax: Fix ZONE_DEVICE page reference counts
Date: Fri, 28 Feb 2025 14:42:40 +1100 [thread overview]
Message-ID: <xhbru4aekyfl25552le5tvifwonyuwoyioxrqxy6zkm2xlyhc5@oqxnudb4bope> (raw)
In-Reply-To: <cover.8068ad144a7eea4a813670301f4d2a86a8e68ec4.1740713401.git-series.apopple@nvidia.com>
Andrew,
This is essentially the same as what's currently in mm-unstable aside from
the two updates listed below. The main thing to note is it incorporates
Balbir's fixup which is currently in mm-unstable as c98612955016
("mm-allow-compound-zone-device-pages-fix-fix")
- Alistair
On Fri, Feb 28, 2025 at 02:30:55PM +1100, Alistair Popple wrote:
> Main updates since v8:
>
> - Fixed reading of bad pgmap in migrate_vma_collect_pmd() as reported/fixed
> by Balbir.
>
> - Fixed bad warnings generated in free_zone_device_folio() when pgmap->ops
> isn't defined, even if it's not required to be. As reported by Gerald.
>
> Main updates since v7:
>
> - Rebased on current akpm/mm-unstable in order to fix conflicts with
> https://lore.kernel.org/linux-mm/20241216155408.8102-1-willy@infradead.org/
> as requested by Andrew.
>
> - Collected Ack'ed/Reviewed by
>
> - Cleaned up a unnecessary and confusing assignment to pgtable.
>
> - Other minor reworks suggested by David Hildenbrand
>
> Main updates since v6:
>
> - Clean ups and fixes based on feedback from David and Dan.
>
> - Rebased from next-20241216 to v6.14-rc1. No conflicts.
>
> - Dropped the PTE bit removals and clean-ups - will post this as a
> separate series to be merged after this one as Dan wanted it split
> up more and this series is already too big.
>
> Main updates since v5:
>
> - Reworked patch 1 based on Dan's feedback.
>
> - Fixed build issues on PPC and when CONFIG_PGTABLE_HAS_HUGE_LEAVES
> is no defined.
>
> - Minor comment formatting and documentation fixes.
>
> - Remove PTE_DEVMAP definitions from Loongarch which were added since
> this series was initially written.
>
> Main updates since v4:
>
> - Removed most of the devdax/fsdax checks in fs/proc/task_mmu.c. This
> means smaps/pagemap may contain DAX pages.
>
> - Fixed rmap accounting of PUD mapped pages.
>
> - Minor code clean-ups.
>
> Main updates since v3:
>
> - Rebased onto next-20241216. The rebase wasn't too difficult, but in
> the interests of getting this out sooner for Andrew to look at as
> requested by him I have yet to extensively build/run test this
> version of the series.
>
> - Fixed a bunch of build breakages reported by John Hubbard and the
> kernel test robot due to various combinations of CONFIG options.
>
> - Split the rmap changes into a separate patch as suggested by David H.
>
> - Reworded the description for the P2PDMA change.
>
> Main updates since v2:
>
> - Rename the DAX specific dax_insert_XXX functions to vmf_insert_XXX
> and have them pass the vmf struct.
>
> - Separate out the device DAX changes.
>
> - Restore the page share mapping counting and associated warnings.
>
> - Rework truncate to require file-systems to have previously called
> dax_break_layout() to remove the address space mapping for a
> page. This found several bugs which are fixed by the first half of
> the series. The motivation for this was initially to allow the FS
> DAX page-cache mappings to hold a reference on the page.
>
> However that turned out to be a dead-end (see the comments on patch
> 21), but it found several bugs and I think overall it is an
> improvement so I have left it here.
>
> Device and FS DAX pages have always maintained their own page
> reference counts without following the normal rules for page reference
> counting. In particular pages are considered free when the refcount
> hits one rather than zero and refcounts are not added when mapping the
> page.
>
> Tracking this requires special PTE bits (PTE_DEVMAP) and a secondary
> mechanism for allowing GUP to hold references on the page (see
> get_dev_pagemap). However there doesn't seem to be any reason why FS
> DAX pages need their own reference counting scheme.
>
> By treating the refcounts on these pages the same way as normal pages
> we can remove a lot of special checks. In particular pXd_trans_huge()
> becomes the same as pXd_leaf(), although I haven't made that change
> here. It also frees up a valuable SW define PTE bit on architectures
> that have devmap PTE bits defined.
>
> It also almost certainly allows further clean-up of the devmap managed
> functions, but I have left that as a future improvment. It also
> enables support for compound ZONE_DEVICE pages which is one of my
> primary motivators for doing this work.
>
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> Tested-by: Alison Schofield <alison.schofield@intel.com>
>
> ---
>
> Cc: lina@asahilina.net
> Cc: zhang.lyra@gmail.com
> Cc: gerald.schaefer@linux.ibm.com
> Cc: dan.j.williams@intel.com
> Cc: vishal.l.verma@intel.com
> Cc: dave.jiang@intel.com
> Cc: logang@deltatee.com
> Cc: bhelgaas@google.com
> Cc: jack@suse.cz
> Cc: jgg@ziepe.ca
> Cc: catalin.marinas@arm.com
> Cc: will@kernel.org
> Cc: mpe@ellerman.id.au
> Cc: npiggin@gmail.com
> Cc: dave.hansen@linux.intel.com
> Cc: ira.weiny@intel.com
> Cc: willy@infradead.org
> Cc: djwong@kernel.org
> Cc: tytso@mit.edu
> Cc: linmiaohe@huawei.com
> Cc: david@redhat.com
> Cc: peterx@redhat.com
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: nvdimm@lists.linux.dev
> Cc: linux-cxl@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: linux-ext4@vger.kernel.org
> Cc: linux-xfs@vger.kernel.org
> Cc: jhubbard@nvidia.com
> Cc: hch@lst.de
> Cc: david@fromorbit.com
> Cc: chenhuacai@kernel.org
> Cc: kernel@xen0n.name
> Cc: loongarch@lists.linux.dev
>
> Alistair Popple (19):
> fuse: Fix dax truncate/punch_hole fault path
> fs/dax: Return unmapped busy pages from dax_layout_busy_page_range()
> fs/dax: Don't skip locked entries when scanning entries
> fs/dax: Refactor wait for dax idle page
> fs/dax: Create a common implementation to break DAX layouts
> fs/dax: Always remove DAX page-cache entries when breaking layouts
> fs/dax: Ensure all pages are idle prior to filesystem unmount
> fs/dax: Remove PAGE_MAPPING_DAX_SHARED mapping flag
> mm/gup: Remove redundant check for PCI P2PDMA page
> mm/mm_init: Move p2pdma page refcount initialisation to p2pdma
> mm: Allow compound zone device pages
> mm/memory: Enhance insert_page_into_pte_locked() to create writable mappings
> mm/memory: Add vmf_insert_page_mkwrite()
> mm/rmap: Add support for PUD sized mappings to rmap
> mm/huge_memory: Add vmf_insert_folio_pud()
> mm/huge_memory: Add vmf_insert_folio_pmd()
> mm/gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages
> fs/dax: Properly refcount fs dax pages
> device/dax: Properly refcount device dax pages when mapping
>
> Dan Williams (1):
> dcssblk: Mark DAX broken, remove FS_DAX_LIMITED support
>
> Documentation/filesystems/dax.rst | 1 +-
> drivers/dax/device.c | 15 +-
> drivers/gpu/drm/nouveau/nouveau_dmem.c | 3 +-
> drivers/nvdimm/pmem.c | 4 +-
> drivers/pci/p2pdma.c | 19 +-
> drivers/s390/block/Kconfig | 12 +-
> drivers/s390/block/dcssblk.c | 27 +-
> fs/dax.c | 365 +++++++++++++++++++-------
> fs/ext4/inode.c | 18 +-
> fs/fuse/dax.c | 30 +--
> fs/fuse/dir.c | 2 +-
> fs/fuse/file.c | 4 +-
> fs/fuse/virtio_fs.c | 3 +-
> fs/xfs/xfs_inode.c | 31 +--
> fs/xfs/xfs_inode.h | 2 +-
> fs/xfs/xfs_super.c | 12 +-
> include/linux/dax.h | 28 ++-
> include/linux/huge_mm.h | 4 +-
> include/linux/memremap.h | 17 +-
> include/linux/migrate.h | 4 +-
> include/linux/mm.h | 36 +---
> include/linux/mm_types.h | 16 +-
> include/linux/mmzone.h | 12 +-
> include/linux/page-flags.h | 6 +-
> include/linux/rmap.h | 15 +-
> lib/test_hmm.c | 3 +-
> mm/gup.c | 14 +-
> mm/hmm.c | 2 +-
> mm/huge_memory.c | 170 ++++++++++--
> mm/internal.h | 2 +-
> mm/memory-failure.c | 6 +-
> mm/memory.c | 69 ++++-
> mm/memremap.c | 60 ++--
> mm/migrate_device.c | 18 +-
> mm/mlock.c | 2 +-
> mm/mm_init.c | 23 +-
> mm/rmap.c | 67 ++++-
> mm/swap.c | 2 +-
> mm/truncate.c | 16 +-
> 39 files changed, 810 insertions(+), 330 deletions(-)
>
> base-commit: b2a64caeafad6e37df1c68f878bfdd06ff14f4ec
> --
> git-series 0.9.1
next prev parent reply other threads:[~2025-02-28 3:42 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-28 3:30 Alistair Popple
2025-02-28 3:30 ` [PATCH v9 01/20] fuse: Fix dax truncate/punch_hole fault path Alistair Popple
2025-02-28 3:30 ` [PATCH v9 02/20] fs/dax: Return unmapped busy pages from dax_layout_busy_page_range() Alistair Popple
2025-02-28 3:30 ` [PATCH v9 03/20] fs/dax: Don't skip locked entries when scanning entries Alistair Popple
2025-02-28 3:30 ` [PATCH v9 04/20] fs/dax: Refactor wait for dax idle page Alistair Popple
2025-02-28 3:31 ` [PATCH v9 05/20] fs/dax: Create a common implementation to break DAX layouts Alistair Popple
2025-02-28 3:31 ` [PATCH v9 06/20] fs/dax: Always remove DAX page-cache entries when breaking layouts Alistair Popple
2025-02-28 3:31 ` [PATCH v9 07/20] fs/dax: Ensure all pages are idle prior to filesystem unmount Alistair Popple
2025-02-28 3:31 ` [PATCH v9 08/20] fs/dax: Remove PAGE_MAPPING_DAX_SHARED mapping flag Alistair Popple
2025-02-28 3:31 ` [PATCH v9 09/20] mm/gup: Remove redundant check for PCI P2PDMA page Alistair Popple
2025-02-28 3:31 ` [PATCH v9 10/20] mm/mm_init: Move p2pdma page refcount initialisation to p2pdma Alistair Popple
2025-02-28 3:31 ` [PATCH v9 11/20] mm: Allow compound zone device pages Alistair Popple
2025-02-28 3:31 ` [PATCH v9 12/20] mm/memory: Enhance insert_page_into_pte_locked() to create writable mappings Alistair Popple
2025-02-28 3:31 ` [PATCH v9 13/20] mm/memory: Add vmf_insert_page_mkwrite() Alistair Popple
2025-02-28 3:31 ` [PATCH v9 14/20] mm/rmap: Add support for PUD sized mappings to rmap Alistair Popple
2025-02-28 3:31 ` [PATCH v9 15/20] mm/huge_memory: Add vmf_insert_folio_pud() Alistair Popple
2025-02-28 3:31 ` [PATCH v9 16/20] mm/huge_memory: Add vmf_insert_folio_pmd() Alistair Popple
2025-02-28 3:31 ` [PATCH v9 17/20] mm/gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages Alistair Popple
2025-02-28 3:31 ` [PATCH v9 18/20] dcssblk: Mark DAX broken, remove FS_DAX_LIMITED support Alistair Popple
2025-02-28 3:31 ` [PATCH v9 19/20] fs/dax: Properly refcount fs dax pages Alistair Popple
2025-03-03 8:58 ` David Hildenbrand
2025-03-26 21:04 ` Dan Williams
2025-02-28 3:31 ` [PATCH v9 20/20] device/dax: Properly refcount device dax pages when mapping Alistair Popple
2025-02-28 3:42 ` Alistair Popple [this message]
2025-03-04 4:46 ` [PATCH v9 00/20] fs/dax: Fix ZONE_DEVICE page reference counts Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xhbru4aekyfl25552le5tvifwonyuwoyioxrqxy6zkm2xlyhc5@oqxnudb4bope \
--to=apopple@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=catalin.marinas@arm.com \
--cc=chenhuacai@kernel.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=david@fromorbit.com \
--cc=david@redhat.com \
--cc=djwong@kernel.org \
--cc=gerald.schaefer@linux.ibm.com \
--cc=hch@lst.de \
--cc=ira.weiny@intel.com \
--cc=jack@suse.cz \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=kernel@xen0n.name \
--cc=lina@asahilina.net \
--cc=linmiaohe@huawei.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=logang@deltatee.com \
--cc=loongarch@lists.linux.dev \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=nvdimm@lists.linux.dev \
--cc=peterx@redhat.com \
--cc=tytso@mit.edu \
--cc=vishal.l.verma@intel.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=zhang.lyra@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox