linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: akpm@linux-foundation.org, dan.j.williams@intel.com,  linux-mm@kvack.org
Cc: Alison Schofield <alison.schofield@intel.com>,
	lina@asahilina.net,  zhang.lyra@gmail.com,
	gerald.schaefer@linux.ibm.com, vishal.l.verma@intel.com,
	 dave.jiang@intel.com, logang@deltatee.com, bhelgaas@google.com,
	jack@suse.cz,  jgg@ziepe.ca, catalin.marinas@arm.com,
	will@kernel.org, mpe@ellerman.id.au,  npiggin@gmail.com,
	dave.hansen@linux.intel.com, ira.weiny@intel.com,
	 willy@infradead.org, djwong@kernel.org, tytso@mit.edu,
	linmiaohe@huawei.com,  david@redhat.com, peterx@redhat.com,
	linux-doc@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	 linuxppc-dev@lists.ozlabs.org, nvdimm@lists.linux.dev,
	linux-cxl@vger.kernel.org,  linux-fsdevel@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
	 jhubbard@nvidia.com, hch@lst.de, david@fromorbit.com,
	chenhuacai@kernel.org,  kernel@xen0n.name,
	loongarch@lists.linux.dev
Subject: Re: [PATCH v9 00/20] fs/dax: Fix ZONE_DEVICE page reference counts
Date: Fri, 28 Feb 2025 14:42:40 +1100	[thread overview]
Message-ID: <xhbru4aekyfl25552le5tvifwonyuwoyioxrqxy6zkm2xlyhc5@oqxnudb4bope> (raw)
In-Reply-To: <cover.8068ad144a7eea4a813670301f4d2a86a8e68ec4.1740713401.git-series.apopple@nvidia.com>

Andrew,

This is essentially the same as what's currently in mm-unstable aside from
the two updates listed below. The main thing to note is it incorporates
Balbir's fixup which is currently in mm-unstable as c98612955016
("mm-allow-compound-zone-device-pages-fix-fix")

 - Alistair

On Fri, Feb 28, 2025 at 02:30:55PM +1100, Alistair Popple wrote:
> Main updates since v8:
> 
>  - Fixed reading of bad pgmap in migrate_vma_collect_pmd() as reported/fixed
>    by Balbir.
> 
>  - Fixed bad warnings generated in free_zone_device_folio() when pgmap->ops
>    isn't defined, even if it's not required to be. As reported by Gerald.
> 
> Main updates since v7:
> 
>  - Rebased on current akpm/mm-unstable in order to fix conflicts with
>    https://lore.kernel.org/linux-mm/20241216155408.8102-1-willy@infradead.org/
>    as requested by Andrew.
> 
>  - Collected Ack'ed/Reviewed by
> 
>  - Cleaned up a unnecessary and confusing assignment to pgtable.
> 
>  - Other minor reworks suggested by David Hildenbrand
> 
> Main updates since v6:
> 
>  - Clean ups and fixes based on feedback from David and Dan.
> 
>  - Rebased from next-20241216 to v6.14-rc1. No conflicts.
> 
>  - Dropped the PTE bit removals and clean-ups - will post this as a
>    separate series to be merged after this one as Dan wanted it split
>    up more and this series is already too big.
> 
> Main updates since v5:
> 
>  - Reworked patch 1 based on Dan's feedback.
> 
>  - Fixed build issues on PPC and when CONFIG_PGTABLE_HAS_HUGE_LEAVES
>    is no defined.
> 
>  - Minor comment formatting and documentation fixes.
> 
>  - Remove PTE_DEVMAP definitions from Loongarch which were added since
>    this series was initially written.
> 
> Main updates since v4:
> 
>  - Removed most of the devdax/fsdax checks in fs/proc/task_mmu.c. This
>    means smaps/pagemap may contain DAX pages.
> 
>  - Fixed rmap accounting of PUD mapped pages.
> 
>  - Minor code clean-ups.
> 
> Main updates since v3:
> 
>  - Rebased onto next-20241216. The rebase wasn't too difficult, but in
>    the interests of getting this out sooner for Andrew to look at as
>    requested by him I have yet to extensively build/run test this
>    version of the series.
> 
>  - Fixed a bunch of build breakages reported by John Hubbard and the
>    kernel test robot due to various combinations of CONFIG options.
> 
>  - Split the rmap changes into a separate patch as suggested by David H.
> 
>  - Reworded the description for the P2PDMA change.
> 
> Main updates since v2:
> 
>  - Rename the DAX specific dax_insert_XXX functions to vmf_insert_XXX
>    and have them pass the vmf struct.
> 
>  - Separate out the device DAX changes.
> 
>  - Restore the page share mapping counting and associated warnings.
> 
>  - Rework truncate to require file-systems to have previously called
>    dax_break_layout() to remove the address space mapping for a
>    page. This found several bugs which are fixed by the first half of
>    the series. The motivation for this was initially to allow the FS
>    DAX page-cache mappings to hold a reference on the page.
> 
>    However that turned out to be a dead-end (see the comments on patch
>    21), but it found several bugs and I think overall it is an
>    improvement so I have left it here.
> 
> Device and FS DAX pages have always maintained their own page
> reference counts without following the normal rules for page reference
> counting. In particular pages are considered free when the refcount
> hits one rather than zero and refcounts are not added when mapping the
> page.
> 
> Tracking this requires special PTE bits (PTE_DEVMAP) and a secondary
> mechanism for allowing GUP to hold references on the page (see
> get_dev_pagemap). However there doesn't seem to be any reason why FS
> DAX pages need their own reference counting scheme.
> 
> By treating the refcounts on these pages the same way as normal pages
> we can remove a lot of special checks. In particular pXd_trans_huge()
> becomes the same as pXd_leaf(), although I haven't made that change
> here. It also frees up a valuable SW define PTE bit on architectures
> that have devmap PTE bits defined.
> 
> It also almost certainly allows further clean-up of the devmap managed
> functions, but I have left that as a future improvment. It also
> enables support for compound ZONE_DEVICE pages which is one of my
> primary motivators for doing this work.
> 
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> Tested-by: Alison Schofield <alison.schofield@intel.com>
> 
> ---
> 
> Cc: lina@asahilina.net
> Cc: zhang.lyra@gmail.com
> Cc: gerald.schaefer@linux.ibm.com
> Cc: dan.j.williams@intel.com
> Cc: vishal.l.verma@intel.com
> Cc: dave.jiang@intel.com
> Cc: logang@deltatee.com
> Cc: bhelgaas@google.com
> Cc: jack@suse.cz
> Cc: jgg@ziepe.ca
> Cc: catalin.marinas@arm.com
> Cc: will@kernel.org
> Cc: mpe@ellerman.id.au
> Cc: npiggin@gmail.com
> Cc: dave.hansen@linux.intel.com
> Cc: ira.weiny@intel.com
> Cc: willy@infradead.org
> Cc: djwong@kernel.org
> Cc: tytso@mit.edu
> Cc: linmiaohe@huawei.com
> Cc: david@redhat.com
> Cc: peterx@redhat.com
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: nvdimm@lists.linux.dev
> Cc: linux-cxl@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: linux-ext4@vger.kernel.org
> Cc: linux-xfs@vger.kernel.org
> Cc: jhubbard@nvidia.com
> Cc: hch@lst.de
> Cc: david@fromorbit.com
> Cc: chenhuacai@kernel.org
> Cc: kernel@xen0n.name
> Cc: loongarch@lists.linux.dev
> 
> Alistair Popple (19):
>   fuse: Fix dax truncate/punch_hole fault path
>   fs/dax: Return unmapped busy pages from dax_layout_busy_page_range()
>   fs/dax: Don't skip locked entries when scanning entries
>   fs/dax: Refactor wait for dax idle page
>   fs/dax: Create a common implementation to break DAX layouts
>   fs/dax: Always remove DAX page-cache entries when breaking layouts
>   fs/dax: Ensure all pages are idle prior to filesystem unmount
>   fs/dax: Remove PAGE_MAPPING_DAX_SHARED mapping flag
>   mm/gup: Remove redundant check for PCI P2PDMA page
>   mm/mm_init: Move p2pdma page refcount initialisation to p2pdma
>   mm: Allow compound zone device pages
>   mm/memory: Enhance insert_page_into_pte_locked() to create writable mappings
>   mm/memory: Add vmf_insert_page_mkwrite()
>   mm/rmap: Add support for PUD sized mappings to rmap
>   mm/huge_memory: Add vmf_insert_folio_pud()
>   mm/huge_memory: Add vmf_insert_folio_pmd()
>   mm/gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages
>   fs/dax: Properly refcount fs dax pages
>   device/dax: Properly refcount device dax pages when mapping
> 
> Dan Williams (1):
>   dcssblk: Mark DAX broken, remove FS_DAX_LIMITED support
> 
>  Documentation/filesystems/dax.rst      |   1 +-
>  drivers/dax/device.c                   |  15 +-
>  drivers/gpu/drm/nouveau/nouveau_dmem.c |   3 +-
>  drivers/nvdimm/pmem.c                  |   4 +-
>  drivers/pci/p2pdma.c                   |  19 +-
>  drivers/s390/block/Kconfig             |  12 +-
>  drivers/s390/block/dcssblk.c           |  27 +-
>  fs/dax.c                               | 365 +++++++++++++++++++-------
>  fs/ext4/inode.c                        |  18 +-
>  fs/fuse/dax.c                          |  30 +--
>  fs/fuse/dir.c                          |   2 +-
>  fs/fuse/file.c                         |   4 +-
>  fs/fuse/virtio_fs.c                    |   3 +-
>  fs/xfs/xfs_inode.c                     |  31 +--
>  fs/xfs/xfs_inode.h                     |   2 +-
>  fs/xfs/xfs_super.c                     |  12 +-
>  include/linux/dax.h                    |  28 ++-
>  include/linux/huge_mm.h                |   4 +-
>  include/linux/memremap.h               |  17 +-
>  include/linux/migrate.h                |   4 +-
>  include/linux/mm.h                     |  36 +---
>  include/linux/mm_types.h               |  16 +-
>  include/linux/mmzone.h                 |  12 +-
>  include/linux/page-flags.h             |   6 +-
>  include/linux/rmap.h                   |  15 +-
>  lib/test_hmm.c                         |   3 +-
>  mm/gup.c                               |  14 +-
>  mm/hmm.c                               |   2 +-
>  mm/huge_memory.c                       | 170 ++++++++++--
>  mm/internal.h                          |   2 +-
>  mm/memory-failure.c                    |   6 +-
>  mm/memory.c                            |  69 ++++-
>  mm/memremap.c                          |  60 ++--
>  mm/migrate_device.c                    |  18 +-
>  mm/mlock.c                             |   2 +-
>  mm/mm_init.c                           |  23 +-
>  mm/rmap.c                              |  67 ++++-
>  mm/swap.c                              |   2 +-
>  mm/truncate.c                          |  16 +-
>  39 files changed, 810 insertions(+), 330 deletions(-)
> 
> base-commit: b2a64caeafad6e37df1c68f878bfdd06ff14f4ec
> -- 
> git-series 0.9.1


  parent reply	other threads:[~2025-02-28  3:42 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-28  3:30 Alistair Popple
2025-02-28  3:30 ` [PATCH v9 01/20] fuse: Fix dax truncate/punch_hole fault path Alistair Popple
2025-02-28  3:30 ` [PATCH v9 02/20] fs/dax: Return unmapped busy pages from dax_layout_busy_page_range() Alistair Popple
2025-02-28  3:30 ` [PATCH v9 03/20] fs/dax: Don't skip locked entries when scanning entries Alistair Popple
2025-02-28  3:30 ` [PATCH v9 04/20] fs/dax: Refactor wait for dax idle page Alistair Popple
2025-02-28  3:31 ` [PATCH v9 05/20] fs/dax: Create a common implementation to break DAX layouts Alistair Popple
2025-02-28  3:31 ` [PATCH v9 06/20] fs/dax: Always remove DAX page-cache entries when breaking layouts Alistair Popple
2025-02-28  3:31 ` [PATCH v9 07/20] fs/dax: Ensure all pages are idle prior to filesystem unmount Alistair Popple
2025-02-28  3:31 ` [PATCH v9 08/20] fs/dax: Remove PAGE_MAPPING_DAX_SHARED mapping flag Alistair Popple
2025-02-28  3:31 ` [PATCH v9 09/20] mm/gup: Remove redundant check for PCI P2PDMA page Alistair Popple
2025-02-28  3:31 ` [PATCH v9 10/20] mm/mm_init: Move p2pdma page refcount initialisation to p2pdma Alistair Popple
2025-02-28  3:31 ` [PATCH v9 11/20] mm: Allow compound zone device pages Alistair Popple
2025-02-28  3:31 ` [PATCH v9 12/20] mm/memory: Enhance insert_page_into_pte_locked() to create writable mappings Alistair Popple
2025-02-28  3:31 ` [PATCH v9 13/20] mm/memory: Add vmf_insert_page_mkwrite() Alistair Popple
2025-02-28  3:31 ` [PATCH v9 14/20] mm/rmap: Add support for PUD sized mappings to rmap Alistair Popple
2025-02-28  3:31 ` [PATCH v9 15/20] mm/huge_memory: Add vmf_insert_folio_pud() Alistair Popple
2025-02-28  3:31 ` [PATCH v9 16/20] mm/huge_memory: Add vmf_insert_folio_pmd() Alistair Popple
2025-02-28  3:31 ` [PATCH v9 17/20] mm/gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages Alistair Popple
2025-02-28  3:31 ` [PATCH v9 18/20] dcssblk: Mark DAX broken, remove FS_DAX_LIMITED support Alistair Popple
2025-02-28  3:31 ` [PATCH v9 19/20] fs/dax: Properly refcount fs dax pages Alistair Popple
2025-03-03  8:58   ` David Hildenbrand
2025-03-26 21:04     ` Dan Williams
2025-02-28  3:31 ` [PATCH v9 20/20] device/dax: Properly refcount device dax pages when mapping Alistair Popple
2025-02-28  3:42 ` Alistair Popple [this message]
2025-03-04  4:46   ` [PATCH v9 00/20] fs/dax: Fix ZONE_DEVICE page reference counts Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xhbru4aekyfl25552le5tvifwonyuwoyioxrqxy6zkm2xlyhc5@oqxnudb4bope \
    --to=apopple@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=chenhuacai@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@fromorbit.com \
    --cc=david@redhat.com \
    --cc=djwong@kernel.org \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=kernel@xen0n.name \
    --cc=lina@asahilina.net \
    --cc=linmiaohe@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=logang@deltatee.com \
    --cc=loongarch@lists.linux.dev \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=peterx@redhat.com \
    --cc=tytso@mit.edu \
    --cc=vishal.l.verma@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=zhang.lyra@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox