linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org,  akpm@linux-foundation.org,
	 "linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>, Christoph Hellwig <hch@lst.de>,
	Jason Gunthorpe <jgg@nvidia.com>,
	 gerald.schaefer@linux.ibm.com, dan.j.williams@intel.com,
	jgg@ziepe.ca, willy@infradead.org,  linux-kernel@vger.kernel.org,
	nvdimm@lists.linux.dev, jhubbard@nvidia.com,
	 zhang.lyra@gmail.com, debug@rivosinc.com, bjorn@kernel.org,
	balbirs@nvidia.com,  lorenzo.stoakes@oracle.com, John@groves.net
Subject: Re: [PATCH] mm: Remove PFN_MAP, PFN_SPECIAL, PFN_SG_CHAIN and PFN_SG_LAST
Date: Wed, 11 Jun 2025 18:58:29 +1000	[thread overview]
Message-ID: <tnaqespmxakrudv6qg5d73fbts6kfvixourtab7wsfigcfx4cc@ep6elmkephtd> (raw)
In-Reply-To: <4e53d612-534c-46b5-9746-a4a9814d41c3@samsung.com>

On Wed, Jun 11, 2025 at 10:42:16AM +0200, Marek Szyprowski wrote:
> On 11.06.2025 10:23, David Hildenbrand wrote:
> > On 11.06.25 10:03, Marek Szyprowski wrote:
> >> On 11.06.2025 04:38, Alistair Popple wrote:
> >>> On Tue, Jun 10, 2025 at 06:18:09PM +0200, Marek Szyprowski wrote:
> >>>> On 04.06.2025 05:21, Alistair Popple wrote:
> >>>>> The PFN_MAP flag is no longer used for anything, so remove it.
> >>>>> The PFN_SG_CHAIN and PFN_SG_LAST flags never appear to have been
> >>>>> used so also remove them. The last user of PFN_SPECIAL was removed
> >>>>> by 653d7825c149 ("dcssblk: mark DAX broken, remove FS_DAX_LIMITED
> >>>>> support").
> >>>>>
> >>>>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> >>>>> Acked-by: David Hildenbrand <david@redhat.com>
> >>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
> >>>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> >>>>> Cc: gerald.schaefer@linux.ibm.com
> >>>>> Cc: dan.j.williams@intel.com
> >>>>> Cc: jgg@ziepe.ca
> >>>>> Cc: willy@infradead.org
> >>>>> Cc: david@redhat.com
> >>>>> Cc: linux-kernel@vger.kernel.org
> >>>>> Cc: nvdimm@lists.linux.dev
> >>>>> Cc: jhubbard@nvidia.com
> >>>>> Cc: hch@lst.de
> >>>>> Cc: zhang.lyra@gmail.com
> >>>>> Cc: debug@rivosinc.com
> >>>>> Cc: bjorn@kernel.org
> >>>>> Cc: balbirs@nvidia.com
> >>>>> Cc: lorenzo.stoakes@oracle.com
> >>>>> Cc: John@Groves.net
> >>>>>
> >>>>> ---
> >>>>>
> >>>>> Splitting this off from the rest of my series[1] as a separate 
> >>>>> clean-up
> >>>>> for consideration for the v6.16 merge window as suggested by 
> >>>>> Christoph.
> >>>>>
> >>>>> [1] - 
> >>>>> https://lore.kernel.org/linux-mm/cover.541c2702181b7461b84f1a6967a3f0e823023fcc.1748500293.git-series.apopple@nvidia.com/
> >>>>> ---
> >>>>>     include/linux/pfn_t.h             | 31 
> >>>>> +++----------------------------
> >>>>>     mm/memory.c                       |  2 --
> >>>>>     tools/testing/nvdimm/test/iomap.c |  4 ----
> >>>>>     3 files changed, 3 insertions(+), 34 deletions(-)
> >>>> This patch landed in today's linux-next as commit 28be5676b4a3 ("mm:
> >>>> remove PFN_MAP, PFN_SPECIAL, PFN_SG_CHAIN and PFN_SG_LAST"). In my 
> >>>> tests
> >>>> I've noticed that it breaks operation of all RISC-V 64bit boards on my
> >>>> test farm (VisionFive2, BananaPiF3 as well as QEMU's Virt machine). 
> >>>> I've
> >>>> isolated the changes responsible for this issue, see the inline 
> >>>> comments
> >>>> in the patch below. Here is an example of the issues observed in the
> >>>> logs from those machines:
> >>> Thanks for the report. I'm really confused by this because this 
> >>> change should
> >>> just be removal of dead code - nothing sets any of the removed PFN_* 
> >>> flags
> >>> AFAICT.
> >>>
> >>> I don't have access to any RISC-V hardwdare but you say this 
> >>> reproduces under
> >>> qemu - what do you run on the system to cause the error? Is it just 
> >>> a simple
> >>> boot and load a module or are you running selftests or something else?
> >>
> >> It fails a simple boot test. Here is a detailed instruction how to
> >> reproduce this issue with the random Debian rootfs image found on the
> >> internet (tested on Ubuntu 22.04, with next-20250610
> >> kernel source):
> >
> > riscv is one of the archs where pte_mkdevmap() will *not* set the pte 
> > as special. (I
> > raised this recently in the original series, it's all a big mess)
> >
> > So, before this change here, pfn_t_devmap() would have returned 
> > "false" if only
> > PFN_DEV was set, now it would return "true" if only PFN_DEV is set.

Ugh, what a mess. Thanks for pointing that out (I had seen your earlier response
to the original series but hadn't found the time to look into it more deeply).

> > Consequently, in insert_pfn() we would have done a pte_mkspecial(), 
> > now we do a
> > pte_mkdevmap() -- again, which does not imply "special" on riscv.
> >
> > riscv selects CONFIG_ARCH_HAS_PTE_SPECIAL, so if !pte_special(), it's 
> > considered as
> > normal.
> >
> > Would the following fix your issue?
> >
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 8eba595056fe3..0e972c3493692 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -589,6 +589,10 @@ struct page *vm_normal_page(struct vm_area_struct 
> > *vma, unsigned long addr,
> >  {
> >         unsigned long pfn = pte_pfn(pte);
> >
> > +       /* TODO: remove this crap and set pte_special() instead. */
> > +       if (pte_devmap(pte))
> > +               return NULL;
> > +
> >         if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) {
> >                 if (likely(!pte_special(pte)))
> >                         goto check_pfn;
> > @@ -598,16 +602,6 @@ struct page *vm_normal_page(struct vm_area_struct 
> > *vma, unsigned long addr,
> >                         return NULL;
> >                 if (is_zero_pfn(pfn))
> >                         return NULL;
> > -               if (pte_devmap(pte))
> > -               /*
> > -                * NOTE: New users of ZONE_DEVICE will not set 
> > pte_devmap()
> > -                * and will have refcounts incremented on their struct 
> > pages
> > -                * when they are inserted into PTEs, thus they are 
> > safe to
> > -                * return here. Legacy ZONE_DEVICE pages that set 
> > pte_devmap()
> > -                * do not have refcounts. Example of legacy 
> > ZONE_DEVICE is
> > -                * MEMORY_DEVICE_FS_DAX type in pmem or virtio_fs 
> > drivers.
> > -                */
> > -                       return NULL;
> >
> >                 print_bad_pte(vma, addr, pte, NULL);
> >                 return NULL;
> >
> >
> > But, I would have thought the later patches in Alistairs series would 
> > sort that out
> > (where we remove pte_devmap() ... )
> >

Yes, I think Marek confirmed that it did in his earlier reply.

> The above change fixes the issues observed on RISCV boards.

Thanks for testing. Andrew has already removed this from the -mm tree so I'll
reincorporate this back into the series and see if I can figure something out
when I respin it.

- Alistair

> Best regards
> -- 
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
> 


      reply	other threads:[~2025-06-11  8:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-04  3:21 Alistair Popple
     [not found] ` <CGME20250610161811eucas1p18de4ba7b320b6d6ff7da44786b350b6e@eucas1p1.samsung.com>
2025-06-10 16:18   ` Marek Szyprowski
2025-06-11  2:38     ` Alistair Popple
2025-06-11  8:03       ` Marek Szyprowski
2025-06-11  8:23         ` David Hildenbrand
2025-06-11  8:42           ` Marek Szyprowski
2025-06-11  8:58             ` Alistair Popple [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tnaqespmxakrudv6qg5d73fbts6kfvixourtab7wsfigcfx4cc@ep6elmkephtd \
    --to=apopple@nvidia.com \
    --cc=John@groves.net \
    --cc=akpm@linux-foundation.org \
    --cc=balbirs@nvidia.com \
    --cc=bjorn@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=debug@rivosinc.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=jgg@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=m.szyprowski@samsung.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=willy@infradead.org \
    --cc=zhang.lyra@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox