linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>,
	akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
	surenb@google.com, mhocko@suse.com, linux-mm@kvack.org,
	wangkefeng.wang@huawei.com
Subject: Re: [PATCH] mm: fix COW mapping handing in generic_access_phys
Date: Wed, 28 May 2025 11:25:04 -0400	[thread overview]
Message-ID: <aDcq0F8ibKNIyQvU@x1.local> (raw)
In-Reply-To: <5b9f5952-9979-426f-857a-dffa9b7963af@redhat.com>

On Wed, May 28, 2025 at 05:02:15PM +0200, David Hildenbrand wrote:
> On 28.05.25 16:54, Peter Xu wrote:
> > [Add Jason]
> > 
> > On Wed, May 28, 2025 at 11:59:56AM +0200, David Hildenbrand wrote:
> > > On 28.05.25 10:59, David Hildenbrand wrote:
> > > > On 28.05.25 03:56, Jinjiang Tu wrote:
> > > > > Syzkaller reports a below BUG:
> > > > >     ioremap on RAM at 0x0000000022727000 - 0x0000000022727fff
> > > > >     WARNING: CPU: 3 PID: 3609 at arch/x86/mm/ioremap.c:216 __ioremap_caller+0x644/0x7f0 arch/x86/mm/ioremap.c:216
> > > > >     Modules linked in:
> > > > >     CPU: 3 PID: 3609 Comm: syz.2.577 Not tainted 6.6.0+ #63
> > > > >     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> > > > >     RIP: 0010:__ioremap_caller+0x644/0x7f0 arch/x86/mm/ioremap.c:216
> > > > >     Call Trace:
> > > > >      <TASK>
> > > > >      generic_access_phys+0x241/0x480 mm/memory.c:6458
> > > > >      __access_remote_vm+0x6af/0x970 mm/memory.c:6535
> > > > >      access_process_vm+0x53/0x80 mm/memory.c:6600
> > > > >      get_cmdline+0x192/0x380 mm/util.c:1041
> > > > >      audit_log_proctitle kernel/auditsc.c:1620 [inline]
> > > > >      audit_log_exit+0x1424/0x18c0 kernel/auditsc.c:1811
> > > > >      __audit_syscall_exit+0x252/0x2f0 kernel/auditsc.c:2079
> > > > >      audit_syscall_exit include/linux/audit.h:356 [inline]
> > > > >      syscall_exit_work+0x10f/0x130 kernel/entry/common.c:166
> > > > >      __syscall_exit_to_user_mode_work kernel/entry/common.c:205 [inline]
> > > > >      syscall_exit_to_user_mode+0x10/0x1e0 kernel/entry/common.c:218
> > > > >      do_syscall_64+0x66/0x110 arch/x86/entry/common.c:87
> > > > >      entry_SYSCALL_64_after_hwframe+0x78/0xe2
> > > > > 
> > > > > The /dev/mem is mapped with COW mapping, and mremap at the mm->args_start.
> > > > > The special pfn mapping is replaced by anon folios due to COW.
> > > > > generic_access_phys() is supposed to handle iomem, instead of RAM pfn,
> > > > > thus trigger a WARN_ON.
> > > > > 
> > > > > Similar to commit 04c35ab3bdae ("x86/mm/pat: fix VM_PAT handling in
> > > > > COW mappings"). check if the pte is special to reject Cowed anon folios.
> > > > > 
> > > > > Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
> > > > > ---
> > > > >     mm/memory.c | 7 +++++++
> > > > >     1 file changed, 7 insertions(+)
> > > > > 
> > > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > > index 49199410805c..e1dac84536ee 100644
> > > > > --- a/mm/memory.c
> > > > > +++ b/mm/memory.c
> > > > > @@ -6840,6 +6840,13 @@ int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
> > > > >     retry:
> > > > >     	if (follow_pfnmap_start(&args))
> > > > >     		return -EINVAL;
> > > > > +
> > > > > +	/* Never return PFNs of anon folios in COW mappings. */
> > > > > +	if (!args.special) {
> > > > > +		follow_pfnmap_end(&args);
> > > > > +		return -EINVAL;
> > > > > +	}
> > > > > +
> > > > >     	prot = args.pgprot;
> > > > >     	phys_addr = (resource_size_t)args.pfn << PAGE_SHIFT;
> > > > >     	writable = args.writable;
> > > > 
> > > > I assume we trigger this through vma->vm_ops->access, when the vm_ops have generic_access_phys set.
> > > > 
> > > > I still dislike exposing the "special" bit here, as it is absolutely not what we should care about in the caller.
> > > > 
> > > > In case our arch does not support pte_special, you fix will not catch that case ...
> > > > 
> > > > The following might be better:
> > > > 
> > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > index 37d8738f5e12e..810adb8d1a53b 100644
> > > > --- a/mm/memory.c
> > > > +++ b/mm/memory.c
> > > > @@ -6681,6 +6681,14 @@ int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
> > > >            prot = args.pgprot;
> > > >            phys_addr = (resource_size_t)args.pfn << PAGE_SHIFT;
> > > >            writable = args.writable;
> > > > +
> > > > +       /* Refuse (refcounted) anonymous pages in CoW mappings. */
> > > > +       if (is_cow_mapping(vma->vm_flags) &&
> > > > +           vm_normal_page(vma, addr, ptep_get(args.ptep))) {
> > > > +               follow_pfnmap_end(&args);
> > > > +               return -EINVAL;
> > > > +       }
> > > > +
> > > 
> > > Thinking again, we might have a PMD/PUD mapping, so maybe
> > > follow_pfnmap_start() should really just refuse any refcounted pages.

[1]

> > 
> > We may want to be careful on this.
> > 
> > I feel like we can still potentially break drivers that
> > follow_pfnmap_start() used to work on debateable things like RAM page
> > injections, unless breaking them is the intention.
> 
> Yes, that all needs a cleanup likely; it's all very confusing and
> inconsistent.
> 
> > 
> > OTOH, I also see at least two in-tree drivers set VM_IO|VM_MIXEDMAP:
> > 
> > *** drivers/gpu/drm/gma500/fbdev.c:
> > psb_fbdev_fb_mmap[110]         vm_flags_set(vma, VM_IO | VM_MIXEDMAP | VM_DONTEXPAND | VM_DONTDUMP);
> > 
> > *** drivers/gpu/drm/omapdrm/omap_gem.c:
> > omap_gem_object_mmap[538]      vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP | VM_IO | VM_MIXEDMAP);
> > 
> > AFAIU, these MIXEDMAP users will still rely on follow_pfnmap_start() to
> > work on e.g. RAM pages, because GUP will simply fail them..
> 
> Right.
> 
> VM_IO essentially tells us "don't touch this memory, it might have side
> effects", such as MMIO, that's why GUP outright refuses VM_IO VMAs.
> 
> I am not sure why generic_access_phys() should be allowed to ... touch that
> memory instead?

I'm looking at:

commit 28b2ee20c7cba812b6f2ccf6d722cf86d00a84dc
Author: Rik van Riel <riel@redhat.com>
Date:   Wed Jul 23 21:27:05 2008 -0700

    access_process_vm device memory infrastructure

VM_IO is also intentionally mentioned in the doc too:

Documentation/filesystems/locking.rst

        ->access() is called when get_user_pages() fails in
        access_process_vm(), typically used to debug a process through
        /proc/pid/mem or ptrace.  This function is needed only for
        VM_IO | VM_PFNMAP VMAs.

So it definitely looks like intentional, though I know nothing about PPC
Cell SPUs..

> 
> Confusing.
> 
> > 
> > Sligtly off-topic: it's also a bit confusing to me on whether the driver
> > should set VM_IO for VM_MIXEDMAP.  I think it should because VM_IO says:
> 
> Depends on the driver. Some VM_MIXEDMAP users only map ordinary kernel
> allocations, where VM_IO is not required. For these drivers, I really don't
> know.
> 
> -- 
> Cheers,
> 
> David / dhildenb
> 

-- 
Peter Xu



  reply	other threads:[~2025-05-28 15:25 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-28  1:56 Jinjiang Tu
2025-05-28  8:59 ` David Hildenbrand
2025-05-28  9:59   ` David Hildenbrand
2025-05-28 12:14     ` Jinjiang Tu
2025-05-28 14:54     ` Peter Xu
2025-05-28 15:02       ` David Hildenbrand
2025-05-28 15:25         ` Peter Xu [this message]
2025-05-28 15:29           ` David Hildenbrand
2025-05-28 16:06             ` Peter Xu
2025-05-28 16:29               ` Jason Gunthorpe
2025-05-28 17:14                 ` Peter Xu
2025-05-28 17:34                   ` Jason Gunthorpe
2025-05-28 17:37                   ` David Hildenbrand
2025-05-28 17:32                 ` David Hildenbrand
2025-05-28 17:47                   ` Jason Gunthorpe
2025-05-28 17:59                     ` Jason Gunthorpe
2025-05-28 18:03                       ` David Hildenbrand
2025-05-28 18:00                     ` David Hildenbrand
2025-05-28 18:15                       ` Jason Gunthorpe
2025-05-28 18:22                         ` David Hildenbrand
2025-05-28 18:29                           ` Jason Gunthorpe
2025-05-30 10:04                             ` David Hildenbrand
2025-05-28 12:13   ` Jinjiang Tu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aDcq0F8ibKNIyQvU@x1.local \
    --to=peterx@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=tujinjiang@huawei.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox