From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>,
akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, linux-mm@kvack.org,
wangkefeng.wang@huawei.com
Subject: Re: [PATCH] mm: fix COW mapping handing in generic_access_phys
Date: Wed, 28 May 2025 10:54:26 -0400 [thread overview]
Message-ID: <aDcjokNWW755JxVQ@x1.local> (raw)
In-Reply-To: <e88f76b2-f7b7-4cb7-940b-60977d489b3a@redhat.com>
[Add Jason]
On Wed, May 28, 2025 at 11:59:56AM +0200, David Hildenbrand wrote:
> On 28.05.25 10:59, David Hildenbrand wrote:
> > On 28.05.25 03:56, Jinjiang Tu wrote:
> > > Syzkaller reports a below BUG:
> > > ioremap on RAM at 0x0000000022727000 - 0x0000000022727fff
> > > WARNING: CPU: 3 PID: 3609 at arch/x86/mm/ioremap.c:216 __ioremap_caller+0x644/0x7f0 arch/x86/mm/ioremap.c:216
> > > Modules linked in:
> > > CPU: 3 PID: 3609 Comm: syz.2.577 Not tainted 6.6.0+ #63
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> > > RIP: 0010:__ioremap_caller+0x644/0x7f0 arch/x86/mm/ioremap.c:216
> > > Call Trace:
> > > <TASK>
> > > generic_access_phys+0x241/0x480 mm/memory.c:6458
> > > __access_remote_vm+0x6af/0x970 mm/memory.c:6535
> > > access_process_vm+0x53/0x80 mm/memory.c:6600
> > > get_cmdline+0x192/0x380 mm/util.c:1041
> > > audit_log_proctitle kernel/auditsc.c:1620 [inline]
> > > audit_log_exit+0x1424/0x18c0 kernel/auditsc.c:1811
> > > __audit_syscall_exit+0x252/0x2f0 kernel/auditsc.c:2079
> > > audit_syscall_exit include/linux/audit.h:356 [inline]
> > > syscall_exit_work+0x10f/0x130 kernel/entry/common.c:166
> > > __syscall_exit_to_user_mode_work kernel/entry/common.c:205 [inline]
> > > syscall_exit_to_user_mode+0x10/0x1e0 kernel/entry/common.c:218
> > > do_syscall_64+0x66/0x110 arch/x86/entry/common.c:87
> > > entry_SYSCALL_64_after_hwframe+0x78/0xe2
> > >
> > > The /dev/mem is mapped with COW mapping, and mremap at the mm->args_start.
> > > The special pfn mapping is replaced by anon folios due to COW.
> > > generic_access_phys() is supposed to handle iomem, instead of RAM pfn,
> > > thus trigger a WARN_ON.
> > >
> > > Similar to commit 04c35ab3bdae ("x86/mm/pat: fix VM_PAT handling in
> > > COW mappings"). check if the pte is special to reject Cowed anon folios.
> > >
> > > Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
> > > ---
> > > mm/memory.c | 7 +++++++
> > > 1 file changed, 7 insertions(+)
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 49199410805c..e1dac84536ee 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -6840,6 +6840,13 @@ int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
> > > retry:
> > > if (follow_pfnmap_start(&args))
> > > return -EINVAL;
> > > +
> > > + /* Never return PFNs of anon folios in COW mappings. */
> > > + if (!args.special) {
> > > + follow_pfnmap_end(&args);
> > > + return -EINVAL;
> > > + }
> > > +
> > > prot = args.pgprot;
> > > phys_addr = (resource_size_t)args.pfn << PAGE_SHIFT;
> > > writable = args.writable;
> >
> > I assume we trigger this through vma->vm_ops->access, when the vm_ops have generic_access_phys set.
> >
> > I still dislike exposing the "special" bit here, as it is absolutely not what we should care about in the caller.
> >
> > In case our arch does not support pte_special, you fix will not catch that case ...
> >
> > The following might be better:
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 37d8738f5e12e..810adb8d1a53b 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -6681,6 +6681,14 @@ int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
> > prot = args.pgprot;
> > phys_addr = (resource_size_t)args.pfn << PAGE_SHIFT;
> > writable = args.writable;
> > +
> > + /* Refuse (refcounted) anonymous pages in CoW mappings. */
> > + if (is_cow_mapping(vma->vm_flags) &&
> > + vm_normal_page(vma, addr, ptep_get(args.ptep))) {
> > + follow_pfnmap_end(&args);
> > + return -EINVAL;
> > + }
> > +
>
> Thinking again, we might have a PMD/PUD mapping, so maybe
> follow_pfnmap_start() should really just refuse any refcounted pages.
We may want to be careful on this.
I feel like we can still potentially break drivers that
follow_pfnmap_start() used to work on debateable things like RAM page
injections, unless breaking them is the intention.
OTOH, I also see at least two in-tree drivers set VM_IO|VM_MIXEDMAP:
*** drivers/gpu/drm/gma500/fbdev.c:
psb_fbdev_fb_mmap[110] vm_flags_set(vma, VM_IO | VM_MIXEDMAP | VM_DONTEXPAND | VM_DONTDUMP);
*** drivers/gpu/drm/omapdrm/omap_gem.c:
omap_gem_object_mmap[538] vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP | VM_IO | VM_MIXEDMAP);
AFAIU, these MIXEDMAP users will still rely on follow_pfnmap_start() to
work on e.g. RAM pages, because GUP will simply fail them..
Sligtly off-topic: it's also a bit confusing to me on whether the driver
should set VM_IO for VM_MIXEDMAP. I think it should because VM_IO says:
#define VM_IO 0x00004000 /* Memory mapped I/O or similar */
IIUC it implies it's IO mapping so e.g. the cache behavior can be
different. But I'm not very sure now seeing both kinds of driver exist that
only sets MIXEDMAP while the above two set MIXEDMAP|IO.
--
Peter Xu
next prev parent reply other threads:[~2025-05-28 14:54 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-28 1:56 Jinjiang Tu
2025-05-28 8:59 ` David Hildenbrand
2025-05-28 9:59 ` David Hildenbrand
2025-05-28 12:14 ` Jinjiang Tu
2025-05-28 14:54 ` Peter Xu [this message]
2025-05-28 15:02 ` David Hildenbrand
2025-05-28 15:25 ` Peter Xu
2025-05-28 15:29 ` David Hildenbrand
2025-05-28 16:06 ` Peter Xu
2025-05-28 16:29 ` Jason Gunthorpe
2025-05-28 17:14 ` Peter Xu
2025-05-28 17:34 ` Jason Gunthorpe
2025-05-28 17:37 ` David Hildenbrand
2025-05-28 17:32 ` David Hildenbrand
2025-05-28 17:47 ` Jason Gunthorpe
2025-05-28 17:59 ` Jason Gunthorpe
2025-05-28 18:03 ` David Hildenbrand
2025-05-28 18:00 ` David Hildenbrand
2025-05-28 18:15 ` Jason Gunthorpe
2025-05-28 18:22 ` David Hildenbrand
2025-05-28 18:29 ` Jason Gunthorpe
2025-05-30 10:04 ` David Hildenbrand
2025-05-28 12:13 ` Jinjiang Tu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aDcjokNWW755JxVQ@x1.local \
--to=peterx@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=tujinjiang@huawei.com \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox