From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C2FEC5AD49 for ; Wed, 28 May 2025 16:06:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BCAB36B0089; Wed, 28 May 2025 12:06:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BA2086B008A; Wed, 28 May 2025 12:06:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB8076B008C; Wed, 28 May 2025 12:06:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 8BD536B0089 for ; Wed, 28 May 2025 12:06:19 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id F3C58BE467 for ; Wed, 28 May 2025 16:06:18 +0000 (UTC) X-FDA: 83492793636.05.FA92FFD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf02.hostedemail.com (Postfix) with ESMTP id 93FD98000E for ; Wed, 28 May 2025 16:06:16 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=bsWoq5ml; spf=pass (imf02.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748448376; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rDQSBS/gGCxekmdaf4Km0jQZvro7Mvttrm/TckFb13Q=; b=F/IH/6VKDhiJDlVX6sOR/ppmlrMz4G9YG2E/wZ8n2S92eGU7oehHsCLTliGyUrrZOwShlq Jf8hZHp5chCuBelmvABFhOKwoiIEcVD6ibd5+XN6Q6FSiWfIWbmcXFXnlNl/2Hn16UDzmg VsYLMqV5syu9DUqrGgMbLaRtAqwFUlY= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=bsWoq5ml; spf=pass (imf02.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748448376; a=rsa-sha256; cv=none; b=ojp/4PFkLCgIfTJ6sKEzpY2sgYmslXFTkS7OhcDE5NbYp5eDhNOKb0gmKbwpIL/gFUdDE1 Z3wwcdozKIEK7gXa49ZwL7HXKxGXAdTwGAWTqMs3dnydkFrGQscVHsGwL+22UEnncXwTAb IiVvKF08WeRIJ14STH7C41DPUyvLs1g= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1748448376; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=rDQSBS/gGCxekmdaf4Km0jQZvro7Mvttrm/TckFb13Q=; b=bsWoq5mlmPlQlgdII7AjP3rCIB5STviz4a5EsFxYOuJFHw60R0P6T7m6wMd5vPeG0s15Kl UL+ZELzWI/EPlLBZWHWx7W5GXws/4jKtXc3b/F47KEiiMYt32ziyEW61SPWN8WdOaT9inV DUW9jx3L9zehiwR9vdcXevgtXxOMb60= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-179-S8_SDB-KOcyNLv4rJ3OKsw-1; Wed, 28 May 2025 12:06:14 -0400 X-MC-Unique: S8_SDB-KOcyNLv4rJ3OKsw-1 X-Mimecast-MFC-AGG-ID: S8_SDB-KOcyNLv4rJ3OKsw_1748448373 Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-476870bad3bso63567991cf.3 for ; Wed, 28 May 2025 09:06:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748448373; x=1749053173; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=rDQSBS/gGCxekmdaf4Km0jQZvro7Mvttrm/TckFb13Q=; b=jgIAGhPBfbK3YTxH8ToDaszq7HXMdl8bFtp/xCRBP0DiueEqOfGK8p1h8rdal+YLUk 4fveG7G/BZ0jPijlYgUF2zIKJLsGUE44QvlQutpZXKHLAvzdWudcl4q/FUvCHQweBqM2 Ltqh1NP16yEFABqcNNSyOO+js0HauqOZRTLl9iP+OaD9bTa12vbwTj82uf7ZCNak/SOE 1belR6YtshBECGDrlZKUMdimPy98HjKmnmEI2k7QQS+WP7cjgXhJFZBYK18RWjPM9gwW Y7JY09mXgcMoz2y3FAi3wLcSEWdvAmFPsDtB+MhilxF8xbSXWwidBZ80rAVOf0KosOcs aIOw== X-Forwarded-Encrypted: i=1; AJvYcCVP8ImwHDoE/pbWOfvOx5VdY5viEbSH7tHzAeGI6Ylh0+FXjWXKRoe4P58qFooLkeKsPO9u3U2m1g==@kvack.org X-Gm-Message-State: AOJu0YwEH9O5fZpXs4hLhQHBd2Fq8KrR/XEryX9tLIwqrsdSJNJA3PUx xsbBOXu+Q79YvUZEi6BN1GJRXLMRlAlBnaaW9uwtiZo9VSB95gHngaSLwS+0KZu0W8ClVmN936I WY1OuPJt3Msy0d+OH5K5CsNh5cwMw4Hxt+1J3PbTA1oFCBzVw7LT/ X-Gm-Gg: ASbGncu1UCH3V/YTfOdLGw3C8WKqMDY0J76TgN4tM+0sUBH2/82umoQX9uX7LJ/BzGC 8y4G2fFRuQrphy6KrZ6pINkF+YgbcZexuUkkO5vYQEvTdXYgyUQx2opcRLe74kHxn73yh41StCe Sor9zkHgAR5aR0useJ53g04rvLhWc19KMkubBotLcZfK9fJ4vQPg2x2gGzMeQM3CNXUOC3brItS 321qvwBuKUP5VrypGTna50pRwNlWPz1OyjE9EojYu7bndSXjm5Sre4k+AmtyNsGFLZ1+fRogKUt 60s= X-Received: by 2002:a05:622a:58cf:b0:476:7b0b:30fb with SMTP id d75a77b69052e-4a380106e0emr44977801cf.22.1748448372836; Wed, 28 May 2025 09:06:12 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFErGWGv+hvTj/ZiRZiQ5dH9fUZCpoik/XxfCKuSrUZGbHVpJ2ZdHThuR6Wojg907piJvVD5w== X-Received: by 2002:a05:622a:58cf:b0:476:7b0b:30fb with SMTP id d75a77b69052e-4a380106e0emr44977461cf.22.1748448372403; Wed, 28 May 2025 09:06:12 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4a3c7fee755sm7601241cf.35.2025.05.28.09.06.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 May 2025 09:06:11 -0700 (PDT) Date: Wed, 28 May 2025 12:06:07 -0400 From: Peter Xu To: David Hildenbrand Cc: Jinjiang Tu , akpm@linux-foundation.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, linux-mm@kvack.org, wangkefeng.wang@huawei.com, Jason Gunthorpe Subject: Re: [PATCH] mm: fix COW mapping handing in generic_access_phys Message-ID: References: <20250528015617.302681-1-tujinjiang@huawei.com> <0d4f0180-52e6-47c9-b141-54e7e7c86880@redhat.com> <5b9f5952-9979-426f-857a-dffa9b7963af@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: zpc8TDTjlX5yDGPBW5Bq4VWtTgDyMcnV_kPl5z6Y-i8_1748448373 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Stat-Signature: i7qu1y9w91e1tej6r57gr43gp1t5m9am X-Rspamd-Queue-Id: 93FD98000E X-Rspamd-Server: rspam11 X-HE-Tag: 1748448376-994624 X-HE-Meta: U2FsdGVkX18uHimIROkn5cwgFuHxK17hjhy4dLl2rmnGW/fX4XW0qQIgO41q1TUZWQ9aFxeXG836/Uvd/XkbYa3EVzwWRRp3DjLEjOgCJnhnfu4jCCddoSSPfe+Kknc8zzmWN7pHAJIhU8C9Ig1DOaJPnTroz6+GeRfiQiwX914kHCOUu1L4agatJk7NYM67mtxZQZ9DOD+YedlCIRMLHCjHD+i9bx950Hq6UmP7RkN4oXP9c6JKMe2C4TrASm+tDTZKABOhPrmX3tbuq0N0MvpsdnlgmYGi48YkPAg5jcqWBz2RXJJfwE7WXyVZCuWP4UdlmpIBKOFfLkHfaesVh8jD8V50cEq3ntqtRh6t9l2sCJUTkV94sHdRZpIgfmzTHCxpOhhZHr1TqzDwy7mOiOhDRw6rIN18O+RsTUkDrwYZAdW7F/PBTzUFecBeUmFbm2XwBDDZy7tZyBP8krCj/b0OkWMBMJ+f/jrTCBPvg/GlvncU/uKGMZLkH1+SI8ESblG84NzasbVANvsHMv3QfS6TaTQh0GObdPMHN+yn+QyOwYJtzIROLZWW60XkFtw2Pfzp4u3gG6Y1ZBPrSd5cBbDbJtOZPfQ9EvsVZvMIJLHcjQraDpxmTjyroe1ZKGhtA/rYIruMMEWjIBNOWYdZMj7IlGW0tQbKAPQa+r76HwR95Vw6RopZTYWC6WA1APBbjIUjGByJ4tx/9KPRCkSOKpgaZdA9ezbBnn13fEODy8+G+vXHcD+iM6T7zMBAbYOULUyt2CyfT8tPN0UvKHLfjFys/+nTuceg0DgymVfeIq1XFt9OxwTa+982004icgIe/AppkrFUy8QKsKGX2szXnsbirLwL0zid8KIVSLc+HyS/vCY9B3Uj87xzfCZl8otMcNlj7u35NgEXnktMP6OwUJ1aNEcygg3fWVLqZ3C67/0j86m0fJ/YbhArByMZn0LAgF4aH0+tn28q06JbSzi TCKD8/yq dqolL4g5tDaIZND+1+FuEnuVG7gsLhhyB3auMK7hMsmH5z0ZWXWZeg22zUbciu38DTXLd5Zta4HBp26So0DXm7yk0vkkaRGYSJG/8rF/v2O+K0TZqxGTq6drjFPCWULzHFb5CW2JhPEq8K0AsSerh1nAEB5vBs3YAIrDZgVJ1VukmC/ueLeIjRMH1snCCmhrHA3Zu/sef3qhiqFwSjcaN6mfxje23aTRS6X2IeqtRCOXbJdj5ffhgE18Z5R4z3nRf+a+OWq9L/NqJS7fojwTsU9hJDXo8zFX/lKdW864MgWqR1aHXcCGP5N1jO3mWkj7iNjId+nsVat+16Qmn+Tv61nfjWKQJNgVzIxkuDxUciK2cnxCQ0vgQF39AD/NYaDtA1xEn1oW04MQsFG3VmunNAZHTFEAlEGiEavLRb4tQ1OMRk63wU60JpDObGMwou4F8NhlEDiAAVj+vXs95K3j5JfTa2GUIrMW4uIKNMjYQLmLv/BTrqH9W7STSpg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, May 28, 2025 at 05:29:29PM +0200, David Hildenbrand wrote: > On 28.05.25 17:25, Peter Xu wrote: > > On Wed, May 28, 2025 at 05:02:15PM +0200, David Hildenbrand wrote: > > > On 28.05.25 16:54, Peter Xu wrote: > > > > [Add Jason] > > > > > > > > On Wed, May 28, 2025 at 11:59:56AM +0200, David Hildenbrand wrote: > > > > > On 28.05.25 10:59, David Hildenbrand wrote: > > > > > > On 28.05.25 03:56, Jinjiang Tu wrote: > > > > > > > Syzkaller reports a below BUG: > > > > > > > ioremap on RAM at 0x0000000022727000 - 0x0000000022727fff > > > > > > > WARNING: CPU: 3 PID: 3609 at arch/x86/mm/ioremap.c:216 __ioremap_caller+0x644/0x7f0 arch/x86/mm/ioremap.c:216 > > > > > > > Modules linked in: > > > > > > > CPU: 3 PID: 3609 Comm: syz.2.577 Not tainted 6.6.0+ #63 > > > > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > > > > > > > RIP: 0010:__ioremap_caller+0x644/0x7f0 arch/x86/mm/ioremap.c:216 > > > > > > > Call Trace: > > > > > > > > > > > > > > generic_access_phys+0x241/0x480 mm/memory.c:6458 > > > > > > > __access_remote_vm+0x6af/0x970 mm/memory.c:6535 > > > > > > > access_process_vm+0x53/0x80 mm/memory.c:6600 > > > > > > > get_cmdline+0x192/0x380 mm/util.c:1041 > > > > > > > audit_log_proctitle kernel/auditsc.c:1620 [inline] > > > > > > > audit_log_exit+0x1424/0x18c0 kernel/auditsc.c:1811 > > > > > > > __audit_syscall_exit+0x252/0x2f0 kernel/auditsc.c:2079 > > > > > > > audit_syscall_exit include/linux/audit.h:356 [inline] > > > > > > > syscall_exit_work+0x10f/0x130 kernel/entry/common.c:166 > > > > > > > __syscall_exit_to_user_mode_work kernel/entry/common.c:205 [inline] > > > > > > > syscall_exit_to_user_mode+0x10/0x1e0 kernel/entry/common.c:218 > > > > > > > do_syscall_64+0x66/0x110 arch/x86/entry/common.c:87 > > > > > > > entry_SYSCALL_64_after_hwframe+0x78/0xe2 > > > > > > > > > > > > > > The /dev/mem is mapped with COW mapping, and mremap at the mm->args_start. > > > > > > > The special pfn mapping is replaced by anon folios due to COW. > > > > > > > generic_access_phys() is supposed to handle iomem, instead of RAM pfn, > > > > > > > thus trigger a WARN_ON. > > > > > > > > > > > > > > Similar to commit 04c35ab3bdae ("x86/mm/pat: fix VM_PAT handling in > > > > > > > COW mappings"). check if the pte is special to reject Cowed anon folios. > > > > > > > > > > > > > > Signed-off-by: Jinjiang Tu > > > > > > > --- > > > > > > > mm/memory.c | 7 +++++++ > > > > > > > 1 file changed, 7 insertions(+) > > > > > > > > > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > > > > > index 49199410805c..e1dac84536ee 100644 > > > > > > > --- a/mm/memory.c > > > > > > > +++ b/mm/memory.c > > > > > > > @@ -6840,6 +6840,13 @@ int generic_access_phys(struct vm_area_struct *vma, unsigned long addr, > > > > > > > retry: > > > > > > > if (follow_pfnmap_start(&args)) > > > > > > > return -EINVAL; > > > > > > > + > > > > > > > + /* Never return PFNs of anon folios in COW mappings. */ > > > > > > > + if (!args.special) { > > > > > > > + follow_pfnmap_end(&args); > > > > > > > + return -EINVAL; > > > > > > > + } > > > > > > > + > > > > > > > prot = args.pgprot; > > > > > > > phys_addr = (resource_size_t)args.pfn << PAGE_SHIFT; > > > > > > > writable = args.writable; > > > > > > > > > > > > I assume we trigger this through vma->vm_ops->access, when the vm_ops have generic_access_phys set. > > > > > > > > > > > > I still dislike exposing the "special" bit here, as it is absolutely not what we should care about in the caller. > > > > > > > > > > > > In case our arch does not support pte_special, you fix will not catch that case ... > > > > > > > > > > > > The following might be better: > > > > > > > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > > > > index 37d8738f5e12e..810adb8d1a53b 100644 > > > > > > --- a/mm/memory.c > > > > > > +++ b/mm/memory.c > > > > > > @@ -6681,6 +6681,14 @@ int generic_access_phys(struct vm_area_struct *vma, unsigned long addr, > > > > > > prot = args.pgprot; > > > > > > phys_addr = (resource_size_t)args.pfn << PAGE_SHIFT; > > > > > > writable = args.writable; > > > > > > + > > > > > > + /* Refuse (refcounted) anonymous pages in CoW mappings. */ > > > > > > + if (is_cow_mapping(vma->vm_flags) && > > > > > > + vm_normal_page(vma, addr, ptep_get(args.ptep))) { > > > > > > + follow_pfnmap_end(&args); > > > > > > + return -EINVAL; > > > > > > + } > > > > > > + > > > > > > > > > > Thinking again, we might have a PMD/PUD mapping, so maybe > > > > > follow_pfnmap_start() should really just refuse any refcounted pages. > > > > [1] > > > > > > > > > > We may want to be careful on this. > > > > > > > > I feel like we can still potentially break drivers that > > > > follow_pfnmap_start() used to work on debateable things like RAM page > > > > injections, unless breaking them is the intention. > > > > > > Yes, that all needs a cleanup likely; it's all very confusing and > > > inconsistent. > > > > > > > > > > > OTOH, I also see at least two in-tree drivers set VM_IO|VM_MIXEDMAP: > > > > > > > > *** drivers/gpu/drm/gma500/fbdev.c: > > > > psb_fbdev_fb_mmap[110] vm_flags_set(vma, VM_IO | VM_MIXEDMAP | VM_DONTEXPAND | VM_DONTDUMP); > > > > > > > > *** drivers/gpu/drm/omapdrm/omap_gem.c: > > > > omap_gem_object_mmap[538] vm_flags_set(vma, VM_DONTEXPAND | VM_DONTDUMP | VM_IO | VM_MIXEDMAP); > > > > > > > > AFAIU, these MIXEDMAP users will still rely on follow_pfnmap_start() to > > > > work on e.g. RAM pages, because GUP will simply fail them.. > > > > > > Right. > > > > > > VM_IO essentially tells us "don't touch this memory, it might have side > > > effects", such as MMIO, that's why GUP outright refuses VM_IO VMAs. > > > > > > I am not sure why generic_access_phys() should be allowed to ... touch that > > > memory instead? > > > > I'm looking at: > > > > commit 28b2ee20c7cba812b6f2ccf6d722cf86d00a84dc > > Author: Rik van Riel > > Date: Wed Jul 23 21:27:05 2008 -0700 > > > > access_process_vm device memory infrastructure > > > > VM_IO is also intentionally mentioned in the doc too: > > > > Documentation/filesystems/locking.rst > > > > ->access() is called when get_user_pages() fails in > > access_process_vm(), typically used to debug a process through > > /proc/pid/mem or ptrace. This function is needed only for > > VM_IO | VM_PFNMAP VMAs. > > > > So it definitely looks like intentional, though I know nothing about PPC > > Cell SPUs.. > > > VM_IO | VM_PFNMAP, I can understand that. It's weird combined with weird. > > But the use case for "VM_IO | VM_MIXEDMAP" ? > > To be precise, I am questioning if follow_pfnmap_start() should only work on > ... > > PFNMAPs ? > > :) It goes back to the question on whether things will break which used to work.. I was almost conservative as I know little on driver side, that's also why I tend to prefer making iov_iter work with ram-mapped VM_PFNMAP vmas too. After all, AFAIU Linux should try to not break working users; sometimes we pay for that. Meanwhile: #define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */ I'm not confident to blame any driver yet to have those special cases for VM_PFNMAP, because it only says "managed without struct page", it didn't say "it must not contain struct page".. Hence it hints the core mm "please do not manage these mappings with struct page at all". Still sounds fair contract, even if not ideal. But yeah, once again I agree that'll be ideal if what you said could happen some day. [I wanted to copy Jason but I failed the job; do it this time] -- Peter Xu