From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 750B1C54ED1 for ; Sat, 24 May 2025 16:46:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E9B656B0082; Sat, 24 May 2025 12:46:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E4B316B0083; Sat, 24 May 2025 12:46:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D13DF6B0085; Sat, 24 May 2025 12:46:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A4B8D6B0082 for ; Sat, 24 May 2025 12:46:15 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2199C1A08C1 for ; Sat, 24 May 2025 16:46:15 +0000 (UTC) X-FDA: 83478379110.28.102C33E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf22.hostedemail.com (Postfix) with ESMTP id 2DD5CC0011 for ; Sat, 24 May 2025 16:46:12 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LAxHXM2k; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf22.hostedemail.com: domain of oleg@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=oleg@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748105173; a=rsa-sha256; cv=none; b=2YIfEejay3pTbR87YdDAOznM020SOGz3dRUNR3hZLATbF3lvltyzC9HM386KgjJrQIUAj8 jDVQJG5c8vUtmfSs9cA7gyW5KWpkz640NNh5oqf8pA2JclMNfnRMGZdKLMGaLrgxmNxqC2 p45lT45MX21Ikf7xn7B0u130bjaK11U= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LAxHXM2k; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf22.hostedemail.com: domain of oleg@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=oleg@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748105173; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3xk+u68q/fMfbYRBioKGMBnP9jaUFgyFWrUy2HKUZLI=; b=jeaxOBCD/J9Lwd94yB9FZU5TyfQPCxbzPzSDSiO5u5ViW3y8lcfD97CqW3jSGmH0KT1dJ6 epTwwHrgmQSWJCHUEHOk7LsuvDFMeYDCYi7perO2txovKhwdX/6hW4c/jqF2Q1ljB6RL7R eMZGioPQAcVbcNKf3MV27e2/5ZfI4S4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1748105172; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=3xk+u68q/fMfbYRBioKGMBnP9jaUFgyFWrUy2HKUZLI=; b=LAxHXM2klH3Xm9c6hqv9JdWPLPsikcMIEPTKEJ2Q+eq6ha9xrx5fxSMzKjRfKZ87vg+9Z/ ieQ7ZxzH4srsRI80h7g1DLTMu5zfKopzvqThUk4Pc3OxbtH9RtIgLtlY4We5/2SfQIRAEn pwG0Mhx5u3BKG2XP77Hvu1ayJz/yPmw= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-634-kKAyHEHNNcqXeSVrGYy8zw-1; Sat, 24 May 2025 12:46:08 -0400 X-MC-Unique: kKAyHEHNNcqXeSVrGYy8zw-1 X-Mimecast-MFC-AGG-ID: kKAyHEHNNcqXeSVrGYy8zw_1748105166 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 93C5218003FC; Sat, 24 May 2025 16:46:05 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.44.32.4]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with SMTP id D7E061944DFF; Sat, 24 May 2025 16:45:59 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Sat, 24 May 2025 18:45:24 +0200 (CEST) Date: Sat, 24 May 2025 18:45:17 +0200 From: Oleg Nesterov To: Pu Lehui , David Hildenbrand , Lorenzo Stoakes Cc: mhiramat@kernel.org, peterz@infradead.org, akpm@linux-foundation.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, pfalcato@suse.de, linux-mm@kvack.org, linux-kernel@vger.kernel.org, pulehui@huawei.com, Andrii Nakryiko , Jiri Olsa Subject: Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap Message-ID: <20250524164516.GA11642@redhat.com> References: <20250521092503.3116340-1-pulehui@huaweicloud.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250521092503.3116340-1-pulehui@huaweicloud.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2DD5CC0011 X-Stat-Signature: 7aiansbuiiqmxekkfixywqbu9s7qekmc X-Rspam-User: X-HE-Tag: 1748105172-704677 X-HE-Meta: U2FsdGVkX1/bOokB+XkRxfnHYitc/48ax/3l81UaIedpKUO+Op4ii4ltFUpvmni5egBkdnNS/NpBC+NxtSAWsKPIu2KlwOzo3u/as6ecTUftt1bW2TKP2drPusMqhAT8DfnVhULLeerxAXiZWd827WDsaXa3X3T8BhABJFyctghYf4+SxV2hMFFsvpoG1lah2Xogc/GwFQIIQA7pBnO4c2B9rTU1Oribut8wmV86lruZ3F0qCfy1vUtFFjKwwQ2rPZaWahY1dc6+iZv9hL2ALjmbK4csAWpNw2YIgPDjDfzm+OWQwFKP7ckD9M7lHxCjprWwLLl7FOueJ4qHMHb3Y0wkh6MF11yJz3MvfiRc9UIjR5+22sbQetMPBZ/eqmMRG2wBKz2Lg238K4bqywv4FJlyr6XXvAB3qXFCn1E9/CVj6p2Zepz9af4Udw7DfwTORhFtYI5dfCcBBrTBSBIgJ3IaWD2GoNXQYOAuDVVoh3F01o3rPba25TTmtVV9mgPK1EpORt05R9eU5cKMhJDqTxU3iF7r+2vcZ6S0Wsx2mJHNzRNiDtfATEDn8QYglpTlB7wjZbd/6/7vfZZoyERTPPrMr+dwDVJxb6vxnHux3+vcur6MLEknDTQPv5GGa3ZhmtIezrDUYYm9vNbNNMhGV3IICVAtWdTF36oGDE9+44L32iS6OMNRZSLQPjVmhz3TkosEPy+KjDeofthCUqBnsJyXB0YAQmXpcRJaNF50PFTPVVBVpHSI6Jrv0Kjyo7ry81mvzyW3N2lvp+OCDVh9DFagPYRqWrVIRSBuStWsryYGwZQ8erZoCdQW0cmnNXXwy8nHUzpB0w2jp+r1vPeO+oIksqOH/ONDBQnTaewExgQ2K7C1UghsjE0p5+4JzAupWKETq8xdokpBZvZq10/NC8SFwewLuu+hBYmP0a7u+EJUSg0D0CjT6cxmv8+shtoYzbH9iVAhyeMbiYVPTfI XYpzxXl4 YFWk7GoCbCXQDiWauPz7tOGEFtpJYnzY4ot5R511onAmcg5+rqbM1Hc3fzOW9fqukLjbQVAGw9gLq9Cd8iCMquKyDhQQ5vemgAjUSTloFxqfd4rqeC+Bi/+JT8A4YUWEGKSSIk8M8ErclzsI6zyhajihw7HNd6FdewEz4CKYTV8t3Meqoa3GRMEj+wXirvCKm4QM1NDgeKSLDGiHXE8cWPD/XZAmF9yPbFAtVffJD4foc//aCzj5Dnxwoh1bCFHalYH84aQ1mSsoF8TSb90nYXYK6vyzpESh+dssSFZCyjo/EbxjnHlWFg/5RkpHGAXt8xJ3dhIi0yCcbCzi7GbkCo6CuhIK/cfvmc7q1zQQ+f84/BOPSNbpJxkdg2MJYhwuBcPvbL7UWZW9khRg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: I am very glad that mm experts are already looking into this problem ;) I can't help, today I don't understand mm/vma.c even remotely. But let me ask a couple of stupid questions. > However, the > upcomming move_page_tables step, which use set_pte_at to remap the vma2 > uprobe anon page to the merged vma, will over map the old uprobe anon > page in the merged vma, and lead the old uprobe anon page to be orphan. To be honest, I can't even understand this part due to my ignorance. What does "the old uprobe anon page to be orphan" actually mean? How can the unnecessary uprobe_mmap() lead to an "unbalanced" inc_mm_counter(MM_ANONPAGES) ? Or what else can explain the "BUG: Bad rss-counter state" from check_mm() ? Or there are more problems? I will appreciate it if someone provides a more detailed explanation for dummies ;) -------------------------------------------------------------------------- But lets forget it for the moment. I fail to understand the usage of uprobe_mmap() in vma_complete(). In particular, this one: 334 if (vp->file) { 335 i_mmap_unlock_write(vp->mapping); 336 uprobe_mmap(vp->vma); When exactly do we need this "unconditional" uprobe_mmap(vp->vma) ? Why is it called by mremap() ? But lets even forget about mremap(). Unless I am totally confused, it is also called by munmap() if it implies split_vma(). >From the reproducer: void *addr2 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0); In this case uprobe_mmap() is called by __mmap_complete(), it will call uprobe_write_opcode(opcode_vaddr => addr2). This is clear. Now, what if we do munmap(addr2, 4096); after that? Afaics, in this case __split_vma() -> vma_complete() will call uprobe_mmap(vp->vma) again, and uprobe_write_opcode() will try to install the bp at the same addr2 vaddr we are going to unmap. Probably this is harmless, but certainly this is sub-optimal... No? Oleg. On 05/21, Pu Lehui wrote: > > From: Pu Lehui > > We encountered a BUG alert triggered by Syzkaller as follows: > BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES val:1 > > And we can reproduce it with the following steps: > 1. register uprobe on file at zero offset > 2. mmap the file at zero offset: > addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0); > 3. mremap part of vma1 to new vma2: > addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE); > 4. mremap back to orig addr1: > mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1); > > In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new > vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from > the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096]. > In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back > to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 + > 4096, addr1 + 8192] still maps the file, it will take > vma_merge_new_range to merge these two addr ranges, and then do > uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero > offset, it will install uprobe anon page to the merged vma. However, the > upcomming move_page_tables step, which use set_pte_at to remap the vma2 > uprobe anon page to the merged vma, will over map the old uprobe anon > page in the merged vma, and lead the old uprobe anon page to be orphan. > > Since the uprobe anon page will be remapped to the merged vma, we can > remove the unnecessary uprobe_mmap at merged vma, that is, do not > perform uprobe_mmap when there is no vma in the addr range to be > expaned. > > This problem was first find in linux-6.6.y and also exists in the > community syzkaller: > https://lore.kernel.org/all/000000000000ada39605a5e71711@google.com/T/ > > The complete Syzkaller C reproduction program is as follows: > > #define _GNU_SOURCE > #include > #include > #include > > #include > #include > #include > #include > #include > #include > > int main(int argc, char *argv[]) > { > // Find out what type id we need for uprobes > int perf_type_pmu_uprobe; > { > FILE *fp = fopen("/sys/bus/event_source/devices/uprobe/type", "r"); > fscanf(fp, "%d", &perf_type_pmu_uprobe); > fclose(fp); > } > > const char *filename = "./bus"; > > int fd = open(filename, O_RDWR|O_CREAT, 0600); > write(fd, "x", 1); > > void *addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE | PROT_EXEC, > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > > // Register a perf uprobe on "./bus" > struct perf_event_attr attr = {}; > attr.type = perf_type_pmu_uprobe; > attr.uprobe_path = (unsigned long) filename; > syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0); > > void *addr2 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0); > void *addr3 = mremap((void *) addr2, 4096, 2 * 4096, MREMAP_MAYMOVE); > mremap(addr3, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, (void *) addr2); > > return 0; > } > > Signed-off-by: Pu Lehui > --- > mm/vma.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/mm/vma.c b/mm/vma.c > index 3ff6cfbe3338..9a8d84b12918 100644 > --- a/mm/vma.c > +++ b/mm/vma.c > @@ -325,7 +325,7 @@ static void vma_prepare(struct vma_prepare *vp) > * @mm: The mm_struct > */ > static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi, > - struct mm_struct *mm) > + struct mm_struct *mm, bool handle_vma_uprobe) > { > if (vp->file) { > if (vp->adj_next) > @@ -358,7 +358,8 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi, > > if (vp->file) { > i_mmap_unlock_write(vp->mapping); > - uprobe_mmap(vp->vma); > + if (handle_vma_uprobe) > + uprobe_mmap(vp->vma); > > if (vp->adj_next) > uprobe_mmap(vp->adj_next); > @@ -549,7 +550,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, > } > > /* vma_complete stores the new vma */ > - vma_complete(&vp, vmi, vma->vm_mm); > + vma_complete(&vp, vmi, vma->vm_mm, true); > validate_mm(vma->vm_mm); > > /* Success. */ > @@ -715,6 +716,7 @@ static int commit_merge(struct vma_merge_struct *vmg) > { > struct vm_area_struct *vma; > struct vma_prepare vp; > + bool handle_vma_uprobe = !!vma_lookup(vmg->mm, vmg->start); > > if (vmg->__adjust_next_start) { > /* We manipulate middle and adjust next, which is the target. */ > @@ -748,7 +750,7 @@ static int commit_merge(struct vma_merge_struct *vmg) > vmg_adjust_set_range(vmg); > vma_iter_store_overwrite(vmg->vmi, vmg->target); > > - vma_complete(&vp, vmg->vmi, vma->vm_mm); > + vma_complete(&vp, vmg->vmi, vma->vm_mm, handle_vma_uprobe); > > return 0; > } > @@ -1201,7 +1203,7 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma, > > vma_iter_clear(vmi); > vma_set_range(vma, start, end, pgoff); > - vma_complete(&vp, vmi, vma->vm_mm); > + vma_complete(&vp, vmi, vma->vm_mm, true); > validate_mm(vma->vm_mm); > return 0; > } > -- > 2.34.1 >