From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1283BC28B30 for ; Thu, 20 Mar 2025 23:05:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 48D2E280002; Thu, 20 Mar 2025 19:05:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 43CFD280001; Thu, 20 Mar 2025 19:05:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2DDCE280002; Thu, 20 Mar 2025 19:05:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 06761280001 for ; Thu, 20 Mar 2025 19:05:21 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C3BC981461 for ; Thu, 20 Mar 2025 23:05:22 +0000 (UTC) X-FDA: 83243462484.21.B3060C1 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by imf08.hostedemail.com (Postfix) with ESMTP id C5C2B160014 for ; Thu, 20 Mar 2025 23:05:20 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=B5aEGAZ3; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of jannh@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742511920; a=rsa-sha256; cv=none; b=RJ3+WT8ZoNFMZRzu0OQXoj7P9HgtZytf3VAtChzFJHFEmRBcIpKMsRW1JwNxdLwexmwrnK 6i+AH77hEP7jzHt8o3EE0gI4SPhPUDiAdhmFdD2Da4mJFZ5hTDIpDETBQruTjNJ3NR6IF6 0+2hyGtWHDLaG80LSYQlLS1Q2Jpx+dQ= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=B5aEGAZ3; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of jannh@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742511920; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ny+GX9yVAovZ4kewf1rPe2k5PehefNSsdrKKMi2rCUM=; b=YhyWSsf5yPEOCqEu6exCaPdyH67DdlG1fcfCu40R9xrbbHfEQF5rnALlnKfg7PBQVqj5fa IrOAutp+oxb/rux1W4R0OTAjda2hrZDPhGVSqAMo4fZFWZWAJQ+dTKcLf14TTlsKsgKePX DNT9rBW+c6hY2QaAaMfmqbUYBRJdiV4= Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-5e50bae0f5bso4377a12.0 for ; Thu, 20 Mar 2025 16:05:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742511919; x=1743116719; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ny+GX9yVAovZ4kewf1rPe2k5PehefNSsdrKKMi2rCUM=; b=B5aEGAZ3ZM1Hf3zbMyuIiqrYJOZ8/UYxKsjP+4hI0AQNEBQXKeLOsmKzXzjsKKB1mp 1hR9NflvTXTLf2pWeSnnffskEdbXvC2xd+fKxiT1HrrOayspJN70D2rB4JFKN1u/S4hj DjOWyEDuscC+mf49Re6gYYbZ02GKOXqtfUOr0nI9j95pozAf0cqaz4U5oRsr5v52u1i0 Kze7j+3JdiJ+3tqvvAAjxV7XYCgdopjuxgL3RuJAHUBg3vZUJmmVz6YxK1gEKqk+HuQu 6AjgLzYRB6M8opIMzDxeqbaGSHlc5qYzHyMah5BbS/I02Hrt/fprx9X0/CoNviNUVQx2 0zHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742511919; x=1743116719; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ny+GX9yVAovZ4kewf1rPe2k5PehefNSsdrKKMi2rCUM=; b=sO8d4gtDxwOf9XGfd0k/OGt0tMiu0wiH/3P7HHVwwD8+A7fhc/2IoYWuZrC7OOEwBM t7puN2qS398Hfahv6vAqggLlxo8MkReI80GJLwzbu7v8Tx57we8eakXAOd+07cqW2bRN mmydtPRDknjN0X7Xcx3A3fRu5XYKJnc/wINuwXCsvfUarRmOip9wtUeKXe1x7deLr/PL vFr7ZHo0IEteyac3J3Xb2+PzUMDcd7aTpwTZyphbStS+LNHLzSRHS5x+mJI4DfRLVru/ Z6BMErtTLffWAnsiuvEiFFPEL1bsiFKFB0YRdXRHSH1cO9fpHo2D+kRyzxPdUYk4lz8s LTNw== X-Forwarded-Encrypted: i=1; AJvYcCU8gbPHjYbS3+ujArRU6y1YTTg9BlHVPvsMmIauN3xf3eoGGlVeKQ0YHRqic7HmAu+p2TLZvUEYnw==@kvack.org X-Gm-Message-State: AOJu0YwwPSznWNc97vtM/iy0F/0Ud/L+xop8dmucCke3h8F3xv40/8r7 hQBtb2ksh7tciDwGo59i+jHLMUZG7q/TunrvioCNudHPlYQxdrRY9ihiKqLu3Vqfs1w7yECNOeh 9LRJQpuqpap+qVGZToIcYb5Dem7xZzdOOKUMZ X-Gm-Gg: ASbGncuL3hNcIlBhCxJSNLaLTohcdDOEF3MO8V2zhPceVOxc/1EzAQJ80Penbdq80Jk bUz+zgtrEV3RwuT6GxVamij8WLfGSmTBWyPuhylfc0dv/GpcMXk+7PE9HAgZjx3SuCVbTkSZUAC Lt77BrxwNI0F1Jzdy0cOfFo8Kou2hy1YdQJe7LmBzlDXD1etUnctl9+w== X-Google-Smtp-Source: AGHT+IF5CK6kwBwnNBgPnJcgDWcZ3GUQUpPhOpTKtwb7ZWImMZjh8oACjRLggDzbt4l/0qzanepIPk+n4k3et23fQF4= X-Received: by 2002:aa7:d497:0:b0:5e6:15d3:ffe7 with SMTP id 4fb4d7f45d1cf-5ebcfee002emr19933a12.7.1742511918624; Thu, 20 Mar 2025 16:05:18 -0700 (PDT) MIME-Version: 1.0 References: <67dc67f0.050a0220.25ae54.001e.GAE@google.com> <332b3149-0e84-4bf8-9542-89d68b0a9680@lucifer.local> In-Reply-To: <332b3149-0e84-4bf8-9542-89d68b0a9680@lucifer.local> From: Jann Horn Date: Fri, 21 Mar 2025 00:04:42 +0100 X-Gm-Features: AQ5f1Jp4nawavgqm2yUblZqtBydiZDSZ7_Jv04KZ6mtjQzD0tGgC7wPOtBY_Qkg Message-ID: Subject: Re: [syzbot] [mm?] BUG: unable to handle kernel paging request in vma_merge_existing_range To: Lorenzo Stoakes Cc: Pedro Falcato , syzbot , Liam.Howlett@oracle.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzkaller-bugs@googlegroups.com, vbabka@suse.cz Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: C5C2B160014 X-Stat-Signature: c5fu78qry6nj69ihhchxmpp8safmqyux X-Rspam-User: X-HE-Tag: 1742511920-406193 X-HE-Meta: U2FsdGVkX1/MEq6kKL0lKTgQ5mnKihrBW55jxxXWHq8lXCOaMh8P4zm8w2cQF+8Q+3jVoOPQsQzbRRQTPw5SxZBUn+HY1Iab4sxVzRV6EkG5+2LeSIOTBG2sPHCam8MJY+CNRO68dP5RyO7Dj6vvqObYYGDEZIE0x0SpImAmY9gt/1P8UIJnAXRXi7rPppZpXfljoTn2Q7oxvsI1v1liJBg2kRQacRrJo7yDdLYi5f8q6D6jd3AIXHhCkWP9Wx9wrlcOLR1ytXitCD7zruPjumteZ9iCRm8DDIluYu9UXeQlhTbaFa4123vyoEe7cIDciCpRZQ3XikMufbQvHU1uXvC7JOIkNO0xUVSGkr0poo+r31FBoln2xn7d666E2j07qMyPR8os9uckKxB7ZLKOaY00I6HLMl6LdCL9WSdN94xAESI/E5Esn+F/g49WRYBMAiDdQfBI7MAGpjtGZ9x782K+q1scTH895S123XA0DYSEkS4NIuqEthxxRlXmmdnCgp5xPn7/RVG4SJRAqm+FD+TNFXbhlOswaXXWvZ08hKxTIbcDfK0navuVxzApEmXCq0HapGwmAUdhul5ITQiV2VLJpnY8fnPCMvx0FREK8HQzGqxbpXQbmASQzcyIsIIpz7EZZYEC7onecgv3BgNEg8KfGUYClhmpbrEt4PT68YsCI9eckSuv1qyCVxyV8GzGMJ/ElLwL4S6j7etwnpUMSSR1eUVLTzz84YvJDVdYLK0HV2RMs77OtEzGB7lLxkH+L3AG5+Z8TmvmZoMABj1lgcimd58jTuIkf0x6onE1cRx6J2r3NtK1iSWlKvlFWb3oSrvQQMSwHRefn0uVYVY7d2k9hSGFAPttvs/8FXXZcGLYO+DjAM9XVahavdJ4+vtoXHdSWaf1Mz5yfnmUIu5++vZpQVfDsogusgfhEdhpsQGsG4uGIUDk/DmdKEqaAWooIo4JsNVcj9mnA6epUXi akbKCnbO GlrYqwuIgGHjkxYnvW17NiWmETFD5rasVE2P7qSduatyEB8cjD2TF7O7XsLg7kgj/CN5B2IcAWmSv4txQln07tkoW1Vbp/KgLa7DZiCTIEDAPWc/3UgPYdGozneOxu4pFTSMwr18MMJeDV6F7ZJrdLkPbDEyxdQSBMcGgehpwdGDpAoEQDDUzAG3PIUpzBur6q/SEBQiUkdCrHbazTIUFmT8MBO+6bO9KKqAvGf7SOQeVcO+HRm+Koziv+K+xY/O/YdJ2MnYqMYwhVGu/aus3QWHg6IhaIyCTsoHI/pXMoZooCuGChEzAk7vbxzlA9HzrULPhA/kYfsEQcMBxjLPZXHCDjiTcIO77kAxkCEERz1pfPKCgZRkcnPY5OVyZiRHcp2XjRu9gkMX/mo6arl58htD/jQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000090, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 20, 2025 at 9:53=E2=80=AFPM Lorenzo Stoakes wrote: > On Thu, Mar 20, 2025 at 09:11:33PM +0100, Jann Horn wrote: > > On Thu, Mar 20, 2025 at 9:02=E2=80=AFPM Pedro Falcato wrote: > > > On Thu, Mar 20, 2025 at 12:09:36PM -0700, syzbot wrote: [...] > > > Ahh, fun bug. This *seems* to be the bug: > > > > > > First, in vma_modify: > > > > > > merged =3D vma_merge_existing_range(vmg); > > > if (merged) > > > return merged; > > > if (vmg_nomem(vmg)) > > > return ERR_PTR(-ENOMEM); > > > > > > then, all the way up to userfaultfd_release_all (the return value pro= pagates > > > vma_modify -> vma_modify_flags_uffd -> userfaultfd_clear_vma): [...] > > > diff --git a/mm/vma.c b/mm/vma.c > > > index 71ca012c616c..b2167b7dc27d 100644 > > > --- a/mm/vma.c > > > +++ b/mm/vma.c > > > @@ -1517,8 +1517,16 @@ static struct vm_area_struct *vma_modify(struc= t vma_merge_struct *vmg) > > > merged =3D vma_merge_existing_range(vmg); > > > if (merged) > > > return merged; > > > - if (vmg_nomem(vmg)) > > > + if (vmg_nomem(vmg)) { > > > + /* If we can avoid failing the whole modification > > > + * due to a merge OOM and validly keep going > > > + * (we're modifying the whole VMA), return vma intact= . > > > + * It won't get merged, but such is life - we're avoi= ding > > > + * OOM conditions in other parts of mm/ this way */ > > > + if (start <=3D vma->vm_start && end >=3D vma->vm_end) > > > + return vma; > > I do not like this solution at all, sorry. > > I mean I get what you're doing and it's smart to try to find a means out = of > this in general :) but let me explain my reasoning: > > For one this is uffd's fault, and the fix clearly needs to be there. I mean... this worked fine back in, for example, 5.4 - userfaultfd_release() would loop over the VMAs, change some stuff in some of them, and merge them where possible, and there was no way anything could fail. It's the VMA subsystem that changed its API... > But also, we _can't be sure_ vma is valid any more. The second it goes of= f > to vma_merge_existing_range() it might be removed, which is why it's > critical to only use 'merged'. > > Now you might be able to prove that _right now_ it'd be ok, if you do thi= s > check, because vma_complete() does the delete and only if either > vma_iter_prealloc() or dup_anon_vma() fails would we return -ENOMEM and > these happen _before_ vma_complete(), but that's an _implementation detai= l_ > and now we've made an assumption that this is the case here. > > An implicit effectively precondition on something unexpressed like that i= s > asking for trouble, really don't want to go that way. > > > > > return ERR_PTR(-ENOMEM); > > > + } > > > > > Along the lines of your idea, perhaps we could add a parameter "bool > > never_fail" to vma_modify() that is passed through to > > vma_merge_existing_range(), and guarantee that it never fails when > > that parameter is set? Then we could also check that never_fail is > > only used in cases where no split is necessary. That somewhat avoids > > having this kind of check that only ever runs in error conditions... > > Yeah hmmm, again this is _not where the problem is_. And we're doing it f= or > _one case only_, who _must_ succeed. Right? It seems to me like it is... theoretically very reasonable of userfaultfd to expect to have a "reliably change the flags of an entire VMA" operation, and if the normal VMA code doesn't provide that because of maple tree internals in the merging case, then it would be reasonable for the VMA code to provide an alternative that does provide this? > Buuuut then again, we could add a _feature_ (it'd be something in VMG not= a > bool, hey what are helper structs for right? :P) > > We coould add a special mode that says we __GFP_NOFAIL, we do that in > vms_abort_munmap_vmas() and man that was under similar circumstances (hey > remember that fun Liam? :) > > But at the same time, it feels icky, and we probably don't want to > proliferate this pattern too much. > > So I'd kind of rather prefer a _general_ no-fail unmap that the uffd > release code could invoke. > > Perhaps we could genericise the vms_abort_munmap_vmas(): > > mas_store_gfp(mas, NULL, GFP_KERNEL|__GFP_NOFAIL); > > And make that available or some form of it, to do the 'simplest' thing in > this scenario. The userfaultfd release code doesn't want an "unmap" operation. It just wants to remove the __VM_UFFD_FLAGS flags and set the vma->vm_userfaultfd_ctx pointer to NULL. The VMA code then sometimes sees an opportunity to merge with adjacent VMAs; and this merging is what's failing. So if we're willing to tolerate having adjacent VMAs that are mergeable but aren't merged after an allocation failure, then instead of using __GFP_NOFAIL for the merge, we could also just ignore merge failures, at least when some "never fail" flag is set? [...] > Another possible solution is to add a flag that _explicitly asserts and > documents_ that you require that no VMA be removed before attempting to > allocate. > > Or we could make that an _explicit_ assumption? I don't think I understand this part. What VMA removal and allocation are you talking about? > And then the uffd code itself could cache vma and take Pedro's solution o= n > that basis? > > void userfaultfd_release_all(struct mm_struct *mm, > struct userfaultfd_ctx *ctx) > { > ... > > for_each_vma(vmi, vma) { > struct vm_area_struct *old =3D vma; > > ... > > vma =3D userfaultfd_clear_vma(&vmi, prev, vma, > vma->vm_start, vma->vm_end); > if (IS_ERR(vma)) { > BUG_ON(vma !=3D -ENOMEM); /* Sorry Linus!= */ > > /* > * OK we assert above that vma must remain= intact if we fail to allocate, > * We are in an extreme memory pressure st= ate, we simply cannot clear this VMA. Move on. > */ > prev =3D old; > } > > ... > } > } > > I mean it's going to be dirty whichever way round we do this. > > Thoughts guys? I guess my main thought on this is: I would prefer it if we keep any code that runs only in near-impossible cases as simple as possible, because issues in those codepaths will take longer to find.