From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 844ECD3C526 for ; Thu, 17 Oct 2024 16:58:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B7D56B0088; Thu, 17 Oct 2024 12:58:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 168556B0089; Thu, 17 Oct 2024 12:58:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 030386B008A; Thu, 17 Oct 2024 12:58:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D11B76B0088 for ; Thu, 17 Oct 2024 12:58:06 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 64A1412176C for ; Thu, 17 Oct 2024 16:57:56 +0000 (UTC) X-FDA: 82683701478.23.159D388 Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by imf29.hostedemail.com (Postfix) with ESMTP id 3CB43120013 for ; Thu, 17 Oct 2024 16:57:51 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Bclsxqdv; spf=pass (imf29.hostedemail.com: domain of jannh@google.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729184139; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8XB3a7uxPWVYCaQb9TO1KjD2csy1sUqq2mVlddinEQA=; b=Wwqh2la/ckUYH065//vVBMqpa2yN1sSZZ/I9rXoZzcibM1xbnVIu3ihGfkWIg+uhnYyTUM 7UcfWxVZb/rOSkHIGNL6VXscWFRs53ZgDd/Chj91rs7tgeLojI5HYIbwvejEDrAvU92f8H PkdmRMLuy0j8iRoPLlRS3MfSjMqxuF8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729184139; a=rsa-sha256; cv=none; b=Hx4TGOYv5TMKuk4zrUbGVlDGW2B7kXSe7vmGgukgXrRYEv29OdzAdFiHLCWvHgKuxg+s30 bUXDBSQROsxJPpxtyPuC6Ckwtv0AVIdanInjwjoi2KXn7EgSkQyhgj9ooq3mRRAXvu6OY3 64wvAvE42Jt12/U3Cy2oVgpv8VY1puk= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Bclsxqdv; spf=pass (imf29.hostedemail.com: domain of jannh@google.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-lf1-f49.google.com with SMTP id 2adb3069b0e04-539e66ba398so895e87.0 for ; Thu, 17 Oct 2024 09:58:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729184283; x=1729789083; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=8XB3a7uxPWVYCaQb9TO1KjD2csy1sUqq2mVlddinEQA=; b=BclsxqdvKGIJeBVZHlI5NtTCSceG/15BiquaiB7ZQV0j/IPSC6fgWyRWenrwmBFA10 Id6ORSaB5dJmGycJsykyXAfcJzlrWHuc3/VjICjT75MlGdBDBLtR42byFp67jqbKIOmr NkZe5fyXLogOBLYKTQxDlzEtgGhtl5NYdtjPcZ2CBlYJ8tlKmVdxwJExmdELsdasWqhv qM2uetXgbjDFYuwbno9On1OOXps2u/oAKqr+pI1XlxsVAEd4ZaO//zbYZxTSjPXZBw6X QSH/gOy+LzSJchKQ3Kyafa3e6dbVqKhqu3+c5tRJdDUwK/bHw849ByND+rlNM646hjlI m9cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729184283; x=1729789083; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8XB3a7uxPWVYCaQb9TO1KjD2csy1sUqq2mVlddinEQA=; b=oj2EIMbfiPXnbPsbzLZeUMsgqWcrdjPjjNNpTRBBsI1l5T8zHpi0yyIdw31mA/xqFr liWz8N+0Zjp+bhJ20Lz670OD5SoeYM/9RzNiaRA6Xpl6D9BPkDrNZrHmBcVl/Ipz4PbR JBCX+4uXGTzVcXwGdOKkMsN0JMcmZOf5I/nOkya6tfZDtBGYZWEHoJzwYjD/d1CYPPn7 BOGZ1TZ3Dsc9C48wuT9VTiq6Nar7KCfTcBf++a0pQNqVl+5bO6QIbTg/BRwOeRN8OcXt IoichZQXap6UQ3HI2FM2jLM3zFsBlCeTTUVLYKrNIuUnA000+SW+owJFk75DezLPZ/g/ UVIQ== X-Forwarded-Encrypted: i=1; AJvYcCW930mbV6CBCgTXLYPmXN4hTFf9givza+40ehpumDvvPi3DkPffDcX4DiGhEJgSBgpzL5CaV5J2dQ==@kvack.org X-Gm-Message-State: AOJu0YwcjGXQbhMCyDKUvblbr3GPD9l/6rHCS1EBsEj4mLSA3Movtnjx RxFomsYjbQsvISBYT+XMf1gc0YmCU/JAVVMYTRn1jtmfRmksR02CX9AdyyOmFXrWZFWlRj05y6q OmRpuU5N74hwtJXwwD8FNNGgKJ+sF7ixtz/wR X-Google-Smtp-Source: AGHT+IF6sxueAeebQ+K/v/kvkzKRFCdnShMJs7BNzRIjwiQCbzUDpO4yO96f/vtMBIZUVpSVOO3kE2IgJtfLm39ShC0= X-Received: by 2002:a05:6512:b22:b0:538:9e44:3034 with SMTP id 2adb3069b0e04-53a0d20566bmr563106e87.6.1729184282455; Thu, 17 Oct 2024 09:58:02 -0700 (PDT) MIME-Version: 1.0 References: <20241016-fix-munmap-abort-v1-1-601c94b2240d@google.com> In-Reply-To: From: Jann Horn Date: Thu, 17 Oct 2024 18:57:24 +0200 Message-ID: Subject: Re: [PATCH fix 6.12] mm: mark mas allocation in vms_abort_munmap_vmas as __GFP_NOFAIL To: Lorenzo Stoakes Cc: Andrew Morton , "Liam R. Howlett" , Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 3CB43120013 X-Stat-Signature: 5mykgf4oqu9hzgy1at5mncxnikkpjunj X-HE-Tag: 1729184271-674129 X-HE-Meta: U2FsdGVkX18ouYEgJdb+Kvxfar4RKXAiu9xIWlqQVvnX28WQPt0A7uxlLkclh7TCc4Dh1u4/BMhZ2HvfvKsUaVLh5sz/wmdbHIHFG+MaLvbSrBQBi1WcrG4UHwJuHbVXZeMZABJ8G8FaeWG0gdT/8OX/WCBir5xjzEQf6XlXFsIX+G88hk0QA+LBjL+WX5elXR+emNK3ac3jUe3FvJavINB/kyAxRbVBcFgaq5/wP7MyPLG5gf9y/Z8mX3ezmDNaISgJZ9j66zX0g9rOYuB8rZZmqtfJy3KzipWdJvCCnwkvJHxRHOlOG4I/I+uPXhTZRP7QSoBtjl429hf+vM/N/vLn6e/Px1Bib+sXgrz+3uLvARRARg0pSMUvUucKie0efh4zQ66C8ZIxdKioPz/bRWxNnIqzZy3wk2IcE/8LdGlSDYDFLlxsftTV1ksmanacFJdn2vhcS2oN108hnE4pqzMEUw7ATS9hBY0eh09LgzxLCT+u/7p+NSIT4Acwd9PqqfpykwFc9a05LcQHFHxNrg7u8uaFD2kyBDFjjUmlhMmAgzW9vsM5mSLsZ72suPQTRluIbN7q74NSXi3ijK2RmFWfRNOGe5lhBq+z6TqPD9gL5qB23uy0B0RjC55Qf+0uPjknQ6nCE6io1By7em46KKc4a5iCeFpc8qX0rG7AQGqqaFSLGw4z/UEZD6wtatmJURrjxDNiDvuouhwnEIpEtVYH1EG5ZxdxgX3/7NMDu3UbXbIpsZXnlCt1crV5iaWqZg4rYNC7sHlRuGr/MphJkKRCpSBp9BLAm5YYsvyt4fJEabk4jzh/DsMXLbtX12ba7TsZjAjunwrLsnD1ABjmUI81mH21zc+aYPgJvvJtuvv/MYAsf5ynBsyUmREXEmOOV8JL3S6Rvg+Adwa1uebIqZdwZ+2Y+5eL+MXfZarGaeDv0zIihGHfMswlJ1eEUg9IrGW4bkYQ6Ih1qncvXi+ xiCX2+49 mrIGVBFRzI85JziP1GlmOnbnnQyXXxz6Jd2A5CM6+EMEDmtPmsR39GsbzqfWX8A8Qv6V0NKCPup2w3r83DFX52ef4Uk7H5aQubV+Vflcwn3d+dK7zOjM//Dloe5C4rccNHwydS5S2Ytz5huVAZI1cgV2w18xLwairFesk02/z9y1sEVw+F7p9PSR6Cyw0DD2w2IWsrd8xM9LIng8x2r+FQ68EOUA1e5wCjOpC92pys3FSKOPaHFpqFGJHa/loGH1hJumxXqQMKv2nq1S2UEpNo8TXaZYnAGg+648u X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Oct 17, 2024 at 11:47=E2=80=AFAM Lorenzo Stoakes wrote: > On Wed, Oct 16, 2024 at 05:07:53PM +0200, Jann Horn wrote: > > vms_abort_munmap_vmas() is a recovery path where, on entry, some VMAs > > have already been torn down halfway (in a way we can't undo) but are > > still present in the maple tree. > > > > At this point, we *must* remove the VMAs from the VMA tree, otherwise > > we get UAF. > > > > Because removing VMA tree nodes can require memory allocation, the > > existing code has an error path which tries to handle this by > > reattaching the VMAs; but that can't be done safely. > > > > A nicer way to fix it would probably be to preallocate enough maple > > tree nodes for the removal before the point of no return, or something > > like that; but for now, fix it the easy and kinda ugly way, by marking > > this allocation __GFP_NOFAIL. > > > > Fixes: 4f87153e82c4 ("mm: change failure of MAP_FIXED to restoring the = gap on failure") > > Signed-off-by: Jann Horn > > I kind of question whether this is real-world achievable (yes I realise y= ou > included a repro, but one prodding /sys/kernel/debug bits :>) but to be > honest at this point I think I feel a lot safer just clearing this here f= or > sure. So: I mean, there is a reason why we have __GFP_NOFAIL, and if you don't set it, my understanding is that you *can* end up failing allocations when the page allocator sees no other way to make progress... I think as a rough sketch, what you'd have to do to hit this issue without cheating using fault injection might be something like this, for simplicity assume all of this happens on the same CPU core: - make processes A, B, C, D; with A having threads A1 and A2 - let process A consume most of the available RAM+swap (so that process A will be killed first by the OOM killer) - let thread A2 enter some syscall that will allocate a lot of order-0 pages without fatal_signal_pending() checks, then block/preempt it somehow - let thread A1 enter an mmap() syscall, then block/preempt it somehow - let process B consume remaining available RAM, until B blocks and the OOM killer decides to reap process A. Note that the OOM killer starts by basically just setting a flag on the target process and sending it a fatal signal; only if the target process doesn't exit for some time after that (OOM_REAPER_DELAY =3D 2 seconds), the OOM killer starts actively reaping the target's memory - let process C allocate as many maple tree nodes as possible (to drain the slab cache's freelists), until C blocks on memory allocation - maybe let process D free one maple tree node or such, so that the first maple node allocation in mmap() for constructing the detached tree works? - let thread A2 continue - it will have access to ALLOC_OOM memory reserves, and AFAIU will be able to completely empty out the memory reserves, and will then hit a __GFP_KERNEL allocation failure - once A2 has hit an allocation failure, let thread A1 continue execution - it, too, should hit a __GFP_KERNEL allocation failure But I haven't actually tested that.