From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59AC5C27C53 for ; Fri, 7 Jun 2024 15:51:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CB5746B009E; Fri, 7 Jun 2024 11:51:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C65216B009F; Fri, 7 Jun 2024 11:51:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B05826B00A1; Fri, 7 Jun 2024 11:51:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8F9C96B009E for ; Fri, 7 Jun 2024 11:51:17 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0578C41A50 for ; Fri, 7 Jun 2024 15:51:17 +0000 (UTC) X-FDA: 82204531794.30.BFE57D8 Received: from mail-ed1-f41.google.com (mail-ed1-f41.google.com [209.85.208.41]) by imf28.hostedemail.com (Postfix) with ESMTP id 2A71BC000B for ; Fri, 7 Jun 2024 15:51:13 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=j6+0jdOi; spf=pass (imf28.hostedemail.com: domain of jannh@google.com designates 209.85.208.41 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717775474; a=rsa-sha256; cv=none; b=1AS/gEX6Cu6/nBRIAKbLeEcNAw2AaFbrq+aIAnC4pyDM5fRd6SXGM9Zl4tStP+xq1/a43A vjmAoYdYkm4l7QlrQfJ4ggu70e80UL5UTIeFWARpQuPtGeBWRdv6JxlhZo2CujCg1V+JP9 CJ94a1Hgoi1khQkn8Lk50sYBEyN8xCw= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=j6+0jdOi; spf=pass (imf28.hostedemail.com: domain of jannh@google.com designates 209.85.208.41 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717775474; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KPvYVfHAHLIq4Iro9sltEbUBk0XQCsjazy7jWcwOLxU=; b=fkqimX2upMPtRwqIQaToDr5DyDaVpFlG1BjOT4imgsrD3ocWeKjWBBd5bEpype6ctO6kZY 933+wGBsiPHb0aIqoQGnqt4f/UHWsgZJPd4wMflID6elw4MFzIR1MeWqajhNbWKHK8j5wb h8IPFEP4pmm4g3CzdcV66XAUZDkS6i0= Received: by mail-ed1-f41.google.com with SMTP id 4fb4d7f45d1cf-5750a8737e5so18942a12.0 for ; Fri, 07 Jun 2024 08:51:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717775472; x=1718380272; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=KPvYVfHAHLIq4Iro9sltEbUBk0XQCsjazy7jWcwOLxU=; b=j6+0jdOidycocIBE+C/tPuv7wrbSGGKyREzMBmk7AkyG9j2F2r6vm5kQgH6FRdQPrU 0skjtZuFHkQZb0ATaz3ySQYsHG71zV8sYj+MhdiUJWp0HglpH4Or8BzyCZVUY7nBSC45 qE3J3Lzo2jXmhcbY0+NhYOyMEX72ewcA+ail8f7P0W4DNefyj43zxkH7ulXF0vnFAtZ5 VFulzKIfuZj3I4jcMpcpOmUgDRVV3WRv8H/UhcapT7A97sBNQOdce+frdbJRzEvtxrZ1 atnUdW6VOGvkupJ2oXspf6TQBZv0RwuGjyaO2pbecEI9K8E67eL0nusUG5y3NmlfKHiI qVNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717775472; x=1718380272; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KPvYVfHAHLIq4Iro9sltEbUBk0XQCsjazy7jWcwOLxU=; b=Z/FmXzAGHboFxTcgLolgrTtwg9rF7OnjF3UbrU7T8ZDdEoIGfRgN0fWn1yFinCZrHV pJo1uP/hUxRHju40mRJLCl1vrT9i9DSFC2AWO6CpnX08wDeyT8YJktBM8smGfHAbtWdc 2YuAEXihTJgHJWd5JOi3JjPzFG2WY3I964oNq01rNBw89gNZNRx0iUfOJfMgMDtB1Bal gRr+FKyxkF7gNv5txavt4VuQ9IwyxnO6XGvB/WIk/Hdr1xg9xr8GUZS2Alr8jqY3a8oC VwmE7lDB1p4vnQJdK4vkeP0SpbVv0A6f5TokIvNqIB5w5WByu3EAw3wZbixARXzXQj0G X1rg== X-Forwarded-Encrypted: i=1; AJvYcCV6hKGrlUTkNOXhMBnmC3CeliKxhlcXE3hlifsdrEXN5QXYCrExYU0iWdRqYMyw39k4YqoPcKppX2491kSG9Uv0Dvs= X-Gm-Message-State: AOJu0YzGc3BkoMPgQEohEpB/D98oqWXLo4mD6mJf+jmiaWmLC1qUu1nG NZIQ1jTegGkG3sz4R8nQUBiy1MsHl8HXoQ93XUfnfi/AocUWWI/NPATYZwfH/lxla5x3upyYXx7 O8ZPkrZirxG2Ym45/NNH01mB+5p/LuYS5zDH1 X-Google-Smtp-Source: AGHT+IFeaOng9QTrsWteV8BPFpGm8tSsyktUl7ikHk3iad/5dvNpRpMcHTMxQVUYRtOOO2V2JmvFuTBVrMMVB25rokM= X-Received: by 2002:aa7:c0c5:0:b0:57c:5ffb:9917 with SMTP id 4fb4d7f45d1cf-57c5ffb99b5mr135769a12.4.1717775472242; Fri, 07 Jun 2024 08:51:12 -0700 (PDT) MIME-Version: 1.0 References: <20240528122352.2485958-1-Jason@zx2c4.com> <20240528122352.2485958-2-Jason@zx2c4.com> In-Reply-To: From: Jann Horn Date: Fri, 7 Jun 2024 17:50:34 +0200 Message-ID: Subject: Re: [PATCH v16 1/5] mm: add VM_DROPPABLE for designating always lazily freeable mappings To: "Jason A. Donenfeld" Cc: linux-kernel@vger.kernel.org, patches@lists.linux.dev, tglx@linutronix.de, linux-crypto@vger.kernel.org, linux-api@vger.kernel.org, x86@kernel.org, Greg Kroah-Hartman , Adhemerval Zanella Netto , "Carlos O'Donell" , Florian Weimer , Arnd Bergmann , Christian Brauner , David Hildenbrand , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 2A71BC000B X-Stat-Signature: t3qnt886xzppunh49wdmhkw3y34on6a3 X-HE-Tag: 1717775473-987879 X-HE-Meta: U2FsdGVkX1/3OzBH3Drdm2fERZMtu+VLw+2ER5W22xUSJMRrNfLTYHO6IrLW/ncUUeSupHWH7AL+luSLGuTo8uLOsZ/S8LmP0h0oLEQA9a2PHmSFiJhD4wse6ChBRoVsyD/Q3IKhlz/F/4IvjbSCs/zsPSB8UX2xyTICT5LdVaOO6ow3sZBw/TjQnLISZowtfFUW+CBB3JTJ+hdVW7NZqsFZogZn46OtvDqpBfniOP6ml4DZ+5P82NuRNhEivWcH2GjcT5uII7Sa+XjHI7e/92P4DyvZIb2TzMrJzRWgtrwJQdwKMn6M7jCtlEWVlDANiZtQ9jw98yzVrZ3a3i8zwsbHIrUOkPhvonXy3TPs2cK6fuYF6ekSXd5DQnslpZQ0Zx7AWVrzdymQMazP0GBxG+SavcGp6ZNtTxSlgXnYUZZYeVbomnNOFwypUlaxjwHUd8Ao5xKjVJ14If4cjiwGNm5aGpx9P1J0z7Ch1qJsYDYBNXnb8tTmf3jR8KCaT8sVXy3yq3VVY5iiYaMezGebsscsgvqqYV3zB5tsRfxaVm2LZNxxzIq+W4Vl/+3JaHOz7Ownn+T8W9a1cR3XY5cd+rWrQFg2JXp0gFJLdidSK5idyDyoAKHYcK8Y46COuhd1yX2AB+9iW8eSxPD2pu08hxgXyKXeAZAPlfkokKW68nNEgC/OsI8LWrjN8CnIrJrrmORweosuzGjkuZOQCvp4Ffqssv6dyVKiIkhD8wy6jMmeTQhrJ2cs8oWyejYFNP/XhPk2qjL78HNDiiXZAbGwd24pJbMAhi9+dDX5iy2MmuNDHvewlG95xMXf7RmmcXSb7eReTIBriQZJuNehslb7JljZapXDYc4BNimqZiFc8vXVLKAP8Z7ucM0tLkEoNtMcPlHloDhu66F8vEiG2NFCs896SH68++bybolAfoUVOjJkT3vvJ0fryQKW0xoKEyE/R8fzow9CljMJ1xCqWBb 6w/pPhvA /alRv+YalqcauQ/KsCF9faEJY403yEBjtiKqw8kba5uOTtAVgYe+E6bNBKX2eaxVuyp1oBGvYFYtlOmrv2vH9fwgVjZ1WtPL4sVaz+WMNZ9JuUJuhxiuHu2trQbJqq359V+b6e+L2K8vkqdDIuc5oa+V9AJ18hWaIwvaXm+58kQfJMqyJfDoP3VCBw/mWZCctasvhM01biinDHCSIdDDFr85p79Samcy/Cp0TcXcYsH8xI+00tG0+b0v3XKxGWAxrFFF1cb8cZFiFQR+3g2q72z2l9JVaDVCochHFb2+DNWdUenUMVVj3XhSwm900VCJF1mKWXssVI8voypcfOs10WTzAON9K56qq218oHhRncze2ZTE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jun 7, 2024 at 5:12=E2=80=AFPM Jann Horn wrote: > On Fri, Jun 7, 2024 at 4:35=E2=80=AFPM Jason A. Donenfeld wrote: > > On Fri, May 31, 2024 at 03:00:26PM +0200, Jann Horn wrote: > > > On Fri, May 31, 2024 at 2:13=E2=80=AFPM Jason A. Donenfeld wrote: > > > > On Fri, May 31, 2024 at 12:48:58PM +0200, Jann Horn wrote: > > > > > On Tue, May 28, 2024 at 2:24=E2=80=AFPM Jason A. Donenfeld wrote: > > > > > > c) If there's not enough memory to service a page fault, it's n= ot fatal. > > > > > [...] > > > > > > @@ -5689,6 +5689,10 @@ vm_fault_t handle_mm_fault(struct vm_are= a_struct *vma, unsigned long address, > > > > > > > > > > > > lru_gen_exit_fault(); > > > > > > > > > > > > + /* If the mapping is droppable, then errors due to OOM = aren't fatal. */ > > > > > > + if (vma->vm_flags & VM_DROPPABLE) > > > > > > + ret &=3D ~VM_FAULT_OOM; > > > > > > > > > > Can you remind me how this is supposed to work? If we get an OOM > > > > > error, and the error is not fatal, does that mean we'll just keep > > > > > hitting the same fault handler over and over again (until we happ= en to > > > > > have memory available again I guess)? > > > > > > > > Right, it'll just keep retrying. I agree this isn't great, which is= why > > > > in the 2023 patchset, I had additional code to simply skip the faul= ting > > > > instruction, and then the userspace code would notice the inconsist= ency > > > > and fallback to the syscall. This worked pretty well. But it meant > > > > decoding the instruction and in general skipping instructions is we= ird, > > > > and that made this patchset very very contentious. Since the skippi= ng > > > > behavior isn't actually required by the /security goals/ of this, I > > > > figured I'd just drop that. And maybe we can all revisit it togethe= r > > > > sometime down the line. But for now I'm hoping for something a litt= le > > > > easier to swallow. > > > > > > In that case, since we need to be able to populate this memory to mak= e > > > forward progress, would it make sense to remove the parts of the patc= h > > > that treat the allocation as if it was allowed to silently fail (the > > > "__GFP_NOWARN | __GFP_NORETRY" and the "ret &=3D ~VM_FAULT_OOM")? I > > > think that would also simplify this a bit by making this type of > > > memory a little less special. > > > > The whole point, though, is that it needs to not fail or warn. It's > > memory that can be dropped/zeroed at any moment, and the code is > > deliberately robust to that. > > Sure - but does it have to be more robust than accessing a newly > allocated piece of memory [which hasn't been populated with anonymous > pages yet] or bringing a swapped-out page back from swap? > > I'm not an expert on OOM handling, but my understanding is that the > kernel tries _really_ hard to avoid failing low-order GFP_KERNEL > allocations, with the help of the OOM killer. My understanding is that > those allocations basically can't fail with a NULL return unless the > process has already been killed or it is in a memcg_kmem cgroup that > contains only processes that have been marked as exempt from OOM > killing. (Or if you're using error injection to explicitly tell the > kernel to fail the allocation.) > My understanding is that normal outcomes of an out-of-memory situation > are things like the OOM killer killing processes (including > potentially the calling one) to free up memory, or the OOM killer > panic()ing the whole system as a last resort; but getting a NULL > return from page_alloc(GFP_KERNEL) without getting killed is not one > of those outcomes. Or, from a different angle: You're trying to allocate memory, and you can't make forward progress until that memory has been allocated (unless the process is killed). That's what GFP_KERNEL is for. Stuff like "__GFP_NOWARN | __GFP_NORETRY" is for when you have a backup plan that lets you make progress (perhaps in a slightly less efficient way, or by dropping some incoming data, or something like that), and it hints to the page allocator that it doesn't have to try hard to reclaim memory if it can't find free memory quickly.