From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0277C27C53 for ; Fri, 7 Jun 2024 15:13:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 28BF26B00AF; Fri, 7 Jun 2024 11:13:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2392C6B00B1; Fri, 7 Jun 2024 11:13:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D9E56B00B2; Fri, 7 Jun 2024 11:13:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E3D1B6B00AF for ; Fri, 7 Jun 2024 11:13:19 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6C5C2A3797 for ; Fri, 7 Jun 2024 15:13:19 +0000 (UTC) X-FDA: 82204436118.15.18451D3 Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) by imf27.hostedemail.com (Postfix) with ESMTP id 80C394001A for ; Fri, 7 Jun 2024 15:13:16 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=vN6q7QdW; spf=pass (imf27.hostedemail.com: domain of jannh@google.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717773196; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=F9DLoOIliT2vV+jCDlYYVBA1v56GjGd/sBFCy6REvhs=; b=OcewM8DNagJsrbbtyUzsb/b5MCmTGGNzaKR+9acN+QaxmF5boLKzzqy/Q276lOI3C9hFK3 NZhX5i9afpujv5GVo2EXExJuKiOCnBYhgBTgeZY682Z/0CSNS6SFE/SZxBG/uBukIChOft x5wbB49syXS7+NIjQeMm3iLXg4zj5Mw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717773196; a=rsa-sha256; cv=none; b=BWGJbcnMUL08gu5CVlRkYNQulHiAHlstojTFGX9TWta37SUliDb7D9VCgYIeciNJFtFZbG bfU7Y3HdtDzxIaniGT3sWNyYJNR9FHLVNioaClfdV2MGx8NzKe6CVSJFzgYoG2rQXn0p2U GBp9h8Dfb6duOvomjLq5a5eAac5GnkU= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=vN6q7QdW; spf=pass (imf27.hostedemail.com: domain of jannh@google.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-5750a8737e5so18227a12.0 for ; Fri, 07 Jun 2024 08:13:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717773195; x=1718377995; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=F9DLoOIliT2vV+jCDlYYVBA1v56GjGd/sBFCy6REvhs=; b=vN6q7QdWaFAZSB+F7X0tp4E6ttCfDY2uuut7ZeTxN2TDkXbvkDTGsfmfE20F0sHCw+ nR8m7ETmcgqFUqm7oEzzC7ENDuy8NfDjCQbc+7OqB4Om1zAotDOHY4MphVJpYKhOuT8Z Qaqm6K3CUiAZpVLJ1AhJgZqqESPTJ5hdDIAFJHsMmfOLl/qGRCiHzh8RKFMAu7kw0Yg5 Y2oBKgVr0PesCvuAyKXXWZi8gaQo6xiPXIcs3lcYkxNWwOO6YthSlYlEI6YgZAH+RN73 t+EVMsg0kDARFxfsTBprie1TX6PNigLK7SFULcyIO91FRlkx3NDFkkiG19VMdcyt1vhE g/5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717773195; x=1718377995; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=F9DLoOIliT2vV+jCDlYYVBA1v56GjGd/sBFCy6REvhs=; b=F4+srIxCtdc7uXrcPmDkkEaR3Iwm+mxhmbx0qTEMkZTF5c2QFTWiyeoUQMzWGKLSsV smDuPm7HaPARnPPhfUb/79d3sPIBlFRiX2B/CSnoTHWudT6WHwOjqmEurVHHrvAUXYpX S68cLKnZXOH+wB1tNHxIc5YV4lE+oIyp1VfSRpG6ACBX76IO6JNKKhKtD+jlvomIUjEX bohrbjaQdLlmKaCx/BOM7sbhL1v2fCzjrSnjBl++I2fFXSHW4w85JWfHEj2GvpiKIqut e9Q80jxh8VNfvBCdnwpNQQA+gcMoxwtJv1ErxcmNKmd7SdBm18FXxfFjnDri6lIY2YX9 2ANw== X-Forwarded-Encrypted: i=1; AJvYcCUtxnS456tE+2tL+qrW7C2RTSYpn3Ux8BzjyIIaE0Ypid8vejPuwSkCUMhAA0064FOJyweYu0lDcNxld7tD59TWPCQ= X-Gm-Message-State: AOJu0Yz0WhXKGKyV7d7Rkz9Rvl9/JU+Ihnu5Z4MrjcOFB8vgJYvVyB5z vg72LM6SudrpKMclYqDu1DY2LDIWOtnpK/8MLYIhJMixsZkXVQmaGeVUOOXrgDVBl0hvxEWWiz3 2vEUEMqu3jETMp73CPlLcPWDXERGHAtJP70rh X-Google-Smtp-Source: AGHT+IGgwajHKpYdDWMQLIOmMJsfkmT9wQMD93gE5Vvy1QzU36KDO2gHuZ/Tqx8lOxhLvhtrGDMlAetFVDjv7C4htvM= X-Received: by 2002:a50:fb96:0:b0:572:988f:2f38 with SMTP id 4fb4d7f45d1cf-57aa6e8e2e9mr588051a12.6.1717773194672; Fri, 07 Jun 2024 08:13:14 -0700 (PDT) MIME-Version: 1.0 References: <20240528122352.2485958-1-Jason@zx2c4.com> <20240528122352.2485958-2-Jason@zx2c4.com> In-Reply-To: From: Jann Horn Date: Fri, 7 Jun 2024 17:12:38 +0200 Message-ID: Subject: Re: [PATCH v16 1/5] mm: add VM_DROPPABLE for designating always lazily freeable mappings To: "Jason A. Donenfeld" Cc: linux-kernel@vger.kernel.org, patches@lists.linux.dev, tglx@linutronix.de, linux-crypto@vger.kernel.org, linux-api@vger.kernel.org, x86@kernel.org, Greg Kroah-Hartman , Adhemerval Zanella Netto , "Carlos O'Donell" , Florian Weimer , Arnd Bergmann , Christian Brauner , David Hildenbrand , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 80C394001A X-Rspam-User: X-Stat-Signature: sj8x5at49dokq339j7kimpaf6twg1coh X-HE-Tag: 1717773196-251339 X-HE-Meta: U2FsdGVkX1+nhwD/9LiVuLWfnXQssUWhCRWv2cLTj6ftiARTZ/rzXXT6SEsNOBOfPiW383HrusF+pV7PnTG8vumXgDTuVmIywVkWmFDGD7VqinmHJ4T9r/mfi92SB+vaWC4Wfot7R9S6crV6uXnn+kf545jt8PkBxmP8xnQ7lX2Ht2Vo1aHtLAw4uB6Cos7n09F88KW6OU/ytD5mFIUO2tp0htOpOBTnA7NEkz7eTJP+9rVNY6m7FlVRThXvg4YjXo9xe8WeuO6rOdBA/OK8bQ063phDyAqFDsEiP4SNYLDbGTcN3RKzpFwU1qNwwDNksH0bueyx7GOGbTc0PoTAX8pENQCrDZ1366RFnKM21xLla5/AsLhmPydfy3cPx3cMJLNMXCm90Ayy3Y+QymGgS5C2d0KYjrJC6Uq52sbtJl1Qej0CUgTSXsB/ynX8ULClThf/y9LYDLOSE+JfOccBJ9/qxcKvDGAfCeNJ8eUmnI4sC+6r173Rz8e8wZLV8S+7Woe5hG3lESxS20rVhSQxPB3yf8iLKQK4jnOTUhahl5ihHkS16aauyIHI2VgesmgmJ1Gj/cFAqJy6xrhd85+R0cIJNvoJQh6PZ7XbZHpEfKgSo/SvrP/s45aTv6dsR3iDZgAh046RUrCnAul65bICC4ZzSrbMbbI252LCy3ha+9/Kii5+Q3EFg/fhZBfMyrlNHLJq5fy9TNA2by8j7ZU1le2pdz+MuemF+xkdIXINyM/az9TJtnX+I0MQYWpuiTAsyIN5MpOvpu+Jf2RAdIlt5HffL15DlEAc60vVE/wqo2b/NPRb2MbEd6Eg7xxNcHAGAnQcdCxKu6C1z0to9AyhquTHSTQa9PBtMTeElG7+2cOno1Kv+BiaqHa9K8Pcc1KAG1Svu8eGqlgW6ZSL+MgH0KOuN9WWs9ET5fX64nQZj6EtCca7CRJhgU76e1tgbad8YEpFBh/4/TEt0/FaLqD MF/HV4C3 LISN4EUAUgpvRxG+UBWUBorWe4LKvdE11roCAbjFVF+FmMvEQ6aoCQZ5EbiN6NadKt1gKbRSTXyoTqzZ6txifHX/wZ5X7xfwOA0P7ZkjCY+61QOnzPaCVFJjAYmjoYl/bwUHntXfntQWDp904/rjoFMbwgt6daxCvGNAqnz/kTx0E3Iu+6MA8y5M6JfR27k8P2abhF2G1+0LBqtWXFglq5pcWHy6lmGKijsdwynAhVGi+2p1loNWr4EEDbX57nYQNmaARjB5s6xRC04nkRcLBRH/C+BoWkouajRqPToVjAj0Q98HKtmnYfV51urO3TxSSzarwyGbb6t0CbBIsQcgRqsOBCQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jun 7, 2024 at 4:35=E2=80=AFPM Jason A. Donenfeld = wrote: > On Fri, May 31, 2024 at 03:00:26PM +0200, Jann Horn wrote: > > On Fri, May 31, 2024 at 2:13=E2=80=AFPM Jason A. Donenfeld wrote: > > > On Fri, May 31, 2024 at 12:48:58PM +0200, Jann Horn wrote: > > > > On Tue, May 28, 2024 at 2:24=E2=80=AFPM Jason A. Donenfeld wrote: > > > > > c) If there's not enough memory to service a page fault, it's not= fatal. > > > > [...] > > > > > @@ -5689,6 +5689,10 @@ vm_fault_t handle_mm_fault(struct vm_area_= struct *vma, unsigned long address, > > > > > > > > > > lru_gen_exit_fault(); > > > > > > > > > > + /* If the mapping is droppable, then errors due to OOM ar= en't fatal. */ > > > > > + if (vma->vm_flags & VM_DROPPABLE) > > > > > + ret &=3D ~VM_FAULT_OOM; > > > > > > > > Can you remind me how this is supposed to work? If we get an OOM > > > > error, and the error is not fatal, does that mean we'll just keep > > > > hitting the same fault handler over and over again (until we happen= to > > > > have memory available again I guess)? > > > > > > Right, it'll just keep retrying. I agree this isn't great, which is w= hy > > > in the 2023 patchset, I had additional code to simply skip the faulti= ng > > > instruction, and then the userspace code would notice the inconsisten= cy > > > and fallback to the syscall. This worked pretty well. But it meant > > > decoding the instruction and in general skipping instructions is weir= d, > > > and that made this patchset very very contentious. Since the skipping > > > behavior isn't actually required by the /security goals/ of this, I > > > figured I'd just drop that. And maybe we can all revisit it together > > > sometime down the line. But for now I'm hoping for something a little > > > easier to swallow. > > > > In that case, since we need to be able to populate this memory to make > > forward progress, would it make sense to remove the parts of the patch > > that treat the allocation as if it was allowed to silently fail (the > > "__GFP_NOWARN | __GFP_NORETRY" and the "ret &=3D ~VM_FAULT_OOM")? I > > think that would also simplify this a bit by making this type of > > memory a little less special. > > The whole point, though, is that it needs to not fail or warn. It's > memory that can be dropped/zeroed at any moment, and the code is > deliberately robust to that. Sure - but does it have to be more robust than accessing a newly allocated piece of memory [which hasn't been populated with anonymous pages yet] or bringing a swapped-out page back from swap? I'm not an expert on OOM handling, but my understanding is that the kernel tries _really_ hard to avoid failing low-order GFP_KERNEL allocations, with the help of the OOM killer. My understanding is that those allocations basically can't fail with a NULL return unless the process has already been killed or it is in a memcg_kmem cgroup that contains only processes that have been marked as exempt from OOM killing. (Or if you're using error injection to explicitly tell the kernel to fail the allocation.) My understanding is that normal outcomes of an out-of-memory situation are things like the OOM killer killing processes (including potentially the calling one) to free up memory, or the OOM killer panic()ing the whole system as a last resort; but getting a NULL return from page_alloc(GFP_KERNEL) without getting killed is not one of those outcomes.