From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DB92C54798 for ; Fri, 1 Mar 2024 03:34:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 93D366B009E; Thu, 29 Feb 2024 22:34:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8EDE96B009F; Thu, 29 Feb 2024 22:34:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B4A06B00A0; Thu, 29 Feb 2024 22:34:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6C62E6B009E for ; Thu, 29 Feb 2024 22:34:12 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 18F994047E for ; Fri, 1 Mar 2024 03:34:12 +0000 (UTC) X-FDA: 81847051944.25.1EB45C7 Received: from bedivere.hansenpartnership.com (bedivere.hansenpartnership.com [96.44.175.130]) by imf03.hostedemail.com (Postfix) with ESMTP id DCED320004 for ; Fri, 1 Mar 2024 03:34:09 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=hansenpartnership.com header.s=20151216 header.b=ukfuxDlI; dkim=pass header.d=hansenpartnership.com header.s=20151216 header.b=ukfuxDlI; dmarc=pass (policy=none) header.from=hansenpartnership.com; spf=pass (imf03.hostedemail.com: domain of James.Bottomley@HansenPartnership.com designates 96.44.175.130 as permitted sender) smtp.mailfrom=James.Bottomley@HansenPartnership.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709264050; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vECPfAOZgg+fslrLpy7OzFf+1tdfpWZXFstboaNa70M=; b=MdW5VdMzBCnMncvq82FqRT0vMJzySH1f+k4HtAoJDilGRxFgRfTjtpjQc89L8ZIv+dGl/V rkJoyYfPKqglo2lGDKR/30uYrqt4wLrnwji6aAfSMBN3AcfuknrgGNZZz/0luwdkbg33o0 O2YFZRP+XR9e7ODW2lEbvxR5arzRPnU= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=hansenpartnership.com header.s=20151216 header.b=ukfuxDlI; dkim=pass header.d=hansenpartnership.com header.s=20151216 header.b=ukfuxDlI; dmarc=pass (policy=none) header.from=hansenpartnership.com; spf=pass (imf03.hostedemail.com: domain of James.Bottomley@HansenPartnership.com designates 96.44.175.130 as permitted sender) smtp.mailfrom=James.Bottomley@HansenPartnership.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709264050; a=rsa-sha256; cv=none; b=1zVTIoSjyGcqQbvhneu0nPw+j7ladaqYbbazxuZy1euFSoS1SQTh+OIfed4xgdGSfOXmh8 7//PtJyUv4QBYPBZhOYfvyMBVrLLKy0PoNw3/tQwAU2kvScna1te83lpv857ingBYzULVL KoKHloSF+mALOyW5tLG/NfrUG7dSFP0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=hansenpartnership.com; s=20151216; t=1709264047; bh=3XwkT15FX1PviarX4+OAQ0l8HZ+Hl6lcikZ6nvaLwAw=; h=Message-ID:Subject:From:To:Date:In-Reply-To:References:From; b=ukfuxDlIlLkSPOlQn+wlblV3hpFMYfCEp5Uy6PxW2JLK04Zs1U8reUlMQW7rD4hOm sOQpD1/Af7knxNp7n7ijEwog9ZfRi9O6Cs26ttT0zES1ArZ/sc1GrgNZOnkWY5QQzk j2ClJELYvO7iULtoIEppTcy1Dp95ZjVpaI2qbDVs= Received: from localhost (localhost [127.0.0.1]) by bedivere.hansenpartnership.com (Postfix) with ESMTP id BD2E11281C36; Thu, 29 Feb 2024 22:34:07 -0500 (EST) Received: from bedivere.hansenpartnership.com ([127.0.0.1]) by localhost (bedivere.hansenpartnership.com [127.0.0.1]) (amavis, port 10024) with ESMTP id P1ZNtldh87bV; Thu, 29 Feb 2024 22:34:07 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=hansenpartnership.com; s=20151216; t=1709264047; bh=3XwkT15FX1PviarX4+OAQ0l8HZ+Hl6lcikZ6nvaLwAw=; h=Message-ID:Subject:From:To:Date:In-Reply-To:References:From; b=ukfuxDlIlLkSPOlQn+wlblV3hpFMYfCEp5Uy6PxW2JLK04Zs1U8reUlMQW7rD4hOm sOQpD1/Af7knxNp7n7ijEwog9ZfRi9O6Cs26ttT0zES1ArZ/sc1GrgNZOnkWY5QQzk j2ClJELYvO7iULtoIEppTcy1Dp95ZjVpaI2qbDVs= Received: from [10.0.15.72] (unknown [49.231.15.39]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (prime256v1) server-signature RSA-PSS (2048 bits) server-digest SHA256) (Client did not present a certificate) by bedivere.hansenpartnership.com (Postfix) with ESMTPSA id F3F8D1281BF1; Thu, 29 Feb 2024 22:34:03 -0500 (EST) Message-ID: Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Reclamation interactions with RCU From: James Bottomley To: Kent Overstreet , Matthew Wilcox Cc: NeilBrown , Amir Goldstein , paulmck@kernel.org, lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, linux-fsdevel , Jan Kara Date: Fri, 01 Mar 2024 10:33:59 +0700 In-Reply-To: References: <170925937840.24797.2167230750547152404@noble.neil.brown.name> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.42.4 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: DCED320004 X-Stat-Signature: qu4jxqbqbwkyjr3ktw3q8iho9bxnhdkj X-HE-Tag: 1709264049-30854 X-HE-Meta: U2FsdGVkX1/ydaGFeIrjj5Wn8lWPUyobFArKem30wdUsQ8EqgCvzAYfZl8+Bx13Tkt1/HwWnRDAb0KvNHGJNXBrd3I0gw9R/DjvuIIv6OeJaZjkdZzGTTRacR1g+BgTDuOarqqQxqXDYkm0vc+qzvGVRzhoTjjBrDll1jLpIGSh50t61b3thbQ4TXT3emSbTydI/jAmjfa1+C0RTRHCdHH4FxHxi8M3WPl31DClffnKPckY0Z/PAvIWVJDTlAI35S+Oh4SGAQfH91X2+6DguIXxmH8Qx+YhsyBawhtzJeoct0xhL1Fu7c/xqgxuZltIoP0zfjLXO/1YyTopIqwA1gq67UU6wgXuuhhnLn87PObIkni6hA0jHcz5zNjQVRaMp4l7iQv+2dV46A8IP/s8G21VZpB/gpFHRBdTr7U+mpB2KzBo6L26kHbVjgC+3lJeRwrM/uNAcMn0ABEvICBM5BkAmWi+23e7jy7GkqG/0VcqAivd9HYoxJh/3E2wGM6swhGL2SpuStsy9L1xYuIYtQE3tVTXQTA1/w6SPJBqJ986EsPhPpX1n5jqUJ4ENbWfWOs6wDfaj56KmsMqj8ighMsGiPsFqOHn+ajW/A7+lGKS5+IwswH1/30X0w1hm3csZTBsOSmQUfEluGqjq80HX9HMA+9vSWNNpaf9rmIHucdHBSSx6Q15e1VQYku3ZtxNc2wlwaYJ9ThfYT4zQzIYYZNksqvvh6z2q0S4+NfBzcKhhHqbJnXCZdyn18QyB+L6LPpjopln48NxEOb51nFJ3id9389NsAkGP0NPBjjb1nn5lyP47AjtYgaVnro6NE3CL9xqwg0177VVwXEBBtgiTa0d3XiKC+vtOVnah6OgUKdw1PeG5Dta90xlhzQk46mgfhiTxszSGyZf1gSVfy/XFD+acKHMBPTtDiM6HmA5boESuWhxMjZ8+dx7e9aLw/MBEhDJq4PfHMSe21PVwKf9 mH1uzuXm of4Yf/zHQx7wOtBDBJdhjucDdIYZSfMkdxBM/FxVslVvEbJGKI0Ilr7t0c03oOVi3ZTeOlwk9pJvHfh4BhZ/MCOCNhHEjkOaEYmLUsn7VyqYrtMB/9QULIMCRJVdS/ox/spxmPlfGjfdX6N9Ny0mBRs527AaHPFNQcHaLfmsU2KT01CY30SyDiGFRM2k4yzvkhVrukrynZqJxYOR1R4Eor+hOYfBNqlSQAHyuyYTB3O/Oaf6x2dUC0Ugi4Oj4gioz2vuOpUEXe2tvqaqpulMBCZztEJWN8W+z3MApyNoIMCacg1N3t2/qCscJilcBWaZOphk4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 2024-02-29 at 22:09 -0500, Kent Overstreet wrote: > On Fri, Mar 01, 2024 at 02:48:52AM +0000, Matthew Wilcox wrote: > > On Thu, Feb 29, 2024 at 09:39:17PM -0500, Kent Overstreet wrote: > > > On Fri, Mar 01, 2024 at 01:16:18PM +1100, NeilBrown wrote: > > > > Insisting that GFP_KERNEL allocations never returned NULL would > > > > allow us to remove a lot of untested error handling code.... > > > > > > If memcg ever gets enabled for all kernel side allocations we > > > might start seeing failures of GFP_KERNEL allocations. > > > > Why would we want that behaviour?  A memcg-limited allocation > > should behave like any other allocation -- block until we've freed > > some other memory in this cgroup, either by swap or killing or ... > > It's not uncommon to have a more efficient way of doing something if > you can allocate more memory, but still have the ability to run in a > more bounded amount of space if you need to; I write code like this > quite often. The cgroup design is to do what we do usually, but within settable hard and soft limits. So if the kernel could make GFP_KERNEL wait without failing, the cgroup would mirror that. > Or maybe you just want the syscall to return an error instead of > blocking for an unbounded amount of time if userspace asks for > something silly. Warn on allocation above a certain size without MAY_FAIL would seem to cover all those cases. If there is a case for requiring instant allocation, you always have GFP_ATOMIC, and, I suppose, we could even do a bounded reclaim allocation where it tries for a certain time then fails. > Honestly, relying on the OOM killer and saying that because that now > we don't have to write and test your error paths is a lazy cop out. OOM Killer is the most extreme outcome. Usually reclaim (hugely simplified) dumps clean cache first and tries the shrinkers then tries to write out dirty cache. Only after that hasn't found anything after a few iterations will the oom killer get activated. > The same kind of thinking got us overcommit, where yes we got an > increase in efficiency, but the cost was that everyone started > assuming and relying on overcommit, so now it's impossible to run > without overcommit enabled except in highly controlled environments. That might be true for your use case, but it certainly isn't true for a cheap hosting cloud using containers: overcommit is where you make your money, so it's absolutely standard operating procedure. I wouldn't call cheap hosting a "highly controlled environment" they're just making a bet they won't get caught out too often. > And that means allocation failure as an effective signal is just > completely busted in userspace. If you want to write code in > userspace that uses as much memory as is available and no more, you > _can't_, because system behaviour goes to shit if you have overcommit > enabled or a bunch of memory gets wasted if overcommit is disabled > because everyone assumes that's just what you do. OK, this seems to be specific to your use case again, because if you look at what the major user space processes like web browsers do, they allocate way over the physical memory available to them for cache and assume the kernel will take care of it. Making failure a signal for being over the working set would cause all these applications to segfault almost immediately. I think what you're asking for is an API to try to calculate what the current available headroom in the working set would be? That's highly heuristic, but the mm people might have an idea how to do it. > Let's _not_ go that route in the kernel. I have pointy sticks to > brandish at people who don't want to deal with properly handling > errors. Error legs are the least exercised and most bug, and therefore exploit, prone pieces of code in C. If we can get rid of them, we should. James