From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05D87C5478C for ; Fri, 1 Mar 2024 04:01:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 558046B0072; Thu, 29 Feb 2024 23:01:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E0EB6B0083; Thu, 29 Feb 2024 23:01:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 35A2F6B0075; Thu, 29 Feb 2024 23:01:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 1CCB06B0083 for ; Thu, 29 Feb 2024 23:01:14 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D4CD916046A for ; Fri, 1 Mar 2024 04:01:13 +0000 (UTC) X-FDA: 81847120026.10.37BE6D7 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) by imf06.hostedemail.com (Postfix) with ESMTP id E3122180015 for ; Fri, 1 Mar 2024 04:01:11 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Fcm2Hn5R; spf=pass (imf06.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.177 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709265672; a=rsa-sha256; cv=none; b=oLo1J8yr0AZC3+bUKAk8MpSUZGN5ugdcALz5jcxhfeKR2ys94qKevPIQTwYOmPoZNbBNir enqAZG099G7R1l11pO83KxHKGrXNtnK0pt24KZ5WjQrmJBKeit2SU6XRIY3xynFTik55ei rP0j88J62W0OubA/SXx4QckuYBJPaiE= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Fcm2Hn5R; spf=pass (imf06.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.177 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709265672; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=k/OMy+M7U7z/nSpxCbhcdQOKxdHvlADZGMDs2NJ20Ws=; b=Cu9ot5CMZqhzZP5hn72j6VVyNmnJwy5bj/KMK/JBAvsk0rHWjav3XYXwJAuUw7Ld1UF8mU rcg/CQV5+5Iaapgb7ywnhSMAhpd6NnFo8ece0CvkuDPuD9a7YoZaECcdunVzJCRi80qLEh SBAanA7bsdutWw3cRHKVKQBa3xchzJI= Date: Thu, 29 Feb 2024 23:01:01 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1709265669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=k/OMy+M7U7z/nSpxCbhcdQOKxdHvlADZGMDs2NJ20Ws=; b=Fcm2Hn5RtSn4HKVJ5bJNp7rP3WM+rED7SBu4ZC4Ll7yyViDs2EXWAeD7rde7AH+Ga0UtjI rlY0KjhQfp1FXGx2PV3cKGXrmB2mzZ3lnuG2JtHq+74z0OZBKU9a0cscZjlffbD0hEi8KV CSyqa5voYOFD15bjMunChDRXfdmJTyg= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: James Bottomley Cc: Matthew Wilcox , NeilBrown , Amir Goldstein , paulmck@kernel.org, lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, linux-fsdevel , Jan Kara Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Reclamation interactions with RCU Message-ID: References: <170925937840.24797.2167230750547152404@noble.neil.brown.name> <3bykct7dzcduugy6kvp7n32sao4yavgbj2oui2rpidinst2zmn@e5qti5lkq25t> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3bykct7dzcduugy6kvp7n32sao4yavgbj2oui2rpidinst2zmn@e5qti5lkq25t> X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: E3122180015 X-Stat-Signature: 8etw6k5irxkj4o41g9yqgqz3du4q1yht X-Rspam-User: X-HE-Tag: 1709265671-944500 X-HE-Meta: U2FsdGVkX19JfUxxtGalZY7fxHi4yUMDun8cll6SYjVLyGPK5eKesTnjGZkiyowoD+GlA9Aqn/KkVevIElW1/8nAE0aT1NUtwsmhW3mxNzuXElUpjG3JVhIrer3UBKEi7FgmC66lQ2YA4hjE++0tVYZrd/svbnGTalCLFzxVJJ66Wbgb78pdgvtFXJbDfmEicc/z3CfM9KCyIeV2E6QlYsZGlShMpuPfFUuP6KuqEwQ00XTowDN7FOrUPe8/05r67qgwQfJwrRYrRbM3lrvriDiW0PYPI9MqmI4zyFLsKUwAnaxbHbXJ+lls5RPlWVfyXbxoxxBZ1JP2TJa9BfZ2L65BqGfiLngWWKAUk3a1quQH00m4q4yfZfgdhB8W7DNY7vUFD1csjvnVATdNkU36eQDPCXkyDfn3qkMZc90M3HsmRUKuVFfclLV/zxqrG57TrxewvkJvbohzuac2nMCylnIy6nAHxemRllU0d6dniLQdpDkd+yTJo1o7hBmbuYfU4ACxdur4eTIPgfCy7CPjfEFbnbREPyYPou0E7N9g8dB7zSIegQB3S9fB69EMtB7eLeab82y67rf8rBYBie8t7b2FT4LCqStgHtvpTuamUhHggjR/qp0w2Jzleo88wbJOhPT9XtI08g2fj+jlE75Edx22ECEVruHF5Npb0IiylHfPxyhBG2VgqfCL/zcXYDM+9r2FZLxCf2cEI6iYQmewt+lVwgojlgISoiHzp/U/5dyAfhDZ3dN6s/wnIdKDnG5Okf7KppzrbAVp6yzE7kTeVk4OQlUj2fbaNlm5t+E4d5bb4xDxS8wMFWoju3oz2nDv+ejIt0Qy00rx1FWuiA9XAsd+feXND8GvSc8rttFGcDrHhPnBgmqZLkBsslKMdkWjgWVDif1ajdgkNa/om+gwkoH/O5emgbl9JtlfHZ9x4ee5nBObTZ0xz9x7HYX5S4qn1tshi6vtghMIfXcGgdx AjxhI7BW ADd7573VLJLGu1wIFBfXvL9kozgKdRw9F+50I45Qk5Oe5Z04pqNHIZLqt4PIaKWDibyyacFkN46y6DfoCey3+QvfWAA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000004, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 29, 2024 at 10:52:06PM -0500, Kent Overstreet wrote: > On Fri, Mar 01, 2024 at 10:33:59AM +0700, James Bottomley wrote: > > On Thu, 2024-02-29 at 22:09 -0500, Kent Overstreet wrote: > > > Or maybe you just want the syscall to return an error instead of > > > blocking for an unbounded amount of time if userspace asks for > > > something silly. > > > > Warn on allocation above a certain size without MAY_FAIL would seem to > > cover all those cases. If there is a case for requiring instant > > allocation, you always have GFP_ATOMIC, and, I suppose, we could even > > do a bounded reclaim allocation where it tries for a certain time then > > fails. > > Then you're baking in this weird constant into all your algorithms that > doesn't scale as machine memory sizes and working set sizes increase. > > > > Honestly, relying on the OOM killer and saying that because that now > > > we don't have to write and test your error paths is a lazy cop out. > > > > OOM Killer is the most extreme outcome. Usually reclaim (hugely > > simplified) dumps clean cache first and tries the shrinkers then tries > > to write out dirty cache. Only after that hasn't found anything after > > a few iterations will the oom killer get activated > > All your caches dumped and the machine grinds to a halt and then a > random process gets killed instead of simply _failing the allocation_. > > > > The same kind of thinking got us overcommit, where yes we got an > > > increase in efficiency, but the cost was that everyone started > > > assuming and relying on overcommit, so now it's impossible to run > > > without overcommit enabled except in highly controlled environments. > > > > That might be true for your use case, but it certainly isn't true for a > > cheap hosting cloud using containers: overcommit is where you make your > > money, so it's absolutely standard operating procedure. I wouldn't > > call cheap hosting a "highly controlled environment" they're just > > making a bet they won't get caught out too often. > > Reading comprehension fail. Reread what I wrote. > > > > And that means allocation failure as an effective signal is just > > > completely busted in userspace. If you want to write code in > > > userspace that uses as much memory as is available and no more, you > > > _can't_, because system behaviour goes to shit if you have overcommit > > > enabled or a bunch of memory gets wasted if overcommit is disabled > > > because everyone assumes that's just what you do. > > > > OK, this seems to be specific to your use case again, because if you > > look at what the major user space processes like web browsers do, they > > allocate way over the physical memory available to them for cache and > > assume the kernel will take care of it. Making failure a signal for > > being over the working set would cause all these applications to > > segfault almost immediately. > > Again, reread what I wrote. You're restating what I wrote and completely > missing the point. > > > > Let's _not_ go that route in the kernel. I have pointy sticks to > > > brandish at people who don't want to deal with properly handling > > > errors. > > > > Error legs are the least exercised and most bug, and therefore exploit, > > prone pieces of code in C. If we can get rid of them, we should. > > Fuck no. > > Having working error paths is _basic_, and learning how to test your > code is also basic. If you can't be bothered to do that you shouldn't be > writing kernel code. > > We are giving far too much by going down the route of "oh, just kill > stuff if we screwed the pooch and overcommitted". > > I don't fucking care if it's what the big cloud providers want because > it's convenient for them, some of us actually do care about reliability. > > By just saying "oh, the OO killer will save us" what you're doing is > making it nearly impossible to fully utilize a machine without having > stuff randomly killed. > > Fuck. That. And besides all that, as a practical matter you can't just "not have erro paths" because, like you said, you'd still have to have a max size where you WARN() - and _fail the allocation_ - and you've still got to unwind. The OOM killer can't kill processes while they're stuck blocking on an allocation that will rever return in the kernel. I think we can safely nip this idea in the bud. Test your damn error paths...