From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C708C5478C for ; Fri, 1 Mar 2024 04:09:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A8E416B0092; Thu, 29 Feb 2024 23:09:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A3D946B0093; Thu, 29 Feb 2024 23:09:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8DEDD6B0095; Thu, 29 Feb 2024 23:09:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7EF8E6B0092 for ; Thu, 29 Feb 2024 23:09:21 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CC10240364 for ; Fri, 1 Mar 2024 04:09:20 +0000 (UTC) X-FDA: 81847140480.13.CBA8C90 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf05.hostedemail.com (Postfix) with ESMTP id AF59C100013 for ; Fri, 1 Mar 2024 04:09:18 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=m+qSHgwI; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=07MwUXlV; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=KFTHe4Hz; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=fEBSCoZn; spf=pass (imf05.hostedemail.com: domain of neilb@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709266159; a=rsa-sha256; cv=none; b=ZK4fyfmQgf+YtU0xzQSlTiuDOihDEAyidPS9wSskak+gN5i2kKu9g/4CdTpmj+am+cFCzJ Hdn55z2BDMU/zAFj7M/ccpa7U6+fHpRZXMm2miogVGSCQ+ms8Bl4/9vyqhijP2ojSYB7MT 4kxRZxzi7jhTIyIlefGF1bZfxfPKXwM= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=m+qSHgwI; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=07MwUXlV; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=KFTHe4Hz; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=fEBSCoZn; spf=pass (imf05.hostedemail.com: domain of neilb@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709266159; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5WDjlbz9MG16LGcrG91DgMR+ALdQG1Bsj7M2alM87yg=; b=DLdB30r6sEyN4XkbS1QQjBwOmuBbzQMK2WaCSGV5u6XN7yRNgP/QK62MBE8o1nqLrM+3no PLnY1sVL3kFCNQhUP0JmztBJ2AJ6ylR+0kXOJR7Y/wIKttHPQPGqmmhcgk5gbPjPpHFlVW Qr8VVF4bPv4JdHZMBNUe6ZQEiHO/cUQ= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id D0AC621F07; Fri, 1 Mar 2024 04:09:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1709266157; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5WDjlbz9MG16LGcrG91DgMR+ALdQG1Bsj7M2alM87yg=; b=m+qSHgwIxBaP+0u6YkKNJH2oV4wrKmW1Sd2vYt8a6aVkO3FQSTRojqOd3bCVifGfs89I+n TeUcmAKrF97Rz8enSX8JLPuu9qyxwVtxp8FMhRlq2ZTNK/XDCqwZRYRkNhxrM6AheGfUKH 1FruQGgFe0IVKxDViPd9cwIaAW7lYOE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1709266157; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5WDjlbz9MG16LGcrG91DgMR+ALdQG1Bsj7M2alM87yg=; b=07MwUXlVDTZ+dr9Ds51voU+28i23/ot9OHWK1wh8utSchlNMeBgB1TiTAO/bWa2vVnMQo8 eitU+lmbLwrAPnDA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1709266156; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5WDjlbz9MG16LGcrG91DgMR+ALdQG1Bsj7M2alM87yg=; b=KFTHe4HzEXaWhlQ5o4Hi4agBrVutdsVh3sgewbYUUFm5OQaEbeOLb6Odp3J47NNQIMu9rh YGHvlzTtxkyr+yP0QkR1S7APoHMMiXkriAiGc74gJrSGNpeQcjeYnM8HyLCdYBRsZA0py+ IWXq9iPymH4QOgCeSIU63vVb8xwhRLQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1709266156; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5WDjlbz9MG16LGcrG91DgMR+ALdQG1Bsj7M2alM87yg=; b=fEBSCoZnzOzW6jEeVakiB1p4GLgzzwZkG8DkvOGvQNdTKOOzwqz+y+jDkrBTFM/AD9B+U9 XuHqkhQy/7X1ZfDg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 2025613AB0; Fri, 1 Mar 2024 04:09:12 +0000 (UTC) Received: from dovecot-director2.suse.de ([10.150.64.162]) by imap1.dmz-prg2.suse.org with ESMTPSA id 5iltLehU4WW/WAAAD6G6ig (envelope-from ); Fri, 01 Mar 2024 04:09:12 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit MIME-Version: 1.0 From: "NeilBrown" To: "Kent Overstreet" Cc: "James Bottomley" , "Matthew Wilcox" , "Amir Goldstein" , paulmck@kernel.org, lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, "linux-fsdevel" , "Jan Kara" Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Reclamation interactions with RCU In-reply-to: References: , , , <170925937840.24797.2167230750547152404@noble.neil.brown.name>, , , , , <3bykct7dzcduugy6kvp7n32sao4yavgbj2oui2rpidinst2zmn@e5qti5lkq25t>, Date: Fri, 01 Mar 2024 15:09:09 +1100 Message-id: <170926614942.24797.13632376785557689080@noble.neil.brown.name> X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: AF59C100013 X-Stat-Signature: 8eu97i7siz37w7wkbka4pii5ogtutrii X-Rspam-User: X-HE-Tag: 1709266158-794831 X-HE-Meta: U2FsdGVkX1++Srz+IHrSWNkvE9HGhQYs6uyvYp82K9XSPVJap6fm61eiPJf6jnDBzIhOKdtORs2HWhygfL4a7IPGYWnsy3wN4VOLYHcv1W9NLi5A7PEF/wxkveQtDhNKyI6bDjdMyVibsqeJrz1/bv2UMjv8Oonq849M1FY2uQ2fubKV1RgWVv82jRa+THi3zcezW2hpOQdSVGN80qMaiEkn5xtSiGd3IHDAcNIpcWQo3wrVz2sz6YA60DGdJWxmGcenEbtTzySOoy8zLt1xi3K8gO3P84uHL0jsfkklHQGsidbjhQUgF/IAUPf62roq2zPv6fan/4obiEWOD1HsLVOHHVVO312NySNwP1dIM+hs8pyJKEY8hCvdpyP58SvwQbqmoMNF54ICeGsmPGTPFDdNHz9868/4ifJMuTjpcYwnxqJkAou0vwY6NGw625cY4xg6PYhLc/A7kdd3PFqYxF4zUJDGEjmkmSTr+cZuBXInTJgaaRXgPxV+Ko1W+UYxrCZT8A91cXJfhkw0jeevoBO41CsIpbN9LAYaVDfsoiqA8pQ+hXG+SG4KPPP5+0sN2WY9YypRP0xRomns0rSjKYQUOa2W76db/hnXF4y9Pu9XOuozsgn6rRhbA3Xvr+qu9fM+lFgB/0Zbo9lxMU0qeRcW9yQMgL4sQrr1mdHvBPaMkW0SQ7aCqX6+lrM4bff1fBUiABkXqm9h2W9zGvOz6EbxNB2mzenFPTSBBYtY9EzheiZtcaKUNH6/Qaxojmu85qk9ubY9ZnBBSKHDhzdpxzxjtHMax+VacUbuBVtYdNNx9LIvZdRANmZirjxmvi/RywVTz41ioNSumr8fVaWfbYVHGD+l8P4T4LuTt+/U/f3aDTNhy5BMCA2BK8CRGOqh X-Bogosity: Ham, tests=bogofilter, spamicity=0.002718, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 01 Mar 2024, Kent Overstreet wrote: > On Thu, Feb 29, 2024 at 10:52:06PM -0500, Kent Overstreet wrote: > > On Fri, Mar 01, 2024 at 10:33:59AM +0700, James Bottomley wrote: > > > On Thu, 2024-02-29 at 22:09 -0500, Kent Overstreet wrote: > > > > Or maybe you just want the syscall to return an error instead of > > > > blocking for an unbounded amount of time if userspace asks for > > > > something silly. > > > > > > Warn on allocation above a certain size without MAY_FAIL would seem to > > > cover all those cases. If there is a case for requiring instant > > > allocation, you always have GFP_ATOMIC, and, I suppose, we could even > > > do a bounded reclaim allocation where it tries for a certain time then > > > fails. > > > > Then you're baking in this weird constant into all your algorithms that > > doesn't scale as machine memory sizes and working set sizes increase. > > > > > > Honestly, relying on the OOM killer and saying that because that now > > > > we don't have to write and test your error paths is a lazy cop out. > > > > > > OOM Killer is the most extreme outcome. Usually reclaim (hugely > > > simplified) dumps clean cache first and tries the shrinkers then tries > > > to write out dirty cache. Only after that hasn't found anything after > > > a few iterations will the oom killer get activated > > > > All your caches dumped and the machine grinds to a halt and then a > > random process gets killed instead of simply _failing the allocation_. > > > > > > The same kind of thinking got us overcommit, where yes we got an > > > > increase in efficiency, but the cost was that everyone started > > > > assuming and relying on overcommit, so now it's impossible to run > > > > without overcommit enabled except in highly controlled environments. > > > > > > That might be true for your use case, but it certainly isn't true for a > > > cheap hosting cloud using containers: overcommit is where you make your > > > money, so it's absolutely standard operating procedure. I wouldn't > > > call cheap hosting a "highly controlled environment" they're just > > > making a bet they won't get caught out too often. > > > > Reading comprehension fail. Reread what I wrote. > > > > > > And that means allocation failure as an effective signal is just > > > > completely busted in userspace. If you want to write code in > > > > userspace that uses as much memory as is available and no more, you > > > > _can't_, because system behaviour goes to shit if you have overcommit > > > > enabled or a bunch of memory gets wasted if overcommit is disabled > > > > because everyone assumes that's just what you do. > > > > > > OK, this seems to be specific to your use case again, because if you > > > look at what the major user space processes like web browsers do, they > > > allocate way over the physical memory available to them for cache and > > > assume the kernel will take care of it. Making failure a signal for > > > being over the working set would cause all these applications to > > > segfault almost immediately. > > > > Again, reread what I wrote. You're restating what I wrote and completely > > missing the point. > > > > > > Let's _not_ go that route in the kernel. I have pointy sticks to > > > > brandish at people who don't want to deal with properly handling > > > > errors. > > > > > > Error legs are the least exercised and most bug, and therefore exploit, > > > prone pieces of code in C. If we can get rid of them, we should. > > > > Fuck no. > > > > Having working error paths is _basic_, and learning how to test your > > code is also basic. If you can't be bothered to do that you shouldn't be > > writing kernel code. > > > > We are giving far too much by going down the route of "oh, just kill > > stuff if we screwed the pooch and overcommitted". > > > > I don't fucking care if it's what the big cloud providers want because > > it's convenient for them, some of us actually do care about reliability. > > > > By just saying "oh, the OO killer will save us" what you're doing is > > making it nearly impossible to fully utilize a machine without having > > stuff randomly killed. > > > > And besides all that, as a practical matter you can't just "not have > erro paths" because, like you said, you'd still have to have a max size > where you WARN() - and _fail the allocation_ - and you've still got to > unwind. No. You warn and DON'T fail the allocation. Just like lockdep warns of possible deadlocks but lets you continue. These will be found in development (mostly) and changed to use __GFP_RETRY_MAYFAIL and have appropriate error-handling paths. > > The OOM killer can't kill processes while they're stuck blocking on an > allocation that will rever return in the kernel. But it can depopulate the user address space (I think). NeilBrown > > I think we can safely nip this idea in the bud. > > Test your damn error paths... >