From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BB5CC48BF6 for ; Sun, 3 Mar 2024 22:46:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C58876B0098; Sun, 3 Mar 2024 17:46:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BE0A46B009C; Sun, 3 Mar 2024 17:46:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A81086B009D; Sun, 3 Mar 2024 17:46:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 93C086B0098 for ; Sun, 3 Mar 2024 17:46:00 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 2BB49808FC for ; Sun, 3 Mar 2024 22:46:00 +0000 (UTC) X-FDA: 81857212080.11.27FD7F0 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf11.hostedemail.com (Postfix) with ESMTP id 227C240006 for ; Sun, 3 Mar 2024 22:45:57 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=ePRFfgZw; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=V4xIMvt5; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=x3sXYsH8; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=6hPNgHEP; spf=pass (imf11.hostedemail.com: domain of neilb@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709505958; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hY8gqozPNICnDT7ZJFqvwxo0ZIvUuWJmYwRNFcx2kHc=; b=AoBRrnEu/+LViS2FUdsTtov1IacaHUwubAWsOuLTA6iHirxNDNFCEsqUx83N2MYPwwOAX+ ZxXi4+UoAKa3RAW6i4ABMO24RuNDhLvOHpMW8o788T/fAZ4JuNbjXKB/bwdW8dTmsxEsYe 8DpxB7mbTbLD4EpT4R5OlD2F09TzFWQ= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=ePRFfgZw; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=V4xIMvt5; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=x3sXYsH8; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=6hPNgHEP; spf=pass (imf11.hostedemail.com: domain of neilb@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709505958; a=rsa-sha256; cv=none; b=AMQhO6Pi1R25fOLId5wCGEg4435TgUrv8QPYjdP6LvfJAJXkZwdWcwKEvStXtKo9DRmFrr 6xyM0ja004hANV2Ofpj1NsszUr4JlIEoBgQ4Ld98HQNiHnelBhpAgGCFod2hNbNNlYuSnw pYRCgyObsEE7eShCRP/jA6xdj7BM6x8= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id E99B5674D9; Sun, 3 Mar 2024 22:45:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1709505956; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hY8gqozPNICnDT7ZJFqvwxo0ZIvUuWJmYwRNFcx2kHc=; b=ePRFfgZwlYlxlo1hCf7sE/BCXE6Ms4fyhvRKM/bilahK2B5bg9FHv+DScdS9GcROAwBt6J YzZXOrOBV7MsH7pV0fxCrV8rW3983t+GBoDSNS/jMdU1FUjC1pXXT/OjQvsIfuiDK7lCrs EZG9aaKnNmlmpEoJM+hecjHA5KrqyUw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1709505956; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hY8gqozPNICnDT7ZJFqvwxo0ZIvUuWJmYwRNFcx2kHc=; b=V4xIMvt5bIqFh0iUjPqdwfa0xvxTlLxLR7//CfEQ8FQd0gIvnyaOgbcjRv5l2N7ISHY2N3 7Wt/Nf9to2uBLICQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1709505955; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hY8gqozPNICnDT7ZJFqvwxo0ZIvUuWJmYwRNFcx2kHc=; b=x3sXYsH8U5HpI+E8FjUOlpsAHkbNtOZY34Oksdpxwknzy1r7JKLqvgtn+3+75aHOtGmj+s 4EDw2SpDhR62pxLRfvO/sbGIFBaEwPjw+BA9mD+hSC2WOwzs+T9URmEBELqCXe3l5UTFt5 2BWuaZ2bDVcg+6L889eaAEl+CSD7qQU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1709505955; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hY8gqozPNICnDT7ZJFqvwxo0ZIvUuWJmYwRNFcx2kHc=; b=6hPNgHEPfs6p+HXcXzwp/3Ba96qCGnCt2K5uqhzgb8aJUSvQb5+oyraGVYUhU5QZPYGZEG RQvlpvfkjvlNYpDg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 33CF91379D; Sun, 3 Mar 2024 22:45:51 +0000 (UTC) Received: from dovecot-director2.suse.de ([10.150.64.162]) by imap1.dmz-prg2.suse.org with ESMTPSA id HFEFMZ/95GWFZQAAD6G6ig (envelope-from ); Sun, 03 Mar 2024 22:45:51 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 From: "NeilBrown" To: "Kent Overstreet" Cc: "Dave Chinner" , "Matthew Wilcox" , "Amir Goldstein" , paulmck@kernel.org, lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, "linux-fsdevel" , "Jan Kara" Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Reclamation interactions with RCU In-reply-to: References: , , , <170925937840.24797.2167230750547152404@noble.neil.brown.name>, , , <170933687972.24797.18406852925615624495@noble.neil.brown.name>, Date: Mon, 04 Mar 2024 09:45:48 +1100 Message-id: <170950594802.24797.17587526251920021411@noble.neil.brown.name> X-Rspamd-Queue-Id: 227C240006 X-Rspam-User: X-Stat-Signature: k6r4xa33tux5yf7o5uyd1ckhejkxjfij X-Rspamd-Server: rspam01 X-HE-Tag: 1709505957-961670 X-HE-Meta: U2FsdGVkX18m236Ku9OXfnrUQWZOdFx/sWkPHtecUx7XZYV/lhwnfoCHQJbv9hsZtPMzQVpiRqxEXZvPCNPunU/HRpSojXmtLtppt139/pNU8alPtDOQgoHeQgUwi3Qmld14n4XNX3/mSCQRlW6O34YpMLHhfBuEdH8GTiXPI1GxyJHm6CDhQiMkustuYF55QLOjrSh3EJ6tWRbG99XkUnSHY/eAMRFY9f6D3kLyymySQf1YKXgSTSxWAdfgQZEOeTZWeIQIwoyEsjXK6MgaHUQNCuLW5+/DIfwtQFtKL9SelHQrP8iFg9JSfUUHEgYoT80NzzHiQx7j140KhoNprPx4nFy7Ywr/xOCcay0XYjh4i4FmgApPjBft0v6gFkfBVLW74gB3ZuqpJB7KmV1Ff8LOXBdiiCJTW2L3FJVZNPXjfstbFpw3q4hcuhZbafIDGvlOTYc9oW9dH6EdT60DUjFQ9p73m4mR/8afE1MxZIff2hrE+MKvOYHEIoZNgN62MCyqkFQToJgmoH88PLQEs1eJmPTvHSiGgcUst4r6RkzpyB6afLqCqIE9neDLCmD7tCgnHLPueJVwWJvGrwB1plZAr2IypWIHDN64wki1GVvZuyrqVRT0vjfNCR/ZvKCsqCk489YtbKJnFg5a9PZmmKkFzcMSYCPVfFk8Lvl1gS5h5xp8HEFRN57V/w7STEl6T9gezdUE/dCsArVaUqgzivL3OKxWXBB6Y3ambRTZ1yq6daCT5Gj1oOm0bAtPMDMYhpLHDPuaTH3t5BFndIcmia5b7EP+KgJj2LW23ClAiflV6/q5fSHzaHM0zo+YlaH+10qpZF6y4Jg4OC5Vt74RFsblh+U4F0bxz1uVgXiG1I5PSagBopHeYk11PgKp6eND X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, 02 Mar 2024, Kent Overstreet wrote: > On Sat, Mar 02, 2024 at 10:47:59AM +1100, NeilBrown wrote: > > On Sat, 02 Mar 2024, Kent Overstreet wrote: > > > On Fri, Mar 01, 2024 at 04:54:55PM +1100, Dave Chinner wrote: > > > > On Fri, Mar 01, 2024 at 01:16:18PM +1100, NeilBrown wrote: > > > > > While we are considering revising mm rules, I would really like to > > > > > revised the rule that GFP_KERNEL allocations are allowed to fail. > > > > > I'm not at all sure that they ever do (except for large allocations= - so > > > > > maybe we could leave that exception in - or warn if large allocatio= ns > > > > > are tried without a MAY_FAIL flag). > > > > >=20 > > > > > Given that GFP_KERNEL can wait, and that the mm can kill off proces= ses > > > > > and clear cache to free memory, there should be no case where failu= re is > > > > > needed or when simply waiting will eventually result in success. A= nd if > > > > > there is, the machine is a gonner anyway. > > > >=20 > > > > Yes, please! > > > >=20 > > > > XFS was designed and implemented on an OS that gave this exact > > > > guarantee for kernel allocations back in the early 1990s. Memory > > > > allocation simply blocked until it succeeded unless the caller > > > > indicated they could handle failure. That's what __GFP_NOFAIL does > > > > and XFS is still heavily dependent on this behaviour. > > >=20 > > > I'm not saying we should get rid of __GFP_NOFAIL - actually, I'd say > > > let's remove the underscores and get rid of the silly two page limit. > > > GFP_NOFAIL|GFP_KERNEL is perfectly safe for larger allocations, as long > > > as you don't mind possibly waiting a bit. > > >=20 > > > But it can't be the default because, like I mentioned to Neal, there are > > > a _lot_ of different places where we allocate memory in the kernel, and > > > they have to be able to fail instead of shoving everything else out of > > > memory. > > >=20 > > > > This is the sort of thing I was thinking of in the "remove > > > > GFP_NOFS" discussion thread when I said this to Kent: > > > >=20 > > > > "We need to start designing our code in a way that doesn't require > > > > extensive testing to validate it as correct. If the only way to > > > > validate new code is correct is via stochastic coverage via error > > > > injection, then that is a clear sign we've made poor design choices > > > > along the way." > > > >=20 > > > > https://lore.kernel.org/linux-fsdevel/ZcqWh3OyMGjEsdPz@dread.disaster= .area/ > > > >=20 > > > > If memory allocation doesn't fail by default, then we can remove the > > > > vast majority of allocation error handling from the kernel. Make the > > > > common case just work - remove the need for all that code to handle > > > > failures that is hard to exercise reliably and so are rarely tested. > > > >=20 > > > > A simple change to make long standing behaviour an actual policy we > > > > can rely on means we can remove both code and test matrix overhead - > > > > it's a win-win IMO. > > >=20 > > > We definitely don't want to make GFP_NOIO/GFP_NOFS allocations nofail by > > > default - a great many of those allocations have mempools in front of > > > them to avoid deadlocks, and if you do that you've made the mempools > > > useless. > > >=20 > >=20 > > Not strictly true. mempool_alloc() adds __GFP_NORETRY so the allocation > > will certainly fail if that is appropriate. >=20 > *nod*=20 >=20 > > I suspect that most places where there is a non-error fallback already > > use NORETRY or RETRY_MAYFAIL or similar. >=20 > NORETRY and RETRY_MAYFAIL actually weren't on my radar, and I don't see > _tons_ of uses for either of them - more for NORETRY. >=20 > My go-to is NOWAIT in this scenario though; my common pattern is "try > nonblocking with locks held, then drop locks and retry GFP_KERNEL". > =20 > > But I agree that changing the meaning of GFP_KERNEL has a potential to > > cause problems. I support promoting "GFP_NOFAIL" which should work at > > least up to PAGE_ALLOC_COSTLY_ORDER (8 pages). >=20 > I'd support this change. >=20 > > I'm unsure how it should be have in PF_MEMALLOC_NOFS and > > PF_MEMALLOC_NOIO context. I suspect Dave would tell me it should work in > > these contexts, in which case I'm sure it should. > >=20 > > Maybe we could then deprecate GFP_KERNEL. >=20 > What do you have in mind? I have in mind a more explicit statement of how much waiting is acceptable. GFP_NOFAIL - wait indefinitely GFP_KILLABLE - wait indefinitely unless fatal signal is pending. GFP_RETRY - may retry but deadlock, though unlikely, is possible. So don't wait indefinitely. May abort more quickly if fatal signal is pending. GFP_NO_RETRY - only try things once. This may sleep, but will give up fairly quickly. Either deadlock is a significant possibility, or alternate strategy is fairly cheap. GFP_ATOMIC - don't sleep - same as current. I don't see how "GFP_KERNEL" fits into that spectrum. The definition of "this will try really hard, but might fail and we can't really tell you what circumstances it might fail in" isn't fun to work with. Thanks, NeilBrown >=20 > Deprecating GFP_NOFS and GFP_NOIO would be wonderful - those should > really just be PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO, now that we're > pushing for memalloc_flags_(save|restore) more. >=20 > Getting rid of those would be a really nice cleanup beacuse then gfp > flags would mostly just be: > - the type of memory to allocate (highmem, zeroed, etc.) > - how hard to try (don't block at all, block some, block forever) >=20