From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A8F0C3DA6E for ; Fri, 5 Jan 2024 10:57:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 702A16B0161; Fri, 5 Jan 2024 05:57:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 68ABA6B0163; Fri, 5 Jan 2024 05:57:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 504A36B0161; Fri, 5 Jan 2024 05:57:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 385CC6B015D for ; Fri, 5 Jan 2024 05:57:41 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 0B5EC1C1476 for ; Fri, 5 Jan 2024 10:57:41 +0000 (UTC) X-FDA: 81644956722.27.7E77FCD Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf10.hostedemail.com (Postfix) with ESMTP id 76C6DC001C for ; Fri, 5 Jan 2024 10:57:38 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=MKKjungj; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=4FVik9ep; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=MKKjungj; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=4FVik9ep; dmarc=none; spf=pass (imf10.hostedemail.com: domain of jack@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704452259; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7vXxhtyeckuqJFNr/JRv524z1JHaWgmFYD2pXQ0IUDA=; b=TOTtcsvMemKkBN17AY5mOc1JWwi+oC/PKAQtXRcYaCmkRE7Pp1Frpy8d3j8CHjYZfzpWpc LXU/XfIWW91vcNcPQQi15q4frDpnHfzCkybl4IpyTFhkPTvxDkw0kodFwClMr2Z3EsP4vk iky0di7wDdaZ+EJcAZTGUa50pTSiCHg= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=MKKjungj; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=4FVik9ep; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=MKKjungj; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=4FVik9ep; dmarc=none; spf=pass (imf10.hostedemail.com: domain of jack@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704452259; a=rsa-sha256; cv=none; b=m0R280O0qWsCmdhe5BBmRuah/oRKDnA9EHO7wvlG+QV5D24nSu843fgG+JUcbpb29lTXw9 DZDJm1mAOFbzDMpUHguKSvDC2XsREgZ6o5sM91274dc4ej3h/R1I4FrspKsc32F2lNqXOT E6/KrTsqfHV6CKzOKGCKkBVAMViohaM= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A8354220B7; Fri, 5 Jan 2024 10:57:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1704452256; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=7vXxhtyeckuqJFNr/JRv524z1JHaWgmFYD2pXQ0IUDA=; b=MKKjungjIvGWCqbk0Vf4SBIt53LzaJID/tvg9uLrkg6ZvBfMm0/3NvfIX4H3zTrxSvMjK8 Mme8raMtSEGYLgrC8FnZJrGRsc72iI4GiAdmYWpBtkOaRZBPXvxaNjohitwW0mRpsIqetd PNI533+jSBdMQ+x69VEGvnZDvktdkIQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1704452256; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=7vXxhtyeckuqJFNr/JRv524z1JHaWgmFYD2pXQ0IUDA=; b=4FVik9epYGd6eXdIFPsDiIvAoOAZNYgVhdckcBQ6otxliwcT7727hvm+6Cy7c2U3FUIAQs bmkQznNaureYqQDw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1704452256; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=7vXxhtyeckuqJFNr/JRv524z1JHaWgmFYD2pXQ0IUDA=; b=MKKjungjIvGWCqbk0Vf4SBIt53LzaJID/tvg9uLrkg6ZvBfMm0/3NvfIX4H3zTrxSvMjK8 Mme8raMtSEGYLgrC8FnZJrGRsc72iI4GiAdmYWpBtkOaRZBPXvxaNjohitwW0mRpsIqetd PNI533+jSBdMQ+x69VEGvnZDvktdkIQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1704452256; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=7vXxhtyeckuqJFNr/JRv524z1JHaWgmFYD2pXQ0IUDA=; b=4FVik9epYGd6eXdIFPsDiIvAoOAZNYgVhdckcBQ6otxliwcT7727hvm+6Cy7c2U3FUIAQs bmkQznNaureYqQDw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 9A4A4136F5; Fri, 5 Jan 2024 10:57:36 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id xfs2JaDgl2XhZwAAD6G6ig (envelope-from ); Fri, 05 Jan 2024 10:57:36 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 49C52A07EF; Fri, 5 Jan 2024 11:57:36 +0100 (CET) Date: Fri, 5 Jan 2024 11:57:36 +0100 From: Jan Kara To: Matthew Wilcox Cc: lsf-pc@lists.linux-foundation.org, linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Removing GFP_NOFS Message-ID: <20240105105736.24jep6q6cd7vsnmz@quack3> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 76C6DC001C X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ih3917hfzedt7t7c31a7nbbsfnior1dd X-HE-Tag: 1704452258-126381 X-HE-Meta: U2FsdGVkX1+r1zsHbLNTxpJIczDPShEKcCQAgu1eMEE/EgJrTswpp5NJdTrenSgcySW6lKklmptlycT1YSkh2nrktXDXwpadv+fKhg9Yyyr/Pgku6tWqpQZKR1u84xbH5ZYCfUIaHzrUFeeyONYPwakse30zvpA4OWedZN8ZyYg3uVMp0XTg++cN5wjE/dVWzFR7fM0TOAANfgE/2bhDraWHBCWan1qP8FY6fYiKfkzOd1X0/14FmFUGnPAEdz0OigL91CcSOqVVNXZhORFawoEylQsRUPv+3KJ1IlHtdgX6/ltz3El+siEaUYbml/N9lHjM/eg3pwIVi6+JXhKI5T85EegR+CF0DHdLuMzDAThxAbWC14CMyzxwc86SqaXCqbWGDVQuKof8ZBgNAVi97Nq4aMunMCdA2Q0pQ3A1+SpgJMAh5KrzNMeLOqKDdX1igwd8qLkqj1waDQo1wK2+9zgJkDOBD0ZCyjssQlKv5x4IQwVv6fD/Gyo462fLAs99tqms81MWBhKcyEbaUeKWXVaHpoB00jANRVXCcnUX4ms3qYmVfFbTOLcAv6VPrwYAHdB5j4Jyu0wAI37eiwDmuIimKR+yxIy/KzpJHgZkROFWTR2FRbzQqOMXorysNeHZV29z+nh8TN/G1zflG3dqxfSvRx9QhY3OucD8mtw2JjdljPElOUUExUMKjKRLaY0hyMG90sO10HIlvPTkC3PRvlXhTx66G0uL/Gd8MGKqY2l7eanVJk2TdRWkaCqik1oPB3PF2r/1ZCn5q96dE4BRIZ5fTacKpvmgzwkMt2ph4E6+iSin3YKjtObKt7C58jAZaDm9MD2yGW8VVyuBwhXc3lUlXhmo+oAnusmxCYL2UXwzyn5jrW+6A/VNMN5ZICZEBiqPDwY3E3PFPcMMWoBow86NpyAk25rzVtDdgOcvrwoaj6/B+kQsXG6XV3exvAJ6A0/jD0GYyDpCVYbLv79 GvTqSmei Ch9+Ns+JXKXKnre4ASE5xlkPXE0Y2hJlMBHayvksELS/uudkb28VlSxY5vqQhUKAn8eLsuYrkMK4HiPE3DsG/q+8OQKGZrvEL9gJIGcxv0S0nWV1ViJ3x64MCbFL8dLe67Q0IgT1ncbeyKKO2rjLN34S1UwMBEIgcwmh5hMG6Qmo/bpPwYVov0gu7Rzh3PZx75t/1+v15qBZkWlxa5qI5ImIdHfbIiQN5iRFrRy5sPDdPO6bL8o++NJ7ajsw8aB7zj6HWCJ3eSytmDMl28B0kfgUv5habfgjjQojgnowz8Ia8YPlNTryANEVIDnmgbO63W4wv1cA8J5NfmVY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello, On Thu 04-01-24 21:17:16, Matthew Wilcox wrote: > This is primarily a _FILESYSTEM_ track topic. All the work has already > been done on the MM side; the FS people need to do their part. It could > be a joint session, but I'm not sure there's much for the MM people > to say. > > There are situations where we need to allocate memory, but cannot call > into the filesystem to free memory. Generally this is because we're > holding a lock or we've started a transaction, and attempting to write > out dirty folios to reclaim memory would result in a deadlock. > > The old way to solve this problem is to specify GFP_NOFS when allocating > memory. This conveys little information about what is being protected > against, and so it is hard to know when it might be safe to remove. > It's also a reflex -- many filesystem authors use GFP_NOFS by default > even when they could use GFP_KERNEL because there's no risk of deadlock. > > The new way is to use the scoped APIs -- memalloc_nofs_save() and > memalloc_nofs_restore(). These should be called when we start a > transaction or take a lock that would cause a GFP_KERNEL allocation to > deadlock. Then just use GFP_KERNEL as normal. The memory allocators > can see the nofs situation is in effect and will not call back into > the filesystem. > > This results in better code within your filesystem as you don't need to > pass around gfp flags as much, and can lead to better performance from > the memory allocators as GFP_NOFS will not be used unnecessarily. > > The memalloc_nofs APIs were introduced in May 2017, but we still have > over 1000 uses of GFP_NOFS in fs/ today (and 200 outside fs/, which is > really sad). This session is for filesystem developers to talk about > what they need to do to fix up their own filesystem, or share stories > about how they made their filesystem better by adopting the new APIs. I agree this is a worthy goal and the scoped API helped us a lot in the ext4/jbd2 land. Still we have some legacy to deal with: ~> git grep "NOFS" fs/jbd2/ | wc -l 15 ~> git grep "NOFS" fs/ext4/ | wc -l 71 When you are asking about what would help filesystems with the conversion I actually have one wish. The most common case is that you need to annotate some lock that can be grabbed in the reclaim path and thus you must avoid GFP_FS allocations from under it. For example to deal with reclaim deadlocks in the writeback paths we had to introduce wrappers like: static inline int ext4_writepages_down_read(struct super_block *sb) { percpu_down_read(&EXT4_SB(sb)->s_writepages_rwsem); return memalloc_nofs_save(); } static inline void ext4_writepages_up_read(struct super_block *sb, int ctx) { memalloc_nofs_restore(ctx); percpu_up_read(&EXT4_SB(sb)->s_writepages_rwsem); } When you have to do it for 5 locks in your filesystem it gets a bit ugly and it would be nice to have some generic way to deal with this. We already have the spin_lock_irqsave() precedent we might follow (and I don't necessarily mean the calling convention which is a bit weird for today's standards)? Even more lovely would be if we could actually avoid passing around the returned reclaim state because sometimes the locks get acquired / released in different functions and passing the state around requires quite some changes and gets ugly. That would mean we'd have to have fs-reclaim-forbidden counter instead of just a flag in task_struct. OTOH then we could just mark the lock (mutex / rwsem / whatever) as fs-reclaim-unsafe during init and the rest would just magically happen. That would be super-easy to use. Honza -- Jan Kara SUSE Labs, CR