From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6383C3600C for ; Thu, 3 Apr 2025 13:43:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A7E4A280004; Thu, 3 Apr 2025 09:43:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A0607280001; Thu, 3 Apr 2025 09:43:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 87F14280004; Thu, 3 Apr 2025 09:43:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 65BB5280001 for ; Thu, 3 Apr 2025 09:43:17 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 13C10C087C for ; Thu, 3 Apr 2025 13:43:17 +0000 (UTC) X-FDA: 83292849234.25.4BDEF49 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf21.hostedemail.com (Postfix) with ESMTP id A7D681C000D for ; Thu, 3 Apr 2025 13:43:14 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="IQWQh8Q/"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=AzDqvaI4; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="k7LzV/L2"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=8gwDFCz8; spf=pass (imf21.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743687795; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=K0HKsQEfUkokQcI1XDwFnvMLnauwvRudsWJ5iSkoq5M=; b=CFEdtA1B7vDEJWfVN6gFO+zymPkd4XLH6IeqqG4dbLmsCbT+xTBcbkwAAKAZtH9IvA9e3g vBvJF1od+r4vXma0OBUHltjDNoo7n6/bhVmr9Y/L36fsEw3zUZgtWPAyPZ3Udv2pOtMx6+ NtfcjCed6DmCCyqDRxgAORDRCP8PjRY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743687795; a=rsa-sha256; cv=none; b=jTfOtSHZ4fKX8A/KHZKeu+W6aGEzGtJwD0EHiOwHohKtUkzOdsNoN44E93XINomzVM1KA8 UidyaYTDSyIK9mYbUlouipog6c5i5Z0VMPmSh6opwA89/r7HwgiNDnnw27Q/MbtG0bv6on BZPpn6Gz3jXyyRtsOApu4nrjPvA+mWg= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="IQWQh8Q/"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=AzDqvaI4; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="k7LzV/L2"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=8gwDFCz8; spf=pass (imf21.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C1FFF1F385; Thu, 3 Apr 2025 13:43:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1743687793; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=K0HKsQEfUkokQcI1XDwFnvMLnauwvRudsWJ5iSkoq5M=; b=IQWQh8Q/XlPMav3P0EvDPUlrJCBZGa5PHagsZMW2/FgQw7lymS6HGYEK9Zy0PpoHI1mK9c ceUFu+tCCyvoUHOzyXuvPO9+3RLoaKeNhoR8qM+i6FuBgUXKhj0AWhK59W1bIfeC4UlaxO wATUECIBuaEQ5evzZ6+BE2FvKlp9lgA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1743687793; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=K0HKsQEfUkokQcI1XDwFnvMLnauwvRudsWJ5iSkoq5M=; b=AzDqvaI4zqHoTG49hrM/h4mJWnBY1Wb27JwZS6qjJjVIR3ww2JXjUuRWmlFfIcR//yMjYk BpjYSfsactfEdECg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1743687792; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=K0HKsQEfUkokQcI1XDwFnvMLnauwvRudsWJ5iSkoq5M=; b=k7LzV/L2Ov5PAXXw40YgZCJNrYfZ/ouuRuwAOFMGbTk0EwFnDZJFutlNiZJvW5M5ne1fX8 pRr19VgRmmpdiOG8fHyjWa+IhRg9CqQysh9EGETkCds5H4AjkVDGTkflOFKFMuX+WIBcih Bh5yLeYw0fclm0/A5/rjx/gxrHTruAE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1743687792; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=K0HKsQEfUkokQcI1XDwFnvMLnauwvRudsWJ5iSkoq5M=; b=8gwDFCz80y1MljbRrDcPwW0WwHcXymNBRIycLjMb3r+Uch1A0HvjuMhy6xbmNwG7N61gYm JI7jtzfL/pTRnIDQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id B157613A2C; Thu, 3 Apr 2025 13:43:12 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id TzpGK3CQ7mdsRwAAD6G6ig (envelope-from ); Thu, 03 Apr 2025 13:43:12 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 48D8BA07E6; Thu, 3 Apr 2025 15:43:12 +0200 (CEST) Date: Thu, 3 Apr 2025 15:43:12 +0200 From: Jan Kara To: Luis Chamberlain Cc: Matthew Wilcox , Jan Kara , brauner@kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, riel@surriel.com, hannes@cmpxchg.org, oliver.sang@intel.com, david@redhat.com, axboe@kernel.dk, hare@suse.de, david@fromorbit.com, djwong@kernel.org, ritesh.list@gmail.com, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com Subject: Re: [PATCH 2/3] fs/buffer: avoid races with folio migrations on __find_get_block_slow() Message-ID: <2jrcw4mtwcophanqmi2y74ffgf247m6ap44u3gedpylsjl3bz6@yueuwkmcwm66> References: <20250330064732.3781046-1-mcgrof@kernel.org> <20250330064732.3781046-3-mcgrof@kernel.org> <20250401214951.kikcrmu5k3q6qmcr@offworld> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: fpo9gx7ijded6pp131fjtizbyp6iwhor X-Rspam-User: X-Rspamd-Queue-Id: A7D681C000D X-Rspamd-Server: rspam08 X-HE-Tag: 1743687794-284988 X-HE-Meta: U2FsdGVkX1/q7s7254kOOnQRDrfp2+quCc5880t7prG1Ps7RcwfS9aRfJFrgy+mi8ZHyUnBqf9CqZugUkL43dW2Q0PBkmXkioCxptQ4Tn5hxeMfYqB2y5b148IBj4HFdn3awor4IBQIYpp+HIn69mdyTSdb1Dp9GSQhpZYdeiMdy4UnMvgkjMNKVGo2MW70f9526AiNsz6vRoeweKIMQoKrAeHv8Kkmwv+CwzI4hGvDz1pM7sZpimUi6kz1LvbQYIBjuLe0GgnP5pNEm51dMNz2eeQuwe3HxWBqveRUKMAdGeLwvYkp1LjV+vLBV/Y65/SHnXJ+FNOW2T5oo3nYhFRmxXJhnR9pKXV0OTOZuU3T12Xrn3SCJylWx5mgBPlLxOZNF/6oUT20xNv91ywECuRLOe7Z7ddj7DvWMj5b6mM6jLSkDbhia/Kc8xO5/k4309mKL7Ed3V6+aCBaNtnFRmEWwDyFKRjZBN9ahJH+pHcRIOYlN3CEKYyW1R3uAmxC61ECKvfpZws9+9mrpLLlQhrgdcct4iyWaQLNOJ52fxWS6ofkfLAk1eHsdrg/bP6UOObrNnWgmMHHcCO7KDLJXy3pssfV5qpY7QeAPUjLuu02XMplAhMV6U04Upjq42lmJlvFOlJ12Mtaz0FTvQv3Bsa0u2CBTqqU07wcLsE45mnXAJC5BIpkiZdtvDQ4kdoywgB5cKgWIiOAJqoEX9e6kURCiQqPbYKyX5ZzkE1tXJTaOLVcXTkbmx8gTzs94d0OrOPhzpLvQ2hhW75JMsHkTulViAZyzNO7dJXsVgSd5DtSdMiPmFbvyWGc33GTPdiuTHsCkVpjHChNuKkfy26+V0w1LwCmSK14FUfyBdyElnHuNsgn5BjhO5Vizmd3d2lfeLSxf9DyBCkVZDQvEtHPNmlK8rChkJhS/yWYFsmVX+UmnmNpqG2CQsRnAdaY1ipQRgj05PCIHt3yuakI4lCg vyIYopgW FHuanI1x4kGedqbtbEy8QBRavDPEGUtM1gzljq9fArrTaipeUiOuwsiQUUcvq2/zjZuy0hp74tlbnCPAFicjGpGrBJooL8RnIwWPzIU3+laka0DPDOiJ5+JRftXQwFVOnmXqPDWOyAA9KN5JSRPF2RAWtYth5ggcEt29Q2nbmNJAAX1ubCuctuMT+mc4FzxtP7LSD7mJ4nU4/oTSM+WjlgDzninQN/1v2IKlHR5i6OXqv/IqlPogZjXSVnj1kej5eO/LwaE9OQ6aWk3mWpzwUfnKxNeue7Imc/ZiCVsGZi9/EzOYJ129X7gRFbpC38+lY4j0Mlilyh7SFdxw1nryfQ1Ffj4ZRiAl/eZvCPG9w/stSySL+mC/3YY2WFeHvsmrFDN6TQ1V0NnrKhVivb6Y5dEPXQFqWtLEpnTeY8VXNAzA4hronPYtWIlUD+E0+9NZ8SZqKrOmIOE6a1+g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed 02-04-25 19:04:23, Luis Chamberlain wrote: > On Wed, Apr 02, 2025 at 02:58:28AM +0100, Matthew Wilcox wrote: > > On Tue, Apr 01, 2025 at 02:49:51PM -0700, Davidlohr Bueso wrote: > > > So the below could be tucked in for norefs only (because this is about the addr > > > space i_private_lock), but this also shortens the hold time; if that matters > > > at all, of course, vs changing the migration semantics. > > > > I like this approach a lot better. One wrinkle is that it doesn't seem > > that we need to set the BH_Migrate bit on every buffer; we could define > > that it's only set on the head BH, right? > > Yes, we are also only doing this for block devices, and for migration > purposes. Even though a bit from one buffer may be desirable it makes > no sense to allow for that in case migration is taking place. So indeed > we have no need to add the flag for all buffers. > > I think the remaining question is what users of __find_get_block_slow() > can really block, and well I've started trying to determine that with > coccinelle [0], its gonna take some more time. > > Perhaps its easier to ask, why would a block device mapping want to > allow __find_get_block_slow() to not block? So I've audited all callers of __find_get_block_slow() (there aren't that many) and most of them are actually fine with sleeping these days. Analysis: __find_get_block_slow() is only used from __find_get_block(). __find_get_block() is used from: write_boundary_block() - locks the buffer so can sleep bdev_getblk() - allocates buffers with 'gfp' mask. We use GFP_NOWAIT mask from some places (generally doing readahead). For callers where !gfpflags_allow_blocking() we should bail rather than block on migration. Callers are currently fine with this, we should probably document that bdev_getblk() with restrictive gfp mask may fail even if bh is present - or perhaps make this even more explicit in the API by providing bdev_try_getblk() and make bdev_getblk() assert gfp mask allows sleeping. __getblk_slow() - only called from bdev_getblk(). Probably should fold there. ocfs2_force_read_journal() - allows sleeping as it does IO jbd2_journal_revoke() - can sleep (has might_sleep() in the beginning) jbd2_journal_cancel_revoke() - only used from do_get_write_access() and do_get_create_access() which do sleep. So can sleep. jbd2_clear_buffer_revoked_flags() - only called from journal commit code which sleeps. So can sleep. The last user is sb_find_get_block() which is used from: hpfs_prefetch_sectors() - prefers bail rather than blocking fat_dir_readahead() - prefers bail rather than blocking exfat_dir_readahead() - prefers bail rather than blocking ext4_free_blocks() - can sleep ext4_getblk() - depending on EXT4_GET_BLOCKS_CACHED_NOWAIT flag either can sleep or must bail (and is fine with it) rather than sleeping fs/ext4/ialloc.c:recently_deleted() - this one is the most problematic place. It must bail rather than sleeping (called under a spinlock) but it depends on the fact that if bh is not returned, then the data has been written out and evicted from memory. Luckily, the usage of recently_deleted() is mostly an optimization to reduce damage in case of crash so rare false failure should be OK. Ted, what is your opinion? And this is actually all. So it seems that if we give possibility to callers to tell whether they want to bail or wait for migration, things should work out fine. Honza -- Jan Kara SUSE Labs, CR