From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06E59C02181 for ; Mon, 20 Jan 2025 22:42:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 757A76B0082; Mon, 20 Jan 2025 17:42:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 707166B0083; Mon, 20 Jan 2025 17:42:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5822B6B0085; Mon, 20 Jan 2025 17:42:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3AA216B0082 for ; Mon, 20 Jan 2025 17:42:09 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A3ACF140467 for ; Mon, 20 Jan 2025 22:42:08 +0000 (UTC) X-FDA: 83029304736.24.9B09F12 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf06.hostedemail.com (Postfix) with ESMTP id A179E180014 for ; Mon, 20 Jan 2025 22:42:04 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=N0VGD85g; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=3hz21+qu; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=BdpQ280H; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=imH1oLk6; dmarc=none; spf=pass (imf06.hostedemail.com: domain of jack@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737412926; a=rsa-sha256; cv=none; b=cuo7ao9OFUPUoV70MTyQsSH/T9iVFauoO1AiLPvqXPnZ3fsRkRxbXlLZsIZ3Q3qKVX+37z sumsrC+R1jIdMBhvNcsMDWoc143gQIdKl55YBAs9CZ16uM5Co7xP4m/K7UBvEaanneU7i3 ayrWbBhs/uf8r+0S1iygC/OP/WmP77U= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=N0VGD85g; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=3hz21+qu; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=BdpQ280H; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=imH1oLk6; dmarc=none; spf=pass (imf06.hostedemail.com: domain of jack@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737412926; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oYKJuPglaND0aCGEia4dXjUVlLraQdXkAEBvOtlme0Y=; b=3Nz4770KOeytO6GqApDmjwV3HWBX3hsJfw2GTnjErULR/6+E6uvj+fWQOleDv40OenrUZE 4ucEDXQG29IkITaghbBvlq4KW4T7xCe7L7ra1ewKDB+P37gvStNgA9IhUE1Tp2JhUmIGwG cMcnHnLgv+0rAdSek4RW/VkI2EH4wF0= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A1FA121158; Mon, 20 Jan 2025 22:42:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1737412922; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oYKJuPglaND0aCGEia4dXjUVlLraQdXkAEBvOtlme0Y=; b=N0VGD85gkGgJYJtQNQyz4/WR84FielIDHmVkI9NP6NmKvvRMj4VYsyzneD7iOjLLaoPjEX iX+qbtMRd/Kk+WsUx+hcRB7U8VqWuj9Fq/gX2BWcgmmCgc1RjgWsuLT9cpXGhmY9N74dVU 5pRtOWPfqb2Rjom2P+vAnv/hYxKZ9Hk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1737412922; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oYKJuPglaND0aCGEia4dXjUVlLraQdXkAEBvOtlme0Y=; b=3hz21+quumZ1UFBr3a90Q9soys9ELgyIUEM6FqiMosoDLKzBgA4MkuvKmBZnu/mYBBwWAA tybKsghZ6Lr581BQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1737412921; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oYKJuPglaND0aCGEia4dXjUVlLraQdXkAEBvOtlme0Y=; b=BdpQ280HW04QNax/e8tI3+4ctKBsTEne8h7DC5ZJdSssxiMeXTMH8NFdwf/OUXuruZcN9y YTSsBbTDAJwE7pjc54z97TeWGrQ1fQju2INeFhmzvzBuuxk9q7V2oQLF1ukV27CMKKvnuq txPEzvilYfRsgY0UDCoA1vrmnq/9zKA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1737412921; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oYKJuPglaND0aCGEia4dXjUVlLraQdXkAEBvOtlme0Y=; b=imH1oLk6AWpk3rvWNR4gEz2lJSgV4RzdwwaptW7baGYl4udCt80p0JkqvIrjUcgUPIycXF zuHwlE5f2UOojFCw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 86CEE139CB; Mon, 20 Jan 2025 22:42:01 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id EtCaIDnRjmffcAAAD6G6ig (envelope-from ); Mon, 20 Jan 2025 22:42:01 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 1CCE7A081E; Mon, 20 Jan 2025 23:42:01 +0100 (CET) Date: Mon, 20 Jan 2025 23:42:01 +0100 From: Jan Kara To: Joanne Koong Cc: Jan Kara , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, "Matthew Wilcox (Oracle)" Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Improving large folio writeback performance Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Action: no action X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: A179E180014 X-Stat-Signature: uddfw88efpaeqamjmczzejoydgq79cdb X-HE-Tag: 1737412924-965593 X-HE-Meta: U2FsdGVkX1/nwqHC7VXxqzFOJ9YT9QR8uSSOVyufLDZTuLoXrZcsZ9L11gbMnpFcn/gAIa2o8HctPoOHS+aE4fPAop2yb57YrcCj4uwBu8KGCY0FukKMR0cGOY81+tZtH6rbYA34rwEQXelh1YRE5opINqYCu6FgEKfhGY60MuHG5/p5K8T/BkKGBOTSfiLzsXabwrGjN8Cy/Lo+683O51epe/jT2WGJabWFDHAyIK7lIowOcR1Rwo+gIqefNm+ON6tTQp16867WHJlQcRnYkLSw2J+7KTYFZiV9jhTg4CqhzgZXrwkGM6MgWQ1SiZad8eEdCMiNo3JulUnviFi2VwKamUWYht+IGaax42IlXJ0SIhqgd5byyFpt6OfwXYzrtq0npxfdKcDZpAt04Ga6huvWRrhs/j5tUCHDD/SoZuWpo7WshlouLvn/ty/7gMP5ITJMXey0jTh56htH8vibqNI2pK8u5jDYQFwOz8LelJryldCrJr6Rs2euXMUZJ1sFd2XwN5x9BwqZut2GIGWQ3SG/eH0O1UUwY2ekVGifUcF58J3Hz1KGHCSqvElDXMkw0JdaHS4mewyX6QhiMW0Os95k3YKzYziOUVgosj2WJW4TfFgbpnhGYrAHCD0Lqtr9cyuJb9Ta5VbVlH8OMdcFz/LMePxHFAu3BKjtZ+jp+gNeUYdtNpagOfCIurK9hglWwVoewRcunKd/0qr2Y3vfqhJ1CImMdhQMxZyrZxA7kSRKvrr5JcDBVO9QYJMACCjjxSQyQdaBmUBdG9Cj59Kyluli1Ujs+SAAxLuPbhNqhX+nhz1QvD6Dm5OUNg3OJD9arwqynSv4HuZidKlZeawrab5UcGsh1ndW98s2crotIdG9OAgcz95H22Ug+wPk8nXaNqLvt/XrF/vtwWyvVEQOSFGRUiYmaOWUxObhb1yKxs5YH9pxAUNTY7aRfcD+EY/TycCMxIjQbvdEqGOfQGl pim2A3ln I6vgSHSAzc7/C1vjx1CBURZLc58dDt88Gdo1Ww00LPqNhEOo9zCmDj12OhaTqktf3c9hQIsRVEcnLeR7FSxnsoYVFs/BRT+Q+HD/Ws7uaJUNu7oLxvyRVkuL5H75kGw4AMIQOwbUDhdivf7hJ/b7VU9OofYwTkS58czXAeNClXls9QMDyC7nr0TuP+I0DB3YQKi38YPlXRch0HV7fG1afhDx2bCbm58c4xlogAF3gEHmZKrmQZBUzDkAuq+4jbmYyyTxc+ujAtpuR3JvVnNaS4JATJjZ9W3rKB9znF0t+K9g+TiZImxur63e+tXlkyZ0a8OyjAknRWDVkmc5YMJw1WEFW3strhznGvVqznzLe9EGFkQnZRfVjYttQPrsoxYiou+Hf X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri 17-01-25 14:45:01, Joanne Koong wrote: > On Fri, Jan 17, 2025 at 3:53 AM Jan Kara wrote: > > On Thu 16-01-25 15:38:54, Joanne Koong wrote: > > I think tweaking min_pause is a wrong way to do this. I think that is just a > > symptom. Can you run something like: > > > > while true; do > > cat /sys/kernel/debug/bdi//stats > > echo "---------" > > sleep 1 > > done >bdi-debug.txt > > > > while you are writing to the FUSE filesystem and share the output file? > > That should tell us a bit more about what's happening inside the writeback > > throttling. Also do you somehow configure min/max_ratio for the FUSE bdi? > > You can check in /sys/block//bdi/{min,max}_ratio . I suspect the > > problem is that the BDI dirty limit does not ramp up properly when we > > increase dirtied pages in large chunks. > > This is the debug info I see for FUSE large folio writes where bs=1M > and size=1G: > > > BdiWriteback: 0 kB > BdiReclaimable: 0 kB > BdiDirtyThresh: 896 kB > DirtyThresh: 359824 kB > BackgroundThresh: 179692 kB > BdiDirtied: 1071104 kB > BdiWritten: 4096 kB > BdiWriteBandwidth: 0 kBps > b_dirty: 0 > b_io: 0 > b_more_io: 0 > b_dirty_time: 0 > bdi_list: 1 > state: 1 > --------- > BdiWriteback: 0 kB > BdiReclaimable: 0 kB > BdiDirtyThresh: 3596 kB > DirtyThresh: 359824 kB > BackgroundThresh: 179692 kB > BdiDirtied: 1290240 kB > BdiWritten: 4992 kB > BdiWriteBandwidth: 0 kBps > b_dirty: 0 > b_io: 0 > b_more_io: 0 > b_dirty_time: 0 > bdi_list: 1 > state: 1 > --------- > BdiWriteback: 0 kB > BdiReclaimable: 0 kB > BdiDirtyThresh: 3596 kB > DirtyThresh: 359824 kB > BackgroundThresh: 179692 kB > BdiDirtied: 1517568 kB > BdiWritten: 5824 kB > BdiWriteBandwidth: 25692 kBps > b_dirty: 0 > b_io: 1 > b_more_io: 0 > b_dirty_time: 0 > bdi_list: 1 > state: 7 > --------- > BdiWriteback: 0 kB > BdiReclaimable: 0 kB > BdiDirtyThresh: 3596 kB > DirtyThresh: 359824 kB > BackgroundThresh: 179692 kB > BdiDirtied: 1747968 kB > BdiWritten: 6720 kB > BdiWriteBandwidth: 0 kBps > b_dirty: 0 > b_io: 0 > b_more_io: 0 > b_dirty_time: 0 > bdi_list: 1 > state: 1 > --------- > BdiWriteback: 0 kB > BdiReclaimable: 0 kB > BdiDirtyThresh: 896 kB > DirtyThresh: 359824 kB > BackgroundThresh: 179692 kB > BdiDirtied: 1949696 kB > BdiWritten: 7552 kB > BdiWriteBandwidth: 0 kBps > b_dirty: 0 > b_io: 0 > b_more_io: 0 > b_dirty_time: 0 > bdi_list: 1 > state: 1 > --------- > BdiWriteback: 0 kB > BdiReclaimable: 0 kB > BdiDirtyThresh: 3612 kB > DirtyThresh: 361300 kB > BackgroundThresh: 180428 kB > BdiDirtied: 2097152 kB > BdiWritten: 8128 kB > BdiWriteBandwidth: 0 kBps > b_dirty: 0 > b_io: 0 > b_more_io: 0 > b_dirty_time: 0 > bdi_list: 1 > state: 1 > --------- > > > I didn't do anything to configure/change the FUSE bdi min/max_ratio. > This is what I see on my system: > > cat /sys/class/bdi/0:52/min_ratio > 0 > cat /sys/class/bdi/0:52/max_ratio > 1 OK, we can see that BdiDirtyThresh stabilized more or less at 3.6MB. Checking the code, this shows we are hitting __wb_calc_thresh() logic: if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) { unsigned long limit = hard_dirty_limit(dom, dtc->thresh); u64 wb_scale_thresh = 0; if (limit > dtc->dirty) wb_scale_thresh = (limit - dtc->dirty) / 100; wb_thresh = max(wb_thresh, min(wb_scale_thresh, wb_max_thresh / } so BdiDirtyThresh is set to DirtyThresh/100. This also shows bdi never generates enough throughput to ramp up it's share from this initial value. > > Actually, there's a patch queued in mm tree that improves the ramping up of > > bdi dirty limit for strictlimit bdis [1]. It would be nice if you could > > test whether it changes something in the behavior you observe. Thanks! > > > > Honza > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche > > s/mm-page-writeback-consolidate-wb_thresh-bumping-logic-into-__wb_calc_thresh.pa > > tch > > I still see the same results (~230 MiB/s throughput using fio) with > this patch applied, unfortunately. Here's the debug info I see with > this patch (same test scenario as above on FUSE large folio writes > where bs=1M and size=1G): > > BdiWriteback: 0 kB > BdiReclaimable: 2048 kB > BdiDirtyThresh: 3588 kB > DirtyThresh: 359132 kB > BackgroundThresh: 179348 kB > BdiDirtied: 51200 kB > BdiWritten: 128 kB > BdiWriteBandwidth: 102400 kBps > b_dirty: 1 > b_io: 0 > b_more_io: 0 > b_dirty_time: 0 > bdi_list: 1 > state: 5 > --------- > BdiWriteback: 0 kB > BdiReclaimable: 0 kB > BdiDirtyThresh: 3588 kB > DirtyThresh: 359144 kB > BackgroundThresh: 179352 kB > BdiDirtied: 331776 kB > BdiWritten: 1216 kB > BdiWriteBandwidth: 0 kBps > b_dirty: 0 > b_io: 0 > b_more_io: 0 > b_dirty_time: 0 > bdi_list: 1 > state: 1 > --------- > BdiWriteback: 0 kB > BdiReclaimable: 0 kB > BdiDirtyThresh: 3588 kB > DirtyThresh: 359144 kB > BackgroundThresh: 179352 kB > BdiDirtied: 562176 kB > BdiWritten: 2176 kB > BdiWriteBandwidth: 0 kBps > b_dirty: 0 > b_io: 0 > b_more_io: 0 > b_dirty_time: 0 > bdi_list: 1 > state: 1 > --------- > BdiWriteback: 0 kB > BdiReclaimable: 0 kB > BdiDirtyThresh: 3588 kB > DirtyThresh: 359144 kB > BackgroundThresh: 179352 kB > BdiDirtied: 792576 kB > BdiWritten: 3072 kB > BdiWriteBandwidth: 0 kBps > b_dirty: 0 > b_io: 0 > b_more_io: 0 > b_dirty_time: 0 > bdi_list: 1 > state: 1 > --------- > BdiWriteback: 64 kB > BdiReclaimable: 0 kB > BdiDirtyThresh: 3588 kB > DirtyThresh: 359144 kB > BackgroundThresh: 179352 kB > BdiDirtied: 1026048 kB > BdiWritten: 3904 kB > BdiWriteBandwidth: 0 kBps > b_dirty: 0 > b_io: 0 > b_more_io: 0 > b_dirty_time: 0 > bdi_list: 1 > state: 1 > --------- Yeah, here the situation is really the same. As an experiment can you experiment with setting min_ratio for the FUSE bdi to 1, 2, 3, ..., 10 (I don't expect you should need to go past 10) and figure out when there's enough slack space for the writeback bandwidth to ramp up to a full speed? Thanks! Honza -- Jan Kara SUSE Labs, CR