From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4012D6E2C9 for ; Wed, 20 Nov 2024 08:03:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BC0C16B007B; Wed, 20 Nov 2024 03:03:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B49166B0083; Wed, 20 Nov 2024 03:03:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9EA4E6B0085; Wed, 20 Nov 2024 03:03:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 759CA6B007B for ; Wed, 20 Nov 2024 03:03:31 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id DA33E1A097D for ; Wed, 20 Nov 2024 08:03:30 +0000 (UTC) X-FDA: 82805732310.24.A3568BD Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by imf14.hostedemail.com (Postfix) with ESMTP id 359A7100010 for ; Wed, 20 Nov 2024 08:02:28 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of shikemeng@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=shikemeng@huaweicloud.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732089748; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nQepoAU1574lCVXFjZFgwoFOG7rLV5tt6FYjXUBBgwY=; b=PrS9GoQQAPNV/YSvPg2/CJV7r35Nerd9fYvABNJRemGNdhE5x5vS+Sf+4z1c7z6lZPUQcH dZrhuaozPZKJiMqGkx8ODxDXyLmqkYoV55IpjNnlrcRuhpbxByVVacS2bhzKJUlybSDn9c T1nwVVcEOMe/YMdbbmWnpbDzrPxMegY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732089748; a=rsa-sha256; cv=none; b=cT0SPKtl4tunkYEtnKGKZE5iZmbtfoZ3QVZdgk8/1FIP2SqpVy1Z2uLOdM9ZruJGQwEfaj kDuihDPj6g9GAqRypUD+6oSIRAy0658pHZhQWAkLK/SBELhtonqNRII3ybzKMYHJyjY9jI k7j1ZD0hXnJ7d1yL7yxU+uS0e80tKW4= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of shikemeng@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=shikemeng@huaweicloud.com; dmarc=none Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XtYkw4pS9z4f3kJx for ; Wed, 20 Nov 2024 16:03:04 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 1E71A1A0194 for ; Wed, 20 Nov 2024 16:03:18 +0800 (CST) Received: from [10.174.178.129] (unknown [10.174.178.129]) by APP3 (Coremail) with SMTP id _Ch0CgDXtsPElz1n3Kv4CA--.63612S2; Wed, 20 Nov 2024 16:03:18 +0800 (CST) Subject: Re: [PATCH v2] mm/page-writeback: Raise wb_thresh to prevent write blocking with strictlimit To: Jim Zhao , jack@suse.cz Cc: akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org References: <20241119114444.3925495-1-jimzhao.ai@gmail.com> <20241119122922.3939538-1-jimzhao.ai@gmail.com> From: Kemeng Shi Message-ID: <5584d4d5-73c8-2a12-f11e-6f19c216656b@huaweicloud.com> Date: Wed, 20 Nov 2024 16:03:16 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: <20241119122922.3939538-1-jimzhao.ai@gmail.com> Content-Type: text/plain; charset=gbk Content-Transfer-Encoding: 7bit X-CM-TRANSID:_Ch0CgDXtsPElz1n3Kv4CA--.63612S2 X-Coremail-Antispam: 1UD129KBjvJXoW3GFyxGFy8GrWfCFW3ArWxXrb_yoWxGFW5pF W7J3W3AFWUJr4I9rsxZFy8Wr12qrs2qrW2gF9rA34Yvrn8Cry7Jr1IkFsYyFy8AFy7GF1r Za1YqF97WryqkFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUkEb4IE77IF4wAFF20E14v26r4j6ryUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x 0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG 6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV Cjc4AY6r1j6r4UM4x0Y48IcVAKI48JMxk0xIA0c2IEe2xFo4CEbIxvr21lc7CjxVAaw2AF wI0_JF0_Jw1l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4 xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1D MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I 0E14v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWU JVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07UK2N tUUUUU= X-CM-SenderInfo: 5vklyvpphqwq5kxd4v5lfo033gof0z/ X-Stat-Signature: f1s9h4jxw54q9y4bor55d4w5eyboicbp X-Rspam-User: X-Rspamd-Queue-Id: 359A7100010 X-Rspamd-Server: rspam02 X-HE-Tag: 1732089748-598286 X-HE-Meta: U2FsdGVkX18lWUlv71Ls0kQNQMux98i4O5FGXpmRdxQn4Qq03jl8CahZoeI+lk5NHfEpXKAjnwP7RXEWWJAaAQ0f7l9m8lZzYuHmRjLqQDUEh/tkBAlU5OqOz3JrnmRCERbyW6TBUaR9HTh1FlCpFJ6YfBREQvAaBt4ZRJLaGShSWRIIOR5fw6g1F4gYsBxFT876oWNBZ2Sisc6mjMUDrk13Hug0UoRHzptZqX6nqafDv8ehTM3Iw9Szn5D2FIh/SyyN9TjYPEcNVh8seZblAVpXLVHfVNF7x556bbkFuYKt7nnrMubTr63suHy8qXFnZi2dPmCvmALTt5/GEdQsrMk1hMB8NneWuFCjZme5qs4vH112CpprgHozSRyBTbvizi83KjABCsoFZJYpNfK5cuYwye+XCrdXBy4vQU0/8qi/i6wINK5w0Oele66rh5z/tJJjITTHyYLgOdq0+WUHFWEHWSV9j9q5znk89gQaWVOW7y8oC6tXpqQNcJEamHUZd51DcBY255PrvNf8lJKnUTh6wD0o3cYRl3FQJyk33PBPl0LGXQg0P6/SUFFDUyUm/r9eGCH5CS9lCSaQkrhFQCCbj7MGgUQ6GR+/9XPGTGfb6uxKIqXSDI4UPIjxO5SyzD/+h5RDxckO2VeGNLTBkaFjUHDhr5E+hERJm9/yQxhFuAOeViLM0EtG5nETTxqnnJ5sS7ZMXwfocppfHWa6SrrKCRk8V3HiMH6viPMqu3wk/agHPkIjZ+eNCh0ac18cuieHnutPmAhFj4FOZdtJUDTRr+xPXQsAdEVTJZacKUiR4e4Tkx2pHESmINW8jsekRHOTYR8LBmmzXd0pwvviyJ4Ipx3uIUkzc7VLhmqyBUpDV7v+Y/vecz95gFl739JxPHaThx1vLIW+kL+75l84r5r0NeCMVQzbqmxa7ZozCHcFWO3jZs3NoJKpIRwmQKruHVK5ipLE/gIwAGNJXNa LRwQIAQZ 9aEvZ8oYQLOf2S/dWV0reGSGz/WCgBqWT5JGzbO9LVWv8tFL4Y8RUJfSKJiTJBV+2bRyFbWwm5qwa991otRnDf9WwKvuZnyDss94fElOIIATvJk2H1WtMEMfZlGuj7vN8y83s0DTP3Ot9nh0Uj/rVRiw6KXfVpYajrSdULGdxXV+kO12xwMFSVmX8NYy0ohjlq3FAubmouxwTcwGiVgfhZ6CjJb2KVHdq1ry2zAW+p9URn/ZzHdF/FKMU+s5WoZBb/aewoVyECa+ma60= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: on 11/19/2024 8:29 PM, Jim Zhao wrote: > Thanks, Jan, I just sent patch v2, could you please review it ? > > And I found the debug info in the bdi stats. > The BdiDirtyThresh value may be greater than DirtyThresh, and after applying this patch, the value of BdiDirtyThresh could become even larger. > > without patch: > --- > root@ubuntu:/sys/kernel/debug/bdi/8:0# cat stats > BdiWriteback: 0 kB > BdiReclaimable: 96 kB > BdiDirtyThresh: 1346824 kB > DirtyThresh: 673412 kB > BackgroundThresh: 336292 kB > BdiDirtied: 19872 kB > BdiWritten: 19776 kB > BdiWriteBandwidth: 0 kBps > b_dirty: 0 > b_io: 0 > b_more_io: 0 > b_dirty_time: 0 > bdi_list: 1 > state: 1 > > with patch: > --- > root@ubuntu:/sys/kernel/debug/bdi/8:0# cat stats > BdiWriteback: 96 kB > BdiReclaimable: 192 kB > BdiDirtyThresh: 3090736 kB > DirtyThresh: 650716 kB > BackgroundThresh: 324960 kB > BdiDirtied: 472512 kB > BdiWritten: 470592 kB > BdiWriteBandwidth: 106268 kBps > b_dirty: 2 > b_io: 0 > b_more_io: 0 > b_dirty_time: 0 > bdi_list: 1 > state: 1 > > > @kemeng, is this a normal behavior or an issue ? Hello, this is not a normal behavior, could you aslo send the content in wb_stats and configuired bdi_min_ratio. I think the improper use of bdi_min_ratio may cause the issue. Thanks, Kemeng > > Thanks, > Jim Zhao > > >> With the strictlimit flag, wb_thresh acts as a hard limit in >> balance_dirty_pages() and wb_position_ratio(). When device write >> operations are inactive, wb_thresh can drop to 0, causing writes to be >> blocked. The issue occasionally occurs in fuse fs, particularly with >> network backends, the write thread is blocked frequently during a period. >> To address it, this patch raises the minimum wb_thresh to a controllable >> level, similar to the non-strictlimit case. >> >> Signed-off-by: Jim Zhao >> --- >> Changes in v2: >> 1. Consolidate all wb_thresh bumping logic in __wb_calc_thresh for consistency; >> 2. Replace the limit variable with thresh for calculating the bump value, >> as __wb_calc_thresh is also used to calculate the background threshold; >> 3. Add domain_dirty_avail in wb_calc_thresh to get dtc->dirty. >> --- >> mm/page-writeback.c | 48 ++++++++++++++++++++++----------------------- >> 1 file changed, 23 insertions(+), 25 deletions(-) >> >> diff --git a/mm/page-writeback.c b/mm/page-writeback.c >> index e5a9eb795f99..8b13bcb42de3 100644 >> --- a/mm/page-writeback.c >> +++ b/mm/page-writeback.c >> @@ -917,7 +917,9 @@ static unsigned long __wb_calc_thresh(struct dirty_throttle_control *dtc, >> unsigned long thresh) >> { >> struct wb_domain *dom = dtc_dom(dtc); >> + struct bdi_writeback *wb = dtc->wb; >> u64 wb_thresh; >> + u64 wb_max_thresh; >> unsigned long numerator, denominator; >> unsigned long wb_min_ratio, wb_max_ratio; >> >> @@ -931,11 +933,27 @@ static unsigned long __wb_calc_thresh(struct dirty_throttle_control *dtc, >> wb_thresh *= numerator; >> wb_thresh = div64_ul(wb_thresh, denominator); >> >> - wb_min_max_ratio(dtc->wb, &wb_min_ratio, &wb_max_ratio); >> + wb_min_max_ratio(wb, &wb_min_ratio, &wb_max_ratio); >> >> wb_thresh += (thresh * wb_min_ratio) / (100 * BDI_RATIO_SCALE); >> - if (wb_thresh > (thresh * wb_max_ratio) / (100 * BDI_RATIO_SCALE)) >> - wb_thresh = thresh * wb_max_ratio / (100 * BDI_RATIO_SCALE); >> + >> + /* >> + * It's very possible that wb_thresh is close to 0 not because the >> + * device is slow, but that it has remained inactive for long time. >> + * Honour such devices a reasonable good (hopefully IO efficient) >> + * threshold, so that the occasional writes won't be blocked and active >> + * writes can rampup the threshold quickly. >> + */ >> + if (thresh > dtc->dirty) { >> + if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) >> + wb_thresh = max(wb_thresh, (thresh - dtc->dirty) / 100); >> + else >> + wb_thresh = max(wb_thresh, (thresh - dtc->dirty) / 8); >> + } >> + >> + wb_max_thresh = thresh * wb_max_ratio / (100 * BDI_RATIO_SCALE); >> + if (wb_thresh > wb_max_thresh) >> + wb_thresh = wb_max_thresh; >> >> return wb_thresh; >> } >> @@ -944,6 +962,7 @@ unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh) >> { >> struct dirty_throttle_control gdtc = { GDTC_INIT(wb) }; >> >> + domain_dirty_avail(&gdtc, true); >> return __wb_calc_thresh(&gdtc, thresh); >> } >> >> @@ -1120,12 +1139,6 @@ static void wb_position_ratio(struct dirty_throttle_control *dtc) >> if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) { >> long long wb_pos_ratio; >> >> - if (dtc->wb_dirty < 8) { >> - dtc->pos_ratio = min_t(long long, pos_ratio * 2, >> - 2 << RATELIMIT_CALC_SHIFT); >> - return; >> - } >> - >> if (dtc->wb_dirty >= wb_thresh) >> return; >> >> @@ -1196,14 +1209,6 @@ static void wb_position_ratio(struct dirty_throttle_control *dtc) >> */ >> if (unlikely(wb_thresh > dtc->thresh)) >> wb_thresh = dtc->thresh; >> - /* >> - * It's very possible that wb_thresh is close to 0 not because the >> - * device is slow, but that it has remained inactive for long time. >> - * Honour such devices a reasonable good (hopefully IO efficient) >> - * threshold, so that the occasional writes won't be blocked and active >> - * writes can rampup the threshold quickly. >> - */ >> - wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8); >> /* >> * scale global setpoint to wb's: >> * wb_setpoint = setpoint * wb_thresh / thresh >> @@ -1459,17 +1464,10 @@ static void wb_update_dirty_ratelimit(struct dirty_throttle_control *dtc, >> * balanced_dirty_ratelimit = task_ratelimit * write_bw / dirty_rate). >> * Hence, to calculate "step" properly, we have to use wb_dirty as >> * "dirty" and wb_setpoint as "setpoint". >> - * >> - * We rampup dirty_ratelimit forcibly if wb_dirty is low because >> - * it's possible that wb_thresh is close to zero due to inactivity >> - * of backing device. >> */ >> if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) { >> dirty = dtc->wb_dirty; >> - if (dtc->wb_dirty < 8) >> - setpoint = dtc->wb_dirty + 1; >> - else >> - setpoint = (dtc->wb_thresh + dtc->wb_bg_thresh) / 2; >> + setpoint = (dtc->wb_thresh + dtc->wb_bg_thresh) / 2; >> } >> >> if (dirty < setpoint) { >> -- >> 2.20.1 >