From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B41FD63931 for ; Wed, 20 Nov 2024 11:57:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 60A416B0096; Wed, 20 Nov 2024 06:57:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 592AE6B0098; Wed, 20 Nov 2024 06:57:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 40AF66B0099; Wed, 20 Nov 2024 06:57:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 219956B0096 for ; Wed, 20 Nov 2024 06:57:37 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 9BA60AD877 for ; Wed, 20 Nov 2024 11:57:36 +0000 (UTC) X-FDA: 82806321066.01.F037295 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf09.hostedemail.com (Postfix) with ESMTP id A6B2E14000F for ; Wed, 20 Nov 2024 11:56:58 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=qyqkFcoI; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=Ddss9Gul; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=qyqkFcoI; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=Ddss9Gul; dmarc=none; spf=pass (imf09.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732103787; a=rsa-sha256; cv=none; b=asFKGqyOdF43Ucs9Ar5zC65rzY7x5Jw0vRR1BnB4R+BmatZdos+ox4DxZdCJduh0x0PExo VOOgrh/5jVQssBrE4Z/L69k2f62nchC7wqeMEHMfsuJTvB8/hY8m2pxa9HyR6sne+ohmax yQv4lqspNp/Hzl5N/vfVBp9zl+q4nsE= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=qyqkFcoI; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=Ddss9Gul; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=qyqkFcoI; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=Ddss9Gul; dmarc=none; spf=pass (imf09.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732103787; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IavqSNCbqUkWJ5eYBw9gwWOmwKh1kgh1iYd5CwKJmz8=; b=oh954ojleR+YyW3LM7RnEGbcQf9Fx2UHvDqBiNPGhdvLdRoPzxtgPdLMxtgYSYTdTrTFpC ZDyizAqior/8bMLuJGoqwcqbP5fzkePv24+s1E8y12V+7E23wLUs7xRCYzD+w+zpTc4ZHM vtYXx2cbQZp5bJlkcWmxO52C0OC307g= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 613F91F76E; Wed, 20 Nov 2024 11:57:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1732103852; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=IavqSNCbqUkWJ5eYBw9gwWOmwKh1kgh1iYd5CwKJmz8=; b=qyqkFcoI32xIg3C13k1Lny8LPkxmG8CEk1duEJOZIvBXXUzHpoiLocW1IKZgCltQWTkzdD i++rtQ4R6X8CKELWPu+PvtzGi2tuHsdkNLEThjpzLF3lLKNSdahAsea6cgAE5NocnftiRo plDV9ZGQD2OHi3tC7uMNtcYPA8JFQD0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1732103852; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=IavqSNCbqUkWJ5eYBw9gwWOmwKh1kgh1iYd5CwKJmz8=; b=Ddss9GulcQLpQgFQ7Zm0D3ki1/IrEQiCUsv0+UkWVI07Uz/cx46VoKOsra8t9iDLL6f/vx 06nNUi9PxMYdKaDA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1732103852; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=IavqSNCbqUkWJ5eYBw9gwWOmwKh1kgh1iYd5CwKJmz8=; b=qyqkFcoI32xIg3C13k1Lny8LPkxmG8CEk1duEJOZIvBXXUzHpoiLocW1IKZgCltQWTkzdD i++rtQ4R6X8CKELWPu+PvtzGi2tuHsdkNLEThjpzLF3lLKNSdahAsea6cgAE5NocnftiRo plDV9ZGQD2OHi3tC7uMNtcYPA8JFQD0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1732103852; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=IavqSNCbqUkWJ5eYBw9gwWOmwKh1kgh1iYd5CwKJmz8=; b=Ddss9GulcQLpQgFQ7Zm0D3ki1/IrEQiCUsv0+UkWVI07Uz/cx46VoKOsra8t9iDLL6f/vx 06nNUi9PxMYdKaDA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 45BDA137CF; Wed, 20 Nov 2024 11:57:32 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id cR29EKzOPWfmAgAAD6G6ig (envelope-from ); Wed, 20 Nov 2024 11:57:32 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id E4D17A08E1; Wed, 20 Nov 2024 12:57:31 +0100 (CET) Date: Wed, 20 Nov 2024 12:57:31 +0100 From: Jan Kara To: Jim Zhao Cc: jack@suse.cz, shikemeng@huaweicloud.com, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org Subject: Re: [PATCH v2] mm/page-writeback: Raise wb_thresh to prevent write blocking with strictlimit Message-ID: <20241120115731.gzxozbnb6eazhil7@quack3> References: <20241119114444.3925495-1-jimzhao.ai@gmail.com> <20241119122922.3939538-1-jimzhao.ai@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241119122922.3939538-1-jimzhao.ai@gmail.com> X-Rspamd-Action: no action X-Rspam-User: X-Rspamd-Queue-Id: A6B2E14000F X-Rspamd-Server: rspam01 X-Stat-Signature: 9bse6bw5uu34xyti6yj1kqazw74j3rju X-HE-Tag: 1732103818-720527 X-HE-Meta: U2FsdGVkX19qOeXkU3fB89epVdXJlgW03+cfxFrUmbIZMRvpt+FW+pFSDY5h1FBjEVtCml63fNLVQdo2DbBaC6mJnBQLSnoIcqv9gsKvYwRd4mad9ERuicIwJK2jDNbTKLdN79Iwf9MzGK1EX4B8McdXIRt7Wlxb43nA2DLhCNzNgF3s8RC0ONdWgd/gCC034H9RTDx3avHn31tBKmhAJlZtK1vvYp8lMNc8ZvhJ38otNVfXRZRa/Am+YxHNUzxjZxDqMuTA2zDsAFvgRZiA8fGRPVMxNd4gjzTodFrrZd/x1BW+iE+XZRO7LO4sQQGTD+C4OxNSwU9QwYibhLBiOqKC5h1haFJVemga1zSEc8MkBY1VxGdnQ3+tfoyNFOMGNcCz9/zMqGfPVRs3Sysv2xcfdfldZRwmp6vz35LeoLgi2La3UoEN8sdB/eHfEP+GCkP1YOi9u7m0fhZemzjRITKyLsQ8qG8cOB9mE9iaGjmrOnfOklsji7XZxpw1PaWhScXRvuZQ682gTVowLqFP8xqtwVT2v3bO4tdcql56NGT04Ef1fOA+EopwCY0pD28WYworNCfuieqvtb7gI09UswvqqIWf0Q/HpzibtwIwVt1FijeMGmfckRC2sVvFPP9Ja38M2/NUzIefQGwT7/Kww2EjOIeprA930gOoezKPmsJnBJm7xBYmOupLUuicR5aMYBbaBZ0Gkx88Y/LTkXqF4P/QOiutrwng5dqeIANGVYGsv4qIlCSITtir9TImF+Hd5xkKm0Q/4EI4RD4bA7l+5TzmZA0Sp+0FA3X/N09/GmIYQqra0PlB0mdA+EmvL6+lKSFWJEkoEsEWvgpQdAWf4HUedpsjX3ZnQYjK/GmrnyhIa5dXfJyo+LAM72iia5hQuzpkW7NTJCSN+fzj70PUeagbUh6qA9nlwp+1c8JdBDpOJ2Pva0VxTWtRxY7A6xE8+JTmeEAn57WSfqAcOGv upS4r3FN O7qC87u9cRK5VJ1Nx/hl78fQXMd38oGQxZrHbAvrGegfxbU2I/mnLiWWSYazo/ARQUhgr4iwHO1Kcig0w1e05G2Y7h2wTR0JFQ0EORQZYGA5uhzSZUhhuwEX7UpQUcPw2Zc2Ye5FZqKSZd8fNWYvAXwZf4dAvwHImD0yYC2EAgiz/UKAlE8iywVW8YJc57jg+T18Eyj6s7ha2tvqYbfO+DbRGw/cWZDNUrQgpp8Q7reMJLqkYCWkZr+l4Ej6xD6iQ5btG53A3ewTV0cNPeP08IHZvym6ssoLbRxE1QPTwRFXXBFH/XvsnteCa6nQJdqFlETSr67WG4o0Mh4xg+j8+5vmwA4xcNOZw+lhcFTu2XXvTea5kbK6W8Gi5HRZhsxRz4evotyDqAfjCxUF/yfmyfRqlss5h4jJOeXLxcg78ysVpvJwVmrOLodq3VScl/hgrD1k0X5KieF3EBC4WuZsJW6nwdkyrsYuFOjnN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello! On Tue 19-11-24 20:29:22, Jim Zhao wrote: > Thanks, Jan, I just sent patch v2, could you please review it ? Yes, the patch looks good to me. > > And I found the debug info in the bdi stats. > The BdiDirtyThresh value may be greater than DirtyThresh, and after > applying this patch, the value of BdiDirtyThresh could become even > larger. > > without patch: > --- > root@ubuntu:/sys/kernel/debug/bdi/8:0# cat stats > BdiWriteback: 0 kB > BdiReclaimable: 96 kB > BdiDirtyThresh: 1346824 kB But this is odd. The machine appears to have around 3GB of memory, doesn't it? I suspect this is caused by multiple cgroup-writeback contexts contributing to BdiDirtyThresh - in fact I think the math in bdi_collect_stats() is wrong as it is adding wb_thresh() calculated based on global dirty_thresh for each cgwb whereas it should be adding wb_thresh() calculated based on per-memcg dirty_thresh... You can have a look at /sys/kernel/debug/bdi/8:0/wb_stats file which should have correct limits as far as I'm reading the code. Honza > DirtyThresh: 673412 kB > BackgroundThresh: 336292 kB > BdiDirtied: 19872 kB > BdiWritten: 19776 kB > BdiWriteBandwidth: 0 kBps > b_dirty: 0 > b_io: 0 > b_more_io: 0 > b_dirty_time: 0 > bdi_list: 1 > state: 1 > > with patch: > --- > root@ubuntu:/sys/kernel/debug/bdi/8:0# cat stats > BdiWriteback: 96 kB > BdiReclaimable: 192 kB > BdiDirtyThresh: 3090736 kB > DirtyThresh: 650716 kB > BackgroundThresh: 324960 kB > BdiDirtied: 472512 kB > BdiWritten: 470592 kB > BdiWriteBandwidth: 106268 kBps > b_dirty: 2 > b_io: 0 > b_more_io: 0 > b_dirty_time: 0 > bdi_list: 1 > state: 1 > > > @kemeng, is this a normal behavior or an issue ? > > Thanks, > Jim Zhao > > > > With the strictlimit flag, wb_thresh acts as a hard limit in > > balance_dirty_pages() and wb_position_ratio(). When device write > > operations are inactive, wb_thresh can drop to 0, causing writes to be > > blocked. The issue occasionally occurs in fuse fs, particularly with > > network backends, the write thread is blocked frequently during a period. > > To address it, this patch raises the minimum wb_thresh to a controllable > > level, similar to the non-strictlimit case. > > > > Signed-off-by: Jim Zhao > > --- > > Changes in v2: > > 1. Consolidate all wb_thresh bumping logic in __wb_calc_thresh for consistency; > > 2. Replace the limit variable with thresh for calculating the bump value, > > as __wb_calc_thresh is also used to calculate the background threshold; > > 3. Add domain_dirty_avail in wb_calc_thresh to get dtc->dirty. > > --- > > mm/page-writeback.c | 48 ++++++++++++++++++++++----------------------- > > 1 file changed, 23 insertions(+), 25 deletions(-) > > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index e5a9eb795f99..8b13bcb42de3 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -917,7 +917,9 @@ static unsigned long __wb_calc_thresh(struct dirty_throttle_control *dtc, > > unsigned long thresh) > > { > > struct wb_domain *dom = dtc_dom(dtc); > > + struct bdi_writeback *wb = dtc->wb; > > u64 wb_thresh; > > + u64 wb_max_thresh; > > unsigned long numerator, denominator; > > unsigned long wb_min_ratio, wb_max_ratio; > > > > @@ -931,11 +933,27 @@ static unsigned long __wb_calc_thresh(struct dirty_throttle_control *dtc, > > wb_thresh *= numerator; > > wb_thresh = div64_ul(wb_thresh, denominator); > > > > - wb_min_max_ratio(dtc->wb, &wb_min_ratio, &wb_max_ratio); > > + wb_min_max_ratio(wb, &wb_min_ratio, &wb_max_ratio); > > > > wb_thresh += (thresh * wb_min_ratio) / (100 * BDI_RATIO_SCALE); > > - if (wb_thresh > (thresh * wb_max_ratio) / (100 * BDI_RATIO_SCALE)) > > - wb_thresh = thresh * wb_max_ratio / (100 * BDI_RATIO_SCALE); > > + > > + /* > > + * It's very possible that wb_thresh is close to 0 not because the > > + * device is slow, but that it has remained inactive for long time. > > + * Honour such devices a reasonable good (hopefully IO efficient) > > + * threshold, so that the occasional writes won't be blocked and active > > + * writes can rampup the threshold quickly. > > + */ > > + if (thresh > dtc->dirty) { > > + if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) > > + wb_thresh = max(wb_thresh, (thresh - dtc->dirty) / 100); > > + else > > + wb_thresh = max(wb_thresh, (thresh - dtc->dirty) / 8); > > + } > > + > > + wb_max_thresh = thresh * wb_max_ratio / (100 * BDI_RATIO_SCALE); > > + if (wb_thresh > wb_max_thresh) > > + wb_thresh = wb_max_thresh; > > > > return wb_thresh; > > } > > @@ -944,6 +962,7 @@ unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh) > > { > > struct dirty_throttle_control gdtc = { GDTC_INIT(wb) }; > > > > + domain_dirty_avail(&gdtc, true); > > return __wb_calc_thresh(&gdtc, thresh); > > } > > > > @@ -1120,12 +1139,6 @@ static void wb_position_ratio(struct dirty_throttle_control *dtc) > > if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) { > > long long wb_pos_ratio; > > > > - if (dtc->wb_dirty < 8) { > > - dtc->pos_ratio = min_t(long long, pos_ratio * 2, > > - 2 << RATELIMIT_CALC_SHIFT); > > - return; > > - } > > - > > if (dtc->wb_dirty >= wb_thresh) > > return; > > > > @@ -1196,14 +1209,6 @@ static void wb_position_ratio(struct dirty_throttle_control *dtc) > > */ > > if (unlikely(wb_thresh > dtc->thresh)) > > wb_thresh = dtc->thresh; > > - /* > > - * It's very possible that wb_thresh is close to 0 not because the > > - * device is slow, but that it has remained inactive for long time. > > - * Honour such devices a reasonable good (hopefully IO efficient) > > - * threshold, so that the occasional writes won't be blocked and active > > - * writes can rampup the threshold quickly. > > - */ > > - wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8); > > /* > > * scale global setpoint to wb's: > > * wb_setpoint = setpoint * wb_thresh / thresh > > @@ -1459,17 +1464,10 @@ static void wb_update_dirty_ratelimit(struct dirty_throttle_control *dtc, > > * balanced_dirty_ratelimit = task_ratelimit * write_bw / dirty_rate). > > * Hence, to calculate "step" properly, we have to use wb_dirty as > > * "dirty" and wb_setpoint as "setpoint". > > - * > > - * We rampup dirty_ratelimit forcibly if wb_dirty is low because > > - * it's possible that wb_thresh is close to zero due to inactivity > > - * of backing device. > > */ > > if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) { > > dirty = dtc->wb_dirty; > > - if (dtc->wb_dirty < 8) > > - setpoint = dtc->wb_dirty + 1; > > - else > > - setpoint = (dtc->wb_thresh + dtc->wb_bg_thresh) / 2; > > + setpoint = (dtc->wb_thresh + dtc->wb_bg_thresh) / 2; > > } > > > > if (dirty < setpoint) { > > -- > > 2.20.1 -- Jan Kara SUSE Labs, CR