From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 752B3D5E37A for ; Sat, 9 Nov 2024 18:53:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0D1EF6B0092; Sat, 9 Nov 2024 13:53:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0818B6B0096; Sat, 9 Nov 2024 13:53:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E631D6B0099; Sat, 9 Nov 2024 13:53:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C91446B0092 for ; Sat, 9 Nov 2024 13:53:35 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 8224580437 for ; Sat, 9 Nov 2024 18:53:35 +0000 (UTC) X-FDA: 82767453594.27.B097F93 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf17.hostedemail.com (Postfix) with ESMTP id 843D44000D for ; Sat, 9 Nov 2024 18:53:03 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=IwxKlBUn; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=cu3h4u0n; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=IwxKlBUn; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=cu3h4u0n; spf=pass (imf17.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731178273; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1CrzPv2zaSwzQWUisXpEjsqCejF07itT1upqPWYCl5Y=; b=fU4dhuCizg2QbgCme5YGqc8mjOOedN82zN6LlvsCGFp6gNdI5XTnUOyNzMYUmC2usvMlvk FwCeLCK9tj+vmbDnbcWexzFdRGQVUdWe7uAlB/J1nrVScFlRs4E1k4/ufxrKEQtiC/Sosk AhaBB3JoLmv01DC3dsxtwV8b3fk1PPg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731178273; a=rsa-sha256; cv=none; b=R+O5cAOAwzyljG2eIadWBDRGPiHnHgLgV3zQ6nzXJzRkqiphpTuVglcoTnIwqHlm1fL87r THghNzw+NV/WJ/WwM6hzb2RAzaA4DkdxzjrLuj6xGRLNMjhTCqcvbOH1tDgcJ7gOttWwpw KN7Ko/MiPHDjh3tcr4vMDsuJUFOKzcg= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=IwxKlBUn; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=cu3h4u0n; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=IwxKlBUn; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=cu3h4u0n; spf=pass (imf17.hostedemail.com: domain of jack@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 00B2E1F388; Sat, 9 Nov 2024 18:53:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1731178411; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1CrzPv2zaSwzQWUisXpEjsqCejF07itT1upqPWYCl5Y=; b=IwxKlBUntynAt0VjCIuWuAIGpQyiK7PPdFBahQOk1LtjgQZQf/nBPK3l6JQd0KriPzkPxY mhsbtGQ45hK7/Ict+XROckUTI4WosacDs4nU/q2OLKwGE5x0/dOH08DngprstYW820+aD2 TeDYJo8oyH5JM2SRdImQoJGbxqqvPMs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1731178411; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1CrzPv2zaSwzQWUisXpEjsqCejF07itT1upqPWYCl5Y=; b=cu3h4u0nWnE2k7coIl6a1BsZrk76sS2yIA5BVFuOgp4bXhHJvXyeLwY3SXy/wgoNoL67pk BREAK6mUb1TAVeAA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1731178411; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1CrzPv2zaSwzQWUisXpEjsqCejF07itT1upqPWYCl5Y=; b=IwxKlBUntynAt0VjCIuWuAIGpQyiK7PPdFBahQOk1LtjgQZQf/nBPK3l6JQd0KriPzkPxY mhsbtGQ45hK7/Ict+XROckUTI4WosacDs4nU/q2OLKwGE5x0/dOH08DngprstYW820+aD2 TeDYJo8oyH5JM2SRdImQoJGbxqqvPMs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1731178411; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1CrzPv2zaSwzQWUisXpEjsqCejF07itT1upqPWYCl5Y=; b=cu3h4u0nWnE2k7coIl6a1BsZrk76sS2yIA5BVFuOgp4bXhHJvXyeLwY3SXy/wgoNoL67pk BREAK6mUb1TAVeAA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id E4083136CC; Sat, 9 Nov 2024 18:53:30 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id EYQHN6qvL2dmPQAAD6G6ig (envelope-from ); Sat, 09 Nov 2024 18:53:30 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 9C8DCA083F; Fri, 8 Nov 2024 23:02:15 +0100 (CET) Date: Fri, 8 Nov 2024 23:02:15 +0100 From: Jan Kara To: Jim Zhao Cc: jack@suse.cz, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org Subject: Re: [PATCH] mm/page-writeback: Raise wb_thresh to prevent write blocking with strictlimit Message-ID: <20241108220215.s27rziym6mn5nzv4@quack3> References: <20241107153217.j6kwfgihzhj33dia@quack3> <20241108031949.2984319-1-jimzhao.ai@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20241108031949.2984319-1-jimzhao.ai@gmail.com> X-Rspamd-Action: no action X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 843D44000D X-Stat-Signature: 7mxkakwn9jumjxc8gbs3hyjzgurr3d6t X-HE-Tag: 1731178383-531384 X-HE-Meta: U2FsdGVkX1++m5Z4irODj+xTWGmS++mVlGB1ZF9lM8p6Fx0oeWDZENzhBLgb1IQQcCqPcKBv+734Wyi37WQ9zRk/Rt20JNprCVj5ZcW4aSNUI4O4HoSFW/sLDf14UlsCEDVbLIac7ZcRDBNko7X2vqFEEiVGg3Te/UU9WQWj9SBKOW0UzytsVWQ1rDa+o4qLn8xAjSEZOSzokTroh2S6QVnPvLsf1H/eQZ7NcPMHAncZA9nebuiliHmoYRYZuzVcQtkxC5rO/EWnTvgbKVt20IY2tHh/HuSI7a7SIgq9jxUPj775GXIAB6n6vZjnor3i+DpvMSB9IuJa2a9FZrQQmbVAbdLKeuUE673Lysw5y0Cybk4+z2dOsFoMJYs4L5dk/FERBI6TfjtFtcgM1s492PJ23VzkiD68a+VBD6Zk3JRmpi/5PKqkkYgb2T8zcGu3xe4G7TawcUElaPC/7dNO8LN5t6Y/maaIDNVgJ4K0V5Nr+sSvNUTGLPdsTtCMnPeYRkY8j6lq51MUvwh78oZ0Q0aUIJvOTVUkKozZPBWPrRPyMz9Rg1/H0ZRZgjBhe8CtwM9hJxVAxNBM+fgW0kNur+NWq0dBBWJ8wKu0+pLu/g9fR9jwaNO0Qib4FWHRfRKBI0nSwfjtigv2HrFPCeSWf7x7JwttZma53QRP5NqpOVMVbo/qugYTXXZjZF1JP01wWvKCxY8rZ66kApaFnmCZrWpXVC+nVoDDW2hscKlqIqtO29ozX5v+xONmuLQXTJrgqH6yo4xeIVUWxlYZ0ROcxir5IMwZdaqsqmulF2zQarhgEhEwFPmzcWqrKVuqJUbpzYL0U2LbV1ZlZg2lwVWo4h9lmjq7Dvck1eTsa/LmM0D4JduSfUY3maeH5fOkRVEUypy6HIFtwx2g2tFJbiUN6H5doKGIt/+uDFg7MsenTwjBI6G71KBcE3EwGDpJ/49loljKyamuSKQVos7UzOS vq7/Vj6k Dv8/oUPrko32+r4Nn5TIo6SdFeT5OmmT1C6ZAVJP0v40EcQ3e1gtThlUQuSAQ8G+sbMFMuT90nJ+ZCqfAqSEGc1f3gKttGSyts94QLjAl5M622eESQ8g0zq3bSPPCRJY2BWVyu0aP1gMDT5TjqUeDIZILBz8lTv0FKko3iF/LsIT4/Jc5L4efdhiBXwAmlTNBz7nKCPkrhjjZu4GcUOlF1/thqKji57ADaUNWKkZ+xxDBdjxEZM4/FUzZsCsdNBuTia4MOzXuEkbjFcYq0URMla2VrTVRM3u++xagOt6yQMYhbHLhod+VugAyzpQ/HaOWLuAla8+aX9RVsxd1lAUBZcPH7yF462HvF/pKgGabsC9JUApqgsXLludJr7UfxNGfqQ6AsI7skq12aq+JvCMEX3qu3UWT+Kl0quF/1o9zVal7YceU3VVyHjI12mQO6gGpGf2WRnqhlOg+q03RtaU33MpQooiN0ICt+mviwB08GCSXxxM561m9dAe47A8KGnCKbRsy X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri 08-11-24 11:19:49, Jim Zhao wrote: > > On Wed 23-10-24 18:00:32, Jim Zhao wrote: > > > With the strictlimit flag, wb_thresh acts as a hard limit in > > > balance_dirty_pages() and wb_position_ratio(). When device write > > > operations are inactive, wb_thresh can drop to 0, causing writes to > > > be blocked. The issue occasionally occurs in fuse fs, particularly > > > with network backends, the write thread is blocked frequently during > > > a period. To address it, this patch raises the minimum wb_thresh to a > > > controllable level, similar to the non-strictlimit case. > > > > > > Signed-off-by: Jim Zhao > > > > ... > > > > > + /* > > > + * With strictlimit flag, the wb_thresh is treated as > > > + * a hard limit in balance_dirty_pages() and wb_position_ratio(). > > > + * It's possible that wb_thresh is close to zero, not because > > > + * the device is slow, but because it has been inactive. > > > + * To prevent occasional writes from being blocked, we raise wb_thresh. > > > + */ > > > + if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) { > > > + unsigned long limit = hard_dirty_limit(dom, dtc->thresh); > > > + u64 wb_scale_thresh = 0; > > > + > > > + if (limit > dtc->dirty) > > > + wb_scale_thresh = (limit - dtc->dirty) / 100; > > > + wb_thresh = max(wb_thresh, min(wb_scale_thresh, wb_max_thresh / 4)); > > > + } > > > > What you propose makes sense in principle although I'd say this is mostly a > > userspace setup issue - with strictlimit enabled, you're kind of expected > > to set min_ratio exactly if you want to avoid these startup issues. But I > > tend to agree that we can provide a bit of a slack for a bdi without > > min_ratio configured to ramp up. > > > > But I'd rather pick the logic like: > > > > /* > > * If bdi does not have min_ratio configured and it was inactive, > > * bump its min_ratio to 0.1% to provide it some room to ramp up. > > */ > > if (!wb_min_ratio && !numerator) > > wb_min_ratio = min(BDI_RATIO_SCALE / 10, wb_max_ratio / 2); > > > > That would seem like a bit more systematic way than the formula you propose > > above... > > Thanks for the advice. > Here's the explanation of the formula: > 1. when writes are small and intermittent,wb_thresh can approach 0, not > just 0, making the numerator value difficult to verify. I see, ok. > 2. The ramp-up margin, whether 0.1% or another value, needs > consideration. > I based this on the logic of wb_position_ratio in the non-strictlimit > scenario: wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8); It seems > provides more room and ensures ramping up within a controllable range. I see, thanks for explanation. So I was thinking how to make the code more consistent instead of adding another special constant and workaround. What I'd suggest is: 1) There's already code that's supposed to handle ramping up with strictlimit in wb_update_dirty_ratelimit(): /* * For strictlimit case, calculations above were based on wb counters * and limits (starting from pos_ratio = wb_position_ratio() and up to * balanced_dirty_ratelimit = task_ratelimit * write_bw / dirty_rate). * Hence, to calculate "step" properly, we have to use wb_dirty as * "dirty" and wb_setpoint as "setpoint". * * We rampup dirty_ratelimit forcibly if wb_dirty is low because * it's possible that wb_thresh is close to zero due to inactivity * of backing device. */ if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) { dirty = dtc->wb_dirty; if (dtc->wb_dirty < 8) setpoint = dtc->wb_dirty + 1; else setpoint = (dtc->wb_thresh + dtc->wb_bg_thresh) / 2; } Now I agree that increasing wb_thresh directly is more understandable and transparent so I'd just drop this special case. 2) I'd just handle all the bumping of wb_thresh in a single place instead of having is spread over multiple places. So __wb_calc_thresh() could have a code like: wb_thresh = (thresh * (100 * BDI_RATIO_SCALE - bdi_min_ratio)) / (100 * BDI_RATIO_SCALE) wb_thresh *= numerator; wb_thresh = div64_ul(wb_thresh, denominator); wb_min_max_ratio(dtc->wb, &wb_min_ratio, &wb_max_ratio); wb_thresh += (thresh * wb_min_ratio) / (100 * BDI_RATIO_SCALE); limit = hard_dirty_limit(dtc_dom(dtc), dtc->thresh); /* * It's very possible that wb_thresh is close to 0 not because the * device is slow, but that it has remained inactive for long time. * Honour such devices a reasonable good (hopefully IO efficient) * threshold, so that the occasional writes won't be blocked and active * writes can rampup the threshold quickly. */ if (limit > dtc->dirty) wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8); if (wb_thresh > (thresh * wb_max_ratio) / (100 * BDI_RATIO_SCALE)) wb_thresh = thresh * wb_max_ratio / (100 * BDI_RATIO_SCALE); and we can drop the bumping from wb_position)_ratio(). This way have the wb_thresh bumping in a single logical place. Since we still limit wb_tresh with max_ratio, untrusted bdis for which max_ratio should be configured (otherwise they can grow amount of dirty pages upto global treshold anyway) are still under control. If we really wanted, we could introduce a different bumping in case of strictlimit, but at this point I don't think it is warranted so I'd leave that as an option if someone comes with a situation where this bumping proves to be too aggressive. Honza -- Jan Kara SUSE Labs, CR