From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23728E77170 for ; Thu, 5 Dec 2024 15:23:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 621A36B00FD; Thu, 5 Dec 2024 10:19:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7ED3F6B0101; Thu, 5 Dec 2024 10:19:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C82DF6B00B9; Thu, 5 Dec 2024 10:19:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 76EF68D0001 for ; Tue, 12 Nov 2024 03:45:47 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 369D9C0851 for ; Tue, 12 Nov 2024 08:45:47 +0000 (UTC) X-FDA: 82776808926.15.8AC701D Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) by imf11.hostedemail.com (Postfix) with ESMTP id CA4EB40002 for ; Tue, 12 Nov 2024 08:44:53 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GnHs+qep; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of jimzhao.ai@gmail.com designates 209.85.214.196 as permitted sender) smtp.mailfrom=jimzhao.ai@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731400970; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SrT6bEUv9tRHMW8wYiNkeYHJqAClE8kFBlBgkdpORpM=; b=Aftz4bq69pTbhaqUPomecHZ9OVkG1Qc9QB6dWY3yWCgBNMp6h5+UPAuVgdHZMEPaLHo74N ibOJ+k+OHLyVuyMQ5cxDqBqW+ZCFh9sfZcbFdwOLXvqOZsxMKT8aZh8jV+5WLQz7X03MNU MZwcVO4zQDZRnaMS9Pda6ojltY69MlY= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GnHs+qep; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of jimzhao.ai@gmail.com designates 209.85.214.196 as permitted sender) smtp.mailfrom=jimzhao.ai@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731400970; a=rsa-sha256; cv=none; b=kKGFSv0Qw0zDiS11ZgWDqR20ZsDgJJ4CcgnAroTFBLw9NPxmngJdKNqBW7HQWrdZUQL3HT icYIQfKbnFiQrfJ7zOlnpDXthXjC8jnUQdG3sk2/yxgmST7AZ2/I7+TFp5D4zc83LspPOU sBZP48MDDp/INEr/vklo9agi5gN8bpM= Received: by mail-pl1-f196.google.com with SMTP id d9443c01a7336-20cdbe608b3so54202875ad.1 for ; Tue, 12 Nov 2024 00:45:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731401144; x=1732005944; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=SrT6bEUv9tRHMW8wYiNkeYHJqAClE8kFBlBgkdpORpM=; b=GnHs+qepHi12ThojcifYiuiSwQGDmLU2xXS4Y+oYMa57AtunFinGBj/dDHt2MAJvd5 hkQjpD2wBqOOGa6uDCsuReQcBLuI0qzHpOJFH5YHNCngTUnnBCOvXtO/H7gl535A/7s4 I5XrNDBndcntVhoRJY3uZBhvWZ9RLEUObX8tSVD1IOH1gAV11287ehdqYvWWcVie481b o4uAjARxrvEHvkyuPI+QUhAjr8C92TSelSw/POkbXkuF00N91G18WzgrJ7TK4CUHnL4h fjFM/h7YPaVxHW/9RlzBhgCXwY7XninObhBB2rKaBvdrdWzJBqqSOMSagXe44S1B7N3j MCqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731401144; x=1732005944; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SrT6bEUv9tRHMW8wYiNkeYHJqAClE8kFBlBgkdpORpM=; b=nHn4DeHb7zh40mTpo6t4PGqTGgYNZeonty4vdRZlyRI15BDGRC0LNT1msqoN1FrG7n +IUY20JqFu3viqw74LspTsYKhNGqNFPO115qilSgM40BBRq3+zCneRbFR+JQQlLa71ig loh/2q4R+xGv1wo6nFZw9sP0PM9ukl+qdm9Mp0XDKDPXCsNGI9UXo3QFx2SyXXQNfgdU 5cXYQmO8ANPHO1XKVlrewNe5Hl8mPmx5b790U+lX7W7IWvSHjABG/OkgcXKw5gNnS1Oq 20Rcd4GvtSN5zi/YjDvcxQlPnYlAJeWJoBLq4xIKt/GMzLAbN379uSyYhIflnnQm2de6 L34w== X-Forwarded-Encrypted: i=1; AJvYcCVsVekaZxY1sYKE9ZSn/002RSsbMcNpaPKzxjJeHIOWlp3j+hEeprUFpNm3pZ83e5OPyH1W6g3nlQ==@kvack.org X-Gm-Message-State: AOJu0Yy5fqKUKDMSJjskiqepbrB3xfu/naQIErYZeTATssCE890RQvcB KzSypCRIO+jklWALmIEIv4Tl61RcIL6yGx8RNZmM/3CJXK1CmQIg X-Google-Smtp-Source: AGHT+IGmlb5e7j0vU8rSA9tjuwhueLznhTs8RfmcM3t8vqUuD3i56XzMImlSugBikE9WV2jX0LlEEA== X-Received: by 2002:a17:903:228b:b0:20c:5533:36da with SMTP id d9443c01a7336-211ab9ccf30mr23782535ad.42.1731401143640; Tue, 12 Nov 2024 00:45:43 -0800 (PST) Received: from localhost.localdomain ([43.154.34.99]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e99a62bd39sm11906542a91.48.2024.11.12.00.45.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Nov 2024 00:45:43 -0800 (PST) From: Jim Zhao To: jack@suse.cz Cc: akpm@linux-foundation.org, jimzhao.ai@gmail.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org Subject: Re: [PATCH] mm/page-writeback: Raise wb_thresh to prevent write blocking with strictlimit Date: Tue, 12 Nov 2024 16:45:39 +0800 Message-Id: <20241112084539.702485-1-jimzhao.ai@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20241108220215.s27rziym6mn5nzv4@quack3> References: <20241108220215.s27rziym6mn5nzv4@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: CA4EB40002 X-Stat-Signature: dex41sih8ipkxbrr5buxkhsc7zdm4kmq X-Rspam-User: X-HE-Tag: 1731401093-921861 X-HE-Meta: U2FsdGVkX19Xn00Va8HvO4Lg4/Z6p2DaLD5nNQdXpNRUct+uG1Hmb8DNbyG5DlgRVONLsQVecELmbXUhDgpunRktRMOJhFiWsrU0HmZNoOGCYRXyVecav8cCEg5WK/ZPcsH00u46uBAKFubfkAd082hfqZOUPADvf+WG8StYL0qo+B4ZtMmtCS8kx8Go3RM+fHTsjwGnVArDRwDjojaIYy885DzerhQkQWv17o40jU+L7gMFjTN5csXmI4gweTaWaEZXtEPvoWhFI7GcTSZ9PctMzdDh4TaUjfZJAeqEY4K161JHFjDZV4J5NrJ9yKEi3HJtxEwfO3QIEvjvBpcYGQEWgwOW7lsO8cedeweYrXXMVdXiX/V8i/MJNh6tQO94aDcdWvKvYNS5tas9faTGwx3eDr0zNVMVZfuyJ8Rts/MYzlfCd+cfbLtlFd6VPT+LAyns9PMVO/AqNe1apswVT7XyT0isE75SyH/cq9hIjTZa3GgUygZLnScJh0/RQOyVJkWXxMsarInEi+vvxy2hqICew9Tg4pXfmnwQ4QxL7RAlunmaWwGMgeb/Qex2STxb/QTqpiVf08Onf7Fo/5AjTFnUTHZkboyucTkdJXyaL7ndmPwVcVFWmm2PiKcBHxJDbvkz3FYUJfB3BYssIMU0azrRf+qQgpG64lRxMub7PzGOBigyAuBedf3UUB4uPjI8FU5c+DY98pEJU0bAwVNIlKB0P1oFfjDCgtzfOAYcf8G6Bl09NbBs8PxDMwU+u3m4QzVri9YBVD8JQdVHj0g+e6evAZK+QisBXVU0/4T1CLH0Cdep7ywgT6zcWBt0sH4lyw2ESODgPzkvQLlRtUT493TIEwwWqllI+FCQHH6Lbe2oFC/MQqHvp7SnwR4MxwSqPPUFAubFApZ+ZnMftClBR6xe+xWQsnFJ9cZyj9/CdJ4VlxHl2rL8elPY8rI+QCPTb6kD9yS16OAb79hjiis bCg3VLzi jMkFJtzh8cR+VTivyGpkcXc7F3UWsVb6V9RFtvOo4XXkUQmYkQtIlrK3J4EIK86phh4QcSG3VrZBSjuhgU86ZbvXy1VFdmLciGwp2qHuuz2ANI4xLE7urILxdPnhQ603XUnV1qENBKRz5/Ayx64+wvNJ2Apm3iynlCMv8H0KFJZKcIGEKpAnQtSy6e9X8Fs2td4p/ltkSkBZ6USlT34a9H67IwfFWfYmgah4jBPBQ7Aq1WAg6hJPBFLJ1MRXYIg3GmiHW/lX773SRJ/jkUhQKEfZ9oidO9AWYRBHTlgJUvOG3NV3JRn1HMLywyZSfrrOxiwap2w4+5hCKI7zLhZU3lEWDIzCDMdE9YFrr2BYxtKI78qKqY0A6vbqdYPjeSgdDJRZR8QRNovtDh7dHpmtq9qVdT6mPH6wJQ5dPrt62I1lTFE5PWDI1v4/iQLR9Aaxyagtw3lWXXYCO19oDLYh6ErYTZvi5GMpNMgVUS5+SN0j8k+tkKXLxrMsTw4rZqJpiWHIKq+dB/SZ3j5eFA0r9vr2zAChLNF1dh5XNWfiJXh+0RMBzhHPZ71bJO7g7o6/1Z362CwnZcGtBtFtA00ksffA7VuOB8EnLbaPE67E8udBqAxSmu7L2gdWeRO/TTaIrSshA3nGadwv5bhGYuXzlB4FC3KZCoZxbBu6fMZ/hoqWWInQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > On Fri 08-11-24 11:19:49, Jim Zhao wrote: > > > On Wed 23-10-24 18:00:32, Jim Zhao wrote: > > > > With the strictlimit flag, wb_thresh acts as a hard limit in > > > > balance_dirty_pages() and wb_position_ratio(). When device write > > > > operations are inactive, wb_thresh can drop to 0, causing writes to > > > > be blocked. The issue occasionally occurs in fuse fs, particularly > > > > with network backends, the write thread is blocked frequently during > > > > a period. To address it, this patch raises the minimum wb_thresh to a > > > > controllable level, similar to the non-strictlimit case. > > > > > > > > Signed-off-by: Jim Zhao > > > > > > ... > > > > > > > + /* > > > > + * With strictlimit flag, the wb_thresh is treated as > > > > + * a hard limit in balance_dirty_pages() and wb_position_ratio(). > > > > + * It's possible that wb_thresh is close to zero, not because > > > > + * the device is slow, but because it has been inactive. > > > > + * To prevent occasional writes from being blocked, we raise wb_thresh. > > > > + */ > > > > + if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) { > > > > + unsigned long limit = hard_dirty_limit(dom, dtc->thresh); > > > > + u64 wb_scale_thresh = 0; > > > > + > > > > + if (limit > dtc->dirty) > > > > + wb_scale_thresh = (limit - dtc->dirty) / 100; > > > > + wb_thresh = max(wb_thresh, min(wb_scale_thresh, wb_max_thresh / 4)); > > > > + } > > > > > > What you propose makes sense in principle although I'd say this is mostly a > > > userspace setup issue - with strictlimit enabled, you're kind of expected > > > to set min_ratio exactly if you want to avoid these startup issues. But I > > > tend to agree that we can provide a bit of a slack for a bdi without > > > min_ratio configured to ramp up. > > > > > > But I'd rather pick the logic like: > > > > > > /* > > > * If bdi does not have min_ratio configured and it was inactive, > > > * bump its min_ratio to 0.1% to provide it some room to ramp up. > > > */ > > > if (!wb_min_ratio && !numerator) > > > wb_min_ratio = min(BDI_RATIO_SCALE / 10, wb_max_ratio / 2); > > > > > > That would seem like a bit more systematic way than the formula you propose > > > above... > > > > Thanks for the advice. > > Here's the explanation of the formula: > > 1. when writes are small and intermittent,wb_thresh can approach 0, not > > just 0, making the numerator value difficult to verify. > > I see, ok. > > > 2. The ramp-up margin, whether 0.1% or another value, needs > > consideration. > > I based this on the logic of wb_position_ratio in the non-strictlimit > > scenario: wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8); It seems > > provides more room and ensures ramping up within a controllable range. > > I see, thanks for explanation. So I was thinking how to make the code more > consistent instead of adding another special constant and workaround. What > I'd suggest is: > > 1) There's already code that's supposed to handle ramping up with > strictlimit in wb_update_dirty_ratelimit(): > > /* > * For strictlimit case, calculations above were based on wb counters > * and limits (starting from pos_ratio = wb_position_ratio() and up to > * balanced_dirty_ratelimit = task_ratelimit * write_bw / dirty_rate). > * Hence, to calculate "step" properly, we have to use wb_dirty as > * "dirty" and wb_setpoint as "setpoint". > * > * We rampup dirty_ratelimit forcibly if wb_dirty is low because > * it's possible that wb_thresh is close to zero due to inactivity > * of backing device. > */ > if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) { > dirty = dtc->wb_dirty; > if (dtc->wb_dirty < 8) > setpoint = dtc->wb_dirty + 1; > else > setpoint = (dtc->wb_thresh + dtc->wb_bg_thresh) / 2; > } > > Now I agree that increasing wb_thresh directly is more understandable and > transparent so I'd just drop this special case. yes, I agree. > 2) I'd just handle all the bumping of wb_thresh in a single place instead > of having is spread over multiple places. So __wb_calc_thresh() could have > a code like: > > wb_thresh = (thresh * (100 * BDI_RATIO_SCALE - bdi_min_ratio)) / (100 * BDI_RATIO_SCALE) > wb_thresh *= numerator; > wb_thresh = div64_ul(wb_thresh, denominator); > > wb_min_max_ratio(dtc->wb, &wb_min_ratio, &wb_max_ratio); > > wb_thresh += (thresh * wb_min_ratio) / (100 * BDI_RATIO_SCALE); > limit = hard_dirty_limit(dtc_dom(dtc), dtc->thresh); > /* > * It's very possible that wb_thresh is close to 0 not because the > * device is slow, but that it has remained inactive for long time. > * Honour such devices a reasonable good (hopefully IO efficient) > * threshold, so that the occasional writes won't be blocked and active > * writes can rampup the threshold quickly. > */ > if (limit > dtc->dirty) > wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8); > if (wb_thresh > (thresh * wb_max_ratio) / (100 * BDI_RATIO_SCALE)) > wb_thresh = thresh * wb_max_ratio / (100 * BDI_RATIO_SCALE); > > and we can drop the bumping from wb_position)_ratio(). This way have the > wb_thresh bumping in a single logical place. Since we still limit wb_tresh > with max_ratio, untrusted bdis for which max_ratio should be configured > (otherwise they can grow amount of dirty pages upto global treshold anyway) > are still under control. > > If we really wanted, we could introduce a different bumping in case of > strictlimit, but at this point I don't think it is warranted so I'd leave > that as an option if someone comes with a situation where this bumping > proves to be too aggressive. Thank you, this is very helpful. And I have 2 concerns: 1. In the current non-strictlimit logic, wb_thresh is only bumped within wb_position_ratio() for calculating pos_ratio, and this bump isn’t restricted by max_ratio. I’m unsure if moving this adjustment to __wb_calc_thresh() would effect existing behavior. Would it be possible to keep the current logic for non-strictlimit case? 2. Regarding the formula: wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8); Consider a case: With 100 fuse devices(with high max_ratio) experiencing high writeback delays, the pages being written back are accounted in NR_WRITEBACK_TEMP, not dtc->dirty. As a result, the bumped wb_thresh may remain high. While individual devices are under control, the total could exceed expectations. Although lowering the max_ratio can avoid this issue, how about reducing the bumped wb_thresh? The formula in my patch: wb_scale_thresh = (limit - dtc->dirty) / 100; The intention is to use the default fuse max_ratio(1%) as the multiplier. Thanks Jim Zhao