From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8EEFD3A67E for ; Wed, 30 Oct 2024 09:32:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 18B746B00D1; Wed, 30 Oct 2024 05:32:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 13B196B00D2; Wed, 30 Oct 2024 05:32:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ECDFF6B00D3; Wed, 30 Oct 2024 05:32:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CD0036B00D1 for ; Wed, 30 Oct 2024 05:32:45 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4B97E1A03A4 for ; Wed, 30 Oct 2024 09:32:45 +0000 (UTC) X-FDA: 82729753092.21.0A71145 Received: from fhigh-a2-smtp.messagingengine.com (fhigh-a2-smtp.messagingengine.com [103.168.172.153]) by imf04.hostedemail.com (Postfix) with ESMTP id AB16740005 for ; Wed, 30 Oct 2024 09:32:10 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=fastmail.fm header.s=fm3 header.b=HixTjREt; dkim=pass header.d=messagingengine.com header.s=fm3 header.b="V 0Rns9B"; spf=pass (imf04.hostedemail.com: domain of bernd.schubert@fastmail.fm designates 103.168.172.153 as permitted sender) smtp.mailfrom=bernd.schubert@fastmail.fm; dmarc=pass (policy=none) header.from=fastmail.fm ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730280603; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vP47Fi19GSqOURIFyHr3BgCH6OFkEfDLoamJ4JOvUm4=; b=MiIN0aROWa1OgCDkRBvuyf9Eeh8ARzCwIawwQX+sDxzTLBPG8cxTmr1imKZhB3iPfEBbSD Y+jAFLfL1SmFabkfVa/Ko0RfgPJKU/xPYdUhNhJ9b6C20oZrKjgjBPWVgjzh3b660SxMjy FhmpBtXhvAgZGZlLDsKgSr3i/CvPHBQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730280603; a=rsa-sha256; cv=none; b=Kw06fHTTv7Aiq5R6alGCU7hoaXIA59KeKurjl80+irqiuzXlkLqM8QliLB7IHLIusH6JJM 5LHat+3sFmBtFw/kVH0Qau7+E652mzFV+DWxo4Fm1xXuhEHbdBbPs6/wZ32kJ8QqmVNRlD cozrUhNMyCfNt+FpcDo9b7LCQ03KruQ= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=fastmail.fm header.s=fm3 header.b=HixTjREt; dkim=pass header.d=messagingengine.com header.s=fm3 header.b="V 0Rns9B"; spf=pass (imf04.hostedemail.com: domain of bernd.schubert@fastmail.fm designates 103.168.172.153 as permitted sender) smtp.mailfrom=bernd.schubert@fastmail.fm; dmarc=pass (policy=none) header.from=fastmail.fm Received: from phl-compute-10.internal (phl-compute-10.phl.internal [10.202.2.50]) by mailfhigh.phl.internal (Postfix) with ESMTP id B005711400F0; Wed, 30 Oct 2024 05:32:42 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-10.internal (MEProxy); Wed, 30 Oct 2024 05:32:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastmail.fm; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm3; t=1730280762; x=1730367162; bh=vP47Fi19GSqOURIFyHr3BgCH6OFkEfDLoamJ4JOvUm4=; b= HixTjREtC+k1ucSRHUmEFo4cmyhIT9/dw0JpbD1qG7gZHdvirOTWTRNGN9kDJX0A ZT+g/n6g/pBJf49p1XrvOHgAOjBDHxKW3k8P/S31Ibq4JJVGN0KR5AIfEXhLyCDG cPstjK8NNtSKXU6QzkWxmIRobHEYs98NucVSFhxr3qE+dSHyethdJ6tnj1J/oNrz cohSPO9cGfSxj+0rPyLEX4iL9QQqxP5owagKXeW4uqbS77NKfxU7aol3Qmr500Le rkCdvRebPvd6g3pE5scPKGWNnSKra8jkxt83n85O9gLwiuibdm+lUPZzyg8mgj/0 IhQnqCtcCpttJq0BDnfg9A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1730280762; x= 1730367162; bh=vP47Fi19GSqOURIFyHr3BgCH6OFkEfDLoamJ4JOvUm4=; b=V 0Rns9BgI1Q4hrq70oytQoOK8A1SGb2tSLjDeGZd5OTcr26AOrmPuJM4uEMAx1lWj Ni9+tBv9KANYCCzwyTIt4ikeLGwl0qW4Db7JKQGw6B3fah36lIKTxMqYbWun7bJm 2JEw5WpWF4HIlgyPnDjaH5YsecklDz8W/0x77Y0aevzM7G6RmXcUFAwNMWTuhgtw 3D9Q10UjVCbUKigp87BkrhgzTnZIbcBKBAVHvEXPjvgDVXrbyeGYq4625hFwY0Pi y0reeSshZhsrQGBksM5s5ddsIKzS+2RYRmac2Oy5q4+k/ezjJR81vUEn9cqrbGFu ZHc1CJDUHXsbkOB9TpZnQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrvdekfedgtdehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnh htshculddquddttddmnecujfgurhepkfffgggfuffvvehfhfgjtgfgsehtkeertddtvdej necuhfhrohhmpeeuvghrnhguucfutghhuhgsvghrthcuoegsvghrnhgurdhstghhuhgsvg hrthesfhgrshhtmhgrihhlrdhfmheqnecuggftrfgrthhtvghrnhepudelfedvudevudev leegleffffekudekgeevlefgkeeluedvheekheehheekhfefnecuvehluhhsthgvrhfuih iivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepsggvrhhnugdrshgthhhusggvrhht sehfrghsthhmrghilhdrfhhmpdhnsggprhgtphhtthhopeelpdhmohguvgepshhmthhpoh huthdprhgtphhtthhopehjohgrnhhnvghlkhhoohhnghesghhmrghilhdrtghomhdprhgt phhtthhopehjvghffhhlvgiguheslhhinhhugidrrghlihgsrggsrgdrtghomhdprhgtph htthhopehmihhklhhoshesshiivghrvgguihdrhhhupdhrtghpthhtohepshhhrghkvggv lhdrsghuthhtsehlihhnuhigrdguvghvpdhrtghpthhtoheplhhinhhugidqfhhsuggvvh gvlhesvhhgvghrrdhkvghrnhgvlhdrohhrghdprhgtphhtthhopehjohhsvghfsehtohig ihgtphgrnhgurgdrtghomhdprhgtphhtthhopehhrghnnhgvshestghmphigtghhghdroh hrghdprhgtphhtthhopehlihhnuhigqdhmmheskhhvrggtkhdrohhrghdprhgtphhtthho pehkvghrnhgvlhdqthgvrghmsehmvghtrgdrtghomh X-ME-Proxy: Feedback-ID: id8a24192:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 30 Oct 2024 05:32:39 -0400 (EDT) Message-ID: <0c3e6a4c-b04e-4af7-ae85-a69180d25744@fastmail.fm> Date: Wed, 30 Oct 2024 10:32:38 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 2/2] fuse: remove tmp folio for writebacks and internal rb tree To: Joanne Koong , Jingbo Xu Cc: Miklos Szeredi , Shakeel Butt , linux-fsdevel@vger.kernel.org, josef@toxicpanda.com, hannes@cmpxchg.org, linux-mm@kvack.org, kernel-team@meta.com References: <20241014182228.1941246-1-joannelkoong@gmail.com> <3e4ff496-f2ed-42ef-9f1a-405f32aa1c8c@linux.alibaba.com> From: Bernd Schubert Content-Language: en-US, de-DE, fr In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: AB16740005 X-Stat-Signature: 74jmmoeu89a7gjefqkkstqepy14ygeeb X-HE-Tag: 1730280730-770336 X-HE-Meta: U2FsdGVkX1/VJGfU5VzWI/Vuw68d2CgYWWxc8WY0f5tr3gfvLZhze9WfFQqjRd+NUnQ7i/1Et6ISwQMOIOeXle1RrgFubYqLyu2d5V9p4H8i6hAqt0Q0AS2BGZ7dwHcGA+00OULMsOQJI9XjwNntsk3UtirvWxRffANP53iI+lBZzet321mcfBdxc6mVLTiBOgvNamYGy9W8kS+qIhXWjnAcsMslN9vU/bgji4bl2aRB2B5k7YavIrASYN1Xmx2JKdUYeIA2C2U1Vo32GxT9VxbdUb5TWrZPPsnfYdtuXZu4RuNcjj2i3f7yZuywe6qz4dA+QVIgYszgbJ1weTInWTlxqmfMIMrT7LT5+6/znvTmz39qJjdUiuE436gFafE5MwOXpfW5UPmGqgYKVziw4GRUSDR2B479+9cARKQXJfM1YaIjHbc33+XLNIJmaUfMFMB+A02g/qzDKCJSK6tUCh/CoLVVHUf5sfzCv0x/kZ6GC4862tSKuf8scfFPr0QetTmEK7EuGKhoZD7JBG9kCc5n1KIW+H1nLxLD70L9uCPuq9bpaSvIh8VV6C8b0Q1Q4NxkV0katM9nRG8CCo8TwRP3AE6X45oryvOyCVY8nXKUptHGbTvniQBiKNAW0tg2MSUoQkAZ6grINMNSvI7Z3C5Kc6Dset0UPO0T19zQtidhZ0dC51BAoJhv4EuV9xH14ITZN/DMNEH+KNmQR9LBF5PxCY2RlRxZXdEvpQE9rRdNID97SnUWILoN38P1X7PHRf1oBbeM3okO6t8PcX3I99hCHqwJjS4mUMwecF4jO0W5KMLKYLessZkfhTIxDW5JCiFnE6wvHbaht8qmwfxdkS6lvKgee0yEXW7MwxdOGFhRgfWE5s0DCXQmIo50tiwRVhWfpnSHfjqZuzRJwslDVeTCxNuCRxXYYeMUmz4F8k8CANKHczSFD5XPyWNabdoy/q7D0RIrmBdPXaPYe6/ zHDcBM9S 4t9NViJcbt/zFydjnG/GFI4hiOpjJDtA8X7L+QLgM26osgSqCtmwR1t097BC6TN3Zgu5wzk5pmDmhTVnQsd7osLmuBTtsuZu8IqWEy704u7/AbV46qR7lp13OF5meO8xoPgd/y24MYsCp9elN+XcoD+0mINbLc3MP1cqS1bfpF7VwTNNmfmQRjJwqZ+VVOdZEVAhc2acDtezj/tWKcHGXem5RNPUk6UelJRKpGXtmOJXL3BPUVIRaxIYTJkKF5AZDGCZpD6Jn/WjPSw9VKHYXqSf3z9/gVHkfkf40oAFH3pn8zplRcl9yD18WocubQnRhRTzFRejPPvvZqLO+56Dk3otW9tetENbgKbdFZ+s9HSD1M5QV/wYO421w8Om2UvJjXJNajGNQ409vwtMpENxqe5+2JA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 10/28/24 22:58, Joanne Koong wrote: > On Fri, Oct 25, 2024 at 3:40 PM Joanne Koong wrote: >> >>> Same here, I need to look some more into the compaction / page >>> migration paths. I'm planning to do this early next week and will >>> report back with what I find. >>> >> >> These are my notes so far: >> >> * We hit the folio_wait_writeback() path when callers call >> migrate_pages() with mode MIGRATE_SYNC >> ... -> migrate_pages() -> migrate_pages_sync() -> >> migrate_pages_batch() -> migrate_folio_unmap() -> >> folio_wait_writeback() >> >> * These are the places where we call migrate_pages(): >> 1) demote_folio_list() >> Can ignore this. It calls migrate_pages() in MIGRATE_ASYNC mode >> >> 2) __damon_pa_migrate_folio_list() >> Can ignore this. It calls migrate_pages() in MIGRATE_ASYNC mode >> >> 3) migrate_misplaced_folio() >> Can ignore this. It calls migrate_pages() in MIGRATE_ASYNC mode >> >> 4) do_move_pages_to_node() >> Can ignore this. This calls migrate_pages() in MIGRATE_SYNC mode but >> this path is only invoked by the move_pages() syscall. It's fine to >> wait on writeback for the move_pages() syscall since the user would >> have to deliberately invoke this on the fuse server for this to apply >> to the server's fuse folios >> >> 5) migrate_to_node() >> Can ignore this for the same reason as in 4. This path is only invoked >> by the migrate_pages() syscall. >> >> 6) do_mbind() >> Can ignore this for the same reason as 4 and 5. This path is only >> invoked by the mbind() syscall. >> >> 7) soft_offline_in_use_page() >> Can skip soft offlining fuse folios (eg folios with the >> AS_NO_WRITEBACK_WAIT mapping flag set). >> The path for this is soft_offline_page() -> soft_offline_in_use_page() >> -> migrate_pages(). soft_offline_page() only invokes this for in-use >> pages in a well-defined state (see ret value of get_hwpoison_page()). >> My understanding of soft offlining pages is that it's a mitigation >> strategy for handling pages that are experiencing errors but are not >> yet completely unusable, and its main purpose is to prevent future >> issues. It seems fine to skip this for fuse folios. >> >> 8) do_migrate_range() >> 9) compact_zone() >> 10) migrate_longterm_unpinnable_folios() >> 11) __alloc_contig_migrate_range() >> >> 8 to 11 needs more investigation / thinking about. I don't see a good >> way around these tbh. I think we have to operate under the assumption >> that the fuse server running is malicious or benevolently but >> incorrectly written and could possibly never complete writeback. So we >> definitely can't wait on these but it also doesn't seem like we can >> skip waiting on these, especially for the case where the server uses >> spliced pages, nor does it seem like we can just fail these with >> -EBUSY or something. I see some code paths with -EAGAIN in migration. Could you explain why we can't just fail migration for fuse write-back pages? >> > > I'm still not seeing a good way around this. > > What about this then? We add a new fuse sysctl called something like > "/proc/sys/fs/fuse/writeback_optimization_timeout" where if the sys > admin sets this, then it opts into optimizing writeback to be as fast > as possible (eg skipping the page copies) and if the server doesn't > fulfill the writeback by the set timeout value, then the connection is > aborted. > > Alternatively, we could also repurpose > /proc/sys/fs/fuse/max_request_timeout from the request timeout > patchset [1] but I like the additional flexibility and explicitness > having the "writeback_optimization_timeout" sysctl gives. > > Any thoughts on this? I'm a bit worried that we might lock up the system until time out is reached - not ideal. Especially as timeouts are in minutes now. But even a slightly stuttering video system not be great. I think we should give users/admin the choice then, if they prefer slow page copies or fast, but possibly shortly unresponsive system. Thank, Bernd