From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66EEFE7718B for ; Mon, 23 Dec 2024 22:14:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E7B296B007B; Mon, 23 Dec 2024 17:14:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E2C346B0082; Mon, 23 Dec 2024 17:14:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF23E6B0083; Mon, 23 Dec 2024 17:14:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B2C586B007B for ; Mon, 23 Dec 2024 17:14:43 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 705CB161088 for ; Mon, 23 Dec 2024 22:14:43 +0000 (UTC) X-FDA: 82927627440.08.14E1C04 Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) by imf25.hostedemail.com (Postfix) with ESMTP id EE544A0005 for ; Mon, 23 Dec 2024 22:14:13 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=sE9Fyiwa; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf25.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.183 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734992063; a=rsa-sha256; cv=none; b=b93/Zh4cd6XsT8goeLmQVNJD7aFGK7Fk2CT+tz8WwHQUsBlPsZPrm8oz3mekbwFnmkKrN0 mKswt1wDRFvvUeUEjiGQk/FJgC7NHx1HCN6cd9mYKymH0711mYRbIYgLyws7n8ZM994xHu R522mT+PIZQtwO1JV+XiwkG+/KY1E08= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=sE9Fyiwa; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf25.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.183 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734992063; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=posGoIiTrb7ucPQNY8cmkzaKpk3ZsGKrSg1es33VyUE=; b=6DnR9OZ0URH6/rhztwuqCg4yGGvcTWzaxQ/apC9J5tJTHJWznm6HspvSMx7QGGBvVm5NYX mW0Bv1su3keIPJVIN8QDM1b684+K/FFU99QHRnCPcmCVjYUTfbqZy1RN0pWVi/5mEMmUhm G3MJdkwcDQnKqCaWPpBleKmJ5yN8SBw= Date: Mon, 23 Dec 2024 14:14:34 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1734992079; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=posGoIiTrb7ucPQNY8cmkzaKpk3ZsGKrSg1es33VyUE=; b=sE9FyiwaaqnsmEPf0qT8iRkoFlvIRCJeKcTFrNauOs+yGORf/x0YPsuMGz5uTxuUyyhEW8 c9oVJdNV3AhZFahbZAefgriZ+Im4NfIp86eXIahN0gTkmufhsesOIJWmBgBLPnwT4LatFk XEwOvl4kXc5mpxI+3M0mAn7QqTcH+N4= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: David Hildenbrand Cc: Bernd Schubert , Joanne Koong , Zi Yan , miklos@szeredi.hu, linux-fsdevel@vger.kernel.org, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, kernel-team@meta.com, Matthew Wilcox , Oscar Salvador , Michal Hocko Subject: Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings Message-ID: References: <7b6b8143-d7a4-439f-ae35-a91055f9d62a@redhat.com> <2e13a67a-0bad-4795-9ac8-ee800b704cb6@fastmail.fm> <2bph7jx4hvhxpgp77shq2j7mo4xssobhqndw5v7hdvbn43jo2w@scqly5zby7bm> <71d7ac34-a5e5-4e59-802b-33d8a4256040@redhat.com> <9404aaa2-4fc2-4b8b-8f95-5604c54c162a@redhat.com> <3f3c7254-7171-4987-bb1b-24c323e22a0f@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3f3c7254-7171-4987-bb1b-24c323e22a0f@redhat.com> X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: EE544A0005 X-Stat-Signature: fenruonkr7oxt9qjhq1w6y1sk9rmctad X-HE-Tag: 1734992053-432554 X-HE-Meta: U2FsdGVkX18DDRWen2dwEgWjzCmgFrYEqImH8SLUIPftkTTjUVJbum0wjIF2P/fESmzZMADPO/O3GMDUhIJ7Ly+9/egxaTEZ3/1EUF1O1H7w3FXr13q+0IO/FBhTE8TMPxZ1EO1i/viAQsCJvq4sRPgrRKidHvqulPqsWv+0ajw9pz0uzeRENQcA2UzOXmp3GU8gDKGOUdpuslPK9GHABkRoOrV0PDpQi76YFMB+7h4jCOv5BLv0CGn55MejVwpsBiQKB8CczL3K1Bjd2sigLmVhJHx5r32hQVLU/SeUfBXAnvdHUgogWoqFmIUaGBxEykJKDL2Xkp2rz/DbGQVJE+EUasKtxknLgNpb+l7whD0w5zqKZKpPNAyPy1VjZyvWGmwqywOJRprgn8CU9/odpWk1zHDQnqNhRgTSmsIYmCR2IYUqz1VqYznaLAiy0nWwInEZU4bYq+c3SVca80PbMwt3gulljSp7bEa49yzgppuJDuUbzYfmlOj7SXkyJi7Q02koNFVEzZzpcTzyoSxyM0ZYuy3nPGbfsvoHXeGF21hZex3AH5Ee7kX6NbQ7JLektkXgL+XCo28k8naCocOwRi36xcNhlr1LjaZYdNckkhi6nMQzU+W+hY+aSdEhoeiQdy2KELkwJe24npBoOLX8bn4tDtm7PNcNyXVfI2xjj/b0ru6qpL/OLwK2Fqw6l5irdZkNGxIOUzzkzNcX3x/7Jj2EtiztO9CKjvdR2ChrnygfxiGm0EARUZ3NJUXU2aAReJtUDVRCC3zvpYpJfHwHQBUawlWfmoi5TjBfO0EhshnKS8KRyk6+vJEvMtShLRYbf0+UW4iy0dIWE9s2NJSsT0KFmdVeqSsB5KsaJ7U5jeczwJ3ZZhe2bA9RCh7nWMVTmzhWWdfykKS4VQ2SvVEsqx7PA1aZeeol5yD32wKByswXrhH8/Yjo83X0UDsZNKbmmXM9MVPYK2czIQcq0G8 F5hUtUIO FA84LJ7NBxqDiFkzqTdW0Co5EyTfg8l354aYp2K6FXJ81OZyws6FUIbuP8F6AZlxdmVb41Yk5l5EGXbZVztjuszJqUja/qML0T66vJvlU2tUKUv1zRTd74o3F3WJLM6fEhq+HRsFX9h+itdGuNWMwPRGQhrxBYiUWMIAEhRPE56HWsQaOvPQnWxxXrblCyKHGwVqUsCNL+g1kB1U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Dec 21, 2024 at 05:18:20PM +0100, David Hildenbrand wrote: [...] > > Yes, so I can see fuse > > (1) Breaking memory reclaim (memory cannot get freed up) > > (2) Breaking page migration (memory cannot be migrated) > > Due to (1) we might experience bigger memory pressure in the system I guess. > A handful of these pages don't really hurt, I have no idea how bad having > many of these pages can be. But yes, inherently we cannot throw away the > data as long as it is dirty without causing harm. (maybe we could move it to > some other cache, like swap/zswap; but that smells like a big and > complicated project) > > Due to (2) we turn pages that are supposed to be movable possibly for a long > time unmovable. Even a *single* such page will mean that CMA allocations / > memory unplug can start failing. > > We have similar situations with page pinning. With things like O_DIRECT, our > assumption/experience so far is that it will only take a couple of seconds > max, and retry loops are sufficient to handle it. That's why only long-term > pinning ("indeterminate", e.g., vfio) migrate these pages out of > ZONE_MOVABLE/MIGRATE_CMA areas in order to long-term pin them. > > > The biggest concern I have is that timeouts, while likely reasonable it many > scenarios, might not be desirable even for some sane workloads, and the > default in all system will be "no timeout", letting the clueless admin of > each and every system out there that might support fuse to make a decision. > > I might have misunderstood something, in which case I am very sorry, but we > also don't want CMA allocations to start failing simply because a network > connection is down for a couple of minutes such that a fuse daemon cannot > make progress. > I think you have valid concerns but these are not new and not unique to fuse. Any filesystem with a potential arbitrary stall can have similar issues. The arbitrary stall can be caused due to network issues or some faultly local storage. Regarding the reclaim, I wouldn't say fuse or similar filesystem are breaking memory reclaim as the kernel has mechanism to throttle the threads dirtying the file memory to reduce the chance of situations where most of memory becomes unreclaimable due to being dirty. Please note that such filesystems are mostly used in environments like data center or hyperscalar and usually have more advanced mechanisms to handle and avoid situations like long delays. For such environment network unavailability is a larger issue than some cma allocation failure. My point is: let's not assume the disastrous situaion is normal and overcomplicate the solution.