From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B323D5B154 for ; Mon, 28 Oct 2024 21:58:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1D116B00B1; Mon, 28 Oct 2024 17:58:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C7CCC6B00B2; Mon, 28 Oct 2024 17:58:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA9A06B00B3; Mon, 28 Oct 2024 17:58:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8791F6B00B1 for ; Mon, 28 Oct 2024 17:58:28 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4A319140500 for ; Mon, 28 Oct 2024 21:58:28 +0000 (UTC) X-FDA: 82724374488.27.6D525F5 Received: from mail-ua1-f51.google.com (mail-ua1-f51.google.com [209.85.222.51]) by imf27.hostedemail.com (Postfix) with ESMTP id 4B09340009 for ; Mon, 28 Oct 2024 21:58:02 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SJWdsdpJ; spf=pass (imf27.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.222.51 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730152652; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=G3nHNIn+SyqbBhOxYH3nfi3fd5TG0GXRysbDD+pOE5g=; b=mAvgFNDNfocPd1Xz7mwL7FrMj3LorXnPZAG2QYAJ8942So10bpH3kpcPH/uQc9Yefc5dG9 JuW1S8qRVR9DkdiHPl7oG3jv/0JzIralDZzyw9fnH13popyddqUGULaHHE3q9at31j3shX eDiUzZpgzKPcfvayQx37ygIY1xw7UM0= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SJWdsdpJ; spf=pass (imf27.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.222.51 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730152652; a=rsa-sha256; cv=none; b=b0//jQOTuh41TR1AhF4BkU88tTw3Rx/aoq01VnE2a+46F9AFcXP7u2iJn7UGOd0+Qt4FMK lcA2Jn0cGtSY95zTY3AnymqaompgoR2yCBlzlSuk0u1McUFeMJHtQdpoizR00XLrFAyHWZ aJur9rBMVeA8d+5NEfk7BcQsxhxHprE= Received: by mail-ua1-f51.google.com with SMTP id a1e0cc1a2514c-84fccf51df1so1402821241.2 for ; Mon, 28 Oct 2024 14:58:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730152705; x=1730757505; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=G3nHNIn+SyqbBhOxYH3nfi3fd5TG0GXRysbDD+pOE5g=; b=SJWdsdpJMl4blhqCB2W7foLNeBQn6PNhYV4yGmG46Oy9ci4yckzztEMwflmxKBMQaQ BLSRALDtLC/TaxQwrVdLXnSx64U8OS7LztXidfr3BP7eO0/j6DoFD45Tt+965rkNGkZA aLfGApxhqEq72Q2YaPmTrDDZzPnqXshWmpK+FSzFkvY0rbBZ9o1Y0xKeqlylH+VtGlgw HkPLIazqlP/BEq2EboTBcXrd5agKd2yd0u57oQnexamqaFTyrIeKM9WKHoaukBRORaJt BilhrkrBD39f9RzKJ7boP5771nv7XMo2vPcPN05evfzkoTwSQOwqBN+4ldEwdNBeLySi q2Ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730152705; x=1730757505; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=G3nHNIn+SyqbBhOxYH3nfi3fd5TG0GXRysbDD+pOE5g=; b=SWNCZGMm3DQfooLrWq0vUw7nshnIMcgjWNYH9gYiHfbvtivL4mRUwXrpqiH53R+ybp X0H+IRf6D+RYUSzL1UgC+7Ov4KDt51PYqYwGvp6g9mrPWGak7l0BxL2yXou9x0ePAWFg ahzKpfYgxk3R9jkgGwTwp/GiQN/CJjb660T0FWxF0RIvN5PrDK1DzZ3AFB3Jo9zxuCXf sV7sT+MgMO4uBQqTfP0LH3QS0ZGZJdSNk+6YInxAqkqnixUPuHC2lOlvqBB+GIBbp0tL JHrGm9eqJMWqlRWKKKUhkwL8pntSPuPkVleLHzRnldfba9OUF8nCwqbhwD6UaFvebMP+ xRAg== X-Forwarded-Encrypted: i=1; AJvYcCVeN8Sd+D/yTa6kcftT1QUPL5pjgW7rYS4aBtoXl0UDEW2a9/V/FHvJuBHr/FbKRQuMv9Ham+6WcA==@kvack.org X-Gm-Message-State: AOJu0YwED5I2USvtb6JWGOQvz92AMVyBuhn2lWTy6ww8nK4r86R1Ofkh tWUO/VGmcpGiauBddA8wfLISjAETnna3j1uYFElzGSdv/UcQ+LPZeBJvHEg4Z1jSXMLaP8B4mW4 n2LetrQ6BUbz7zEsBbvvwL+/Te4I= X-Google-Smtp-Source: AGHT+IGTPkceBN5rXEH7cS6guGCFxTUJCyFx78BQM1XPeK2sw1zZlYSCbxYSZi4Q0TMNXdmeO1AsxjKvDINxFRdw+pI= X-Received: by 2002:a05:6102:160a:b0:4a5:e5e5:f929 with SMTP id ada2fe7eead31-4a8cfb85fe5mr8216854137.13.1730152705400; Mon, 28 Oct 2024 14:58:25 -0700 (PDT) MIME-Version: 1.0 References: <20241014182228.1941246-1-joannelkoong@gmail.com> <20241014182228.1941246-3-joannelkoong@gmail.com> <3e4ff496-f2ed-42ef-9f1a-405f32aa1c8c@linux.alibaba.com> In-Reply-To: From: Joanne Koong Date: Mon, 28 Oct 2024 14:58:14 -0700 Message-ID: Subject: Re: [PATCH v2 2/2] fuse: remove tmp folio for writebacks and internal rb tree To: Jingbo Xu Cc: Miklos Szeredi , Shakeel Butt , linux-fsdevel@vger.kernel.org, josef@toxicpanda.com, bernd.schubert@fastmail.fm, hannes@cmpxchg.org, linux-mm@kvack.org, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: 87drftq1xe51dgcupjrdowunqicjrjoo X-Rspamd-Queue-Id: 4B09340009 X-Rspamd-Server: rspam11 X-HE-Tag: 1730152682-13707 X-HE-Meta: U2FsdGVkX19PrcWjxp0WFIXg0zfUHZeS1E8OtLAw7zl51L/bnUWWMGhzTvnVqpEK8jtuK6gtGEIBiJ1xZv/SbqUFdDcwbfx5LJwtYgkQiD0CaYTx1tnKqSIrffJgqJ/qDISfyEGVHSF0EwNSWsfHXod2Y4RPr0L6jDysaJd8G+nRIU3IxHFbDl9EoTh8i5zjGyGx+hXYxKkZooYubz2/Fs7bUFjgT7qPY//L8tPFoz4X5eUjPFdI59lwbPaU0dybXLAj8H6sH43MaAidx/LSyoGBMhnYHBDsNd5a3tEZOrLSMxUI5etp7e15pUvNp8iFGxav+LC1vq5ABiTApejJLYqTMvv607g931Av9MGmEvqnjnFtPCVhCF7+3jy76Qq0IHFrLKiAnGPkuxjdm1Vmds20DuV0E/KeIqxxdo4U3ACPgoR9AOMrR+ySWjbaOXrYKfj8ittyeRatu85B7LiDNwi9ZI9GAC3bcFqxM1wo6Sww+0w1yjq3mS5s6F/qrmp6+VzomjfRILGWjNh10tHfySuxqIms9P4hK3E6gkhW+IvdDu1o0dA9l09tudbEBpISm/tHIeWwKeTmzerGn6hkFTI+R8sXbwLzXTVWpJsxzkBX28BgwDAQ3ymmnlXEDsjMVoPc8QFTxcAzll1XhC06tC15bu0HCP3x1wUl2GwEbuuY7v10WLwNVTsqWDviikrZ09taaBVeU576pTTiiU3kRbo2+Z55B22WJP6Vtm9KRYNCOLk7whcrqjVroKoLTlwQWFLMFh6LN/RUA4whacDsH1540AkF+TlRmk3FVn/cmjqSfleSLCRZqvc+ac/YHtXkYm/KW4nSjVnyJ5Wnb7guVIpeMmWYOVGE5GxHVE1eCVC1+OuoOa69e5dyIkWTo0neyq5cPeBApkILE7mJRpTONNQzsHB6oWXr4eZgPWL0Ok96BF69dN5HlBmehyTzsUaOyRQgMuIs0TqCL/AMihQ tlKo0EKp V9fABNqoJjFhIOlTarO9FfkF4xxxOT9mcgDa8IJbSAgeLUKlfF1Qt18Y9oxTelCplBkeXAeaqtFLj3INwkl0aBwPmGEMHPyN52vOJ6dLsj3O/B6se4tcWeHQQJgy9sZaQpo3ttF59NuK4v9CqVbrkzYtASlu06X0wLrGc7VVxGiujLC5nDUb7rBQCm2dM3XLvUnFxL70LOjrZ09kGT2kYMLIaNRDU/eZshyM6AfAb6w0H6p0dYHlu837oWz/kDoZVBFrnmM1csKS3cUo+ckoP8pi76aDoGFLxehJEmkcFfebPRpBY9Yn78hlRtF1+CvMULzmSNx0GpmrVpQn2JqRuqkuVFbQ+Uq05/Z7OfL9BH70ufisxdkVEjKpuna5Ey3RIwytW5eJzaTp45T+WHiiwJYfWKlAOHqOX+1Tz/s9m7M/3Sgqj4CzHi513GQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Oct 25, 2024 at 3:40=E2=80=AFPM Joanne Koong wrote: > > On Fri, Oct 25, 2024 at 10:36=E2=80=AFAM Joanne Koong wrote: > > > > On Thu, Oct 24, 2024 at 6:38=E2=80=AFPM Jingbo Xu wrote: > > > > > > > > > > > > On 10/25/24 12:54 AM, Joanne Koong wrote: > > > > On Mon, Oct 21, 2024 at 2:05=E2=80=AFPM Joanne Koong wrote: > > > >> > > > >> On Mon, Oct 21, 2024 at 3:15=E2=80=AFAM Miklos Szeredi wrote: > > > >>> > > > >>> On Fri, 18 Oct 2024 at 07:31, Shakeel Butt wrote: > > > >>> > > > >>>> I feel like this is too much restrictive and I am still not sure= why > > > >>>> blocking on fuse folios served by non-privileges fuse server is = worse > > > >>>> than blocking on folios served from the network. > > > >>> > > > >>> Might be. But historically fuse had this behavior and I'd be ver= y > > > >>> reluctant to change that unconditionally. > > > >>> > > > >>> With a systemwide maximal timeout for fuse requests it might make > > > >>> sense to allow sync(2), etc. to wait for fuse writeback. > > > >>> > > > >>> Without a timeout allowing fuse servers to block sync(2) indefini= tely > > > >>> seems rather risky. > > > >> > > > >> Could we skip waiting on writeback in sync(2) if it's a fuse folio= ? > > > >> That seems in line with the sync(2) documentation Jingbo reference= d > > > >> earlier where it states "The writing, although scheduled, is not > > > >> necessarily complete upon return from sync()." > > > >> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sync.ht= ml > > > >> > > > > > > > > So I think the answer to this is "no" for Linux. What the Linux man > > > > page for sync(2) says: > > > > > > > > "According to the standard specification (e.g., POSIX.1-2001), sync= () > > > > schedules the writes, but may return before the actual writing is > > > > done. However Linux waits for I/O completions, and thus sync() or > > > > syncfs() provide the same guarantees as fsync() called on every fil= e > > > > in the system or filesystem respectively." [1] > > > > > > Actually as for FUSE, IIUC the writeback is not guaranteed to be > > > completed when sync(2) returns since the temp page mechanism. When > > > sync(2) returns, PG_writeback is indeed cleared for all original page= s > > > (in the address_space), while the real writeback work (initiated from > > > temp page) may be still in progress. > > > > > > > That's a great point. It seems like we can just skip waiting on > > writeback to finish for fuse folios in sync(2) altogether then. I'll > > look into what's the best way to do this. > > > > > I think this is also what Miklos means in: > > > https://lore.kernel.org/all/CAJfpegsJKD4YT5R5qfXXE=3DhyqKvhpTRbD4m1ws= YNbGB6k4rC2A@mail.gmail.com/ > > > > > > Though we need special handling for AS_NO_WRITEBACK_RECLAIM marked pa= ges > > > in sync(2) codepath similar to what we have done for the direct recla= im > > > in patch 1. > > > > > > > > > > > > > > Regardless of the compaction / page migration issue then, this > > > > blocking sync(2) is a dealbreaker. > > > > > > I really should have figureg out the compaction / page migration > > > mechanism and the potential impact to FUSE when we dropping the temp > > > page. Just too busy to take some time on this though..... > > > > Same here, I need to look some more into the compaction / page > > migration paths. I'm planning to do this early next week and will > > report back with what I find. > > > > These are my notes so far: > > * We hit the folio_wait_writeback() path when callers call > migrate_pages() with mode MIGRATE_SYNC > ... -> migrate_pages() -> migrate_pages_sync() -> > migrate_pages_batch() -> migrate_folio_unmap() -> > folio_wait_writeback() > > * These are the places where we call migrate_pages(): > 1) demote_folio_list() > Can ignore this. It calls migrate_pages() in MIGRATE_ASYNC mode > > 2) __damon_pa_migrate_folio_list() > Can ignore this. It calls migrate_pages() in MIGRATE_ASYNC mode > > 3) migrate_misplaced_folio() > Can ignore this. It calls migrate_pages() in MIGRATE_ASYNC mode > > 4) do_move_pages_to_node() > Can ignore this. This calls migrate_pages() in MIGRATE_SYNC mode but > this path is only invoked by the move_pages() syscall. It's fine to > wait on writeback for the move_pages() syscall since the user would > have to deliberately invoke this on the fuse server for this to apply > to the server's fuse folios > > 5) migrate_to_node() > Can ignore this for the same reason as in 4. This path is only invoked > by the migrate_pages() syscall. > > 6) do_mbind() > Can ignore this for the same reason as 4 and 5. This path is only > invoked by the mbind() syscall. > > 7) soft_offline_in_use_page() > Can skip soft offlining fuse folios (eg folios with the > AS_NO_WRITEBACK_WAIT mapping flag set). > The path for this is soft_offline_page() -> soft_offline_in_use_page() > -> migrate_pages(). soft_offline_page() only invokes this for in-use > pages in a well-defined state (see ret value of get_hwpoison_page()). > My understanding of soft offlining pages is that it's a mitigation > strategy for handling pages that are experiencing errors but are not > yet completely unusable, and its main purpose is to prevent future > issues. It seems fine to skip this for fuse folios. > > 8) do_migrate_range() > 9) compact_zone() > 10) migrate_longterm_unpinnable_folios() > 11) __alloc_contig_migrate_range() > > 8 to 11 needs more investigation / thinking about. I don't see a good > way around these tbh. I think we have to operate under the assumption > that the fuse server running is malicious or benevolently but > incorrectly written and could possibly never complete writeback. So we > definitely can't wait on these but it also doesn't seem like we can > skip waiting on these, especially for the case where the server uses > spliced pages, nor does it seem like we can just fail these with > -EBUSY or something. > I'm still not seeing a good way around this. What about this then? We add a new fuse sysctl called something like "/proc/sys/fs/fuse/writeback_optimization_timeout" where if the sys admin sets this, then it opts into optimizing writeback to be as fast as possible (eg skipping the page copies) and if the server doesn't fulfill the writeback by the set timeout value, then the connection is aborted. Alternatively, we could also repurpose /proc/sys/fs/fuse/max_request_timeout from the request timeout patchset [1] but I like the additional flexibility and explicitness having the "writeback_optimization_timeout" sysctl gives. Any thoughts on this? [1] https://lore.kernel.org/linux-fsdevel/20241011191320.91592-4-joannelkoo= ng@gmail.com/ Thanks, Joanne > Will continue looking more into this early next week. > > Thanks, > Joanne > > > > Thanks, > > Joanne > > > > > > > > > -- > > > Thanks, > > > Jingbo