From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01825D149F6 for ; Fri, 25 Oct 2024 22:40:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 752498D0006; Fri, 25 Oct 2024 18:40:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6DA918D0001; Fri, 25 Oct 2024 18:40:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 52CCE8D0006; Fri, 25 Oct 2024 18:40:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 132BA8D0001 for ; Fri, 25 Oct 2024 18:40:40 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 709D2A100B for ; Fri, 25 Oct 2024 22:40:03 +0000 (UTC) X-FDA: 82713594054.20.316BF07 Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) by imf02.hostedemail.com (Postfix) with ESMTP id 7DB6180015 for ; Fri, 25 Oct 2024 22:39:59 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=luDd0izw; spf=pass (imf02.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729895959; a=rsa-sha256; cv=none; b=1n2dg74npSTGNrr4Jj4dmFi/1PLYHn08pfoL3996Sqy9gqPPAG2a5jfOjifHvm3RvwbkPK pzlOaZ0GGzbPDW4Guqlyk/9zEQMuCJkymTxzZchSEvCHfQjJ2cC4m90D+/GO2kZRGhRRG2 vMzA+2n4TsCOi7x+rElffaJwh1ZXGrI= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=luDd0izw; spf=pass (imf02.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729895959; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dvXrzVj6NC2cWh2gHTFKDNacmkNtTMz1oS0GDy16qUM=; b=b7b8oUd0dPow1eHN2M/FzDtEeP/320z5W9CDAQ/P9n/tmuew/mncPFoVAXG41m/So50X7C oGNvVvpqPaWLLpplEV9bil+ddJrUQV1O/75YqwyYAPKuzQ889Pt61bcgUyC8yjajNYRmJk dKF7TdHL0RAN3RlG4z082r+IEgi4qhU= Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-4609d8874b1so16894711cf.3 for ; Fri, 25 Oct 2024 15:40:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729896037; x=1730500837; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=dvXrzVj6NC2cWh2gHTFKDNacmkNtTMz1oS0GDy16qUM=; b=luDd0izwhbm8EhGAR8xp6Yh5YBaJf4I91Z6yXBOirbZRzYdAAOCB9LRZmlCUx+IK+G DxkG/O0At+eqvxWexIbSj87DNFdldigZB05lmdoJM7U4zcpmyOapZyLZd9H8oiEt84rR O2sOfBgH1VxB1XrJgKpEdjgO8TzSIdgtVTVBrkayYmTPr5k6KBBI5WrhilQOIc3w6eY2 GmGW4jKjnrxOk5CFawXoMqbTCOQhiurddCkMHLh1j9rZFKVKKUWcP90xUWTMeZq113E8 8bwNT9fVLDZNJrAK012x3J/B+dBduNhePxVzEAwGtEszabPdx0xdxHLX74kR6wXF47jV GTxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729896037; x=1730500837; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dvXrzVj6NC2cWh2gHTFKDNacmkNtTMz1oS0GDy16qUM=; b=mR1DUl5A7c7azGydgUPewi5f3Wv2fFyJn4fN5Lbf5woiTMJkJLkvLqxdz4Y6tawTGL TgQktStFltI2w4h2OiEPoHakAU3rDsRQGK9cSO+A2BbIi/0Kk6Cfm+VnbuAtjpXhStQ3 TTYcC696haJGYkTvA0rXBG02aceG88ZisiCHvIrmLmSI0wFQuV+iBFjgJ/e82rffMvZB vdZkeZqdTGNwmtHGypAT9SwCHu4DnzIn+/Cr314VXbemayt0Tro6prLsp0cfKcsyKq8o 1gjzomxSHebIECLUm6tKYV8rk2oIcaEclXmEChhUdUFE6uPlE2OGGqdEoiySVWlL0LMu mk4A== X-Forwarded-Encrypted: i=1; AJvYcCUJoJ/GV4a+/8TsQ2yGCRKkAp7UTXC/C0DiXtFJEKhuJxuCHNuL8HMTSRJnERrBE6cGEvE+hWwNwg==@kvack.org X-Gm-Message-State: AOJu0YwgrCJVclVw58RbXElvblUUPnZy0Lea3LqA7Ogq2RIDMruWYGni hYeo3zcpZp0HYuRjBL0qlXbt79aWFW/0Ti8WADWtZKMsBJpLYaVIDIlgtd3SUqybtoBYnN66LLJ EWtV6gDBv6qWHHyBUQNS6rrjflj8= X-Google-Smtp-Source: AGHT+IE10myqGBOKYbDb9FUpEshghqJacEmL6+A2IxbUQypP5oS37tUoNRDJ3ddjfNdtyCkXc9Exq5t0fKs3dlnqVTk= X-Received: by 2002:ac8:5993:0:b0:45d:9525:42ff with SMTP id d75a77b69052e-4613c1a08c7mr10867181cf.54.1729896036954; Fri, 25 Oct 2024 15:40:36 -0700 (PDT) MIME-Version: 1.0 References: <20241014182228.1941246-1-joannelkoong@gmail.com> <20241014182228.1941246-3-joannelkoong@gmail.com> <3e4ff496-f2ed-42ef-9f1a-405f32aa1c8c@linux.alibaba.com> In-Reply-To: From: Joanne Koong Date: Fri, 25 Oct 2024 15:40:25 -0700 Message-ID: Subject: Re: [PATCH v2 2/2] fuse: remove tmp folio for writebacks and internal rb tree To: Jingbo Xu Cc: Miklos Szeredi , Shakeel Butt , linux-fsdevel@vger.kernel.org, josef@toxicpanda.com, bernd.schubert@fastmail.fm, hannes@cmpxchg.org, linux-mm@kvack.org, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 7zys8ruk3gb97hbimed3s6y7pqmdk8g5 X-Rspamd-Queue-Id: 7DB6180015 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1729895999-528814 X-HE-Meta: U2FsdGVkX19bcqmuisOABwDCu3/Ra8euqdrwPU2a6kHrRQAUV9LpmbMcU78z6BtRpL69/HNmnLUnMr4EMRHAMDBP2mcGZ6KjlvfbZPjD2v31DCPx4y1tFx9/iePfZLi6c9CWKwINh5Wj66YbaIH1akC0rU4zTgFQIpxLGwhywhJqGEsLm7O7lsE1u7HW/v3+lY/629X42sodTMaMcRAw7chq1C9TQu3NwzwQV5rBqMyadutxraUERx+dJ7GGzwyFs1Oe76hJZ59jAD1GolPVzqQ2tv9IKz0zQZ2fgnwhtKJk/iuh1+RO84GOVpI9fycdvEjz8UiZd5baIQC17X65uY6OvBnYAtyo1xJpRGY4EXH51U94SRVYowRbZpiJcisQUluhfAmTcY0pbeLvZCEgmNuaGXkqI7VSZVyGanFq8vY5rwCJjC1kkACaxDc3mKbSTqLSKRPMMAheW9kdaYSU4EGByOV1DrkuTFf/5jIa1aeQUQD8lIzknLKd+vdPe6YFmVQ4RqDzqZP40gCXpzLQ7xbDcylXk1GmIIEdfHtoqFAffNXCVA4ry90eDuciCdjNm5AP97SOZk9+hPY5Lg5tz0yuToaQHxjSIDA0PwYsWmXaE0noUvtFQTInsRVdQdX/79IRPugI/ZXI+ygkBcNX9cCMvPPyXxNqr4oNYq/kN5J1kKsPNCCPwjpIOFaLNUHgT0F+2YjgKvuyqKHMdkmE1+1ctuJ9Ee6j3Rw63wIf9hwAHQSQieoJp4saBGCJssPY9Ova69e3lOsiUz1Ocn9yI7hA61EIcuY+IIxNx4mIhatVWb05XAL+Nh90ooWQnkCfHQVswUH6+1XNHqHWfWF6uAQx8oIqkYzNZ5j42ijHLWiU8LtusYtNbbQTpWjvQkE3OPajZ36v/pYSZGK63eBFwJxN5exgDASQgCCyUqMI1HjP/Ya8S3ZbiGZeWzoCbbmD+l+lD87BPlaVNjdH0UY S/un98/N uB4TPYXNHv3AFZXlImQsakNU6oB/v5jByb+DMrsNOSUwTBZf48cozG30ImV1bVtGOCxN95uJcYrK7C6qw0ikhHij48GQfpLx+X7y6/b+L17voCCMjEddL3gkUieIt8+Iio8CRmSlwTikfAL6ClS7U5P082Yxv9DqJwVMxiXivimXs4mHMwrepoOkIr/VkBL6/k/l/ANfxb9hpIg7R4gRxwe/vf7681g2YaLmQOb6fkHpfeUKI+7lMt1XC4FKy8dkGd7yfKH7/UDSm3bBbEMQg4yTLf8vgnhWLNTTA4h42iI+aUDVpAzLVZ+dpaSiLzw62sCnk7sK5DBzpMs2qTUG51cflXxf3WMdfhG1pwaqZlz0YvsSdHcQJWLUy0Y60SPCFhKIL+Ek7hJVLtJB7iAj5ah0/5hp3ygp+2yD6SKIUd69b+ein0yoI1txWHQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Oct 25, 2024 at 10:36=E2=80=AFAM Joanne Koong wrote: > > On Thu, Oct 24, 2024 at 6:38=E2=80=AFPM Jingbo Xu wrote: > > > > > > > > On 10/25/24 12:54 AM, Joanne Koong wrote: > > > On Mon, Oct 21, 2024 at 2:05=E2=80=AFPM Joanne Koong wrote: > > >> > > >> On Mon, Oct 21, 2024 at 3:15=E2=80=AFAM Miklos Szeredi wrote: > > >>> > > >>> On Fri, 18 Oct 2024 at 07:31, Shakeel Butt = wrote: > > >>> > > >>>> I feel like this is too much restrictive and I am still not sure w= hy > > >>>> blocking on fuse folios served by non-privileges fuse server is wo= rse > > >>>> than blocking on folios served from the network. > > >>> > > >>> Might be. But historically fuse had this behavior and I'd be very > > >>> reluctant to change that unconditionally. > > >>> > > >>> With a systemwide maximal timeout for fuse requests it might make > > >>> sense to allow sync(2), etc. to wait for fuse writeback. > > >>> > > >>> Without a timeout allowing fuse servers to block sync(2) indefinite= ly > > >>> seems rather risky. > > >> > > >> Could we skip waiting on writeback in sync(2) if it's a fuse folio? > > >> That seems in line with the sync(2) documentation Jingbo referenced > > >> earlier where it states "The writing, although scheduled, is not > > >> necessarily complete upon return from sync()." > > >> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sync.html > > >> > > > > > > So I think the answer to this is "no" for Linux. What the Linux man > > > page for sync(2) says: > > > > > > "According to the standard specification (e.g., POSIX.1-2001), sync() > > > schedules the writes, but may return before the actual writing is > > > done. However Linux waits for I/O completions, and thus sync() or > > > syncfs() provide the same guarantees as fsync() called on every file > > > in the system or filesystem respectively." [1] > > > > Actually as for FUSE, IIUC the writeback is not guaranteed to be > > completed when sync(2) returns since the temp page mechanism. When > > sync(2) returns, PG_writeback is indeed cleared for all original pages > > (in the address_space), while the real writeback work (initiated from > > temp page) may be still in progress. > > > > That's a great point. It seems like we can just skip waiting on > writeback to finish for fuse folios in sync(2) altogether then. I'll > look into what's the best way to do this. > > > I think this is also what Miklos means in: > > https://lore.kernel.org/all/CAJfpegsJKD4YT5R5qfXXE=3DhyqKvhpTRbD4m1wsYN= bGB6k4rC2A@mail.gmail.com/ > > > > Though we need special handling for AS_NO_WRITEBACK_RECLAIM marked page= s > > in sync(2) codepath similar to what we have done for the direct reclaim > > in patch 1. > > > > > > > > > > Regardless of the compaction / page migration issue then, this > > > blocking sync(2) is a dealbreaker. > > > > I really should have figureg out the compaction / page migration > > mechanism and the potential impact to FUSE when we dropping the temp > > page. Just too busy to take some time on this though..... > > Same here, I need to look some more into the compaction / page > migration paths. I'm planning to do this early next week and will > report back with what I find. > These are my notes so far: * We hit the folio_wait_writeback() path when callers call migrate_pages() with mode MIGRATE_SYNC ... -> migrate_pages() -> migrate_pages_sync() -> migrate_pages_batch() -> migrate_folio_unmap() -> folio_wait_writeback() * These are the places where we call migrate_pages(): 1) demote_folio_list() Can ignore this. It calls migrate_pages() in MIGRATE_ASYNC mode 2) __damon_pa_migrate_folio_list() Can ignore this. It calls migrate_pages() in MIGRATE_ASYNC mode 3) migrate_misplaced_folio() Can ignore this. It calls migrate_pages() in MIGRATE_ASYNC mode 4) do_move_pages_to_node() Can ignore this. This calls migrate_pages() in MIGRATE_SYNC mode but this path is only invoked by the move_pages() syscall. It's fine to wait on writeback for the move_pages() syscall since the user would have to deliberately invoke this on the fuse server for this to apply to the server's fuse folios 5) migrate_to_node() Can ignore this for the same reason as in 4. This path is only invoked by the migrate_pages() syscall. 6) do_mbind() Can ignore this for the same reason as 4 and 5. This path is only invoked by the mbind() syscall. 7) soft_offline_in_use_page() Can skip soft offlining fuse folios (eg folios with the AS_NO_WRITEBACK_WAIT mapping flag set). The path for this is soft_offline_page() -> soft_offline_in_use_page() -> migrate_pages(). soft_offline_page() only invokes this for in-use pages in a well-defined state (see ret value of get_hwpoison_page()). My understanding of soft offlining pages is that it's a mitigation strategy for handling pages that are experiencing errors but are not yet completely unusable, and its main purpose is to prevent future issues. It seems fine to skip this for fuse folios. 8) do_migrate_range() 9) compact_zone() 10) migrate_longterm_unpinnable_folios() 11) __alloc_contig_migrate_range() 8 to 11 needs more investigation / thinking about. I don't see a good way around these tbh. I think we have to operate under the assumption that the fuse server running is malicious or benevolently but incorrectly written and could possibly never complete writeback. So we definitely can't wait on these but it also doesn't seem like we can skip waiting on these, especially for the case where the server uses spliced pages, nor does it seem like we can just fail these with -EBUSY or something. Will continue looking more into this early next week. Thanks, Joanne > > Thanks, > Joanne > > > > > > -- > > Thanks, > > Jingbo