From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 413AFC02183 for ; Tue, 14 Jan 2025 18:08:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C865C280006; Tue, 14 Jan 2025 13:08:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C108C280001; Tue, 14 Jan 2025 13:08:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A61E8280006; Tue, 14 Jan 2025 13:08:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 81F29280001 for ; Tue, 14 Jan 2025 13:08:11 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1EDB1140C27 for ; Tue, 14 Jan 2025 18:08:11 +0000 (UTC) X-FDA: 83006841582.15.36A072D Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf01.hostedemail.com (Postfix) with ESMTP id 3F83C40002 for ; Tue, 14 Jan 2025 18:08:09 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=H5hSfAPi; spf=pass (imf01.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736878089; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nkcjj9ye7VhpQO7nkoxfXRniEXnmcjfLXAVoNZjBYe8=; b=xwhbPp7nLGYXcAKado78BKMqsVaM15uNhw4JYX+JoERp3J3z37zf0QzqKVeNJYS1EGd2hY ordERxvXvcBplbQeHIpmS+UfBMr0XXgNyZnOlEWU+Bk7jHms9OtZPtrrEXIkHVMxYietSf jPbE8Dg/7yFbJWXV9fxR7YNbFyc6g68= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=H5hSfAPi; spf=pass (imf01.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736878089; a=rsa-sha256; cv=none; b=g12pUTrv3kt0kOz0gZ+XWJPjz9e2QM/utJoy7YmNWFDtyI5CKH65Ec01KkgqoUUCama+f3 dbV/DclLcIcw3fzswjW+qSBxO5b+vWMVvcBmiCfx3S94VYFX8ONCNjGe1SkiwIKgKJM/oI jtf5ZWjQ+6Y0OtIM2ktIwtbIsa1DY/w= Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-46c8474d8daso40165931cf.3 for ; Tue, 14 Jan 2025 10:08:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736878088; x=1737482888; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nkcjj9ye7VhpQO7nkoxfXRniEXnmcjfLXAVoNZjBYe8=; b=H5hSfAPiOHdNxz+ZbIcMPlOHlXAMspfpBKC4iWE7qWPugxSsRpa1J8WlBEoZ9vddXx LXPWqZ2B1j73UBYBU3+RT4AxzYaakHcQn5pSv1as1iGNOKoGyxDl914wnDmAbURGQ24D h92gnYB4+8Rj9R9w802Dl8v30WIJa2hMUcKByrtpuNJ4ooh/zVJDUmIpz1A7Hx7x22CQ XON2Z7XhwdvAInEkofsxzlEmlhbVQIl9jk6VL81i2XzV7rkYNTKRA18VaBEIxmiiDXI6 0uerGlrtUjkTXSgLmnHhPUNukRAWG2YSuAwRsGgSiiW4DTbCvvS7qWr6PCkGR9vkvqpM D0pA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736878088; x=1737482888; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nkcjj9ye7VhpQO7nkoxfXRniEXnmcjfLXAVoNZjBYe8=; b=O8dsYJZSQX57iY4KrS2BNX7bcmn7DKZlOvYCRNiUuFcatpkHiSKoR0P0vVtOhwTSqH wMOd0mrA7q6cr7wADwyVVhj+Swm3qB8Q1awUDrqpkQTvE8v3FQ0VYqX9v0n/q7dbmHbt H4G8YqYYtSO+BGI2BJpylvWQDhxePPEmMDF3GFN1wmOdKQ3q39F8Gmfr1u8Y9T9+Wayw fUsuD+x+wQybwzsEaxhuxx2xZv1CHasaOy0JLU484WXXwK9Nwx+9lzNyNUeZlssugxWH eUbBGqz+q6BAgE0EnyLratHR9kM+ziJ5hfPkcrdadIQTcfkw3QyEQ1/oiCljXajBb3ZO +Jtw== X-Forwarded-Encrypted: i=1; AJvYcCV7TpbP+vwv9lQ3GgyiHZD/E80ehic1YRn6nn5CiqSP6lqbPVGa99u9t+i1cb3Fpyyy8DCRnqQPhw==@kvack.org X-Gm-Message-State: AOJu0YzEIuDLTo5z8KU6CgxwxEbmtb3XRU9Yao21W267+yKmBFu2r/HN fXZBh8Z6D/8yRqyW7leI4nYt/OKjz6eKo1Sl9l0xdQxobhkoL5bvi8T2ACrYjcDzqzd5/nlrSQ/ /VApUAfYvJXvY6IAXzliiOjzNEDs= X-Gm-Gg: ASbGnctCNbgqLGjbVGe0iUSD9QlJzNqeVY1zHi4K94KslI8RvI5yk7LD/p30k0XyDMx Ynuqs9MK9TNrt0uhfwdFzDOVB58T8b9D4gExafwM= X-Google-Smtp-Source: AGHT+IGPoYSnpqAltXzWcI643ahJNRlAh0V6u+sevmlmu2TNjNVPx9f9Nj08EYKqy0BUz9m7k0m0iJBN/gSmD86TqQs= X-Received: by 2002:a05:622a:34c:b0:467:5dcf:79c2 with SMTP id d75a77b69052e-46c71086e9emr445423801cf.43.1736878088270; Tue, 14 Jan 2025 10:08:08 -0800 (PST) MIME-Version: 1.0 References: <791d4056-cac1-4477-a8e3-3a2392ed34db@redhat.com> <1fdc9d50-584c-45f4-9acd-3041d0b4b804@redhat.com> <54ebdef4205781d3351e4a38e5551046482dbba0.camel@kernel.org> <2848b566-3cae-4e89-916c-241508054402@redhat.com> <060f4540-6790-4fe2-a4a5-f65693058ebf@fastmail.fm> In-Reply-To: From: Joanne Koong Date: Tue, 14 Jan 2025 10:07:57 -0800 X-Gm-Features: AbW1kvbKVbcACAuk6TgiE03IW8wO6yzdOx9PKHtUithqoc8AjDyLtE19OPOBAKI Message-ID: Subject: Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings To: Miklos Szeredi Cc: Bernd Schubert , Jeff Layton , David Hildenbrand , Shakeel Butt , Zi Yan , linux-fsdevel@vger.kernel.org, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, kernel-team@meta.com, Matthew Wilcox , Oscar Salvador , Michal Hocko Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 3F83C40002 X-Stat-Signature: mnd7nt5h16r6d3diotxenmbuo83pp48d X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1736878089-711242 X-HE-Meta: U2FsdGVkX18B55+6iwgXQrC8WKos9YfUL+MwpCzrd9KHL2FaG1boNv/FwkCnl/a4htFehqFfelJSSCVzvlxrxqaqzReZN/yb3qCnsSDssgHMNmCnyaQfBr3A+GIM9du9h5tngJm2LN5TEfragMzsNw+YXzASDJMCfNdrtsRIKmS8gCC829xuNcEvRLFC7t/bx4gp0p85mjvN5rJis0GLUGbvhwnD7h5yF7o/U2JGGLzIt1/5Sn3IRWjeK/GewlfNpNS+e1YqIRnwT3j51cd5EQOcj2vqXp44pAh246EXLiYPEa98t2w8f7glTLwEdVfX4S0h+7Pv+Qcmmpzzk3kHxu9PrHR/5R8LiQZjJf1cvnXhPkwN1aRJKGaIfd51e27DmAMaQmbxk36Sh9sVs9kozzYiD9mStSPQdesjawpmcZmb1c8xmdZM7l1z5am9u1G9gThXrBzRWjucGCivrBL3McYjDthQxx+9yA5KAfR4ydT1IznVQkv+YTbjydt1MdQ1c/r0CYYq7fsFR5bum4Ofh5OMDtFyDypQTQDvZ3iRK3tYAWk5xNkWRfPv3OZCLQr1qtjUerXToZkLXa9KZPYOQiGghtZVDkVgEcMLVwrSFc+Is85oGWjHkHKmwjz7d35oJfviUAnHqN/qQ2yNmIMuyIrLEel+vL8jbbuN+ZqfEE4prrxHKVM20GnZDl8iibwiX4DBa6/+sbh6erOOK9rhRqMw1AgNhyvGFKWMnYco5bHYKDDM6vArM+e41B13Tt1F6Hp8Ra8VukDKs2Y9Km2wlOMIpMFRhvAk1jfHIaDnfowxtdvtVE8UgZQvKINxJJkyjWvOayRBeu7ps0WFIoBQrp8fBtEeiM57+A0S3Q03GoYn8ruciOelEppk0RsUOckuR/8z0dzZQUNU6BARAUoJE0itu2eKpzXimikmio/XZI+0l7nhuh2UReMfoInuSgUO4Gu97xBBHvHAY76GbmR 21OUuMyL ya9N16UkygnfmDz+xdWvJwoFQJfZOiMNTlSJ9t+nUCSEI8PnKdkSgiHF23U1DwURY0X3wFOPdIFQQBQnSk3JqfrAlaBrJKl9s0TSMcjo3R8HUPwW8CYE7e4MEaw6lEDdpLn+er1hmJ3/h5Qxc3maA3GhuwMfxtt6rtRUC9li7ZoHtwWkZl9Ra2qwsKUTJb9qBUaAkTe4jIyxPJuEzT5FS2e8iqoMh2HbU2EYWHaqeqW8vekIbezAx2AMg8cQI/GOSEqRF2TpIIG5jBvkjLj9pC4eO1VXcKDFbUonmgLUblHpjOsNZaJjTtkG9JdChX6HBh9KtpExgeojDA3EL6WCQkUhYdTytcV3NE2e26JKIB4jqAt4Z+rFvVUxFOn6vaEkVv/wO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 14, 2025 at 2:07=E2=80=AFAM Miklos Szeredi = wrote: > > On Tue, 14 Jan 2025 at 10:55, Bernd Schubert = wrote: > > > > > > > > On 1/14/25 10:40, Miklos Szeredi wrote: > > > On Tue, 14 Jan 2025 at 09:38, Miklos Szeredi wrot= e: > > > > > >> Maybe an explicit callback from the migration code to the filesystem > > >> would work. I.e. move the complexity of dealing with migration for > > >> problematic filesystems (netfs/fuse) to the filesystem itself. I'm > > >> not sure how this would actually look, as I'm unfamiliar with the > > >> details of page migration, but I guess it shouldn't be too difficult > > >> to implement for fuse at least. > > > > > > Thinking a bit... > > > > > > 1) reading pages > > > > > > Pages are allocated (PG_locked set, PG_uptodate cleared) and passed t= o > > > ->readpages(), which may make the pages uptodate asynchronously. If = a > > > page is unlocked but not set uptodate, then caller is supposed to > > > retry the reading, at least that's how I interpret > > > filemap_get_pages(). This means that it's fine to migrate the page > > > before it's actually filled with data, since the caller will retry. > > > > > > It also means that it would be sufficient to allocate the page itself > > > just before filling it in, if there was a mechanism to keep track of > > > these "not yet filled" pages. But that probably off topic. > > > > With /dev/fuse buffer copies should be easy - just allocate the page > > on buffer copy, control is in libfuse. > > I think the issue is with generic page cache code, which currently > relies on the PG_locked flag on the allocated but not yet filled page. > If the generic code would be able to keep track of "under > construction" ranges without relying on an allocated page, then the > filesystem could allocate the page just before copying the data, > insert the page into the cache mark the relevant portion of the file > uptodate. > > > With splice you really need > > a page state. > > It's not possible to splice a not-uptodate page. > > > I wrote this before already - what is the advantage of a tmp page copy > > over /dev/fuse buffer copy? I.e. I wonder if we need splice at all here= . > > Splice seems a dead end, but we probably need to continue supporting > it for a while for backward compatibility. > There was a previous discussion about splice and tmp pages here [1], I see the following issues with having splice default to using tmp pages as a workaround: - my understanding is that the majority of use cases do use splice (eg iirc, libfuse does as well), in which case there's no point to this patchset then - codewise, imo this gets messy (eg we would still need the rb tree and would now need to check writeback against folio writeback state and against the rb tree) - for the large folios work in [2], the implementation imo is pretty clean because it's rebased on top of this patchset that removes the tmp pages and rb tree. If we still have tmp pages, then this gets very gnarly. There's not a good way I see to handle large folios in the rb tree given this scenario: a) writeback on a large folio is issued b) we copy it to a tmp folio and clear writeback on it since it's being spliced, we add this writeback request to the rb tree c) the folio in the pagecache is evicted d) another write occurs on a larger range that encompasses the range in the writeback in a) or on a subset of it Maybe this is doable with some other data structure instead of the rb tree (eg an xarray with refcounts maybe?), but it'd be ideal if we could find a solution (my guess is this would have to come from the the mm layer?) that obviates tmp pages altogether. Thanks, Joanne [1] https://lore.kernel.org/linux-fsdevel/CAJnrk1YwNw7C=3DEMfKQzN88Zq_2Qih5= Te_bfkeaOf=3DtG+L3u9eA@mail.gmail.com/ [2] https://lore.kernel.org/linux-fsdevel/20241213221818.322371-1-joannelko= ong@gmail.com/ > Thanks, > Miklos