From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7ABBAE77188 for ; Fri, 10 Jan 2025 22:00:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CADFD6B0088; Fri, 10 Jan 2025 17:00:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C5C7F6B0096; Fri, 10 Jan 2025 17:00:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD6946B0098; Fri, 10 Jan 2025 17:00:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8CEF16B0088 for ; Fri, 10 Jan 2025 17:00:53 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 1A72145404 for ; Fri, 10 Jan 2025 22:00:53 +0000 (UTC) X-FDA: 82992912786.27.F633A46 Received: from out-173.mta0.migadu.com (out-173.mta0.migadu.com [91.218.175.173]) by imf09.hostedemail.com (Postfix) with ESMTP id 41B4614000C for ; Fri, 10 Jan 2025 22:00:51 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=PNDVkAZY; spf=pass (imf09.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.173 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736546451; a=rsa-sha256; cv=none; b=yJfnLcm1yu4fGIq2Cq6u8HcNl8liiEKRg3wuIjO5/tu9Q9Wb3TAfOAOo6Q/GiT5u6KKvZR TOBPFaFUroSOY6IOZ0lAOIgxal7roTx/AwwDAu9ERQj8y4XbzRi4C545v5ThnbSxKWf3z2 a3rAvOE1boH8m1qHdXGKG+sA7MhcTWM= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=PNDVkAZY; spf=pass (imf09.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.173 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736546451; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gaxYa+/pvGC0VMVl2r1hPuxMDV9adl0c5vS7IpWG08s=; b=3FxAh3pM6J5h3eSB6RXkFeVJs/tmEAg6LW5aRFD9QTaSrQZil/0Dh5zOZEFin7h/6GKMFs jMI7PWX7JcIVp+wsvH/A4f3NW9WFUjWu5bi83GwZNeILl+8DgYkn3hhwtsbx6xbqQH+zui jU4Zuc8aaXNn6u5J1hJBlS9WPJEeHGo= Date: Fri, 10 Jan 2025 14:00:38 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1736546443; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=gaxYa+/pvGC0VMVl2r1hPuxMDV9adl0c5vS7IpWG08s=; b=PNDVkAZYG2ovbPLBVXu/Ml+vOIPMploI3ZSX8DijqUIkYFTqdwVfCdGnzBxhEuy3/58hbs tCzfHFIFhKjveaGUeKqOXZekEZ4M7Ul+0RGz9dWPOfG0ngUQTSBZ/c/rx/dKxF2R/3RPcv 28oMQdXqL2+MVRYHyJHdRRQ3eAIm0lM= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: David Hildenbrand Cc: Jeff Layton , Miklos Szeredi , Joanne Koong , Bernd Schubert , Zi Yan , linux-fsdevel@vger.kernel.org, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, kernel-team@meta.com, Matthew Wilcox , Oscar Salvador , Michal Hocko Subject: Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings Message-ID: References: <791d4056-cac1-4477-a8e3-3a2392ed34db@redhat.com> <1fdc9d50-584c-45f4-9acd-3041d0b4b804@redhat.com> <54ebdef4205781d3351e4a38e5551046482dbba0.camel@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 41B4614000C X-Stat-Signature: ykg4w6ijxzonpkb85ndawnq9uncppoc9 X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1736546451-452727 X-HE-Meta: U2FsdGVkX19mxJHX6ZxBKsX7mvIIdbkc5J0emKcLsF0V6t79r0LhAvdclE+48sX42OEemo1h1NSqyLA5ohiBQ+lfYO8Ve5+kEJsgalGAMnGnVPu3YAL4zLXBnpvOYUpv0VKGGTH3Gsry+YM5LXdpQIExZXWnDWxrUo+tWcP4gmwE4npZzwB+vw6mogJ8P7wmc3TlkqcNPJFceCICOTyga1GBpN2pYfvcjY9+W+IFxD210mXYR1MYlaQv7TYArB7uOZUd2cDsGlN1wuyPZNeBX4UNEvD6LnnPZKcTVioy5dapIHCeUTqUc+q2UY5o1rIHn7ho1YOWL5iiAiKCiYkRNAyCkYH/hplny/B7PoJDWXBaHnPJ5MhRWWPdK8KLKzp14BqlYeMJeZGsxnhl/Ay9Hmy7ADNYo79gr/FLILjvNQib/XTlCER0yftTrRMVw1xqMrLo0617FpPuU2ck7Y4mgB3AwdWXIoIK+Djv+3YV5NuusoM8HMxeao/ZGjHq46HYemBuX+vCsE5cZVCDNrQKamZLBtddMJslk2leBMdU46RzOrTz9f7cmpOAZuppOnLzfz7TWyQ3/9vON29b2cbPQPn5XptruQWBiJVhGapFKxVHRK9YE40rjJrnPKecZFk1DGENbpzDQl2N/FDnbRGxXXK68q0lywsZYcRe3hR9Eg/eGyL0TkYk4BsoSlFu19Cyzwjj3tRexbCDd0xg8mJs5E8XZj+NofgdAEP6ZEf/7j0elcn5xtEdobcSitVRxCuzsDg0dPm+0n5BI0drhX0F+7fEqtAQmy1CaGpP3NjTydT4Vjstphb4NMbpTAnUQycWv6rWAsWLl20DLvWyWgdbpMM3Qg1WoD8SCTQctLEda2UUwRA7vCYbNQQJnbvAjgTfMZVgSxUbX0GKSZFIlIbIjztwRcvhwXes2jlepM98g+k1F8Lq1L87PAO/YWJ5DbUFFf8VxqwwrV684kzhZkX nDXxBcmi 1fN3T3YTZ0DoyD/GGaSZG8XdT1020JagcRuPBkVb14FgVCn7rDjIX3Sug5b8EMIvNi82YF+FbEt16+KFg3L0Dk3G1/aZju7gTIDJnnybbpBwgNutw3z0W+mieWQmydntZ9KPzv0mCMMCnB97WHgz/0zwZokJw4X6CKF6nauTxyc+Bth0XIt+xcW7IiymAF7Ys54quKAvCqeij6P2RFC2qMSFyJiD04veUkSZlVgK+cIleLRu3t8Qcsn1JfOsXG0FLc6Josa5pZxLAxvg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jan 10, 2025 at 10:13:17PM +0100, David Hildenbrand wrote: > On 10.01.25 21:28, Jeff Layton wrote: > > On Thu, 2025-01-09 at 12:22 +0100, David Hildenbrand wrote: > > > On 07.01.25 19:07, Shakeel Butt wrote: > > > > On Tue, Jan 07, 2025 at 09:34:49AM +0100, David Hildenbrand wrote: > > > > > On 06.01.25 19:17, Shakeel Butt wrote: > > > > > > On Mon, Jan 06, 2025 at 11:19:42AM +0100, Miklos Szeredi wrote: > > > > > > > On Fri, 3 Jan 2025 at 21:31, David Hildenbrand wrote: > > > > > > > > In any case, having movable pages be turned unmovable due to persistent > > > > > > > > writaback is something that must be fixed, not worked around. Likely a > > > > > > > > good topic for LSF/MM. > > > > > > > > > > > > > > Yes, this seems a good cross fs-mm topic. > > > > > > > > > > > > > > So the issue discussed here is that movable pages used for fuse > > > > > > > page-cache cause a problems when memory needs to be compacted. The > > > > > > > problem is either that > > > > > > > > > > > > > > - the page is skipped, leaving the physical memory block unmovable > > > > > > > > > > > > > > - the compaction is blocked for an unbounded time > > > > > > > > > > > > > > While the new AS_WRITEBACK_INDETERMINATE could potentially make things > > > > > > > worse, the same thing happens on readahead, since the new page can be > > > > > > > locked for an indeterminate amount of time, which can also block > > > > > > > compaction, right? > > > > > > > > > > Yes, as memory hotplug + virtio-mem maintainer my bigger concern is these > > > > > pages residing in ZONE_MOVABLE / MIGRATE_CMA areas where there *must not be > > > > > unmovable pages ever*. Not triggered by an untrusted source, not triggered > > > > > by an trusted source. > > > > > > > > > > It's a violation of core-mm principles. > > > > > > > > The "must not be unmovable pages ever" is a very strong statement and we > > > > are violating it today and will keep violating it in future. Any > > > > page/folio under lock or writeback or have reference taken or have been > > > > isolated from their LRU is unmovable (most of the time for small period > > > > of time). > > > > > > ^ this: "small period of time" is what I meant. > > > > > > Most of these things are known to not be problematic: retrying a couple > > > of times makes it work, that's why migration keeps retrying. > > > > > > Again, as an example, we allow short-term O_DIRECT but disallow > > > long-term page pinning. I think there were concerns at some point if > > > O_DIRECT might also be problematic (I/O might take a while), but so far > > > it was not a problem in practice that would make CMA allocations easily > > > fail. > > > > > > vmsplice() is a known problem, because it behaves like O_DIRECT but > > > actually triggers long-term pinning; IIRC David Howells has this on his > > > todo list to fix. [I recall that seccomp disallows vmsplice by default > > > right now] > > > > > > These operations are being done all over the place in kernel. > > > > Miklos gave an example of readahead. > > > > > > I assume you mean "unmovable for a short time", correct, or can you > > > point me at that specific example; I think I missed that. Please see https://lore.kernel.org/all/CAJfpegthP2enc9o1hV-izyAG9nHcD_tT8dKFxxzhdQws6pcyhQ@mail.gmail.com/ > > > > > > > The per-CPU LRU caches are another > > > > case where folios can get stuck for long period of time. > > > > > > Which is why memory offlining disables the lru cache. See > > > lru_cache_disable(). Other users that care about that drain the LRU on > > > all cpus. > > > > > > > Reclaim and > > > > compaction can isolate a lot of folios that they need to have > > > > too_many_isolated() checks. So, "must not be unmovable pages ever" is > > > > impractical. > > > > > > "must only be short-term unmovable", better? Yes and you have clarified further below of the actual amount. > > > > > > > Still a little ambiguous. > > > > How short is "short-term"? Are we talking milliseconds or minutes? > > Usually a couple of seconds, max. For memory offlining, slightly longer > times are acceptable; other things (in particular compaction or CMA > allocations) will give up much faster. > > > > > Imposing a hard timeout on writeback requests to unprivileged FUSE > > servers might give us a better guarantee of forward-progress, but it > > would probably have to be on the order of at least a minute or so to be > > workable. > > Yes, and that might already be a bit too much, especially if stuck on > waiting for folio writeback ... so ideally we could find a way to migrate > these folios that are under writeback and it's not your ordinary disk driver > that responds rather quickly. > > Right now we do it via these temp pages, and I can see how that's > undesirable. > > For NFS etc. we probably never ran into this, because it's all used in > fairly well managed environments and, well, I assume NFS easily outdates CMA > and ZONE_MOVABLE :) > > > >>> > > > > The point is that, yes we should aim to improve things but in iterations > > > > and "must not be unmovable pages ever" is not something we can achieve > > > > in one step. > > > > > > I agree with the "improve things in iterations", but as > > > AS_WRITEBACK_INDETERMINATE has the FOLL_LONGTERM smell to it, I think we > > > are making things worse. AS_WRITEBACK_INDETERMINATE is really a bad name we picked as it is still causing confusion. It is a simple flag to avoid deadlock in the reclaim code path and does not say anything about movability. > > > > > > And as this discussion has been going on for too long, to summarize my > > > point: there exist conditions where pages are short-term unmovable, and > > > possibly some to be fixed that turn pages long-term unmovable (e.g., > > > vmsplice); that does not mean that we can freely add new conditions that > > > turn movable pages unmovable long-term or even forever. > > > > > > Again, this might be a good LSF/MM topic. If I would have the capacity I > > > would suggest a topic around which things are know to cause pages to be > > > short-term or long-term unmovable/unsplittable, and which can be > > > handled, which not. Maybe I'll find the time to propose that as a topic. > > > > > > > > > This does sound like great LSF/MM fodder! I predict that this session > > will run long! ;) > > Heh, fully agreed! :) I would like more targeted topic and for that I want us to at least agree where we are disagring. Let me write down two statements and please tell me where you disagree: 1. For a normal running FUSE server (without tmp pages), the lifetime of writeback state of fuse folios falls under "short-term unmovable" bucket as it does not differ in anyway from anyother filesystems handling writeback folios. 2. For a buggy or untrusted FUSE server (without tmp pages), the lifetime of writeback state of fuse folios can be arbitrarily long and we need some mechanism to limit it.