From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1EC8C46CD2 for ; Tue, 23 Jan 2024 00:19:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4BE0D6B0075; Mon, 22 Jan 2024 19:19:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 46E616B0080; Mon, 22 Jan 2024 19:19:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30F6F6B0081; Mon, 22 Jan 2024 19:19:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1F0196B0080 for ; Mon, 22 Jan 2024 19:19:44 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B1F8D14063E for ; Tue, 23 Jan 2024 00:19:43 +0000 (UTC) X-FDA: 81708667446.01.A824046 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf25.hostedemail.com (Postfix) with ESMTP id CC5A2A0010 for ; Tue, 23 Jan 2024 00:19:41 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=zSfTmrah; spf=pass (imf25.hostedemail.com: domain of david@fromorbit.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705969182; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PsOlioRUwX2+o9Aa9cmeJB8MooioNCbAfG+kck/RrK8=; b=Y7luUI0BLwM5MFdpFqw+j2HJWtmOOhT9QXFHi053dIO5AXChIy+rVfkMAptbTjiDsLvPJ3 8wgATXXOTtftyKIoU0QfxFcXSDMZQPvtBLEDEP+QFtkXK++C5jyMzJ2FQrKto9XEKOgOMQ 1tbE80cFQ49+Ye1GHDPyLvCKors7DuE= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=zSfTmrah; spf=pass (imf25.hostedemail.com: domain of david@fromorbit.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705969182; a=rsa-sha256; cv=none; b=7cWNqrFCUJsxCCZDQWXylohA+9f/wWsINJkl2bPvxOdJy4nxkuKjXzOf7mR+tVlmM4S8Ao HMnLV0WgPXCALpzLmjLU77s0ZryHo7aQUYxvTgpwc+jce9gkiYOQMlPMEFWXC/cQ27CCRV 1Im5CC1MGvWX0cAuOoxnjCNnFT1v1RU= Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-5ce74ea4bf2so2380362a12.0 for ; Mon, 22 Jan 2024 16:19:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1705969180; x=1706573980; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PsOlioRUwX2+o9Aa9cmeJB8MooioNCbAfG+kck/RrK8=; b=zSfTmrahCvU607ejx7I8aBbb3KvdD+E9+/A+jSvpSSb+3Z/RK17BDPpU8pCbLoAYZ2 5VzY/S4gA/yLPobYhh0W77ipRy0h0puYHN8Moqs2aU1U+LvwoCGqrOFzS7WW4joSCCZz BendZLjkASMov7HzQjpDepUJlHWBTYceVypkjSU3C0uXpYJSU1a8aqj6dE6sUj63wr9O EwCVp7jnRtzYA2/ueoxUQGwMx+Go1lb5tsPzMYkLtbLQeAKMn0ARcAOFZRjqYM+EFA8P zkwYcbXh9mQJ7eNeE6wN2lJhvX8rNgRn4WD7QFy5L2LPD7E8ajdF9hGjPRtV3x7Z/yoi XUcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705969180; x=1706573980; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PsOlioRUwX2+o9Aa9cmeJB8MooioNCbAfG+kck/RrK8=; b=QaZfZrGzBbvc93JHx8WF372Qn6hhnbbmiP7+ErPh0QPFZrNlzuj1MrYDFjRkKWVCKo ou1sApmZRW6fRfza4g0hYwxNB+//AHY/Hn1IZm1SqpzRjJga9avKw7DBlffIepiTTB/J 3Hatv2I7/0oBOZkoYIGfTEJ085fn5FnfBaweveCk7/b8k4ndkO+Pswa5VryydHqn/pqQ dfgiQBlCXeTgZgPu/DipF0GJ3pXBT/anP70206UMd5xGww1G8/sSvwcIlrhH12nZrcdW ySybu/JBAaVYuJYzkHVeIjpyeu96KwFevVXy2CaDyE/xLqNMB2N918RojXHim6Lz06/g wk8w== X-Gm-Message-State: AOJu0YxSRviD0pt73W/QqslafM8j3UJllVBP3mY95AWlq7w2dU/Rrd0t GXBBIDxqcBrLd5sQVhBt0UzBI5lEeKW2/5L3S8sZceWxf3j6K/6rYGHD9AWaspU= X-Google-Smtp-Source: AGHT+IEvFm+Z3WvzsHcg59KKuaKzk5Uy7Oxu+IH0G4qXIMvJNfDJIdC1QznUZsez4BoQckuOwdkuRA== X-Received: by 2002:a05:6a20:c858:b0:19a:9f65:a971 with SMTP id ha24-20020a056a20c85800b0019a9f65a971mr4497920pzb.59.1705969180488; Mon, 22 Jan 2024 16:19:40 -0800 (PST) Received: from dread.disaster.area (pa49-181-38-249.pa.nsw.optusnet.com.au. [49.181.38.249]) by smtp.gmail.com with ESMTPSA id ls30-20020a056a00741e00b006dbe42b8f75sm1995534pfb.220.2024.01.22.16.19.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jan 2024 16:19:39 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1rS4Vt-00DzKd-1W; Tue, 23 Jan 2024 11:19:37 +1100 Date: Tue, 23 Jan 2024 11:19:37 +1100 From: Dave Chinner To: Andi Kleen Cc: linux-xfs@vger.kernel.org, linux-mm@kvack.org Subject: Re: Using Folios for XFS metadata Message-ID: References: <20240118222216.4131379-1-david@fromorbit.com> <87zfwxk75o.fsf@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: CC5A2A0010 X-Rspam-User: X-Stat-Signature: 37p8mqtr3asjsof7pd4zdesfxb9ntgrm X-Rspamd-Server: rspam01 X-HE-Tag: 1705969181-782300 X-HE-Meta: U2FsdGVkX19sR3+eDcutRjf9o2/yjz5hgmEh3XTmC5SfwiNoKEFkrmrntwg4ADqXxWRGz+uZWi1ZefbrzoGBZLzoDi7czcV1WNHKhvX7Z7sx2u/7TH8t783ZLWhY38lBkpb0s4SWkc9k2TzjgE447yC8ToJeXgsp+0m+FKXUnefu1AP3IaQt6X09hJai926oeN8ssXsBH/7tfIJtxHjDGCnOI6mhRCLymji4Jdsq/6n878lkkcXCmt99En+EvuDYdKTKRCO/TSXUEgS8VbJa3e909ptWU6p2E1PpXTtnJhhwrDTzpwK169vXzUUSg0kD0fT84k273QKIpMFTuKHaf8CrdFGN9czmU3AsGIPLi02k5nNrtHsyhG+TfdwpVaZzxYNwR1/Bw2lw13MpJhet9jHcY0MjvPZ7vQVWvh5ymvNdYL81yZcwwoXWAXnbpvOuUGebz+PGx3l33LvNGKyb3j84xkXv7HlOEA68yWpC2Wmgs2TwVyhg9/V9jLqCLiJGoGtavsCqKYWu6YQZKzIEDd8KiakkoDi+HetnHB3aoGUw1zY0PdamAK8G1kyiTOQG2qIqkLI/aXw3S/T64vdCeJh8VSRciz1Lx1SFEqI8qmj775ExGLah9Q0n/R38sdcSl+zRA+xsrCqMKngRT+oTUVtSjqrcmXS0hbFZ028uNqNcaXk2917XQT6N/IN+ntyJ8CIkGxg84Rf/YFKP+qGb+tzrjQGkoIabI8vsH3jQ8i+4zKkr0L+RL5El6vEffj26/BK7TuVY30+wcz28bgm1FffM5K9WQms04C2Z1pxNT4p1lET8xUhAYBAX/12RZGWIXa0yG74pOpAuGnGIWzI8Y95YgxAw9+MFexzcBSOOqc5uZOVAGRBO/5WqH2AYaLLidgAW03fFp8rw+hZfsk6hsKZ0hGbo3dIiCVL7Czrtrc4+M6QtQ94H+xa0DEbjXLHJgl/bjOP58gcIt1DNRh5 4B0f8L8x SU+0vHZbDWCAwoMqTLaBgW6+TgJqWSmDGqWMmkCCUDf73Wug+xpSTWLsT12aDkNZievdzwM5RXcF08o3u4AbF6DE5LBbeFWWqbJQPtjJJqp7U7i/m4uC5tX+WFl5t/zx/Ud73IjuRMN7P3xcns1Mx3FWc9oGXWJ6ktvfvVenVU5tTc/n9e2D3C1DYfZTjCqjo7UsMKWKQp1bU6KHiH7IAXamnIfaXorp3F1OCFG9N3u+rg9XdrFWrYU0yFyD22mKyTo3c9B5hr7vQSKzFv3Aqd/4ZdzQhksZkX1pyfGrmBzpTwYho583qrFaqkf5JxMQPOAsE2bphBFFRRFbVnjTVTWgxMKx1+BmcukPDlDBUv4D5sXIq8FcmMGMMl+SJ7nSUYD5bpJxGtHUUJONOrhNH42XqmIMUEuSA8mG8L0xH2hvnJbOOMuM+n2QqOSy01vh3iGbz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jan 22, 2024 at 05:34:12AM -0800, Andi Kleen wrote: > [fixed the subject, not sure what happened there] > > FWIW I'm not sure fail-fail is always the right strategy here, > in many cases even with some reclaim, compaction may win. Just not if you're > on a tight budget for the latencies. > > > I stress test and measure XFS metadata performance under sustained > > memory pressure all the time. This change has not caused any > > obvious regressions in the short time I've been testing it. > > Did you test for tail latencies? No, it's an RFC, and I mostly don't care about memory allocation tail latencies because it is highly unlikely to be a *new issue* we need to care about. The fact is taht we already do so much memory allocation and high order memory allocation (e.g. through slub, xlog_kvmalloc(), user data IO through the page cache, etc) that if there's a long tail latency problem with high order memory allocation then it will already be noticably affecting the XFS data and metadata IO latencies. Nobody is reporting problems about excessive long tail latencies when using XFS, so my care factor about long tail latencies in this specific memory allocation case is close to zero. > There are some relatively simple ways to trigger memory fragmentation, > the standard way is to allocate a very large THP backed file and then > punch a lot of holes. Or just run a filesystem with lots of variable sized high order allocations and varying cached object life times under sustained memory pressure for a significant period of time.... > > > I would in any case add a tunable for it in case people run into this. > > > > No tunables. It either works or it doesn't. If we can't make > > it work reliably by default, we throw it in the dumpster, light it > > on fire and walk away. > > I'm not sure there is a single definition of "reliably" here -- for > many workloads tail latencies don't matter, so it's always reliable, > as long as you have good aggregate throughput. > > Others have very high expectations for them. > > Forcing the high expectations on everyone is probably not a good > general strategy though, as there are general trade offs. Yup, and we have to make those tradeoffs because filesystems need to be good at "general purpose" workloads as their primary focus. Minimising long tail latencies is really just "best effort only" because we intimately aware of the fact that there are global resource limitations that cause long tail latencies in filesystem implementations that simply cannot be worked around. > I could see that having lots of small tunables for every use might not be a > good idea. Perhaps there would be a case for a single general tunable > that controls higher order folios for everyone. You're not listening: no tunables. Code that has tunables because you think it *may not work* is code that is not ready to be merged. > > > Tail latencies are a common concern on many IO workloads. > > > > Yes, for user data operations it's a common concern. For metadata, > > not so much - there's so many far worse long tail latencies in > > metadata operations (like waiting for journal space) that memory > > allocation latencies in the metadata IO path are largely noise.... > > I've seen pretty long stalls in the past. > > The difference to the journal is also that it is local the file system, while > the memory is normally shared with everyone on the node or system. So the > scope of noisy neighbour impact can be quite different, especially on a large > machine. Most systems run everything on a single filesystem. That makes them just as global as memory allocation. If the journal bottlenecks, everyone on the system suffers the performance degradation, not just the user who caused it. -Dave. -- Dave Chinner david@fromorbit.com