From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26F95C10F13 for ; Mon, 4 Dec 2023 05:57:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2CE4E6B027D; Mon, 4 Dec 2023 00:57:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 257756B027F; Mon, 4 Dec 2023 00:57:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D3016B0280; Mon, 4 Dec 2023 00:57:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id EA37C6B027D for ; Mon, 4 Dec 2023 00:57:53 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id BA6F4C0293 for ; Mon, 4 Dec 2023 05:57:53 +0000 (UTC) X-FDA: 81528079626.21.7707B18 Received: from mail-yb1-f172.google.com (mail-yb1-f172.google.com [209.85.219.172]) by imf30.hostedemail.com (Postfix) with ESMTP id E11EE80005 for ; Mon, 4 Dec 2023 05:57:51 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=TdmxFbHR; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of hughd@google.com designates 209.85.219.172 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701669471; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hljZSc6Iw8axzsRJGckoJxavG2ruOfbHtkaDlvFXcNE=; b=U5A/OkHgg2+wkmbL9v1MRBfzy0RR6GXwEBO7Zd5kQSVwU6a0v2F6xtU79Ajg8kxhT0D5yw BgMWS4hYU8rOCQJHgY79QsO84eHdRpPgDt6IMv3CM/rXp+HSth2QNxzw286rOfpA/HZK0C ppkmZSxy0yaZIXHY0VfnwlNBGVnN9c0= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=TdmxFbHR; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of hughd@google.com designates 209.85.219.172 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701669471; a=rsa-sha256; cv=none; b=zCNo9HbVResvWY5pdIYqmDVNyl8gPotXQSGLVX6DOu+JTHARuibBz5Hic5PNVqh+vWWikQ fsTwbRsVhpAMSHP5esgOxTTqxej1uHk6EYIO/4S4N8rX8NjZCruwDai6VzwZBwvwEzqADf JQ11DzWec6cpCZOUJawuTro8Vtp+w/w= Received: by mail-yb1-f172.google.com with SMTP id 3f1490d57ef6-db4422fff15so1757979276.1 for ; Sun, 03 Dec 2023 21:57:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701669471; x=1702274271; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=hljZSc6Iw8axzsRJGckoJxavG2ruOfbHtkaDlvFXcNE=; b=TdmxFbHRRle+YwApfwbibZ7ISyCSsYRkMnhwRd2bgdczGqM3TSGC45+vkpMylm9Htz UkpWBnfzyhCo44CM/YknqpOBCSi2NXDicpRKZkU1tmSnk5ZJY9hmf4gjBO14N/NkofUQ kF5v81ranZrN1s+yyOeAsX2/fAbr8XZwh+EH8Hjg7kW0m+IjACw2+K9GaJ1ecxs0VV8m wJ6Vd6udag1n8wVsePTtgkB+cpOM6coSdCgXdRMCkpe5nMncuaVG9AZROY2qynY9TRJh oJYSAjVPEiGdrV+ftHLtsZHILJ/rV9NRkjRcoIo+eA2znNX9VVvWN7HYJ9S/MPbWHrWB jxDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701669471; x=1702274271; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hljZSc6Iw8axzsRJGckoJxavG2ruOfbHtkaDlvFXcNE=; b=Tc6GNj5m2rpJWSU8AwrySeP0f6d6Rg4C2WzMwLAIcApFbjMCr68d3gjdFkAlOFEE4h v8mPZYpkDPd9sxqfPdibbdUaNPQ/rvdodj1gDEYCoq+vOP9JcRJjvH/MDPBPDd2bSQno S7dRIcidjf8+DUpgTCtPyN1zTWfjpN9j7My67eIkdcncXFCZhcIZZSwSjFSnT3TlXnnH tF/xpFbk8vUIpTTy2ECT0xc/oMk4m2CnVZ/EVrnJI4ri/MiYBCUnoCKRm6kWcQqSqvIb Vg7YcBEB5vZ+25FN+41pOb1vutcOisM58GLf06c9SWqNGCNG7Zr5rXfSLt3G+zF8pUz9 Rdww== X-Gm-Message-State: AOJu0Yw1imoTVXnE+ufdQ8MHnbEbju/R7TdafQ9nROIAWmbarsFqxhBQ lBIBz16EhIwmkOTMQLQbI9xr2Q== X-Google-Smtp-Source: AGHT+IErrC2fVEOiFNM0fm0IodRxybPjhgsnZN0Y7pi9GzEISRQTL+uaJW3arAs7Poa+LIAT0LfPnA== X-Received: by 2002:a25:33d7:0:b0:da0:411b:ef19 with SMTP id z206-20020a2533d7000000b00da0411bef19mr25509511ybz.1.1701669470889; Sun, 03 Dec 2023 21:57:50 -0800 (PST) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id z28-20020a25ad9c000000b00d9ca7c2c8e2sm2109381ybi.11.2023.12.03.21.57.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 03 Dec 2023 21:57:49 -0800 (PST) Date: Sun, 3 Dec 2023 21:57:38 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Matthew Wilcox cc: Viacheslav Dubeyko , Linux FS Devel , linux-mm@kvack.org, Hugh Dickins , "Kirill A. Shutemov" Subject: Re: Issue with 8K folio size in __filemap_get_folio() In-Reply-To: Message-ID: <90344ea7-4eec-47ee-5996-0c22f42d6a6a@google.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: E11EE80005 X-Stat-Signature: a6tbmjjg5mo69yp4ked58f91rpekx9fu X-HE-Tag: 1701669471-642877 X-HE-Meta: U2FsdGVkX19r/LHVShWL/pw0D/Wi1TZCdIB4Ukoiez3LKwFtONAQHxz1x5LZqJUDkzgiqpuqdxEHws/bnEIPP/gljHZA7UD4JTTwaKMBrX0KT2gPpsIvM7D6565t58cinToytQez8iZr1wtb+SkKEM2N25nnmGEL2TRUQd+zdsZwrivIuGkeseUF42NTYXRRWMPVJs/5Lo2FY/MklOqmCknWczO3Fja8X6EUjMuXNFancEnvkdmK8/vC50BQAQbaTHJij9mqwv5qizLXSmX8DrV25vGoJMrhXb9zpNDyi4XQ2c2iyB3+tkLgSzSXi0gWr3sIm5rHgRX8AMslL6JPMP4O/cmpLIitCj8jP3g3Yq5wLSku63TTGQWkwYcwPxqd4pWF7rvnJHc4DmAcjvFpLaRBlI6BFOxv9hiWFyWuvEkrIHSDZIfKaSHzoWOxnx3ndz9b3lRxSYq3hMgw+IY7oe+cN91mjZ3KkA2eBZHHUsGgML7wYph4YGJgGNQ5fiYJI4+JgUgoM3pFCb/P/mt9yEshza6ENHxnxdYBNpQHASp2eWd/nV/j6g9hvSFabXUc1LbAUUfDF5Tm31I7oXhkCIUQTmsLwjJFkopLWoy+yT0i15qrVlXgeL1GIvPruq0kbhBeoT0Rlg1aectafuVY4C9eHgHgO+2fb1bKOR3Be13uDTIVscXHPmqQEGqfChWhSBtYgBowN+Of3lriI4inUFXBiYOlefqzTE1ZIzgtNnbrVryf19NYuUDUXT7IAf9Lzwh0dVOehies6sePm5wYTjqQQZoQKYmFtXP4nXOPma6WWsKE+4jnDWrOmTon6sqjZFs0rifnwUl5w+Ynubk/TwCPZ0k26m5S+YCkc/NPBXg+4x47/CEh9QHE74XJusNQk8oIeyxXXpmzOMdHtPa4EcyNwCZIUJgQEMB8yPOi2zF717sU9Nu8y7vGA6aCuVGts6qtCtrV5UPAQikpUmc dILpZY7P UoF5wJBGBXO0FHLpSr3ysxLi74TToDbMnGOQQ1xs3ce7xqZSckjRrkWuIGDMhV/Eoell62eX73Bp3efS7cGGfgKG4w54FDrvmJctwCAaf158dmpyw83453jrYFDQ7ErHkwAOjnftP4AWajll/1qbDztE+kwiv6Sf1vCtwbcaZ2CSbp2/tIv2sZTRuhhKIB/tL2FhYOexiPenb8VO309DzanfxrW3qO/4erf38KPOnWIzcxwLphf19Tk5Rk+3Lyfr2peHAKDbVzpwAJjCQyVMWRJSzW3F171RMg+ilpjnf83sWsmGo059ebvgvxSBczkSF/LGmz42N98GjG31aG7tXbH/Xl+aL8ySu87EZdKn6fKl15MBM8JJZls8+RSKNyNj2vaNqi77Q+p7B2VAsl+TGYjgZCDcWl1Sj861o X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, 3 Dec 2023, Matthew Wilcox wrote: > On Sun, Dec 03, 2023 at 09:27:57PM +0000, Matthew Wilcox wrote: > > I was talking with Darrick on Friday and he convinced me that this is > > something we're going to need to fix sooner rather than later for the > > benefit of devices with block size 8kB. So it's definitely on my todo > > list, but I haven't investigated in any detail yet. > > OK, here's my initial analysis of just not putting order-1 folios > on the deferred split list. folio->_deferred_list is only used in > mm/huge_memory.c, which makes this a nice simple analysis. > > - folio_prep_large_rmappable() initialises the list_head. No problem, > just don't do that for order-1 folios. > - split_huge_page_to_list() will remove the folio from the split queue. > No problem, just don't do that. > - folio_undo_large_rmappable() removes it from the list if it's > on the list. Again, no problem, don't do that for order-1 folios. > - deferred_split_scan() walks the list, it won't find any order-1 > folios. > > - deferred_split_folio() will add the folio to the list. Returning > here will avoid adding the folio to the list. But what consequences > will that have? Ah. There's only one caller of > deferred_split_folio() and it's in page_remove_rmap() ... and it's > only called for anon folios anyway. Yes, the deferred split business is quite nicely localized, which makes it easy to avoid. > > So it looks like we can support order-1 folios in the page cache without > any change in behaviour since file-backed folios were never added to > the deferred split list. Yes, you just have to code in a few "don't do that"s as above. And I think it would be reasonable to allow even anon order-1 folios, but they'd just be prevented from participating in the deferred split business. Allowing a PMD to be split at any time is necessary for correctness; then allowing the underlying folio to be split is a nice-to-have when trying to meet competition for memory, but it's not a correctness issue, and the smaller the folio to be split, the less the saving - a lot of unneeded order-1s could still waste a lot of memory, but it's not so serious as with the order-9s. > > Now, is this the right direction? Is it a bug that we never called > deferred_split_folio() for pagecache folios? I would defer to Hugh > or Kirill on this. Ccs added. No, not a bug. The thing with anon folios is that they only have value while mapped (let's ignore swap and GUP for simplicity), so when page_remove_rmap() is called in the unmapping, that's a good hint that the hugeness of the folio is no longer worthwhile; but it's a bad moment for splitting because of locks held, and quite possibly a stupid time for splitting too (because just a part of unmapping a large range, when splitting would merely slow it all down). Hence anon's deferred split queue. But for file pages, the file contents must retain their value whether mapped or not. The moment to consider splitting the folio itself is when truncating or punching a hole; and it happens that there is not a locking problem then, nor any overhead added to eviction. Hence no deferred split queue for file. Shmem does have its per-superblock shmem_unused_huge_shrink(), for splitting huge folios at EOF when under memory pressure (and it would be better if it were not restricted to EOF, but could retry folios which had been EBUSY or EAGAIN at the time of hole-punch). Maybe other large folio filesystems should also have such a shrinker. Hugh