From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A57EEC47DD9 for ; Thu, 22 Feb 2024 22:21:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3859C6B0075; Thu, 22 Feb 2024 17:21:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 30DF16B0078; Thu, 22 Feb 2024 17:21:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1AF2A6B007B; Thu, 22 Feb 2024 17:21:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0623F6B0075 for ; Thu, 22 Feb 2024 17:21:34 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B6F471C1285 for ; Thu, 22 Feb 2024 22:21:33 +0000 (UTC) X-FDA: 81820862466.22.95FE9F8 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf20.hostedemail.com (Postfix) with ESMTP id 1BC191C001D for ; Thu, 22 Feb 2024 22:21:26 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=temperror ("DNS error when getting key") header.d=infradead.org header.s=casper.20170209 header.b="FbQU/QsM"; spf=none (imf20.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708640492; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZjD7r/7S1sDJAGxu8kdfDzAkEHqibiAhj7qRdf9g6CE=; b=EEJuoFAS/n7+bXrqvmn34EuhcmwqYkYzx1khmT5wtU1tlmS8aQ/vR9s1+IeC4y76f3hPzZ rklhsFTW+T+mszPAbMF0hyikwgjdLrodvuGhi7XfTxTDz+uJl6wtsy08NYNMLJRSsC3A51 6Lr8TmNi2prn8sg2Quph7LqZFyssVXU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708640492; a=rsa-sha256; cv=none; b=yWEWIBzqtHgWl9PkuTbj3cnfkXsQnFhT2WUBo3oT75fyOi5UZlJnbLpFuvFvKfpgq9ssNs oJqHcSn5NN6I3Nzl7e/0UZfVOtfFpxOlyGP5IJH8dn7IMaQJb/vfada6tf6NU3u8in2gj/ qQpR4sg0N4Rc2R9cBeKAEwJj6M17rD0= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=temperror ("DNS error when getting key") header.d=infradead.org header.s=casper.20170209 header.b="FbQU/QsM"; spf=none (imf20.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=ZjD7r/7S1sDJAGxu8kdfDzAkEHqibiAhj7qRdf9g6CE=; b=FbQU/QsMcxKFpaJdTU41MEfyJe 09KS1oL8r4poFWrRTHZYDxwZo1o6mUfyZL+VwzkkWc/XdagRjuAUNDo+hxSlgaAJktNcChr8nPCvl Dbelcw3XtWz8ylHuyCfrkgLaz7hWOMMLUmLHTJ8gHex1vzi70J6RrCr1+zIp6hhkAYsxZGwK/8PIv s9R/69yr0TFV056MYYRtYcz/JOvcMnG1Ad1YFjpGtYhFJ/0UO/sNIgDn0w1WwglLIQTcWqYxqFd9u 9hKSL1IHM8fZ5ZRqtwOU6m59lzdFBQyjLDnEPmjTut12qLk/K67sET2J9uwwfnBjU0eJVjHR7Y6NH sM19hL3w==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rdHRR-00000004sIB-3sNV; Thu, 22 Feb 2024 22:21:22 +0000 Date: Thu, 22 Feb 2024 22:21:21 +0000 From: Matthew Wilcox To: Frank van der Linden Cc: Peter Xu , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, James Houghton , Muchun Song Subject: Re: [LSF/MM/BPF TOPIC] Hugetlb Unifications Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 1BC191C001D X-Rspam-User: X-Stat-Signature: 38zsooo7kpynsuwojatz9otuxkgcx98e X-Rspamd-Server: rspam03 X-HE-Tag: 1708640486-622054 X-HE-Meta: U2FsdGVkX1+q/ko1x7hiLEgjoEJEFCUkpgAPfJy8onDFAtMuW4YNI93k2Ih+QbMX4Onq8C+np9NDASUJ9VekyTQtft3i189OY8BmMDHkh/VSDhcGGcUIEhmvY5OnF3ttG4VvtiqFVXKPmj7dpySr5ePs/S7lMNjYUB45UtPQY5BfZxa32+YfSYgum6KqR4KVREKJIZfUbwjaJ6dLXvU9eSV7+jso+1zNfl+ODFip+JZ5C2t2Mhq15rygy63JfnuxUWK/wH+HVu7aJMrsIVLS+yjPFzMGRfHWRLGvAp7HPdm0gef/86o8DkDYWfo1fti8W9yc8ae27xY+3yZBvCtOl+rZKHosQP5JyWKnTcb/hEdq05oxwieUJI24LscaSe+rHN6U3JiArcuir9opW2E5sVuyt0QZ5nyfPvgcSQ9CGJIxEh0YuiOWEaDqvNf5bLuFI8h2BuGC0W7/18G3YJJv8oBVU8WrKfxTvDFC+iqNCMSjw5NFYurZMjNgyRLoBq/2emSfvMeqbsq1TsX5qkuqCdX+ndjbvSOTwL4Z+KQNV3TeSmrv0EHq0Pi1Sb9TxuOwlbbN38tkSbJaGFk6P+wp0qjc6cNCzoBJKgcQO+iQIiLnbzpP/PuJXgOZBzngpPP9gemrEHRfOy3NvCfVMQpk5uQE7kh3wgajqyPxfkIT2TWzy+t4TiolpBNo0aLIaUQx+z52dHS+kIee1ucn0ZlwRdLBv+F7Vspm9zD1eDhYY1fOX//erXhLkHKyOdIp47osJkn37vFycCSfl34fGPDXgsd/xI/spk3RPgsjiYUsgiNZZgFVLevZYlggzxKBFnppUcP1UVLXLDhU1m8cmz9kagSENZn1QLPoa+2tM9u1Lf42vK5ZiMzNLKNWIO2B8cII7X7I5eE4txvnTCQ4wLaNc97EjInI/5mJZMMZYmu7TGawxikMY+JB1kh6jXLS71tcCvrIqOOTxoLTT5bPEbs IUDXSOVE WWAnzqIE0nEwgIvrYzPhXpNorzkHLlMFg69KNbd6woJ/UT8g4oKx2aFGcjfa8gQT4J4ZXrcWWOa0cMRnpHvQX9BKA9aVPKCgsyZGNHp4Pa/WU03ojKVDbMlo9FolTbrtXIMDmg/xsr50t8OM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 22, 2024 at 12:36:40PM -0800, Frank van der Linden wrote: > To me, it makes sense to have hugetlb pages themselves to just be > large folios as much as possible. Well, they are. That isn't the concern: $ git grep -c 'struct page' mm/hugetlb.c fs/hugetlbfs fs/hugetlbfs/inode.c:5 mm/hugetlb.c:30 $ git grep -c 'struct folio' mm/hugetlb.c fs/hugetlbfs fs/hugetlbfs/inode.c:10 mm/hugetlb.c:97 (further patches to convert pages to folios are welcome) > On top of that, there could be a > notion of physical memory pools with certain properties. The > properties can be things like: size, evictability, migratability, > possibly persistence across reboots, maybe "should not be in the > direct map", like memfd_secret. hugetlbfs then could be expressed as a > filesystem on top of a pool of, for example, 1G non-evictable pages. > The pools themselves could have a memfd-like interface (or use memfd > itself), and could also be used to hook in to things like KVM > guestmemfd. > > So yes, that would be a hugetlb v2, but mainly as a backward > compatible layer on top of something more generic. "Those who do not understand what hugetlbfs provides are condemned to reinvent it badly". We're really going to miss Mike this year. The most important thing (to my mind) that it provides is shared page tables. It does it badly, hence the mshare proposal. I don't think we can make progress on a hugetlbfs2 until we have some mechanism in the MM to share page tables (as using hugetlbfs2 will regress Certain Important Workloads that pay my salary). Another piece of the puzzle is reserved pages. I have not investigated this area at all, and so I don't know if the current mechanism in hugetlbfs is a good one, how it could be hoisted to the MM layer, or reimplemented in a hugetlbfs2. Maybe this is where your mshare idea comes in. Peter's patches have been focused on removing some of the special casing of hugetlbfs in the generic MM. I think this is a great idea! While I haven't been actively reviewing those patches as they often touch areas I'm not an expert in, I'm in favour of them going in.