From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69620EB64D7 for ; Tue, 13 Jun 2023 14:59:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD5838E0003; Tue, 13 Jun 2023 10:59:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D5EA58E0002; Tue, 13 Jun 2023 10:59:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD8328E0003; Tue, 13 Jun 2023 10:59:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id AA0268E0002 for ; Tue, 13 Jun 2023 10:59:21 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 65131AFB21 for ; Tue, 13 Jun 2023 14:59:21 +0000 (UTC) X-FDA: 80898032922.22.0F9168F Received: from mail-vk1-f180.google.com (mail-vk1-f180.google.com [209.85.221.180]) by imf03.hostedemail.com (Postfix) with ESMTP id 6B2DA2001F for ; Tue, 13 Jun 2023 14:59:18 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=jetRMPiY; dmarc=none; spf=pass (imf03.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.221.180 as permitted sender) smtp.mailfrom=jgg@ziepe.ca ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686668358; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eDiDSM0IGxbh6HZo0RGiePTuUyBbLV+X7xP9Vrox19U=; b=BIWpP2xgt/usofaBNOEOwaxC2mzADZq2sqcyYTUU8iWe4Ub2x9JOP4u8YrZz5/0NaEXFpt UTFVaVzK0xg2egha5Jg566sKD1KPJKFfBHI0nV4ka5/GjAoEs8fJ15GetUZ1vwxAsKlisb 0qZh8z1wFpXhRENy6lf3YqL/szKXHRM= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=jetRMPiY; dmarc=none; spf=pass (imf03.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.221.180 as permitted sender) smtp.mailfrom=jgg@ziepe.ca ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686668358; a=rsa-sha256; cv=none; b=0wYvdwk+++gtzINGFe9uA4hc68Up2pChQZ9thWqmyFG422FkFNlT9MK0UD5SdQ/6qjsjTW VnUyE9PvzhllrprEHAgMhHJMfpj6WcF/FodBlR+mVzP8xnv6J/FgC5GEZjIBKvUxHTL9Uc m76v6+WBvJiPlRPa7qWF9NEEtg1Zi/M= Received: by mail-vk1-f180.google.com with SMTP id 71dfb90a1353d-45bcbd77636so1684071e0c.1 for ; Tue, 13 Jun 2023 07:59:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1686668357; x=1689260357; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=eDiDSM0IGxbh6HZo0RGiePTuUyBbLV+X7xP9Vrox19U=; b=jetRMPiY5xK0cPTDeygtTTxvUUwhB6D0KgZbbA5QgHvh+FrnjY8Ga7VqUxtp6T7ldO 882yZabDIToCkpX3yTKu3aRRSLaa9ci7V5qBAVUOnx/vaZS1MOQmKYEaP55Waz5Emnwy 4kWdQqnxaGljCFp3UNHMYPyQt53usZ9JMDs/wsEpsqLQ1xkgc9ob+5EkHApruxy7FO3k y0P1ncpmB10ZakK7ODkJQAucXB1Jq1OtfhEja0JfSX7CBJcigLeW8v4nG+eO8MfdceNo d8l1tWHfumM5+4JceJ3Z0Oh7FKsQqfzMRF1osf+9TaYKTGgtnHkmo77szSW+pHYbGbUq LLfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686668357; x=1689260357; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=eDiDSM0IGxbh6HZo0RGiePTuUyBbLV+X7xP9Vrox19U=; b=arKXt4wOZtDzPBxi1WOD34H4iTgbqFhs8ZZdxoRzZkE6BIfoauSvjhYhmKmDqmqCaI jnXj7RCVHAfbra2XPg7VVKNk9P8MvihjjIjKJEpfy5ZwYJpmfyDURP3exGkXDrv1443s hcWLjzxhp33JbwWBqRHuWK1tt/boOvqaABr8aKAS7nlhW1I93T6AxFsW9IihuRN7s/zM MLHHwDIFbco5kAEDm1ThvGThIB3c2Ej9y5Da1JrGfhwNQTFWV6yUnAb5ZqwLlVM+s+r7 MUj3dRrTOUfh4elRzF2Jgu/mpFTkzVjNa1Ygilk4nvbcllCePzNGaU6/34MdZKh3NB2x yWpw== X-Gm-Message-State: AC+VfDy3fVICIuZMI5h9rho8zWo+J8PuIr6K0nTFUFs+EliqFct8zJSq 6chFxdK8xOeoI33EhqbI8Gprog== X-Google-Smtp-Source: ACHHUZ6VYFkb55+6SXQAkEoAlQUu4+78FHKD7WD3cG+OG1MzjKSR+buZJ9nxytp7xaO6lHyyICcERA== X-Received: by 2002:a1f:3d17:0:b0:46e:4ffb:2f8 with SMTP id k23-20020a1f3d17000000b0046e4ffb02f8mr1161813vka.0.1686668357406; Tue, 13 Jun 2023 07:59:17 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-68-25-194.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.25.194]) by smtp.gmail.com with ESMTPSA id f38-20020a05622a1a2600b003f9c6a311e1sm4249475qtb.47.2023.06.13.07.59.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jun 2023 07:59:16 -0700 (PDT) Received: from jgg by wakko with local (Exim 4.95) (envelope-from ) id 1q95UJ-004xIv-Rw; Tue, 13 Jun 2023 11:59:15 -0300 Date: Tue, 13 Jun 2023 11:59:15 -0300 From: Jason Gunthorpe To: Matthew Wilcox Cc: David Hildenbrand , David Rientjes , Mike Kravetz , Yosry Ahmed , James Houghton , Naoya Horiguchi , Miaohe Lin , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Peter Xu , Michal Hocko , Axel Rasmussen , Jiaqi Yan Subject: Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs Message-ID: References: <20230602172723.GA3941@monkey> <7e0ce268-f374-8e83-2b32-7c53f025fec5@google.com> <7c42a738-d082-3338-dfb5-fd28f75edc58@redhat.com> <75d5662a-a901-1e02-4706-66545ad53c5c@redhat.com> <20230607220651.GC4122@monkey> <686e3e61-704e-1258-8a8b-f18399b41668@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 6B2DA2001F X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 364j3rwwn96q37adti461u8r9hgarcqx X-HE-Tag: 1686668358-483944 X-HE-Meta: U2FsdGVkX1/vH+vRFg5DeARAT0DH6/IcvWApc77hKDibfttxplAAHG28ymBVO9VP5/thHoelC6nbAz20Dp00PFJGulO+Phivva7BrM4v0hlTlbtC2LqH4/fNecMafQeoR7gh/9zt5xCAIvjiLR/7NFI1ffDk+wx+QetyvTjxJTHS8GyaCrC9BQI/0vRffDI+3+n+5i4fwg9Tsr76GyAGYJWX4xOcJbTBOxjeOB8yJgaT+tiXGAtT7s8hhe4uZEbbwBqbKw73ohlqoDt4WoSHv9MhFe0pilqRmgfg4Nz6aPBenPz9T61QOQipXZvuosNld+LHoj2P53jxVSAX9bRJAe3ZPq1WiNQxJoKw03ev+Ck6hCT1ouZsogf4SDtn5Yrmiv4VMrYKlIw7s7uX7XgYihwyn+dTWzJeLfasa8rHfuPkHs1A0TKIM0m/nHvkRZ2UZdKyAOUg0SlBho1+Yp6LzK0sXXVWqnNbRPkq0kfe+cebRxxcB2DQzEG8afqVOCj2iSPee4w6pS5up5Sn4nFXSBFt3/uc0giSHF66H/U/66RNqPABLj0C8owEUfLrFEyDMhXQUzyJk4OTsf7WkZcmnI8UJaUNIDST082OvwayCrLY6WzRMIpn0rAkD3Aj7K5/nuqSNCUNpgHm+1miq+00s1CrgNy/aHIbXl1/PvXo4Oug9a42cDhL3DJt8B71h3Z/wD3Ra2MS9OEcnIBgg7wYOgSi5zwVqzbakkeMY4I3VBWCOP5QZIp3YVew67T1UVOsu7t6U1YbtrMb03ZfV6PBDsj9WTIrB7rIwcCtc+iqfSydtcD/uzfkgiYRy3Y/+4HLOlTcs81riuovZPnJgxzv/skbUt9T8mIA+fJHbZmjVFEl8qgvKYJrE+J8LVcExl8o5xnzQzO8xONp3SQvwGci8tOW7VXOb0/F7P57jLaMeGNS6StJEq4Rq0Y/JyyNL4skgvPVyFle+cmMfbJ/jCX 2YaRQUDf MAa11CfXJ1wxiJLX3W5mVAel5HORHUC8cxtc7LzPOKkW9a1CflcJ7LUGc7pvu68uHsDXsVsbzuurtDdruvtp2zH1ghI5QyDc0x0OGTiDOqhJTuAXiKM+XE9VITlcYoCHz/QHOf7Sw8ry1bCJcdH61JlanS4unrbYkFazExmRV/OOtbkiM5X0yjaycApqbLqH0gFq+fNIS8kDh7nGQa7hnATdb5Ee+qVxtmQaTaxQwAYvZoqjoNl3G8xaOUjOYD9GXnB8VKKhfBkJB7QCmA1IfQp7z4WAhWsIilbZSd2+1hQvo85dRDBIhllR6seyopEdjn1u8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jun 08, 2023 at 09:10:15PM +0100, Matthew Wilcox wrote: > On Thu, Jun 08, 2023 at 08:34:10AM +0200, David Hildenbrand wrote: > > On 08.06.23 02:02, David Rientjes wrote: > > > While people have proposed 1GB THP support in the past, it was nacked, in > > > part, because of the suggestion to just use existing 1GB support in > > > hugetlb instead :) > > > > Yes, because I still think that the use for "transparent" (for the user) > > nowadays is very limited and not worth the complexity. > > > > IMHO, what you really want is a pool of large pages that (guarantees about > > availability and nodes) and fine control about who gets these pages. That's > > what hugetlb provides. > > > > In contrast to THP, you don't want to allow for > > * Partially mmap, mremap, munmap, mprotect them > > * Partially sharing then / COW'ing them > > * Partially mixing them with other anon pages (MADV_DONTNEED + refault) > > * Exclude them from some features KSM/swap > > * (swap them out and eventually split them for that) > > > > Because you don't want to get these pages PTE-mapped by the system *unless* > > there is a real reason (HGM, hwpoison) -- you want guarantees. Once such a > > page is PTE-mapped, you only want to collapse in place. > > > > But you don't want special-HGM, you simply want the core to PTE-map them > > like a (file) THP. > > > > IMHO, getting that realized much easier would be if we wouldn't have to care > > about some of the hugetlb complexity I raised (MAP_PRIVATE, PMD sharing), > > but maybe there is a way ... > > I favour a more evolutionary than revolutionary approach. That is, > I think it's acceptable to add new features to hugetlbfs _if_ they're > combined with cleanup work that gets hugetlbfs closer to the main mm. > This is why I harp on things like pagewalk that currently need special > handling for hugetlb -- that's pointless; they should just be treated as > large folios. GUP handles hugetlb separately too, and I'm not sure why. Yes, this echo's my feelings too. Making all the special core-mm cases around hugetlb even more complicated with HGM seems like a non-starter. We need to get to a point where the core-mm handles all the PTE programming and supports arbitary order folios in the page tables uniformly for everyone. hugetlb is just a special high order folio provider. Get rid of all the special PTE formats, unique arch code, and special code in gup.c/pagewalkers/etc that supports hugetlbfs. I think the general path to do that is to make the core-mm and all the hugetlb supporting arches support a core-code path for working with high order folios in page tables. Maybe this is demo'd & tested with a temporary/simplified hugetlbfs uAPI. When the core MM and all the arches are ready you switch hugetlbfs to use the new core API and deleted all the page walk special cases. >From there you can then teach the core code to do all the splitting and whatever that you want. Jason