From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AD87C7EE25 for ; Thu, 8 Jun 2023 20:10:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80E026B0072; Thu, 8 Jun 2023 16:10:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 796958E0001; Thu, 8 Jun 2023 16:10:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65E076B0075; Thu, 8 Jun 2023 16:10:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5674B6B0072 for ; Thu, 8 Jun 2023 16:10:35 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 010781204E1 for ; Thu, 8 Jun 2023 20:10:34 +0000 (UTC) X-FDA: 80880673230.14.8BC7369 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf17.hostedemail.com (Postfix) with ESMTP id D659240023 for ; Thu, 8 Jun 2023 20:10:32 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=ZR+ukdji; dmarc=none; spf=none (imf17.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686255033; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VmjgSZuU8QCSPY3SBDu2IBqIxIqJ8BniGMJox1JFuLU=; b=ejNdqCtTOJ49X5gRHDYRESeuWYrto1YexeM2gmGXyL6O8n1IeeccUEjPRMDPInahc+0+RI etS91LzdcbYXKhpHTqsLVUfY8ad1s/3n8LL3aZnt5eoW+euWdLviq9DTemKz2D2HioNvVl PHYofhifVnmdcAdejDp6uI+RCoxbmH8= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=ZR+ukdji; dmarc=none; spf=none (imf17.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686255033; a=rsa-sha256; cv=none; b=BEQI0J00PNc/vhglX/bqz65fRfFhZG0S0vKRbEfBNoexCUmm4mBri0H7bT6asrQqeV16jN FcUJMEI2sGJp+1k+EvGRX3eteU4J1bi88nt+zy9yeSB4vR2tL7SnEQd/LUdndrlZ6zn1oV dA97r4tb5Mqsn2nXxovfBrzfNhUADrg= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=VmjgSZuU8QCSPY3SBDu2IBqIxIqJ8BniGMJox1JFuLU=; b=ZR+ukdjioBsKMlAR5JBFTjYloh D+upW76Jnv3YDZnLzqX0rUVQh3KpvPO4xY1fg3S4/8tBE5BtGgERtPXgiuHhV1BgLSlIJBGezJQJP PzXJt/JrKjoZMJF31SRz4JI350B44wZI5mZvnCG4sXQpkQA4cx0OLNjXI5IqpCfwWa2i+68BTP1Iy Cyh6dpsW7aBHhRUMFsXNH8jEfBR0lkFPw5WZVM/YQnNtEtH/1VtXmcUuxN1wRh4OOzBQO3drO9+4I Y+wDeAOI5fysZnPhSy4QyqMjYmBGlfsgXpJJp4D/bfkMmCWOAvLqvzGYYhBAzw+nsYt5WK5ZNVnGO 6lHzLm0g==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1q7LxX-00FoFQ-KZ; Thu, 08 Jun 2023 20:10:15 +0000 Date: Thu, 8 Jun 2023 21:10:15 +0100 From: Matthew Wilcox To: David Hildenbrand Cc: David Rientjes , Mike Kravetz , Yosry Ahmed , James Houghton , Naoya Horiguchi , Miaohe Lin , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Peter Xu , Michal Hocko , Axel Rasmussen , Jiaqi Yan Subject: Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs Message-ID: References: <20230602172723.GA3941@monkey> <7e0ce268-f374-8e83-2b32-7c53f025fec5@google.com> <7c42a738-d082-3338-dfb5-fd28f75edc58@redhat.com> <75d5662a-a901-1e02-4706-66545ad53c5c@redhat.com> <20230607220651.GC4122@monkey> <686e3e61-704e-1258-8a8b-f18399b41668@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: D659240023 X-Stat-Signature: nppyfxd68gakqnncukrwrn6g9ms85xbh X-Rspam-User: X-HE-Tag: 1686255032-376182 X-HE-Meta: U2FsdGVkX1/yKGfafeXACTizu3LIJaWQPuTX7DLk7GZbr46HDonn4LgVAYWq8RCKHK7Ypkkd/E5JBL7GkYWYUgbk42ZRhdCRhilYKZHoBxPqS+ejDNmTrf1vW0Y+1Q8uZ+YIBUcvijopN2bCLBDdl0ySjxgsJYj+l3t98gNAfOOowowLBFbS8p2dE5I1pnHHM84CmHSs9dO804U0j7TNbcNlDXtNaGVG57h9loisOd4wzatMUYoHgWyZEE9bhFL+BLcGACuVbhiRaooIfExnexUR6soG3c+5C9BrPLlBqlocUwYaj+0aBQvOmYcXHHn+df7M+pOCSlfhBB7vlK3fvd00uzcnmLEeBxdTyuifVU/m4htxYQKuS8OsZ6Oy0UjcYVqpRmFbA1qRat6U71If8/vd6nnz6rTXwYUuxGxc+9DT0kjM4q66eAO/rCrv/SdaGkBufGUqqzBGU1pIFPtpPRU0XQPUaR2l0xhO2XCblvS4JIFua00CsR60qzDWYYtvMDXU6dimhcgX8y+G5F8sDtpVT8aMxZvA0kNthIskxLCxpCU2kOGriGUd355dsx4GJK2IIs8IeW23lgMtJBIE/TmNPgnEocyDyGPdYAQxae3CNvm0ykglP9rAqJNH+3pKUhDGs79DnIis0ucDmMRiPVcEeM7gbvKAn1EHdwjfTVoGyvMHP/qlOqWNX5fxTVRIL4QCRZ1kvp5krSJa1tgdoXpe30fnt7Wj5c6z7nCpgnIWYHq90ODSWQha1A1DgDR/lX0x498jSiE473nl45ys+bDYNvLzcmtDDV3MxyBM52e7kReY5RBB9UXx6jJ6YrnR1gHno4zTSsWCzvBk5N7qs3Qn8RflskTZ5UOLEv/L/omawWU0rlH2JYx0Vu9do/7NbMbP4HnqRe07rVSvFAvWpWiQfjDbN0usjZMP843IpDHiVHYOwAitVC4GC93oCaSG5zaxLnCm2XkNR3Zz3x7 /gSf114A fZYMOY0Qeyc0C1IG/bFLX9NC/ImZXUNxpDTW9BEjWGO1N+mN/FqEEhzt/EhSMg/PsCpVXb+hFRqsnj5ODEhPrRGWdej/h2CC7Jq//RGnmEvxHso4uCSGqngy9Q/erjC+mztGgf6A3d4c4mGU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jun 08, 2023 at 08:34:10AM +0200, David Hildenbrand wrote: > On 08.06.23 02:02, David Rientjes wrote: > > While people have proposed 1GB THP support in the past, it was nacked, in > > part, because of the suggestion to just use existing 1GB support in > > hugetlb instead :) > > Yes, because I still think that the use for "transparent" (for the user) > nowadays is very limited and not worth the complexity. > > IMHO, what you really want is a pool of large pages that (guarantees about > availability and nodes) and fine control about who gets these pages. That's > what hugetlb provides. > > In contrast to THP, you don't want to allow for > * Partially mmap, mremap, munmap, mprotect them > * Partially sharing then / COW'ing them > * Partially mixing them with other anon pages (MADV_DONTNEED + refault) > * Exclude them from some features KSM/swap > * (swap them out and eventually split them for that) > > Because you don't want to get these pages PTE-mapped by the system *unless* > there is a real reason (HGM, hwpoison) -- you want guarantees. Once such a > page is PTE-mapped, you only want to collapse in place. > > But you don't want special-HGM, you simply want the core to PTE-map them > like a (file) THP. > > IMHO, getting that realized much easier would be if we wouldn't have to care > about some of the hugetlb complexity I raised (MAP_PRIVATE, PMD sharing), > but maybe there is a way ... I favour a more evolutionary than revolutionary approach. That is, I think it's acceptable to add new features to hugetlbfs _if_ they're combined with cleanup work that gets hugetlbfs closer to the main mm. This is why I harp on things like pagewalk that currently need special handling for hugetlb -- that's pointless; they should just be treated as large folios. GUP handles hugetlb separately too, and I'm not sure why. That's not to be confused with "hugetlb must change to be more like the regular mm". Sometimes both are bad, stupid and wrong, and need to be changed. The MM has never had to handle 1GB pages before and, eg, handling mapcount by iterating over each struct page is not sensible because that's 16MB of data just to answer folio_mapcount().