From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F9D3C4167B for ; Tue, 31 Oct 2023 13:13:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F12486B02F5; Tue, 31 Oct 2023 09:13:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EBEF56B02F6; Tue, 31 Oct 2023 09:13:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DACF06B02F7; Tue, 31 Oct 2023 09:13:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C90646B02F5 for ; Tue, 31 Oct 2023 09:13:16 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8164C40580 for ; Tue, 31 Oct 2023 13:13:16 +0000 (UTC) X-FDA: 81405797592.11.A082D78 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf10.hostedemail.com (Postfix) with ESMTP id 9CE9DC000F for ; Tue, 31 Oct 2023 13:13:14 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698757994; a=rsa-sha256; cv=none; b=s82ErJecOVX0fcbL+Cad9joBFuQpXwGWNhldNq2GRK6wJVtvLLL38uOvk78PBTeoySPukn Q/KeqXJeh6RmGJdks0nSlqPDZKe4shvwXaDNWvSDbasSuOXmwg4x9a3lB/4VgEKfdCHzwZ HORsaQ4rsYEyEoTqITydSVGBLE95Rew= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698757994; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bwuhtf0/9/UwmLBRlSlxigiQor5F03iGZjpqGqYep2w=; b=uUIgVr5QgFYBf0MzU824Cx8vWhi+TXsa4lYkBbHsezcZNdUsZcdBbV9FxuWgKpsDUf7W6y f+GOUqT9L++7XngFSsxHAF3oiUkvRH0eekDD71gOJlxp0F5MpVPt19lIDiOuDi8Rz5wI/3 zlhSY00tUnNcrTuT0oMr286kzGjs+zg= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2C522C15; Tue, 31 Oct 2023 06:13:55 -0700 (PDT) Received: from [10.1.34.180] (XHFQ2J9959.cambridge.arm.com [10.1.34.180]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1946A3F738; Tue, 31 Oct 2023 06:13:10 -0700 (PDT) Message-ID: <5001e231-795f-4d8c-bd9d-16096e428aef@arm.com> Date: Tue, 31 Oct 2023 13:13:10 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 0/9] variable-order, large folios for anonymous memory Content-Language: en-GB To: David Hildenbrand , Andrew Morton , Matthew Wilcox , Yin Fengwei , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org References: <20230929114421.3761121-1-ryan.roberts@arm.com> <6d89fdc9-ef55-d44e-bf12-fafff318aef8@redhat.com> <7a3a2d49-528d-4297-ae19-56aa9e6c59c6@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 9CE9DC000F X-Stat-Signature: hrr6txoh1t39izibxriwbzn736wmsmr8 X-HE-Tag: 1698757994-852496 X-HE-Meta: U2FsdGVkX19UF52P+dxfBFonjUWZkrRjq08wdIqzZ+9p6iZL4N7scN61kkldpa0PH/WcGOvhbn+qsHz+v93SbQCs/Gaa4KwUnJRuFBRPhPyTZ3JrfloQXXO+xhw+Lwzu+AYgMEkQUuH0cc+YYnA/N4mqPf4NMn5lAjW3XuhQOOEKfXH+APtTaYCFM1DkweRB8DBi7jUHs/1nviEWVmOU5B3xPZH7zvL6v/QMz1fWzgqvlMxsbKZlOszSIq5D4KVyRAVov9LAJakMgtQhz8RwV9HF4CpJ83TdyZB6190UaBcQjVsOalqubF35pIjqn+miJMW5iONUPlCTPKwp70N0HCrPG4bUqByl8wMIL/8mD0AkVkBdTL2bO7G1zGrctBPtTSqLQ16+SJaEp8inQlA/u4xVG0M5IQY3Acily2RSOAGPdB9DzKsCZ7+Ewn2ZjuoGTfUpwKFblgNm+OFGqh9vCyXRctO+8+MQMw6WgyJwevwUGt4cxOAbLu5e27zKVwswqUoueU+tH8vDbYzuC4yCVldQ+Y/06VtWhaQBu65rJWLPil/vFtlOTJFUyB9Uw7fdESNAxdo98LwQKd+Pu2vJyVTEDrpx6/++M+jstRqUuPeaHgwpKOH+7doyCWA5oulxkbJPUwx3StOIwXuAq4iInNdWunBaTH7e/2dEWfJ6dbdWPa0YAzpRtxphwHKcbASedCfaD0fcIHnsqGI8s2UuzWJlLB3h/Bwhq40ymYqU1HXtYBrePx7PmDeOShU3KDqYBpr5PCcl8Ucd//psRN0HzVgNOiOjIIHtk0hVj7epfJgtj/cDvNpLYj8cL0hhrPKwXa7V+CV+uJ0hcYeWITkDu61ldC7meIbtUoGelTlS5wekZjAmskdEfIaGo1iagWribp+9UFa9FzpcBVfYXolYaAF5JmZ3XV7NcZtlrtUWrrBRtvVNrnOiD+mMVDiMPiPBzYq1mgXWWZW715kwDR+ yJv2ZMJd 4lPbBRKF2iYUBNtYsCJx1RqxGa/dOvzEP3/xVEuhfEHZ1drhYAbw1Uzq6iBDbiDSWDQgOHIPgjE7qnLhCUC7Zd1FbryGS1BwfUggqC3u7UueGP9g3YJw9PZZYE0H6iv3FbKOyesfb7IWSiN3I3ni0C7k0IdLkp0WkDSkl6Ts2lS/0xXfbjcWbYxgO2g8cC4IpzhzvucZoSF/SQlVP7Yif918cY6O5vqZuyyVr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 31/10/2023 12:03, David Hildenbrand wrote: > On 31.10.23 12:55, Ryan Roberts wrote: >> On 31/10/2023 11:50, Ryan Roberts wrote: >>> On 06/10/2023 21:06, David Hildenbrand wrote: >>> [...] >>>> >>>> Change 2: sysfs interface. >>>> >>>> If we call it THP, it shall go under "/sys/kernel/mm/transparent_hugepage/", I >>>> agree. >>>> >>>> What we expose there and how, is TBD. Again, not a friend of "orders" and >>>> bitmaps at all. We can do better if we want to go down that path. >>>> >>>> Maybe we should take a look at hugetlb, and how they added support for multiple >>>> sizes. What *might* make sense could be (depending on which values we actually >>>> support!) >>>> >>>> >>>> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/ >>>> /sys/kernel/mm/transparent_hugepage/hugepages-128kB/ >>>> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/ >>>> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/ >>>> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/ >>>> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/ >>>> >>>> Each one would contain an "enabled" and "defrag" file. We want something >>>> minimal >>>> first? Start with the "enabled" option. >>>> >>>> >>>> enabled: always [global] madvise never >>>> >>>> Initially, we would set it for PMD-sized THP to "global" and for everything >>>> else >>>> to "never". >>> >>> Hi David, >>> >>> I've just started coding this, and it occurs to me that I might need a small >>> clarification here; the existing global "enabled" control is used to drive >>> decisions for both anonymous memory and (non-shmem) file-backed memory. But the >>> proposed new per-size "enabled" is implicitly only controlling anon memory (for >>> now). >>> >>> 1) Is this potentially confusing for the user? Should we rename the per-size >>> controls to "anon_enabled"? Or is it preferable to jsut keep it vague for now so >>> we can reuse the same control for file-backed memory in future? >>> >>> 2) The global control will continue to drive the file-backed memory decision >>> (for now), even when hugepages-2048kB/enabled != "global"; agreed? >>> >>> Thanks, >>> Ryan >>> >> >> Also, an implementation question: >> >> hugepage_vma_check() doesn't currently care whether enabled="never" for DAX VMAs >> (although it does honour MADV_NOHUGEPAGE and the prctl); It will return true >> regardless. Is that by design? It couldn't fathom any reasoning from the >> commit log: > > The whole DAX "hugepage" and THP mixup is just plain confusing. We're simply > using PUD/PMD mappings of DAX memory, and PMD/PTE- remap when required (VMA > split I assume, COW). > > It doesn't result in any memory waste, so who really cares how it's mapped? > Apparently we want individual processes to just disable PMD/PUD mappings of DAX > using the prctl and madvise. Maybe there are good reasons. > > Looks like a design decision, probably some legacy leftovers. OK, I'll ensure I keep this behaviour. Thanks! >