From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0366BC4167B for ; Tue, 31 Oct 2023 11:55:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 73A6B6B02E8; Tue, 31 Oct 2023 07:55:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6EA736B02E9; Tue, 31 Oct 2023 07:55:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 600146B02EA; Tue, 31 Oct 2023 07:55:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 50FC36B02E8 for ; Tue, 31 Oct 2023 07:55:58 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E3151120B67 for ; Tue, 31 Oct 2023 11:55:57 +0000 (UTC) X-FDA: 81405602754.05.53581A4 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf07.hostedemail.com (Postfix) with ESMTP id 1A90E40004 for ; Tue, 31 Oct 2023 11:55:55 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; spf=pass (imf07.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698753356; a=rsa-sha256; cv=none; b=d8HL4wkyM+nZFt1hb7d6aJWM411Cu26p+4maEjQyn4crlltOdV/ugvPHqM77WydPynsD8o wofKvnd341mkQeNp2jr1uHxhkwYVySNIUwGTp/3aamV6+0csENpT3siAwSK6K8qftNVIS5 PfiDvlO1i03m2yrPWDMy6J/Wt4Wd5Bw= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; spf=pass (imf07.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698753356; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iXe1JfRJzhTJvG568T+GP3iWyPmb45o0sLi12G6p92I=; b=ZOsIpjI7U/XQHhM4yBmH2veMJKt/RgSWE7zkT22H7I2fTz4fvT3Qj8HA8/d0KQF0Af6CdP CWf1UHyKoshNnKiw95CPBU+QXBa+BJ7QO2yrCaWeaFcCM07xY+u4Mgy6Oqvutvk/VjBOaL JFcqMZzdck3NE61LOABaoeFkvyPndJY= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D76BEC15; Tue, 31 Oct 2023 04:56:36 -0700 (PDT) Received: from [10.1.34.180] (XHFQ2J9959.cambridge.arm.com [10.1.34.180]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C4E713F64C; Tue, 31 Oct 2023 04:55:52 -0700 (PDT) Message-ID: Date: Tue, 31 Oct 2023 11:55:51 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 0/9] variable-order, large folios for anonymous memory Content-Language: en-GB From: Ryan Roberts To: David Hildenbrand , Andrew Morton , Matthew Wilcox , Yin Fengwei , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org References: <20230929114421.3761121-1-ryan.roberts@arm.com> <6d89fdc9-ef55-d44e-bf12-fafff318aef8@redhat.com> <7a3a2d49-528d-4297-ae19-56aa9e6c59c6@arm.com> In-Reply-To: <7a3a2d49-528d-4297-ae19-56aa9e6c59c6@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1A90E40004 X-Stat-Signature: aj1jabeo3u6tar4i79c8hwkyghpwwk9h X-HE-Tag: 1698753355-649546 X-HE-Meta: U2FsdGVkX19CDi2JjJ/9ZgYPzaT1ZjpyiFYOmSgqF5S1ZNVZ5RltXrOlvTiNRmGmIq9Pl5Nb/Se0QB/Z6ocnU3XgfZPXv+nxlKba+H0GN3ihABjysRMIcuHiM6wbtLbVc8EoOohP13eGaflEMCGJ1w4xWUwjxz/UlpRnVNAcgh+l2K9zfUajAwYB/OqhNGowgl4X/6Np1iHc/6jbAai3Ow5r16SgfZzBewdY2HAzUkCjPBDMKZWEoXC0lp5urPuWzNtWJpSkAlRiWJ2hEY9+UK2JVhZmvuFh4jPHwDDS66Fi/1DT8qv64/SgF+56u+xQuK4jVZ7Uiic6GGVMpxiuASWJELAzRu/XmOW2RFl4vz4S0safIV88ugRSJ4OhhVEQCECRPtoSeDxBiuNp3a4yzx0DC2tbHjs6ROQ1KF3UosXd390qIwwdQOwoeSUYr29YGPf0QlOnaLpF5xE3/NI+lG2CEHnNl7NuiHL7x6xLCOtbpFp8YvOsf//H1PLiVnjj4T7/AWXsawyNGjLntayvMjgQWXsmQ7GK09B+TaJIK7gDdf1lIHrVhLrgAxjFDwYyUGX5/hWtkjL3u+qUkOWZOnrXAMocxnBdA4S1Pd0udyQ04IaBiJH+McF31mPwwCq+5FNSzHt1DIDIxYoxDkGayb1h4lvJey6PjuFOQAM/HY2Q7Gzz8PDhBaFgdO7ledPaf166o5GCTnZuos9i3OLN7cj9zl4t362Yg9KlzYH9QECKl9Ee2KBiwDUyUmP7JOQoj5EcYNh4DwJRym5v9R/HUDCR6oMfXvWPSoiHi53nBQiCBM/LoPagFE/xabStuL1MndJ6SU+FUD4t6FgfQ+YJbQ7Z47bH754OVIdO1ErR2e3Uhld0Y2fudcnULrpVKyHLmRiibn6tA39vh57pJJy5KicW0emJo2jLEz0iakeJPcKbHHz7fKdIr+rxajqvtF1tWDlxMEWaHcKU1+Rza4u i56jX85b R/0s4EZ9oF5pKxpZn2M6Q5gFBIjA2Ii4UMi4a4pMf74g9uhCvF7cgkXnqgpt5NG7RcRQLPpDHkcGAWvxboHTRLFqQi5BHFh+E3dH2B+XjKv2Ufqc0nIevFw+tPYq7mFwkET50zbbieQO6xWO+jTWTEzP2/p/FHp8y1qc4vEhcxS8fsF9OG68Mj2+thLPZ5z46ovbCFeCPEloH6cD3wptcJV8fevke7OkymUKi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 31/10/2023 11:50, Ryan Roberts wrote: > On 06/10/2023 21:06, David Hildenbrand wrote: > [...] >> >> Change 2: sysfs interface. >> >> If we call it THP, it shall go under "/sys/kernel/mm/transparent_hugepage/", I >> agree. >> >> What we expose there and how, is TBD. Again, not a friend of "orders" and >> bitmaps at all. We can do better if we want to go down that path. >> >> Maybe we should take a look at hugetlb, and how they added support for multiple >> sizes. What *might* make sense could be (depending on which values we actually >> support!) >> >> >> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/ >> /sys/kernel/mm/transparent_hugepage/hugepages-128kB/ >> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/ >> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/ >> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/ >> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/ >> >> Each one would contain an "enabled" and "defrag" file. We want something minimal >> first? Start with the "enabled" option. >> >> >> enabled: always [global] madvise never >> >> Initially, we would set it for PMD-sized THP to "global" and for everything else >> to "never". > > Hi David, > > I've just started coding this, and it occurs to me that I might need a small > clarification here; the existing global "enabled" control is used to drive > decisions for both anonymous memory and (non-shmem) file-backed memory. But the > proposed new per-size "enabled" is implicitly only controlling anon memory (for > now). > > 1) Is this potentially confusing for the user? Should we rename the per-size > controls to "anon_enabled"? Or is it preferable to jsut keep it vague for now so > we can reuse the same control for file-backed memory in future? > > 2) The global control will continue to drive the file-backed memory decision > (for now), even when hugepages-2048kB/enabled != "global"; agreed? > > Thanks, > Ryan > Also, an implementation question: hugepage_vma_check() doesn't currently care whether enabled="never" for DAX VMAs (although it does honour MADV_NOHUGEPAGE and the prctl); It will return true regardless. Is that by design? It couldn't fathom any reasoning from the commit log: bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, bool smaps, bool in_pf, bool enforce_sysfs) { if (!vma->vm_mm) /* vdso */ return false; /* * Explicitly disabled through madvise or prctl, or some * architectures may disable THP for some mappings, for * example, s390 kvm. * */ if ((vm_flags & VM_NOHUGEPAGE) || test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) return false; /* * If the hardware/firmware marked hugepage support disabled. */ if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) return false; /* khugepaged doesn't collapse DAX vma, but page fault is fine. */ if (vma_is_dax(vma)) return in_pf; <<<<<<<< ... }