From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B687C4167B for ; Tue, 5 Dec 2023 10:50:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D38556B0080; Tue, 5 Dec 2023 05:50:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CC2226B0081; Tue, 5 Dec 2023 05:50:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B62266B0082; Tue, 5 Dec 2023 05:50:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A1BAE6B0080 for ; Tue, 5 Dec 2023 05:50:21 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 612A5C02FE for ; Tue, 5 Dec 2023 10:50:21 +0000 (UTC) X-FDA: 81532445442.06.72583F9 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf03.hostedemail.com (Postfix) with ESMTP id 957B420002 for ; Tue, 5 Dec 2023 10:50:19 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=none; spf=pass (imf03.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701773419; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k17CcjFFwD2FA33VsJd0DUA3Wx+W3PbuCKTiupENqzw=; b=Czl6O7OAg1FYVjBXuF9HeVtdFM7paNnF7sWMblYGl3tCjXD1q7Aj+xwStue6lfnQmPiMH4 kZ6RYEZEYFD06cI9ArKHuPscoP9yRuckaoVWay9fK/8h/3q3RhqzDZr5CKmvkj3p76az+p QT4o+dnsIGZkf5rHZ0tTPjjf3MdIBJE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701773419; a=rsa-sha256; cv=none; b=D5spuuayISVQT6LpwsT/qo5d1DHGF6mdlQ9JoXVz6mfPF+DUtNr4vHlECaLFyJHgSX/ZKF VIPb6scrR/fCdnB2NA6oNeqzSassnzcXC+Td9y0nQWZaseJakvEEymBad6WyDYOe0qc+bN PTVLl/OHoqRERP7K1VDWntm5OFt65Ss= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=none; spf=pass (imf03.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3E0421FB; Tue, 5 Dec 2023 02:51:05 -0800 (PST) Received: from [10.57.73.130] (unknown [10.57.73.130]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8A5B83F5A1; Tue, 5 Dec 2023 02:50:15 -0800 (PST) Message-ID: <075826b4-2df8-4460-a8f2-c0581d098cff@arm.com> Date: Tue, 5 Dec 2023 10:50:15 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v8 03/10] mm: thp: Introduce multi-size THP sysfs interface Content-Language: en-GB To: David Hildenbrand , Barry Song <21cnbao@gmail.com> Cc: Andrew Morton , Matthew Wilcox , Yin Fengwei , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins , Kefeng Wang , Alistair Popple , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <20231204102027.57185-1-ryan.roberts@arm.com> <20231204102027.57185-4-ryan.roberts@arm.com> <8adbde1c-970b-4a26-81b0-91b913c4850b@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 957B420002 X-Rspam-User: X-Stat-Signature: yfmtcitq3pau6cy5dsqwxotop1ji7q7f X-Rspamd-Server: rspam03 X-HE-Tag: 1701773419-552827 X-HE-Meta: U2FsdGVkX18TZFaxI0B21CT6Dgj8W5Q9eJq3Kvm1QXSJ52VTeUCwKyPV3KXf4jTYmdhrcgNJMiMwztk4FoGQ/HZc3SaFBEaLu8GXNwKkhAl6/kWf8MFtUUeX8bwPtuGaP8SDL9re3Cf3ovrrT16gfEz2SWmLv+HxmFlUVh+xkNJLRQtRAb5tiF2rPiYWw0uZzPl/V4KKEysK70dZazHr5IY9w28GGEHj7cllvIZe13kIOhfK+SkF2hINPK0AmLqJhovzwyPtnWWZBJr/sos7SAPae1OcmUPI1RdkHmzJNHn1nzIyFDuE88HvCEAnuoKaGCspFJvfPcCxgVu4/3yL7E108psOoioQjQlmmFUAbssbBvB+m7KjIoZpE8maT5qiB+K+SOul+G8Ij5BRJSiStrSz+vGfZ46tb/zLPH/SE7+/qH8snEMgdsRcRTZGcwmaVBW/fy0W6RKDNdH6qIleqb5UDojCqkyWbnVhNEJJO09HosjDFSm9Q7NbtoMYXXko3ME5+2TjzYbWqiCFJrsXTf2x06PHZKtFAu4aPibSxwW4yk5ykbrvy6U3uVWCS/86DKBvBG/SJ7QWXNaNgBb7BOvj7FGxY2krKy19tLvahWO3x6lnijWZIzWNNGNxobmsOUytqJgKbjki4hCLt2zDAH0rgHScRcD/j208tgSEjzY/JC0RGzqjrT7b6dnBYQgOu0BzMqhbXprqC1rPRQnIy7T2I2E2I/RRCpW6yGkvlngZ6Kyu5e6lDMlw9R/Ql/shhV8OsYiUCdWLp8WWre5BDOYe6A8YL8IfrkJd3SGU2PHeEHlHZEfGi+o7ILmY8QGYlfZc2sQsp3/gWyt6kq2ZKJSL+CE/EEPZmkHbvXcK6jzOd6kT9ehWgmu0MFz+tHxRaeyCUKiVvRUbyD9oowmUCNq2Xsv6Z0Is4EafByJnifjc1lXFLatbKTjVxdixN8rRRxIaj4y72iHDsnaMyr1 QMEXwanb SbZCNhKLA6tLOp3P/R5mVFtWathRIv+Ya+FvExbZGuZqwW4xhjQzK8j/0AJ0UgHDQorZK6BbeQhieQVW/3AOVRCU4ST/HA1CkkKGHTMYj8SYBJY7M/WFRhns+YK7qSGIlZPK3XqJq6CQeMgXmnlky5ib3KnPdr+O3PBErcG44CrtqkVZ42YljzGkXCaBq5n5Ord9hXeNAcpfZAy8IeQgb5Gq+drB8Ar9+8NWM8HtMYKCivoDoOUP/n0NJCzaV9QlO9xSH3vZOXe1KIGru60OvugKuwn0dCLy8aTKoR8Lir/BP/AQDeRVCI7RJnozdebnAG2RS7kAWGhFYDp56Se9s5g2opg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 05/12/2023 09:57, David Hildenbrand wrote: > On 05.12.23 10:50, Ryan Roberts wrote: >> On 05/12/2023 04:21, Barry Song wrote: >>> On Mon, Dec 4, 2023 at 11:21 PM Ryan Roberts wrote: >>>> >>>> In preparation for adding support for anonymous multi-size THP, >>>> introduce new sysfs structure that will be used to control the new >>>> behaviours. A new directory is added under transparent_hugepage for each >>>> supported THP size, and contains an `enabled` file, which can be set to >>>> "inherit" (to inherit the global setting), "always", "madvise" or >>>> "never". For now, the kernel still only supports PMD-sized anonymous >>>> THP, so only 1 directory is populated. >>>> >>>> The first half of the change converts transhuge_vma_suitable() and >>>> hugepage_vma_check() so that they take a bitfield of orders for which >>>> the user wants to determine support, and the functions filter out all >>>> the orders that can't be supported, given the current sysfs >>>> configuration and the VMA dimensions. If there is only 1 order set in >>>> the input then the output can continue to be treated like a boolean; >>>> this is the case for most call sites. The resulting functions are >>>> renamed to thp_vma_suitable_orders() and thp_vma_allowable_orders() >>>> respectively. >>>> >>>> The second half of the change implements the new sysfs interface. It has >>>> been done so that each supported THP size has a `struct thpsize`, which >>>> describes the relevant metadata and is itself a kobject. This is pretty >>>> minimal for now, but should make it easy to add new per-thpsize files to >>>> the interface if needed in future (e.g. per-size defrag). Rather than >>>> keep the `enabled` state directly in the struct thpsize, I've elected to >>>> directly encode it into huge_anon_orders_[always|madvise|inherit] >>>> bitfields since this reduces the amount of work required in >>>> thp_vma_allowable_orders() which is called for every page fault. >>>> >>>> See Documentation/admin-guide/mm/transhuge.rst, as modified by this >>>> commit, for details of how the new sysfs interface works. >>>> >>>> Signed-off-by: Ryan Roberts >>> >>> Reviewed-by: Barry Song >> >> Thanks! >> >>> >>>> -khugepaged will be automatically started when >>>> -transparent_hugepage/enabled is set to "always" or "madvise, and it'll >>>> -be automatically shutdown if it's set to "never". >>>> +khugepaged will be automatically started when one or more hugepage >>>> +sizes are enabled (either by directly setting "always" or "madvise", >>>> +or by setting "inherit" while the top-level enabled is set to "always" >>>> +or "madvise"), and it'll be automatically shutdown when the last >>>> +hugepage size is disabled (either by directly setting "never", or by >>>> +setting "inherit" while the top-level enabled is set to "never"). >>>> >>>>   Khugepaged controls >>>>   ------------------- >>>> >>>> +.. note:: >>>> +   khugepaged currently only searches for opportunities to collapse to >>>> +   PMD-sized THP and no attempt is made to collapse to other THP >>>> +   sizes. >>> >>> For small-size THP, collapse is probably a bad idea. we like a one-shot >>> try in Android especially we are using a 64KB and less large folio size. if >>> PF succeeds in getting large folios, we map large folios, otherwise we >>> give up as those memories can be quite unstably swapped-out, swapped-in >>> and madvised to be DONTNEED. >>> >>> too many compactions will increase power consumption and decrease UI >>> response. >> >> Understood; that's very useful information for the Android context. Multiple >> people have made comments about eventually needing khugepaged (or something >> similar) support in the server context though to async collapse to contpte size. >> Actually one suggestion was a user space daemon that scans and collapses with >> MADV_COLLAPSE. I suspect the key will be to ensure whatever solution we go for >> is flexible and can be enabled/disabled/configured for the different >> environments. > > There certainly is interest for 2 MiB THP on arm64 64k where the THP size would > normally be 512 MiB. In that scenario, khugepaged makes perfect sense. Indeed