From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB43DC4167B for ; Tue, 5 Dec 2023 09:50:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5DFFD6B006E; Tue, 5 Dec 2023 04:50:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 590C26B0093; Tue, 5 Dec 2023 04:50:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 45A196B0095; Tue, 5 Dec 2023 04:50:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 337DC6B0093 for ; Tue, 5 Dec 2023 04:50:22 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id EFB67402EB for ; Tue, 5 Dec 2023 09:50:21 +0000 (UTC) X-FDA: 81532294242.24.1D3C7EF Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf24.hostedemail.com (Postfix) with ESMTP id 2AB9F180025 for ; Tue, 5 Dec 2023 09:50:19 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf24.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701769820; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LceMQS5rgt3qgoMzi3+IsygpP8C6t9lp8kWcRIsIaK4=; b=YcNxWXIsysusMJ4GNxp4Gb9nukHvKzPh7CTU8wC3qTT571bDAKNUYlzevQxGjmLub2oNBh g0GrT1P/4J3TeIV4URN4FfFZXgvGn5hg9UKWJTbg5oneK8cE92ofoz4ECSb+fDY9Hl2k+L eCl61oJthgVVBrZ8XD5gbSmQoGme+Fs= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf24.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701769820; a=rsa-sha256; cv=none; b=OepPUM+a3zeXNwR9zVtEobIWju5h7/8yAkT+O5CuGijqLIwkDYyhQnzVUDzx47wWvNMer7 pronr12MRRA6twbLdzwMAAsESeqnIDRcCXcOQL/VyKmWl1lXg+zUuoML1pgpLMD/QiOCxJ 5NvsM3mJpzcYWZsZMhNkuLx9GAO3kTk= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B209EC15; Tue, 5 Dec 2023 01:51:05 -0800 (PST) Received: from [10.57.73.130] (unknown [10.57.73.130]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0D1FE3F5A1; Tue, 5 Dec 2023 01:50:15 -0800 (PST) Message-ID: <8adbde1c-970b-4a26-81b0-91b913c4850b@arm.com> Date: Tue, 5 Dec 2023 09:50:14 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v8 03/10] mm: thp: Introduce multi-size THP sysfs interface Content-Language: en-GB To: Barry Song <21cnbao@gmail.com> Cc: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins , Kefeng Wang , Alistair Popple , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <20231204102027.57185-1-ryan.roberts@arm.com> <20231204102027.57185-4-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: y1ewu8nbm3xzud5kasju3m9na9i1j183 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 2AB9F180025 X-HE-Tag: 1701769819-599364 X-HE-Meta: U2FsdGVkX1/Dws0j26+ifGbZMR23aVfsYASzyKZ8MorN20wmE3jE5ZA2Td+XBS2PvbAmkJvt3/ji9wP8qRUb30Hfi0Yyzg0mGWqh3gMMP2yLLRBURUdotU1ZXtg5INjdhrPzByqKnEkOkvf7F0ahXF5lSrORfiI8dCMyX6/uhvgBrxfwQiv7+wYA+Ue10sXpjj5UJj2i7KDrLE3t6nkpuAw8vBnNMSF1SPRkyWwyUbClVnBeXAqUqpdirhu13jr3E1fCx3ok6mTO4Mh/3LmrVTBt1ndKwW96JmCk8bjS9P+EVWLWPF48pdbNmmx5ciz7IFjWHIvKUIOFevu9/fUsEY59JHXGSERV1a/RbFaYofbY5ec5k6uFXzJvpf1ghWSB2uP1YagPBPkpi0dhh+1ujPVf88AypHX9N9UOtYFyeXig5xllWQDmWhL2u+8fEmjNxOBVKxQrNnRgTCkVNglxCABk40MdYcpnZp1pWvyooFpa1DtJzc4X7iNWKEEXgZlzbFE3lsNTMJ4MdlQDpTBYaHIDyt9dIE7jI8VWpScepvlQEJTTgpFlnshQwTsY7RhZOwLv2V0dhkP9v65a2KMGe48XBsn0BjQHzrefRKjMeQjQ3IhRTK4naO06bPC3Hw1HYww27Fp//Z/E3O8a9LG65on4Kf9hh6EhoEuf2OTCjEPKZmn8BguF4BQCK+eaV52QLr4hNnxegI7XlSwg3jb5eh6mcgHBaQC/p4GrGEG0MFPgRtIXsxBPLXINvVgnnXhGeETp5do/9BnNyKQzNz99HWgLdHOAERwtxs4ArJr00P7XAViu22/q+qkH/0t9Kc/GGlzKgzoEHb5ofjTJNOHgAe55LGmIRO7K1JE/USl5drRx0wf9YnZen14NAjKrnFE8LUPTyv+9oGh+BAATztv61z5ryoF83htiSzWkMBCu9zdFmlH4hzLtKATrTFO1061s1X9qWC/s1+qrJ8CcWpM DZJjkKHz dk68mJnRmWx7cqqVp7MgbTs8kjciV3dopLCbI/GVgSwQBEvn8QtZbQsnAwXjkeCdJBDsq4fWBt2O5l0gUpx33d1MT/u5JKr5WNx0rinHiYiffsEkXBzWHntLpHISUWW54mTKz9/eHrRnij9XuU2gTTqQGbEv/oqylqXEPnc8mcd73Jpiy1Gs059fQTdgtJyUauoq6il37NjAMoqyZhAlOx1vpR8VEGjK0yifjdK67x0LrkdUY/3ek0iO9ZNAaes6HjnVpCR/NVv/zh0W4+sWmKK9wTWeK5RJwL/q3y2mpyPuFpjuNCNsnoIw/+IjXi9SRn7jQY7PSj53fCe6PvuwPgqHvwg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 05/12/2023 04:21, Barry Song wrote: > On Mon, Dec 4, 2023 at 11:21 PM Ryan Roberts wrote: >> >> In preparation for adding support for anonymous multi-size THP, >> introduce new sysfs structure that will be used to control the new >> behaviours. A new directory is added under transparent_hugepage for each >> supported THP size, and contains an `enabled` file, which can be set to >> "inherit" (to inherit the global setting), "always", "madvise" or >> "never". For now, the kernel still only supports PMD-sized anonymous >> THP, so only 1 directory is populated. >> >> The first half of the change converts transhuge_vma_suitable() and >> hugepage_vma_check() so that they take a bitfield of orders for which >> the user wants to determine support, and the functions filter out all >> the orders that can't be supported, given the current sysfs >> configuration and the VMA dimensions. If there is only 1 order set in >> the input then the output can continue to be treated like a boolean; >> this is the case for most call sites. The resulting functions are >> renamed to thp_vma_suitable_orders() and thp_vma_allowable_orders() >> respectively. >> >> The second half of the change implements the new sysfs interface. It has >> been done so that each supported THP size has a `struct thpsize`, which >> describes the relevant metadata and is itself a kobject. This is pretty >> minimal for now, but should make it easy to add new per-thpsize files to >> the interface if needed in future (e.g. per-size defrag). Rather than >> keep the `enabled` state directly in the struct thpsize, I've elected to >> directly encode it into huge_anon_orders_[always|madvise|inherit] >> bitfields since this reduces the amount of work required in >> thp_vma_allowable_orders() which is called for every page fault. >> >> See Documentation/admin-guide/mm/transhuge.rst, as modified by this >> commit, for details of how the new sysfs interface works. >> >> Signed-off-by: Ryan Roberts > > Reviewed-by: Barry Song Thanks! > >> -khugepaged will be automatically started when >> -transparent_hugepage/enabled is set to "always" or "madvise, and it'll >> -be automatically shutdown if it's set to "never". >> +khugepaged will be automatically started when one or more hugepage >> +sizes are enabled (either by directly setting "always" or "madvise", >> +or by setting "inherit" while the top-level enabled is set to "always" >> +or "madvise"), and it'll be automatically shutdown when the last >> +hugepage size is disabled (either by directly setting "never", or by >> +setting "inherit" while the top-level enabled is set to "never"). >> >> Khugepaged controls >> ------------------- >> >> +.. note:: >> + khugepaged currently only searches for opportunities to collapse to >> + PMD-sized THP and no attempt is made to collapse to other THP >> + sizes. > > For small-size THP, collapse is probably a bad idea. we like a one-shot > try in Android especially we are using a 64KB and less large folio size. if > PF succeeds in getting large folios, we map large folios, otherwise we > give up as those memories can be quite unstably swapped-out, swapped-in > and madvised to be DONTNEED. > > too many compactions will increase power consumption and decrease UI > response. Understood; that's very useful information for the Android context. Multiple people have made comments about eventually needing khugepaged (or something similar) support in the server context though to async collapse to contpte size. Actually one suggestion was a user space daemon that scans and collapses with MADV_COLLAPSE. I suspect the key will be to ensure whatever solution we go for is flexible and can be enabled/disabled/configured for the different environments. > > Thanks > Barry