From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8B47C3DA42 for ; Sat, 13 Jul 2024 10:45:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5701B6B0085; Sat, 13 Jul 2024 06:45:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F9266B0088; Sat, 13 Jul 2024 06:45:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39B106B0089; Sat, 13 Jul 2024 06:45:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 17D3C6B0085 for ; Sat, 13 Jul 2024 06:45:11 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7760240EDA for ; Sat, 13 Jul 2024 10:45:10 +0000 (UTC) X-FDA: 82334397180.05.8C32E37 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf14.hostedemail.com (Postfix) with ESMTP id 55E6C100010 for ; Sat, 13 Jul 2024 10:45:08 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720867491; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mMWZitOhjszxbcMGfOshRwYujNy3UyXAsM0G62nR9s8=; b=HKhH2QQC8fmxjMw8RNSjACmxANj6k9j75o34fBrPykAqgL/wPjxiSgHNL2rZ82mc/erX7K 4ULcKSBxEY6Vc33A8jirAFy01Q47XNWrtU72Xjqt98hcwOg/ThmNcP3jQbw8N4qV2YIgb0 Jw+Av3X4RJQZYnfM/loSk7stZGZxwfs= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720867491; a=rsa-sha256; cv=none; b=U13SY02vyfVNfdIX+BilPSegNTIBm3LsXdHIz9it5gtP1EI7SDMNU1c1kySox51BACOVJm 0U6DkIjPQvcV2xDc/3Tq5EFx5TTVDeKIvtXih1hV42WWkwtv7btyTgGtBA9WQLVANhWrdM c16cCEehIDJi75xDFR/EHZZ/dVc208M= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A74CC1007; Sat, 13 Jul 2024 03:45:32 -0700 (PDT) Received: from [10.57.77.253] (unknown [10.57.77.253]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B97233F762; Sat, 13 Jul 2024 03:45:05 -0700 (PDT) Message-ID: <8c32a2fc-252d-406b-9fec-ce5bab0829df@arm.com> Date: Sat, 13 Jul 2024 11:45:04 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 2/2] mm: mTHP stats for pagecache folio allocations Content-Language: en-GB To: David Hildenbrand , Lance Yang , Baolin Wang Cc: Andrew Morton , Hugh Dickins , Jonathan Corbet , "Matthew Wilcox (Oracle)" , Barry Song , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20240711072929.3590000-1-ryan.roberts@arm.com> <20240711072929.3590000-3-ryan.roberts@arm.com> <9e0d84e5-2319-4425-9760-2c6bb23fc390@linux.alibaba.com> <756c359e-bb8f-481e-a33f-163c729afa31@redhat.com> From: Ryan Roberts In-Reply-To: <756c359e-bb8f-481e-a33f-163c729afa31@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 55E6C100010 X-Stat-Signature: fkpcuyax1ud5mmd6oqedxtmardtnqdk4 X-HE-Tag: 1720867508-787914 X-HE-Meta: U2FsdGVkX18EbYaCCj+yXi/O8IBcqvH6rJFOnB3gTPVC1JiKX1z5FWB/1L5eMkolBsw+eNKF6DTLCfIrIiS+yjJyNpU405CbdWuhBVqSjnAmul+6kqEUZrbJYP8U6fx3aTvQDmZAAgul4N4Wlkm1wQGLTse7bArWaNNOLvdHgam8Po7osjQ4ohMh34h7dyqARc9nXRs0ByxY/5JNXJAzwZfe/2JXksuX1y1L5K0VYkSdRsZoDE55s0ioW6WKYwXag4bnQRveBpGshazHITzV4AqTR7ttxW/s/GhYkYwHC6cCRMUzGFI2tLoyOhIlAFB6+jFx38gzZM07okMFANLYARUvSrXoJQsJu9VwnL+LOa3sPQQc2UiKiiSJ+FtPfj+W+seWZz+NUZ2oOPqwMJTHhELSVaq3XCJHP0NHH2EBDDxmoxsJ1ZTecCn534xlmf4oLaw7GxyEjzkA7ZZUijfRAwi8tvK+xIRsBKk1NB4mY+AVlG6Y0XOWr+aobudDlSrhJiKJSXGi/e3hVPpE/GIgtXcs3lzLU2C76rHTz99Sp4Ie5FVk29Hxiqp2naShe2zcpZOGA7kgmq/S0eSu1SHSHYfZH/WjQViRW6Tib1fKdqzs2WnaQoq2nlp3UPurrHopysFMF0OrUbePahelo2mwxXVD85djwb2tJ6St1e3rq3+NgCGZLyb1ff3Cw2JJE/el/r2K1T4ksnsfCNiIz4ZFn/gLKu6W0f2AomJYp18DCokVfpTgEyrnTh0czduWOtMZdDT3o0stYk8gRy+YugsER43ApFtplEP7XUtkhKRXaoN9yw3ekef4HD//kg+B9w99CliUpdmRXIkxGV9HXRDd/9f/TAC9NhfJ8K9jwJsNjmDbkrfFHcL6zjaGhACGZWVefRMoCKHguk4un0MGBFkADeEPRKZr40K8n6jA/vfCHo7QF2An15j99KtMdGjeZybe/qKdO+4LwxvT3Afrjws /5OLFnpW ijWOKFonTIvNgrHknzFcovBydSZG69mwieWp0L0jK1Oeg+9lOm+Abh2K1GpEqZPgHgJC4SvgEr5+PtTPcs2oCH6rt2CHCM2a7iPe8ahbmpTEJK158hWBeAZHX3gWYkHAv4u5mSQWy3nEcWtDIvfim/ew4zFjjz0yPNBZA1sfLPNb/IBLkeHXzyRu8z+EumkSG2tYZ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 13/07/2024 02:08, David Hildenbrand wrote: > On 12.07.24 14:22, Lance Yang wrote: >> On Fri, Jul 12, 2024 at 11:00 AM Baolin Wang >> wrote: >>> >>> >>> >>> On 2024/7/11 15:29, Ryan Roberts wrote: >>>> Expose 3 new mTHP stats for file (pagecache) folio allocations: >>>> >>>>     /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc >>>>     /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback >>>>     >>>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge >>>> >>>> This will provide some insight on the sizes of large folios being >>>> allocated for file-backed memory, and how often allocation is failing. >>>> >>>> All non-order-0 (and most order-0) folio allocations are currently done >>>> through filemap_alloc_folio(), and folios are charged in a subsequent >>>> call to filemap_add_folio(). So count file_fallback when allocation >>>> fails in filemap_alloc_folio() and count file_alloc or >>>> file_fallback_charge in filemap_add_folio(), based on whether charging >>>> succeeded or not. There are some users of filemap_add_folio() that >>>> allocate their own order-0 folio by other means, so we would not count >>>> an allocation failure in this case, but we also don't care about order-0 >>>> allocations. This approach feels like it should be good enough and >>>> doesn't require any (impractically large) refactoring. >>>> >>>> The existing mTHP stats interface is reused to provide consistency to >>>> users. And because we are reusing the same interface, we can reuse the >>>> same infrastructure on the kernel side. The one small wrinkle is that >>>> the set of folio sizes supported by the pagecache are not identical to >>>> those supported by anon and shmem; pagecache supports order-1, unlike >>>> anon and shmem, and the max pagecache order may be less than PMD-size >>>> (see arm64 with 64K base pages), again unlike anon and shmem. So we now >>>> create a hugepages-*kB directory for the union of the sizes supported by >>>> all 3 memory types and populate it with the relevant stats and controls. >>> >>> Personally, I like the idea that can help analyze the allocation of >>> large folios for the page cache. >>> >>> However, I have a slight concern about the consistency of the interface. >>> >>> For 64K, the fields layout: >>> ├── hugepages-64kB >>> │   ├── enabled >>> │   ├── shmem_enabled >>> │   └── stats >>> │       ├── anon_fault_alloc >>> │       ├── anon_fault_fallback >>> │       ├── anon_fault_fallback_charge >>> │       ├── file_alloc >>> │       ├── file_fallback >>> │       ├── file_fallback_charge >>> │       ├── shmem_alloc >>> │       ├── shmem_fallback >>> │       ├── shmem_fallback_charge >>> │       ├── split >>> │       ├── split_deferred >>> │       ├── split_failed >>> │       ├── swpout >>> │       └── swpout_fallback >>> >>> But for 8K (for pagecache), you removed some fields (of course, I >>> understand why they are not supported). >>> >>> ├── hugepages-8kB >>> │   └── stats >>> │       ├── file_alloc >>> │       ├── file_fallback >>> │       └── file_fallback_charge >>> >>> This might not be user-friendly for some user-space parsing tools, as >>> they lack certain fields for the same pattern interfaces. Of course, >>> this might not be an issue if we have clear documentation describing the >>> differences here:) >>> >>> Another possible approach is to maintain the same field layout to keep >>> consistent, but prohibit writing to the fields that are not supported by >>> the pagecache, and any stats read from them would be 0. >> >> I agree that maintaining a uniform field layout, especially at the stats >> level, might be necessary ;) >> >> Keeping a consistent interface could future-proof the design. It allows >> for the possibility that features not currently supported for 8kB pages >> might be enabled in the future. > > I'll just note that, with shmem/file effectively being disabled for order > 11, > we'll also have entries there that are effectively unused. Indeed, I mentioned that in the commit log :) > > Good question how we want to deal with that (stats are easy, but what about when > we enable something? Maybe we should document that "enabled" is only effective > when supported). The documentation already says "If enabling multiple hugepage sizes, the kernel will select the most appropriate enabled size for a given allocation." for anon THP (and I've added similar wording for my as-yet-unposted patch to add controls for page cache folio sizes). So I think we could easily add dummy *enabled controls for all sizes, that can be written to and read back consistently, but the kernel just ignores them when deciding what size to use. It would also simplify the code that populates the controls. Personally though, I'm not convinced of the value of trying to make the controls for every size look identical. What's the real value to the user to pretend that they can select a size that they cannot? What happens when we inevitably want to add some new control in future which only applies to select sizes and there is no good way to fake it for the other sizes? Why can't user space just be expected to rely on the existance of the files rather than on the existance of the directories? As always, I'll go with the majority, but just wanted to register my opinion. Thanks, Ryan > > Hmmmmm >