From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 432D8C3DA49 for ; Tue, 16 Jul 2024 08:31:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 495B96B0085; Tue, 16 Jul 2024 04:31:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 41E1B6B0089; Tue, 16 Jul 2024 04:31:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 297856B008A; Tue, 16 Jul 2024 04:31:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 089046B0085 for ; Tue, 16 Jul 2024 04:31:17 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 6F995C172C for ; Tue, 16 Jul 2024 08:31:16 +0000 (UTC) X-FDA: 82344946152.13.DD16155 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf18.hostedemail.com (Postfix) with ESMTP id 483761C0011 for ; Tue, 16 Jul 2024 08:31:14 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf18.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721118643; a=rsa-sha256; cv=none; b=FmuoSEoSIf6MEVzycgsuG8TvXIJe3jAiY1MPBCHrzk8RZ2ZTQX00sWO2lWsnie9eWBxqza WeJcMci+YX+NLax6THaui843F/WwIOUuX4YeZXZJtMgy0ZkaOiBMEKTwrX+0kVrDw8uEDg buGBw0hm3bsy2l0+BEZYl4mlLlIXBPA= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf18.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721118643; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Qk1lQTcD0nXhZU6iyriwPShBJ8dNpcjaeKcxpp7D7Qg=; b=ZnDu7GWn6WNexLOZv3KKB86a6xbm03CU26J52glMVrumZhPnoNXdzI7NN4XOo8OLWrViqz Cypzj2lsQCbVmgG4jLQToLEXv1c2iwa37cBNH5oH7pNMwKJ+S90P+PRYZJDW+v5jlmgfY5 zVZemEq/CRl8SpNlMYvmcj9uMBwYr/4= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 825E01063; Tue, 16 Jul 2024 01:31:38 -0700 (PDT) Received: from [10.57.77.35] (unknown [10.57.77.35]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0585C3F73F; Tue, 16 Jul 2024 01:31:10 -0700 (PDT) Message-ID: Date: Tue, 16 Jul 2024 09:31:08 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 2/2] mm: mTHP stats for pagecache folio allocations Content-Language: en-GB From: Ryan Roberts To: David Hildenbrand , Lance Yang , Baolin Wang Cc: Andrew Morton , Hugh Dickins , Jonathan Corbet , "Matthew Wilcox (Oracle)" , Barry Song , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20240711072929.3590000-1-ryan.roberts@arm.com> <20240711072929.3590000-3-ryan.roberts@arm.com> <9e0d84e5-2319-4425-9760-2c6bb23fc390@linux.alibaba.com> <756c359e-bb8f-481e-a33f-163c729afa31@redhat.com> <8c32a2fc-252d-406b-9fec-ce5bab0829df@arm.com> In-Reply-To: <8c32a2fc-252d-406b-9fec-ce5bab0829df@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 483761C0011 X-Stat-Signature: sfc7xh9spukixwnkum3rog7836ecdsyk X-Rspam-User: X-HE-Tag: 1721118674-909028 X-HE-Meta: U2FsdGVkX1+OobZmopbuM6Un/a6xFYzhGd+KqfUolkkesZAAHtHBaOsSg6l5kOCiq+EzqWE/+gat/maBNOmgW3Jdas1cF6KsMK8mKlmmWqCDnhHIzUpDcyuWmc5YO9/Exq0Fi5qppc669v4NOSyG15Xe0hoLt/lwaBmkk8wE/ZiVMmWCREgBPcIZx4rB8AGs2SA9QXvMWYA8DcKj64EFWS0P0VMVRAY0c61qDBO/pDfW90YiMK9qJL50bshQFw/66JsIVWtoSJKs5X1bI9yESpel4c/M0XpZ3wFAgJCqdw3BeZ/NJOWmvA9hqMh0x70Kb6UsD0nIzSLGfuKLTl5eNTKi3nZSh75WNI7Q77s5KkDftF7DkBmdeJccjrB8xuHPcoaI/UZpN21DaQxnax2v02W2y6YEWDzGgNPEsFSNiiWQiy7nBOLkA/nn2JsbYIXaXB5hC3fwTaj3PIvIZQjlvEJo6CPY4Shaq2o0khMGI3YOh3ayjXzmMXNjv5bK1sf7aanj2Vn44CjbcFSMEU9aoqrpiTzDzAcup4QaiXkfxJrCZIwRJEWX2VvUATnFyA9D6s8btgQGvOwHGVmWm+9tWBUQ4m0oCayMgKmrYIWfRSWxCPVK2CdOZTtwfDdsuvWKCjVqlkxx0FyksFUGvq+yBkY3VvUdjXviYYzhjU2cxH9S9Q/WaxDYYCawDkPvfD/0LnaW7dbSIWnbTp7LpfFvjlfVtoZQyjJ2+nYdPecJRf7B85cdqcecGLDZ2DJHz0ybIEev/Gj6oW4iaAs547rZkph0632RKwVyQc0zUp5aWiOg8h/wX1qrh5LAir86rk6IRTi3hlbOJAgId5BdnkRCSV+P/l7dIUfcxvlCJEQCWhLaiezniyFMuNVDswjvEwuOgpHORLXzofPPfkwWzwppS2zPSxFe2/S1ILMPRPJH7QtpINv+zXkm3svm5a4CDjhgqFv56cG+lCg2qlDQarl fcvDsA0I nm3NJiD0xsgLd1lrvW0x340Kp0I95xDScODcTwCvscgV0DyW36RM32eoykglpVc0M0FtJGzqTXZxZcqHRnE1PSfxrIDgzB69v1KLskQyKJU02VuSQ4MrgyYTOJ8bZMagk+VsMzF5+pjg203liFDAqZi1jY9gG+L2KstB7Bo++C6kY8Yf9wkDQnK5qN3lu+q6QJ86r X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 13/07/2024 11:45, Ryan Roberts wrote: > On 13/07/2024 02:08, David Hildenbrand wrote: >> On 12.07.24 14:22, Lance Yang wrote: >>> On Fri, Jul 12, 2024 at 11:00 AM Baolin Wang >>> wrote: >>>> >>>> >>>> >>>> On 2024/7/11 15:29, Ryan Roberts wrote: >>>>> Expose 3 new mTHP stats for file (pagecache) folio allocations: >>>>> >>>>>     /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc >>>>>     /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback >>>>>     >>>>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge >>>>> >>>>> This will provide some insight on the sizes of large folios being >>>>> allocated for file-backed memory, and how often allocation is failing. >>>>> >>>>> All non-order-0 (and most order-0) folio allocations are currently done >>>>> through filemap_alloc_folio(), and folios are charged in a subsequent >>>>> call to filemap_add_folio(). So count file_fallback when allocation >>>>> fails in filemap_alloc_folio() and count file_alloc or >>>>> file_fallback_charge in filemap_add_folio(), based on whether charging >>>>> succeeded or not. There are some users of filemap_add_folio() that >>>>> allocate their own order-0 folio by other means, so we would not count >>>>> an allocation failure in this case, but we also don't care about order-0 >>>>> allocations. This approach feels like it should be good enough and >>>>> doesn't require any (impractically large) refactoring. >>>>> >>>>> The existing mTHP stats interface is reused to provide consistency to >>>>> users. And because we are reusing the same interface, we can reuse the >>>>> same infrastructure on the kernel side. The one small wrinkle is that >>>>> the set of folio sizes supported by the pagecache are not identical to >>>>> those supported by anon and shmem; pagecache supports order-1, unlike >>>>> anon and shmem, and the max pagecache order may be less than PMD-size >>>>> (see arm64 with 64K base pages), again unlike anon and shmem. So we now >>>>> create a hugepages-*kB directory for the union of the sizes supported by >>>>> all 3 memory types and populate it with the relevant stats and controls. >>>> >>>> Personally, I like the idea that can help analyze the allocation of >>>> large folios for the page cache. >>>> >>>> However, I have a slight concern about the consistency of the interface. >>>> >>>> For 64K, the fields layout: >>>> ├── hugepages-64kB >>>> │   ├── enabled >>>> │   ├── shmem_enabled >>>> │   └── stats >>>> │       ├── anon_fault_alloc >>>> │       ├── anon_fault_fallback >>>> │       ├── anon_fault_fallback_charge >>>> │       ├── file_alloc >>>> │       ├── file_fallback >>>> │       ├── file_fallback_charge >>>> │       ├── shmem_alloc >>>> │       ├── shmem_fallback >>>> │       ├── shmem_fallback_charge >>>> │       ├── split >>>> │       ├── split_deferred >>>> │       ├── split_failed >>>> │       ├── swpout >>>> │       └── swpout_fallback >>>> >>>> But for 8K (for pagecache), you removed some fields (of course, I >>>> understand why they are not supported). >>>> >>>> ├── hugepages-8kB >>>> │   └── stats >>>> │       ├── file_alloc >>>> │       ├── file_fallback >>>> │       └── file_fallback_charge >>>> >>>> This might not be user-friendly for some user-space parsing tools, as >>>> they lack certain fields for the same pattern interfaces. Of course, >>>> this might not be an issue if we have clear documentation describing the >>>> differences here:) >>>> >>>> Another possible approach is to maintain the same field layout to keep >>>> consistent, but prohibit writing to the fields that are not supported by >>>> the pagecache, and any stats read from them would be 0. >>> >>> I agree that maintaining a uniform field layout, especially at the stats >>> level, might be necessary ;) >>> >>> Keeping a consistent interface could future-proof the design. It allows >>> for the possibility that features not currently supported for 8kB pages >>> might be enabled in the future. >> >> I'll just note that, with shmem/file effectively being disabled for order > 11, >> we'll also have entries there that are effectively unused. > > Indeed, I mentioned that in the commit log :) > >> >> Good question how we want to deal with that (stats are easy, but what about when >> we enable something? Maybe we should document that "enabled" is only effective >> when supported). > > The documentation already says "If enabling multiple hugepage sizes, the kernel > will select the most appropriate enabled size for a given allocation." for anon > THP (and I've added similar wording for my as-yet-unposted patch to add controls > for page cache folio sizes). So I think we could easily add dummy *enabled > controls for all sizes, that can be written to and read back consistently, but > the kernel just ignores them when deciding what size to use. It would also > simplify the code that populates the controls. > > Personally though, I'm not convinced of the value of trying to make the controls > for every size look identical. What's the real value to the user to pretend that > they can select a size that they cannot? What happens when we inevitably want to > add some new control in future which only applies to select sizes and there is > no good way to fake it for the other sizes? Why can't user space just be > expected to rely on the existance of the files rather than on the existance of > the directories? > > As always, I'll go with the majority, but just wanted to register my opinion. Should I assume from the lack of reply on this that everyone else is in favour of adding dummy controls so that all sizes have the same set of controls? If I don't hear anything further, I'll post v2 with dummry controls today or tomorrow. > > Thanks, > Ryan > >> >> Hmmmmm >> >