From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED7CAC3ABB2 for ; Wed, 28 May 2025 14:04:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 62B166B0082; Wed, 28 May 2025 10:04:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 602156B0088; Wed, 28 May 2025 10:04:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53F876B0089; Wed, 28 May 2025 10:04:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 360A46B0082 for ; Wed, 28 May 2025 10:04:39 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 93ED8140C06 for ; Wed, 28 May 2025 14:04:38 +0000 (UTC) X-FDA: 83492487036.25.BEAA24F Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) by imf14.hostedemail.com (Postfix) with ESMTP id 24F7F100004 for ; Wed, 28 May 2025 14:04:34 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=e9SfAFtQ; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf14.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.124 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748441076; a=rsa-sha256; cv=none; b=4UAdKRO4xQiTB0TIBx+KwBBjC7k0OUebYtC8QXGIcXRtveIyc2cZH5c8SaDzq/I/RDhDkB i8YVqGCtneUuDXFZ66xv1aRo9k4wVKavymz9l35T74vsGX2Y9KoMxYySZ+ObmLCwaXF5Wp 1E6EQ/Kwk5uH8/FhsfYenmRl95MLXFE= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=e9SfAFtQ; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf14.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.124 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748441076; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VS4Pm+zKew2qwigMu2CJywp/34i9y6xWwalLXDrUznQ=; b=LBEgbuMHrRqETgyfc5Exn5W6KTVBBwe5nFDIwVEjgOV2HzItNVzAPek854D5YI7Hm8LHtU F6jWTyu/f+7hZi5GRjL+7OoKZSC7C8BxvIih1xpEedk0Ye/s9+cWYS4I5GEnB5Q1mj4V+b x7JovUgpIjrzfPbGfjFj3IxuaQR/cPM= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1748441070; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=VS4Pm+zKew2qwigMu2CJywp/34i9y6xWwalLXDrUznQ=; b=e9SfAFtQ6NYsNb1g33Tj/7j5sytkxR3N4oiEVYcPiHTHom8jb9c/NsQ+SsA26PT28u6tGc3lJl168ANlVrbaQabLe3L7ey0H8kY0pE6PjY0x+Xbbm9mmuFhujsVKYcVL8C+WlTY18aYYu6Dqg3sW2tpjwlLaRZ0tneDF2b39qw4= Received: from 30.39.222.111(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WcDdxLt_1748441065 cluster:ay36) by smtp.aliyun-inc.com; Wed, 28 May 2025 22:04:26 +0800 Message-ID: Date: Wed, 28 May 2025 22:04:24 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v7 06/12] khugepaged: introduce khugepaged_scan_bitmap for mTHP support To: David Hildenbrand , Nico Pache , David Rientjes , zokeefe@google.com Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, ziy@nvidia.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, hannes@cmpxchg.org, mhocko@suse.com, rdunlap@infradead.org References: <20250515032226.128900-1-npache@redhat.com> <20250515032226.128900-7-npache@redhat.com> <9c54397f-3cbf-4fa2-bf69-ba89613d355f@linux.alibaba.com> <1f00fdc3-a3a3-464b-8565-4c1b23d34f8d@linux.alibaba.com> From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 24F7F100004 X-Stat-Signature: rqk4p6t7e5933kes1nhkuizx9isstxbo X-Rspam-User: X-HE-Tag: 1748441074-984882 X-HE-Meta: U2FsdGVkX1/UwvapqiDIo9ScMomF7Kx53+7dJ0NAJBg9c9Tf2+ztwsDwUGTGJcUL9CTWWlGyWLPYzC1aMXc8UoRlp6umAyDiOU6fSawZ44F7v8kvHoAZmK2c25HpGVwgd5QIGHZOJsAcySBpNwhI7bHO0Pit+nFs54Ugqp2NDQVXeOi8NHdXQlYI3672Lio+yvp3L8Od7eNf4XxJhM27ql9xPCis8+Eo66wupONsJa+dMP9Y9KfZW2K9KChy/CxDenhZyxm98pfbvc8r5gU6WNgHfmSK4hoDcy+J/Z2RSKlXWdJn3DhzBfc/Plg7exFn7awXhqi+LV5GCS+/Zp/12OpREsijFU7BS0eQ2Ew2mDAJ+sxMK+nT9j9IATeVacGVMKvUoxSqy3nxiLwYKqmwR3ePZopek1h/IPMisuyfvkMGWei5YcH4jqjN4ntZzyxQMuR0jJwnFt7zu9eBbEaFcESsgNc3kRnBR59rArqHm38cSd+sub3RfiunKrlV/PI+93wzPmaeqeH9JGjSfJ64VSOsXU2Oc5nBcXJnDyXB9/zgLOQgCQ3XbMCpLRQkO8U6hyIEQI2dwAamaUL1GycC73/V+WHyF89Z6ZVQPnDZ5LoVWSJTzdnCNPca68Jw72sIkms8Vgr7RW9dpGd7OvDZ2c+w59wglGGc7As++NlBI3mfZ7WGt9XD1sGJGvdwS2CdNFFBKZf8VUkx8uPrau2afrEaTWrYOm3rOyLy2lETT+AlnOTsjeyW2UuXgUxT0JGCB/GSF0s7q3B9U9DnfKNrfiTBsJgm5WZxBiiuXlNLiOuGsnI/APkoXvuXhHIpdSspgZQ9BO5CLvFblESSS+BEj1wqt4DcnpISKJCZU5POdCPVqNRHkYLe9daUlAJr4A+BSrnvlqI6VID02G8cHNEBgzBLFnmwHaP4veNy09zLMlyaysdlAiKugiVEXwlw9qjaEkg4ADrlS8OGusY3PB8 IyqM7sZr JOhjJ/6gd86nPSO+nT3NEekmSMlUGPSe/yrVhY9zPEKXKZEslngvXoxjwFjtUBYwElSwRAKYqVuXnBAPfW2Uk3N08huJ4xWba9iL7Gu5X6/zV7ltVwvC2WKEdJSRQutncKwrKRf/YDQkX58dQ92pOc3yuo1i89pH4rYl9iy+7OzUejSCJ/Sa5+Y1kmhNtb4WPUbRLY7eAU5wh45yIOfnYO+T7a2fOs2BdaU6QAy9+LGRbl8MW8l+0GLpdff3zdHSu0LCE7f5Y2anLxeLRfbq13pZkkgqU7eQQkS5gl5mha6xhXGrIzuAuQkzOcUbdIRpnw2t83ewpSLiZdkTozswoRvwCvc34Qt5bbyKU3kL2rCbMP7gxgZxGUxeDBCx4G2PC9e0xKZw6Jdy1DsyTCjXKJANrc4YIfV2G/d5zyCWfchJjTfZyo46ewFlfauCexKv7vY/b2vOXkUZIwz92x62jY0eAJKVfWtgTBjrY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/5/28 17:26, David Hildenbrand wrote: > On 22.05.25 11:39, Baolin Wang wrote: >> >> >> On 2025/5/21 18:23, Nico Pache wrote: >>> On Tue, May 20, 2025 at 4:09 AM Baolin Wang >>> wrote: >>>> >>>> Sorry for late reply. >>>> >>>> On 2025/5/17 14:47, Nico Pache wrote: >>>>> On Thu, May 15, 2025 at 9:20 PM Baolin Wang >>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 2025/5/15 11:22, Nico Pache wrote: >>>>>>> khugepaged scans anons PMD ranges for potential collapse to a >>>>>>> hugepage. >>>>>>> To add mTHP support we use this scan to instead record chunks of >>>>>>> utilized >>>>>>> sections of the PMD. >>>>>>> >>>>>>> khugepaged_scan_bitmap uses a stack struct to recursively scan a >>>>>>> bitmap >>>>>>> that represents chunks of utilized regions. We can then determine >>>>>>> what >>>>>>> mTHP size fits best and in the following patch, we set this >>>>>>> bitmap while >>>>>>> scanning the anon PMD. A minimum collapse order of 2 is used as >>>>>>> this is >>>>>>> the lowest order supported by anon memory. >>>>>>> >>>>>>> max_ptes_none is used as a scale to determine how "full" an order >>>>>>> must >>>>>>> be before being considered for collapse. >>>>>>> >>>>>>> When attempting to collapse an order that has its order set to >>>>>>> "always" >>>>>>> lets always collapse to that order in a greedy manner without >>>>>>> considering the number of bits set. >>>>>>> >>>>>>> Signed-off-by: Nico Pache >>>>>> >>>>>> Sigh. You still haven't addressed or explained the issues I >>>>>> previously >>>>>> raised [1], so I don't know how to review this patch again... >>>>> Can you still reproduce this issue? >>>> >>>> Yes, I can still reproduce this issue with today's (5/20) mm-new >>>> branch. >>>> >>>> I've disabled PMD-sized THP in my system: >>>> [root]# cat /sys/kernel/mm/transparent_hugepage/enabled >>>> always madvise [never] >>>> [root]# cat >>>> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled >>>> always inherit madvise [never] >>>> >>>> And I tried calling madvise() with MADV_COLLAPSE for anonymous memory, >>>> and I can still see it collapsing to a PMD-sized THP. >>> Hi Baolin ! Thank you for your reply and willingness to test again :) >>> >>> I didn't realize we were talking about madvise collapse-- this makes >>> sense now. I also figured out why I could "reproduce" it before. My >>> script was always enabling the THP settings in two places, and I only >>> commented out one to test this. But this time I was doing more manual >>> testing. >>> >>> The original design of madvise_collapse ignores the sysfs and >>> collapses even if you have an order disabled. I believe this behavior >>> is wrong, but by design. I spent some time playing around with madvise >>> collapses with and w/o my changes. This is not a new thing, I >>> reproduced the issue in 6.11 (Fedora 41), and I think its been >>> possible since the inception of madvise collapse 3 years ago. I >>> noticed a similar behavior on one of my RFC since it was "breaking" >>> selftests, and the fix was to reincorporate this broken sysfs >>> behavior. >> >> OK. Thanks for the explanation. >> >>> 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage >>> collapse") >>> "This call is independent of the system-wide THP sysfs settings, but >>> will fail for memory marked VM_NOHUGEPAGE." >>> >>> The second condition holds true (and fails for VM_NOHUGEPAGE), but I >>> dont know if we actually want madvise_collapse to be independent of >>> the system-wide. >> >> This design principle surprised me a bit, and I failed to find the >> reason in the commit log. I agree that "never should mean never," and we >> should respect the THP/mTHP sysfs setting. Additionally, for the >> 'shmem_enabled' sysfs interface controlled for shmem/tmpfs, THP collapse >> can still be prohibited through the 'deny' configuration. The rules here >> are somewhat confusing. > > I recall that we decided to overwrite "VM_NOHUGEPAGE", because the > assumption is that the same app that triggered MADV_NOHUGEPAGE triggers > the collapse. So the app decides on its own behavior. > > Similarly, allowing for collapsing in a VM without VM_HUGEPAGE in the > "madvise" mode would be fine. > > But in the "never" case, we should just "never" collapse. OK. Let's fix the "never" case first. Thanks.