From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F052BC5AE59 for ; Thu, 29 May 2025 04:03:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8F4CD6B00DD; Thu, 29 May 2025 00:03:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8A4E16B00DE; Thu, 29 May 2025 00:03:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 794B76B00DF; Thu, 29 May 2025 00:03:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 58BB96B00DD for ; Thu, 29 May 2025 00:03:00 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EB1941203AD for ; Thu, 29 May 2025 04:02:59 +0000 (UTC) X-FDA: 83494599678.14.14BED38 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf22.hostedemail.com (Postfix) with ESMTP id AC982C000F for ; Thu, 29 May 2025 04:02:57 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NB4jK8Vx; spf=pass (imf22.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748491377; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zhm2iJuKj8i0RfMXE3b2NN5iB3OT/qdLHRZamrHUNLk=; b=PyFYzgpkDayWL7dwp9mm/h2cWsBPC53SBrstmQUUwSgy/iNynVTWS62GRolxcFoWbnfGyg /tiIWqdQrF2Gpf3KIUmxahTDbTx/MdxXlXmuODQkKcPU40hJOlCFeXQWsWh3s/r/yvAL+o ANspXDRUn4uYJZuScuKKgZ9mrzCVwqo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748491377; a=rsa-sha256; cv=none; b=AtRmwrIaWKGc5TaZBbnV6ChvnSqxg+KeFnNddyWaXKrNbHSfe7RqFHFuGk4Y+aLoO8ay3q sK54i8kO6r+4BNRum6xZkh01llO0p4gHAQmX+UBx2acxMDVy0QhvBMmQXNDkDnzPhFsfPV L7Mo9yeAbyaTBA/Mofy/piXGUjroWvA= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NB4jK8Vx; spf=pass (imf22.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1748491377; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zhm2iJuKj8i0RfMXE3b2NN5iB3OT/qdLHRZamrHUNLk=; b=NB4jK8VxpqLeJAZDPOcsXUJwdwR4LqH9yqMwa0nZK2XCVOCF76thsEr2AGKAcxhDm+uhQv /hpX1QuAhTQLzobOXcwqdbFHPTgH9Rc+aZSmPKvV3LI6ngwFqEnxT5XCRxVvTON03d3o0R mC8QBIv2hzfCGh6S0ZuzV/wQ9xJRocs= Received: from mail-yw1-f200.google.com (mail-yw1-f200.google.com [209.85.128.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-541-Frj0vOzNPru592GfhrbgKA-1; Thu, 29 May 2025 00:02:52 -0400 X-MC-Unique: Frj0vOzNPru592GfhrbgKA-1 X-Mimecast-MFC-AGG-ID: Frj0vOzNPru592GfhrbgKA_1748491372 Received: by mail-yw1-f200.google.com with SMTP id 00721157ae682-70ef969b06aso9329867b3.2 for ; Wed, 28 May 2025 21:02:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748491372; x=1749096172; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zhm2iJuKj8i0RfMXE3b2NN5iB3OT/qdLHRZamrHUNLk=; b=mP9lpvAONLCxwU6T0bse9cFoSMNTrw8QoAyVU0o3u7CvXSc56EyDFkBSRXcu70f/rc fDoytkl5nyzuPQLkEtaXjvoHBfXavmtgg7/5z4ypIbxvADQEEPaj9mxalYxbun7MSIfo iz+NLLzk9SY9SyWDz/twfxXdkVsWHlQzBAuBTB3eGVQfB0wskNBZT5Ma9KyZgLFgPatN 7UukqICzWgs1TKCJWMsS+rYIXLYcmvrthj6S+9VxlWmX7ZYTZEuxrGAX1P8tyXwnHHMj wWoTLc+uRsd7h1rYl+jK2acD0FlFv4KlA0bnwOY6gBaotaPo81h0x8tyYbUW4I7Bo2ZW CO6A== X-Forwarded-Encrypted: i=1; AJvYcCVLQtzsWy77W2JJ+i6toJP7wcteqXS1nL8EIBCC+d7Kd7cTVrrFmn7q2fIFRCG8FsKJDrwTnEgPzA==@kvack.org X-Gm-Message-State: AOJu0YwZDZmsl4D7GLcXuY7kkmEkt8wzSg8xve57c4ylPt37ZIlMONge htCTuM+I1tt3YDbErf0sMX0oLqAcugnxRi+zMZN6AFD1Epeh0uI6exRifjEDnIxiyvMFr3eISrO t5kgbauvRKa3Cj8VXYPE007SbLBnE9exHFrW9yWSG9hvz//99xxZaH98LOrgloQ+oCTQtnJP2pJ KyvHBw/vtVd4GG3liDPdHmYMqpIOU= X-Gm-Gg: ASbGncsLvvrS2TekqvBFhGXFbtvCEAotoyX4ueoAKQ5hdqYnRSjha7tFeTc7ABDzXWS isTWz9RDUnHWIbBkkxnISoSe8uGY9D4JtU9s7mDtFgJfPuKjQ6wB3T2l+DUYBVygz6rwTwug= X-Received: by 2002:a05:690c:6e0a:b0:70c:b534:86b0 with SMTP id 00721157ae682-70f8b50bbabmr12814807b3.13.1748491372150; Wed, 28 May 2025 21:02:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEwGJlK5MRnqhABRt1qihn9KAqIhm+LuBFcmX9h1Q1HwUofuAQIUVYJZCzpdmN/TvwDMjnTcYLYCFc1ZuD3UH8= X-Received: by 2002:a05:690c:6e0a:b0:70c:b534:86b0 with SMTP id 00721157ae682-70f8b50bbabmr12814387b3.13.1748491371794; Wed, 28 May 2025 21:02:51 -0700 (PDT) MIME-Version: 1.0 References: <20250515032226.128900-1-npache@redhat.com> <20250515032226.128900-7-npache@redhat.com> <9c54397f-3cbf-4fa2-bf69-ba89613d355f@linux.alibaba.com> <1f00fdc3-a3a3-464b-8565-4c1b23d34f8d@linux.alibaba.com> In-Reply-To: From: Nico Pache Date: Wed, 28 May 2025 22:02:25 -0600 X-Gm-Features: AX0GCFumAmWLWAjszDpXEY6ePKI2IHSQqMTz5Qce_OEN6FILJEIdGvUcXeg58kI Message-ID: Subject: Re: [PATCH v7 06/12] khugepaged: introduce khugepaged_scan_bitmap for mTHP support To: Baolin Wang Cc: David Hildenbrand , David Rientjes , zokeefe@google.com, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, ziy@nvidia.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, hannes@cmpxchg.org, mhocko@suse.com, rdunlap@infradead.org X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: ReudIvitzPTMblTBu8fxVG8sWYHv2-RUFQlRywXAsiE_1748491372 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: AC982C000F X-Stat-Signature: xoxs88txsofejcswfe1xgddgtb8bhqid X-Rspam-User: X-HE-Tag: 1748491377-357200 X-HE-Meta: U2FsdGVkX1+9qOJDgpA9Sy3MiArtFqZ3ZEVND9uJBjZfclstZkZZeEfp9EabEVmKfnFOd6+9wqhKXAntvfPFSvOqGumCCNUN5WBCl1GGsWYpq+dKztunYMWqp7LfmFIo+MF8riV78ujhoTXrG4v4+1fuwJxncpkZCItvoJrzscbC7hxCdqHJ9bNU3vV2knzrI52wE9g3jGix71NR1XjpnvzbPPqRfcIERpuwPtUeG0bDc7W8d8ndWnDvxLKGGE1vOViWxfc7Lw9rqVVmGpjV7tzVLrEck3beGZNHSv7Q0cd+XV0wZQLCt4l9NkYCTk4HE26iypPz0nVRJ8On5Tss5v4ReFKWdLLFJdEg4ujKQmGz+24ox+E5+KL8S5/nwJMqqQr8wlE5+tEETiMFOHmdyCv/bPBv7WnF5KWLpnmXdjywaSS4kEE81ZW89HcZZECQDP16eexd8Q/vtAw66ZWAexrnN99ZDDUMcK2xIBZd3MNH+imN0KO7MF4BPJAyoShxDhAKTtST/FMc2Co51zy6Ac+j1Ds+9w/HQ5y0yZkxC7DwarjraiTYAbCH4ngAdsllchLkVDD+evKsPTjSNkAvubun3u8aWcnFgJ2T7Tik31D5nqbI/yfif/AlPTuAGYQwGNQ4S6OXw39EzKv9Kmz6smdwz8rS6P69r4vd9/mF+yZ/D7vVoSpk41b28F5sh70S7XMbDyOOvIj9WaiQVA7TwkDPJDZ3bOVzbQZR4qPh+nRdIo6TAcMgfU1D95+qdAyRa3flIYScu/2k64Ls+ZIwwAyj85Vg99xE64o6CUeJrgDphXydG9EOpZmyWsU2IVmHQMjkzvry7eWhgw0yCt+Nfz5uE7n8u4n1mNfANpoUlTs6pQgLXqQr7F44gX/ZGJPpeNNrfSUnXcXcLIBpLoxM+gWvpu2DE70bNnaggleVSuNqWzQjwXCTdvB+29OmTsyBvSKFyCg6bukK+P6SkW+ f7oXlwkn GiUm5GUHIm/Zp4GPS00r+4xz5vZAwOsC5+Wa5FlMwSmjyPnqrDPpSldWz5/17Y+13ux0m032clJwh1OTcKVKtVul2JWo1Csh9L+wQ6Brc+FP7komsgEo/dljsUe9HL5tCKeYkgobbDIg9bB5Y4/ifwvnThBzvnak8MLZ8WIeOytAu6BOh5JMVAiFRHZLoeGgUQbihrprxm6lAPmBnsYGYn/eDDqUapgreeDo5zXo1HK6MJc70W4y3KqakyclT1sWqdKQ0rBSWI+Qi0IFOZnNkNlcSzHyjsC4nNG7MLJvGarfLkdrgTyoH16jpC7Pw+b6em+NdHXwhF0g7iYxOGSAQNwXqC7sbIWPK0zQRcWc6Gt57IbAwMfkf30GURQVVS8qgXzos7LREWQTqS82BPvyprl+LUOQppmE+62EjkuxM9dZrrQ0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, May 28, 2025 at 8:04=E2=80=AFAM Baolin Wang wrote: > > > > On 2025/5/28 17:26, David Hildenbrand wrote: > > On 22.05.25 11:39, Baolin Wang wrote: > >> > >> > >> On 2025/5/21 18:23, Nico Pache wrote: > >>> On Tue, May 20, 2025 at 4:09=E2=80=AFAM Baolin Wang > >>> wrote: > >>>> > >>>> Sorry for late reply. > >>>> > >>>> On 2025/5/17 14:47, Nico Pache wrote: > >>>>> On Thu, May 15, 2025 at 9:20=E2=80=AFPM Baolin Wang > >>>>> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 2025/5/15 11:22, Nico Pache wrote: > >>>>>>> khugepaged scans anons PMD ranges for potential collapse to a > >>>>>>> hugepage. > >>>>>>> To add mTHP support we use this scan to instead record chunks of > >>>>>>> utilized > >>>>>>> sections of the PMD. > >>>>>>> > >>>>>>> khugepaged_scan_bitmap uses a stack struct to recursively scan a > >>>>>>> bitmap > >>>>>>> that represents chunks of utilized regions. We can then determine > >>>>>>> what > >>>>>>> mTHP size fits best and in the following patch, we set this > >>>>>>> bitmap while > >>>>>>> scanning the anon PMD. A minimum collapse order of 2 is used as > >>>>>>> this is > >>>>>>> the lowest order supported by anon memory. > >>>>>>> > >>>>>>> max_ptes_none is used as a scale to determine how "full" an order > >>>>>>> must > >>>>>>> be before being considered for collapse. > >>>>>>> > >>>>>>> When attempting to collapse an order that has its order set to > >>>>>>> "always" > >>>>>>> lets always collapse to that order in a greedy manner without > >>>>>>> considering the number of bits set. > >>>>>>> > >>>>>>> Signed-off-by: Nico Pache > >>>>>> > >>>>>> Sigh. You still haven't addressed or explained the issues I > >>>>>> previously > >>>>>> raised [1], so I don't know how to review this patch again... > >>>>> Can you still reproduce this issue? > >>>> > >>>> Yes, I can still reproduce this issue with today's (5/20) mm-new > >>>> branch. > >>>> > >>>> I've disabled PMD-sized THP in my system: > >>>> [root]# cat /sys/kernel/mm/transparent_hugepage/enabled > >>>> always madvise [never] > >>>> [root]# cat > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled > >>>> always inherit madvise [never] > >>>> > >>>> And I tried calling madvise() with MADV_COLLAPSE for anonymous memor= y, > >>>> and I can still see it collapsing to a PMD-sized THP. > >>> Hi Baolin ! Thank you for your reply and willingness to test again :) > >>> > >>> I didn't realize we were talking about madvise collapse-- this makes > >>> sense now. I also figured out why I could "reproduce" it before. My > >>> script was always enabling the THP settings in two places, and I only > >>> commented out one to test this. But this time I was doing more manual > >>> testing. > >>> > >>> The original design of madvise_collapse ignores the sysfs and > >>> collapses even if you have an order disabled. I believe this behavior > >>> is wrong, but by design. I spent some time playing around with madvis= e > >>> collapses with and w/o my changes. This is not a new thing, I > >>> reproduced the issue in 6.11 (Fedora 41), and I think its been > >>> possible since the inception of madvise collapse 3 years ago. I > >>> noticed a similar behavior on one of my RFC since it was "breaking" > >>> selftests, and the fix was to reincorporate this broken sysfs > >>> behavior. > >> > >> OK. Thanks for the explanation. > >> > >>> 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage > >>> collapse") > >>> "This call is independent of the system-wide THP sysfs settings, but > >>> will fail for memory marked VM_NOHUGEPAGE." > >>> > >>> The second condition holds true (and fails for VM_NOHUGEPAGE), but I > >>> dont know if we actually want madvise_collapse to be independent of > >>> the system-wide. > >> > >> This design principle surprised me a bit, and I failed to find the > >> reason in the commit log. I agree that "never should mean never," and = we > >> should respect the THP/mTHP sysfs setting. Additionally, for the > >> 'shmem_enabled' sysfs interface controlled for shmem/tmpfs, THP collap= se > >> can still be prohibited through the 'deny' configuration. The rules he= re > >> are somewhat confusing. > > > > I recall that we decided to overwrite "VM_NOHUGEPAGE", because the > > assumption is that the same app that triggered MADV_NOHUGEPAGE triggers > > the collapse. So the app decides on its own behavior. > > > > Similarly, allowing for collapsing in a VM without VM_HUGEPAGE in the > > "madvise" mode would be fine. > > > > But in the "never" case, we should just "never" collapse. > > OK. Let's fix the "never" case first. Thanks. Great, I will update that in the next version! >