From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E0B1C0015E for ; Wed, 26 Jul 2023 07:36:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D0E78D0002; Wed, 26 Jul 2023 03:36:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2810C6B0075; Wed, 26 Jul 2023 03:36:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 170EA8D0002; Wed, 26 Jul 2023 03:36:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0A2E06B0071 for ; Wed, 26 Jul 2023 03:36:20 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D5E441A01B4 for ; Wed, 26 Jul 2023 07:36:19 +0000 (UTC) X-FDA: 81052954878.18.89DD13C Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) by imf25.hostedemail.com (Postfix) with ESMTP id E4205A0015 for ; Wed, 26 Jul 2023 07:36:17 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=pavmd9cv; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of itaru.kitayama@gmail.com designates 209.85.216.51 as permitted sender) smtp.mailfrom=itaru.kitayama@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690356978; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XJptixagOkSC+Uu8hQGTwM3D0cABsII7k3b+VO/SYKg=; b=p/ueYk4kuu+nHWcFtanp/qJvAFZZZs47W8y6hiwsQ9+dGCM8LEvuXWr2fEU72NzSxLi3Zq eo6KOJbOrA3mX8/D+zUTspO0N7DmwR+mklXyVqPVE4R5jpyc/yS47Pdgl/s4XukaO5Jowa z5pgyl6o79NuVX4BkxJdAw6HeJq+SSc= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=pavmd9cv; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of itaru.kitayama@gmail.com designates 209.85.216.51 as permitted sender) smtp.mailfrom=itaru.kitayama@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690356978; a=rsa-sha256; cv=none; b=pHTSSoyuKMnpXY+q6xHZdiouXyINWmnrMxIYJRLrFCHRjmVMgoQMgWhAg1axkJohuInVwW ksqUbjxxkWE913DbtGg+YS7hA9yc6NWRdOMmmGU85r84RpxFBe7y4R4nVCqRtJ8vBn7v4x T3EGYoJEtgcEPWBjnfqadR5Nvzrn+Zk= Received: by mail-pj1-f51.google.com with SMTP id 98e67ed59e1d1-26825239890so1671280a91.0 for ; Wed, 26 Jul 2023 00:36:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690356976; x=1690961776; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=XJptixagOkSC+Uu8hQGTwM3D0cABsII7k3b+VO/SYKg=; b=pavmd9cv/UiDnFC+vbM+V/aMVxW8bQh61h0wUH3DrItflfwgaoZmmpTBIMmYvreQdO QM7n42K72WJ+9uxmdXtm6J9Ti5on9CFgOL/ma5eT76y3PXVqcSH4JQQDohGCaXSLvkDV bEmrkge3qXgOQ/msgAkDLDslNXjrzxm58ZeIdzB6E4tE/Nrcr8gDlhGSDGfycKcHSWc9 CveUIQ2yktZnS5sodvjxaqOfsNXWWTtAtI4vt+zMWIHF6/MV2QUFY+E3dpNrOLZONa8m AV0Jd+uazs3z0c7kiVNMvt7Kw7ygA2nMX1d1A2SlbVu0IELGOGdewQhmCikUkFl2+s7p VutQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690356976; x=1690961776; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=XJptixagOkSC+Uu8hQGTwM3D0cABsII7k3b+VO/SYKg=; b=HDdoJlMFXUOi2QYDVekcvDr+G5S7jS4MkaOgCNY7SE9BNZyZX1RZ/wDlavLTw/Q4qs EAO2Ck3jjSqVBS1wH4dgEvvG0vVWsOCT3PhjL7mTqEED1pqsNs78FMorS9gamzRWHlpG 4Zq/w8tUEvL1qzD+f7gwelP8rYlQgC9WzXVOPHunr1PsbgETWTNdQO1SKSmFCeELX0kb 5Usj9axwact5peAPCCxRZ5Npt/kkyKr104UtVYvHC5avK5uqIhumBZYpOT7SE2G9rtId gedPijQ45yBCM3DPaH5Y9XVgJh1tN8bFVJjjghEKzn9TqDfCZOVcjaGvT/suHZj+8KX8 5cRA== X-Gm-Message-State: ABy/qLZRdcFuYbYnVMkemW86SIbzb+HSHeZRqjcNrrrhN9KiOx11ebcR U8vNQTzlOwHcJmwVfLGoqYoB3i3BXbzskFLACUc= X-Google-Smtp-Source: APBJJlH1GtyvZVYxp+Mm2jHFlMvVSXmBHbI6z7js4u7ioaJOa+oCns+f/5gJCnpHeVapc3oV5pSmnduNf5JcbfRa/P8= X-Received: by 2002:a17:90b:224c:b0:268:93d:b93c with SMTP id hk12-20020a17090b224c00b00268093db93cmr1287669pjb.13.1690356976387; Wed, 26 Jul 2023 00:36:16 -0700 (PDT) MIME-Version: 1.0 References: <20230714160407.4142030-1-ryan.roberts@arm.com> <83bb1b99-81d3-0f32-4bf2-032cb512a1a1@arm.com> <2FCD9E8A-D38A-40C4-9825-AE7ECEEFC715@nvidia.com> <34979a4c-0bab-fbb9-f8dd-ab3da816de52@arm.com> In-Reply-To: <34979a4c-0bab-fbb9-f8dd-ab3da816de52@arm.com> From: Itaru Kitayama Date: Wed, 26 Jul 2023 16:36:05 +0900 Message-ID: Subject: Re: [PATCH v3 0/4] variable-order, large folios for anonymous memory To: Ryan Roberts Cc: Andrew Morton , Anshuman Khandual , Catalin Marinas , David Hildenbrand , "Huang, Ying" , "Kirill A. Shutemov" , Luis Chamberlain , Matthew Wilcox , Will Deacon , Yang Shi , Yin Fengwei , Yu Zhao , Zi Yan , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: multipart/alternative; boundary="0000000000001f0bf106015ee7a1" X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: E4205A0015 X-Stat-Signature: jiukxfouab36y6pz9kxerkfydtcf9oaj X-Rspam-User: X-HE-Tag: 1690356977-501250 X-HE-Meta: U2FsdGVkX1+klf98Juiwbzg6zdRNwWCvdSs77M6WW9GyJGqzOo/bTPa4RqpC7TF+oixXRgWVjYJGJJtxXr5gNw2XkABzMZWk7IOsgimIxDgnoFUtLsS07WjNzOJw9A7YyBbCusuCIeFczXQsOX2rYlhiGzpYY9gINrv0FI4kLafppBqabKz9rzcqBQ6zHrb6Uss94lcCj6PRLRYBnAhBxFrYixlspiXQb65bagnh7OdgXlMcizcWHePH/a8OgOtQ0ynZhCmKaxFZGGuS6mLKPwJq27wWqdyRTyn/b53q9tgU6dK7UfH9boWuzbXgpj1oF4w0pYW7KKVObSnqrGutNUDv47MrRDqPGKnYFMN+RIhDUdM77HSw6yijyGWrU0osLMoQGOplplv4LprjhQQGySRPgCJUZ+se6o4LqX7BnA+T/cc+RTvqSjqbElQHtEoGYK4vG+K9ExBTRbnpH/66pUBp0jm03PD2IcohFXLU1Up1GDt/YFIjwYpEIJYS2mIWrzSApd8cNXRpVXUsbcXqUOw/74mDhvBVYvxkYTTvYzTMeo+I07ymvZ5Vpd2wz8awjU4JE1lXrs3j8JmnVk0eAfCHkJ4Y7LRp5QT1QdCPhjrvapiPAaewYJagFzwYZ+es7M6R2F7aWoKtlaWkHiyWBAthEpsexwgoYgeknMMsrQzSNSA59aJ32oQcJYAF+zHIdFi5RJEg75Q2Eu2knR3AFpu/+M4jFjj3s1S4tH3W5FoscN/hsOsaPbsjyOXR7C/z8xPcIJyj3mXBEu3IYT13ZCzg+W62zXYe9ZMc/UlBi2uJuWLgkPU191Avczu2KCIcvrTrLfhPpL0mRQguCojUf9EB5KrYVNMWvMVoid30VrYGpebaayZRV8eTRQtwHXk+HOE/LRKtlpzY23aIbbtDQ4Jly+mWZoml+vo0mIJ/K47Wu8RL+kxljBgoprYnFJn8T1cJjA68cxen2rIelhu /UeuDqGw EBfXF0GZaxYiurKL8szPXDBqYLHNJ3baPv03xRwEi0y8qU8kho8ZE5E4TReSrdwvxDA0t53OaQ9AeulNyLNMa+tR1rXNYsJbL7xXb+YV0b6L5l3i0EUXnYHhzNdLxK0tNqVcNlkWxrqZUiN4CbxWFCQrVFbAt7bdNjIDJd/xqFNlA/+d/d6yIRMh4N66vymeX8v5uN9Qx5aKAWrwMwElK/Cg3oufPA53uBiE8Q5tjV+KpqAxuYFgMgetcuVmdh6+o09hTl8DepsMH7qa1tEFg1Iazn7NmZDD7Ue4iCNhwsO8lbq9fm6lgXk4U8OoqTZU5gk/fDBJu+Ck7Vge4FY8L7roJa0l2nxWXPbv6Jla8OaMSCuTeK9srAtueabhNW6XtrZ1Yyoy3Eblv0mXLDucp0hCYJNFL1FfpOG+6w4BxwIBazs1mv7MA3Q4fKO6zoIGXY9RWhVOF8IXD+uli0Sf/BYq16ky1PC4+mER9QYerfDANafjiP7zw2xxfWI1NtyQk5it7yatDGS53PZS+QHkk0IJeMRwNFTCc8zy351NMMb9maTvNAFuERVsFdnJn0e36B1kE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --0000000000001f0bf106015ee7a1 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Ryan, Do you have a kselfrest code for this new feature? I=E2=80=99d like to test it out on FVP when I have the chance. On Tue, Jul 25, 2023 at 0:42 Ryan Roberts wrote: > On 24/07/2023 15:58, Zi Yan wrote: > > On 24 Jul 2023, at 7:59, Ryan Roberts wrote: > > > >> On 14/07/2023 17:04, Ryan Roberts wrote: > >>> Hi All, > >>> > >>> This is v3 of a series to implement variable order, large folios for > anonymous > >>> memory. (currently called "FLEXIBLE_THP") The objective of this is to > improve > >>> performance by allocating larger chunks of memory during anonymous > page faults. > >>> See [1] and [2] for background. > >> > >> A question for anyone that can help; I'm preparing v4 and as part of > that am > >> running the mm selftests, now that I've fixed them up to run reliably > for > >> arm64. This is showing 2 regressions vs the v6.5-rc3 baseline: > >> > >> 1) khugepaged test fails here: > >> # Run test: collapse_max_ptes_none (khugepaged:anon) > >> # Maybe collapse with max_ptes_none exceeded.... Fail > >> # Unexpected huge page > >> > >> 2) split_huge_page_test fails with: > >> # Still AnonHugePages not split > >> > >> I *think* (but haven't yet verified) that (1) is due to khugepaged > ignoring > >> non-order-0 folios when looking for candidates to collapse. Now that w= e > have > >> large anon folios, the memory allocated by the test is in large folios > and > >> therefore does not get collapsed. We understand this issue, and I > believe > >> DavidH's new scheme for determining exclusive vs shared should give us > the tools > >> to solve this. > >> > >> But (2) is weird. If I run this test on its own immediately after > booting, it > >> passes. If I then run the khugepaged test, then re-run this test, it > fails. > >> > >> The test is allocating 4 hugepages, then requesting they are split > using the > >> debugfs interface. Then the test looks at /proc/self/smaps to check th= at > >> AnonHugePages is back to 0. > >> > >> In both the passing and failing cases, the kernel thinks that it has > >> successfully split the pages; the debug logs in split_huge_pages_pid() > confirm > >> this. In the failing case, I wonder if somehow khugepaged could be > immediately > >> re-collapsing the pages before user sapce can observe the split? > Perhaps the > >> failed khugepaged test has left khugepaged in an "awake" state and it > >> immediately pounces? > > > > This is more likely to be a stats issue. Have you checked smap to see i= f > > AnonHugePages is 0 KB by placing a getchar() before the > exit(EXIT_FAILURE)? > > Yes - its still 8192K. But looking at the code that value is determined > from the > fact that there is a PMD block mapping present. And the split definitely > succeeded so something must have re-collapsed it. > > Looking into the khugepaged test suite, it saves the thp and khugepaged > settings > out of sysfs, modifies them for the tests, then restores them when > finished. But > it doesn't restore if exiting early (due to failure). It changes the > settings > for alloc_sleep_millisecs and scan_sleep_millisecs from a large number of > seconds to 10 ms, for example. So I'm pretty sure this is the culprit. > > > > Since split_huge_page_test checks that stats to make sure the split > indeed > > happened. > > > > -- > > Best Regards, > > Yan, Zi > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > --0000000000001f0bf106015ee7a1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Ryan,
Do you have a kselfrest code= for this new feature?
I=E2=80=99d like to test it o= ut on FVP when I have the chance.

=
On Tue, Jul 25, 2023 at 0:42 Ryan Rob= erts <ryan.roberts@arm.com&g= t; wrote:
On 24/07/2023 15:58, Zi Y= an wrote:
> On 24 Jul 2023, at 7:59, Ryan Roberts wrote:
>
>> On 14/07/2023 17:04, Ryan Roberts wrote:
>>> Hi All,
>>>
>>> This is v3 of a series to implement variable order, large foli= os for anonymous
>>> memory. (currently called "FLEXIBLE_THP") The object= ive of this is to improve
>>> performance by allocating larger chunks of memory during anony= mous page faults.
>>> See [1] and [2] for background.
>>
>> A question for anyone that can help; I'm preparing v4 and as p= art of that am
>> running the mm selftests, now that I've fixed them=C2=A0 up to= run reliably for
>> arm64. This is showing 2 regressions vs the v6.5-rc3 baseline:
>>
>> 1) khugepaged test fails here:
>> # Run test: collapse_max_ptes_none (khugepaged:anon)
>> # Maybe collapse with max_ptes_none exceeded.... Fail
>> # Unexpected huge page
>>
>> 2) split_huge_page_test fails with:
>> # Still AnonHugePages not split
>>
>> I *think* (but haven't yet verified) that (1) is due to khugep= aged ignoring
>> non-order-0 folios when looking for candidates to collapse. Now th= at we have
>> large anon folios, the memory allocated by the test is in large fo= lios and
>> therefore does not get collapsed. We understand this issue, and I = believe
>> DavidH's new scheme for determining exclusive vs shared should= give us the tools
>> to solve this.
>>
>> But (2) is weird. If I run this test on its own immediately after = booting, it
>> passes. If I then run the khugepaged test, then re-run this test, = it fails.
>>
>> The test is allocating 4 hugepages, then requesting they are split= using the
>> debugfs interface. Then the test looks at /proc/self/smaps to chec= k that
>> AnonHugePages is back to 0.
>>
>> In both the passing and failing cases, the kernel thinks that it h= as
>> successfully split the pages; the debug logs in split_huge_pages_p= id() confirm
>> this. In the failing case, I wonder if somehow khugepaged could be= immediately
>> re-collapsing the pages before user sapce can observe the split? P= erhaps the
>> failed khugepaged test has left khugepaged in an "awake"= state and it
>> immediately pounces?
>
> This is more likely to be a stats issue. Have you checked smap to see = if
> AnonHugePages is 0 KB by placing a getchar() before the exit(EXIT_FAIL= URE)?

Yes - its still 8192K. But looking at the code that value is determined fro= m the
fact that there is a PMD block mapping present. And the split definitely succeeded so something must have re-collapsed it.

Looking into the khugepaged test suite, it saves the thp and khugepaged set= tings
out of sysfs, modifies them for the tests, then restores them when finished= . But
it doesn't restore if exiting early (due to failure). It changes the se= ttings
for alloc_sleep_millisecs and scan_sleep_millisecs from a large number of seconds to 10 ms, for example. So I'm pretty sure this is the culprit.<= br>

> Since split_huge_page_test checks that stats to make sure the split in= deed
> happened.
>
> --
> Best Regards,
> Yan, Zi


_______________________________________________
linux-arm-kernel mailing list
l= inux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listi= nfo/linux-arm-kernel
--0000000000001f0bf106015ee7a1--