From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12593C433DB for ; Tue, 16 Feb 2021 09:01:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 717A264DF0 for ; Tue, 16 Feb 2021 09:01:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 717A264DF0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F0EB58D015F; Tue, 16 Feb 2021 04:01:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EBEB48D0140; Tue, 16 Feb 2021 04:01:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DADAF8D015F; Tue, 16 Feb 2021 04:01:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0115.hostedemail.com [216.40.44.115]) by kanga.kvack.org (Postfix) with ESMTP id C48678D0140 for ; Tue, 16 Feb 2021 04:01:47 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 863EA1837E1A6 for ; Tue, 16 Feb 2021 09:01:47 +0000 (UTC) X-FDA: 77823538254.30.unit71_5f0004e27642 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 6083D18157669 for ; Tue, 16 Feb 2021 09:01:47 +0000 (UTC) X-HE-Tag: unit71_5f0004e27642 X-Filterd-Recvd-Size: 8994 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Tue, 16 Feb 2021 09:01:46 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id ECD891FB; Tue, 16 Feb 2021 01:01:45 -0800 (PST) Received: from [192.168.0.130] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9738A3F694; Tue, 16 Feb 2021 01:01:41 -0800 (PST) From: Anshuman Khandual Subject: Re: [PATCH 0/3] mm/page_alloc: Fix pageblock_order with HUGETLB_PAGE_SIZE_VARIABLE To: David Hildenbrand , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, akpm@linux-foundation.org, will@kernel.org Cc: Robin Murphy , Marek Szyprowski , Christoph Hellwig , Mark Rutland , iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org References: <1613024531-19040-1-git-send-email-anshuman.khandual@arm.com> <683c812a-ce3d-ef74-10d1-eaf8a3ae93d4@redhat.com> Message-ID: <0609df35-dd69-54a7-e839-2d97d77e0016@arm.com> Date: Tue, 16 Feb 2021 14:32:04 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2/12/21 3:09 PM, David Hildenbrand wrote: > On 12.02.21 08:02, Anshuman Khandual wrote: >> >> On 2/11/21 2:07 PM, David Hildenbrand wrote: >>> On 11.02.21 07:22, Anshuman Khandual wrote: >>>> The following warning gets triggered while trying to boot a 64K page= size >>>> without THP config kernel on arm64 platform. >>>> >>>> WARNING: CPU: 5 PID: 124 at mm/vmstat.c:1080 __fragmentation_index+0= xa4/0xc0 >>>> Modules linked in: >>>> CPU: 5 PID: 124 Comm: kswapd0 Not tainted 5.11.0-rc6-00004-ga0ea7d62= 002 #159 >>>> Hardware name: linux,dummy-virt (DT) >>>> [=C2=A0=C2=A0=C2=A0 8.810673] pstate: 20400005 (nzCv daif +PAN -UAO = -TCO BTYPE=3D--) >>>> [=C2=A0=C2=A0=C2=A0 8.811732] pc : __fragmentation_index+0xa4/0xc0 >>>> [=C2=A0=C2=A0=C2=A0 8.812555] lr : fragmentation_index+0xf8/0x138 >>>> [=C2=A0=C2=A0=C2=A0 8.813360] sp : ffff0000864079b0 >>>> [=C2=A0=C2=A0=C2=A0 8.813958] x29: ffff0000864079b0 x28: 00000000000= 00372 >>>> [=C2=A0=C2=A0=C2=A0 8.814901] x27: 0000000000007682 x26: ffff8000135= b3948 >>>> [=C2=A0=C2=A0=C2=A0 8.815847] x25: 1fffe00010c80f48 x24: 00000000000= 00000 >>>> [=C2=A0=C2=A0=C2=A0 8.816805] x23: 0000000000000000 x22: 00000000000= 0000d >>>> [=C2=A0=C2=A0=C2=A0 8.817764] x21: 0000000000000030 x20: ffff0005ffc= b4d58 >>>> [=C2=A0=C2=A0=C2=A0 8.818712] x19: 000000000000000b x18: 00000000000= 00000 >>>> [=C2=A0=C2=A0=C2=A0 8.819656] x17: 0000000000000000 x16: 00000000000= 00000 >>>> [=C2=A0=C2=A0=C2=A0 8.820613] x15: 0000000000000000 x14: ffff8000114= c6258 >>>> [=C2=A0=C2=A0=C2=A0 8.821560] x13: ffff6000bff969ba x12: 1fffe000bff= 969b9 >>>> [=C2=A0=C2=A0=C2=A0 8.822514] x11: 1fffe000bff969b9 x10: ffff6000bff= 969b9 >>>> [=C2=A0=C2=A0=C2=A0 8.823461] x9 : dfff800000000000 x8 : ffff0005ffc= b4dcf >>>> [=C2=A0=C2=A0=C2=A0 8.824415] x7 : 0000000000000001 x6 : 0000000041b= 58ab3 >>>> [=C2=A0=C2=A0=C2=A0 8.825359] x5 : ffff600010c80f48 x4 : dfff8000000= 00000 >>>> [=C2=A0=C2=A0=C2=A0 8.826313] x3 : ffff8000102be670 x2 : 00000000000= 00007 >>>> [=C2=A0=C2=A0=C2=A0 8.827259] x1 : ffff000086407a60 x0 : 00000000000= 0000d >>>> [=C2=A0=C2=A0=C2=A0 8.828218] Call trace: >>>> [=C2=A0=C2=A0=C2=A0 8.828667]=C2=A0 __fragmentation_index+0xa4/0xc0 >>>> [=C2=A0=C2=A0=C2=A0 8.829436]=C2=A0 fragmentation_index+0xf8/0x138 >>>> [=C2=A0=C2=A0=C2=A0 8.830194]=C2=A0 compaction_suitable+0x98/0xb8 >>>> [=C2=A0=C2=A0=C2=A0 8.830934]=C2=A0 wakeup_kcompactd+0xdc/0x128 >>>> [=C2=A0=C2=A0=C2=A0 8.831640]=C2=A0 balance_pgdat+0x71c/0x7a0 >>>> [=C2=A0=C2=A0=C2=A0 8.832327]=C2=A0 kswapd+0x31c/0x520 >>>> [=C2=A0=C2=A0=C2=A0 8.832902]=C2=A0 kthread+0x224/0x230 >>>> [=C2=A0=C2=A0=C2=A0 8.833491]=C2=A0 ret_from_fork+0x10/0x30 >>>> [=C2=A0=C2=A0=C2=A0 8.834150] ---[ end trace 472836f79c15516b ]--- >>>> >>>> This warning comes from __fragmentation_index() when the requested o= rder >>>> is greater than MAX_ORDER. >>>> >>>> static int __fragmentation_index(unsigned int order, >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct contig_page_info *info) >>>> { >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long= requested =3D 1UL << order; >>>> >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (WARN_ON_O= NCE(order >=3D MAX_ORDER)) <=3D=3D=3D=3D=3D Triggered here >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return 0; >>>> >>>> Digging it further reveals that pageblock_order has been assigned a = value >>>> which is greater than MAX_ORDER failing the above check. But why thi= s >>>> happened ? Because HUGETLB_PAGE_ORDER for the given config on arm64 = is >>>> greater than MAX_ORDER. >>>> >>>> The solution involves enabling HUGETLB_PAGE_SIZE_VARIABLE which woul= d make >>>> pageblock_order a variable instead of constant HUGETLB_PAGE_ORDER. B= ut that >>>> change alone also did not really work as pageblock_order still got a= ssigned >>>> as HUGETLB_PAGE_ORDER in set_pageblock_order(). HUGETLB_PAGE_ORDER n= eeds to >>>> be less than MAX_ORDER for its appropriateness as pageblock_order ot= herwise >>>> just fallback to MAX_ORDER - 1 as before. While here it also fixes a= build >>>> problem via type casting MAX_ORDER in rmem_cma_setup(). >>> >>> I'm wondering, is there any real value in allowing FORCE_MAX_ZONEORDE= R to be "11" with ARM64_64K_PAGES/ARM64_16K_PAGES? >> >> MAX_ORDER should be as high as would be required for the current confi= g. >> Unless THP is enabled, there is no need for it to be any higher than 1= 1. >> But I might be missing historical reasons around this as well. Probabl= y >> others from arm64 could help here. >=20 > Theoretically yes, practically no. If nobody cares about a configuratio= n, no need to make the code more complicated for that configuration. >=20 >> >>> >>> Meaning: are there any real use cases that actually build a kernel wi= thout TRANSPARENT_HUGEPAGE and with ARM64_64K_PAGES/ARM64_16K_PAGES? >> >> THP is always optional. Besides kernel builds without THP should alway= s >> be supported. Assuming that all builds will have THP enabled, might no= t >> be accurate. >> >>> >>> As builds are essentially broken, I assume this is not that relevant?= Or how long has it been broken? >> >> Git blame shows that it's been there for some time now. But how does >> that make this irrelevant ? A problem should be fixed nonetheless. >=20 > When exactly did I say not to fix it? I'm saying if nobody uses it, we = might be able to simplify. I might have misunderstood the relevance of a problematic configuration wrt the need for it to be fixed. My bad, never mind :) >=20 >> >>> >>> It might be easier to just drop the "TRANSPARENT_HUGEPAGE" part from = the FORCE_MAX_ZONEORDER config. >>> >> >> Not sure if it would be a good idea to unnecessarily have larger MAX_O= RDER >> value for a given config. But I might be missing other contexts here. >=20 > My point is: keep it simple if there is no need to make it complicated.= If these arm64 variants are the only cases where we run into that issue = and nobody uses them ("hat it's been there for some time now"), why make = stuff complicated? >=20 > The current code seems to assume that HUGETLB_PAGE_ORDER <=3D MAX_ORDER= . Instead of changing that for optimizing an unused use case (it is broke= n), just simplify the arm64 conditions. I'd even say add a Will try and change MAX_ORDER on arm64. >=20 > /* > =C2=A0* Some code assumes that HUGETLB_PAGE_ORDER <=3D MAX_ORDER. For n= ow, only > =C2=A0* !TRANSPARENT_HUGEPAGE could lead in archs creating such setups. > =C2=A0* Forbid it for now; anybody that has a valid use case has to san= itize > =C2=A0* the code. > =C2=A0*/ > BUILD_BUG_ON(HUGETLB_PAGE_ORDER <=3D MAX_ORDER); Also add another, might have to find a proper place to put these. BUILD_BUG_ON(HPAGE_PMD_ORDER <=3D MAX_ORDER) >=20 >=20 > But again, if there are valid use cases then sure, let's make the code = fully compatible with HUGETLB_PAGE_ORDER > MAX_ORDER. Given that gigantic HugeTLB allocation can fallback on alloc_contig_pages= () or CMA if/when available, is there a real need for HUGETLB_PAGE_ORDER to = be upto MAX_ORDER, used as pageblock_order or as COMPACTION_HPAGE_ORDER ? Wi= th gigantic HugeTLB pages being available, HUGETLB_PAGE_ORDER seems to be ju= st detached from the buddy allocator. But I am not sure, will keep looking. >=20 >=20 > BTW: can we be sure that you caught all issues? For example: >=20 > #define COMPACTION_HPAGE_ORDER=C2=A0 HUGETLB_PAGE_ORDER >=20 > I'm not sure if it will work as expected if HUGETLB_PAGE_ORDER > MAX_OR= DER ... fill_contig_page_info() will never find any suitable free blocks = ... >=20 Right, there is an assumption that COMPACTION_HPAGE_ORDER < MAX_ORDER in fill_contig_page_info() and will never find any suitable free blocks. Thi= s along with pageblock_order needs to change when HUGETLB_PAGE_ORDER can be greater than equal to MAX_ORDER. - Anshuman