From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 752E3C021BE for ; Thu, 27 Feb 2025 09:41:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA215280003; Thu, 27 Feb 2025 04:41:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C526D280002; Thu, 27 Feb 2025 04:41:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B1954280003; Thu, 27 Feb 2025 04:41:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 94187280002 for ; Thu, 27 Feb 2025 04:41:23 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 18140C0BCF for ; Thu, 27 Feb 2025 09:41:23 +0000 (UTC) X-FDA: 83165231646.09.93EBB6B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 0162840006 for ; Thu, 27 Feb 2025 09:41:20 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="BPUnD/Cx"; spf=pass (imf04.hostedemail.com: domain of bhe@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740649281; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0sxwE5Psp/Z94B64CXRquaBwvBOwmyUSTRgTHm6dlmk=; b=sNKshwEJRK8LBFlo0iO12CNMtxSKqVb0uAO7+bS7f+DxgODYWtCATTSiTi4vYipWTRtdsX bzlNpCk53ECC021mZt4R+tb3RgcyrkX+9f4OlthsIPW6HhtbgDB5hBZHGdGKI84+gyDJnm ym5DBlN9m75Iu3+l/Lu8XG9ycg62dU8= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="BPUnD/Cx"; spf=pass (imf04.hostedemail.com: domain of bhe@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740649281; a=rsa-sha256; cv=none; b=l26mi/XeksTuvGpKdHwhCXNiYBRiszT6nWM8Isn+pXWca6HIBhhO3a+v6XkriikgbT5RmT WDnTetTVFjQlLHWAzKjVpaaDwDrT7GxHYnB2iadwUUn76XKNpgn91wIJ0rTL+lS8PIHwRG pPGdbCjSJ9p90qr+bHeKb+nDKkcZacI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740649280; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0sxwE5Psp/Z94B64CXRquaBwvBOwmyUSTRgTHm6dlmk=; b=BPUnD/Cx13rkw3NrLmJS9IoM2SwYS/BjjwhAfHdM8QaB0y2e+xW+t1OYgAb68IULXlQ3gi tWzgFkCEhc46ADkrDh776V78JRouBomgiKrawlT9jwiGIgEs7cYKUUwDdPGhezbWf0ipeH wMp26Db9c47rAPU+EF7OpYtgzCR4YE4= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-495--8lvYtOdO2KtvFvRAwFjuw-1; Thu, 27 Feb 2025 04:41:16 -0500 X-MC-Unique: -8lvYtOdO2KtvFvRAwFjuw-1 X-Mimecast-MFC-AGG-ID: -8lvYtOdO2KtvFvRAwFjuw_1740649275 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id CF4651954B1B; Thu, 27 Feb 2025 09:41:14 +0000 (UTC) Received: from localhost (unknown [10.72.112.31]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 827BD19560AE; Thu, 27 Feb 2025 09:41:13 +0000 (UTC) Date: Thu, 27 Feb 2025 17:41:08 +0800 From: Baoquan He To: Michal Hocko Cc: Gabriel Krisman Bertazi , akpm@linux-foundation.org, linux-mm@kvack.org, Mel Gorman , Vlastimil Babka Subject: Re: [PATCH] Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone" Message-ID: References: <20250226032258.234099-1-krisman@suse.de> MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: A63A8JNrsHe1MD7kzHOfeA9LCw2um6TBLqJwTLOaB2w_1740649275 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspam-User: X-Rspamd-Queue-Id: 0162840006 X-Stat-Signature: r4dez9p46dffwbkztaistanejdkj7ibf X-Rspamd-Server: rspam03 X-HE-Tag: 1740649280-728966 X-HE-Meta: U2FsdGVkX19szTrWmn9spbUqWtnAnB28dmXd5aezYlCIAKrUrb4xcac6g9SmWv5mic4EyMJKn0MwVMJ51YniTBaKNePJwzfLblIcFdq+zV8eZM3Huiw3DzIU6W9lzGEkULuSI/8md6j2PNd1mQkEek7rOPhfzRovvWb0nYK0DKrM+ZT61gI5QGTSBFN32SNcii5GgpefPvNCJqY+4XODjRRht3NV/tt9FFoEKitBjxKJ8qNWXVn14y0GhUcJp1FCchntDe93Yy5qQBmOCD8M1Bni/haEKe8+WudsZrByx1K0QguB8Or66ze1RREj61UVm1qSaDVc0y+r2vVWX+oF+5w6LROWFMYDz5YAFdxlA2HnCMxsCcYTa6R7hq26JjcP4VAp3zoz7hwkE7SYljTyUMDe3eKv/Nxj22gdBvhMPCfSLgxD7LEATugwS1BwP8O/JklnEc7ZfrDEn7snVufu2ZEqSZC1ABmgYHB9HyNtYOBqECgXEZLaO61tB2VCbK10bnNoKzbnS7GDv6WmrmSRyfvOh1naX54ChBPQpmKItfgLWvnKhpUNqpV1GLhuFqiQpoY7Eq4MKhKzU9oith1Vrn3roke7ooainPsfhWAA834/9OMyokOYdIuBah0IgmH0ov+gijnvO7VBB9axm0PxDP1gugg7ZSeNaGkbBrkjCPi/A425YRlUFIXE3j8trQHF5zQ72ZR99YLZL5ejZrnp783tzqzoB0Q8MW/J6X/y6VZHPjzRe77moWMoV/0VvICZ3BSZ1SiQS3k7gq/DDD3gLGIK0uBsNduvL9+gFcAnXvmuSFzr2Ww5vLNPWK8YxUlR7ZLEj28lwqYTahWMl7r5jqlYECEyqrDJmksMxIDa9XrImO8SdGJCOmdtKI8in1TZr5O0Jrq1rfHvrL0CVwQTH4wDl+gP77YOZAiRYrl1GdPpxF5+f3qKFEEA7X25Di5yzCbpGpyzWPj2ioJeW9T pLzUAqfb mxD0ptjcv/cedjnkjkJOjdLL+Jz/gBFmo06y6LOghkOAHTzo3yBhJHl6WuTnwKIUcEuJ5cZwJZFTw8QI6RjIkCahjtxc0nFdvN+n47InJ6o/AnHi6kX+fvYUYxMTChkTMvi5ueb0mEL3NMeI8zmJS2NFZfhOuoPxNDqv8QisjWTtPjxYRb0ioXY/ZK+TdM2NVSzkGqauAkyIIS7F8q7gWrmjWdkz+OzLIlTkGVNnNRo2EbyPHppgX4XbUv03RB9EvGfYJt5A2OwcBjUrohAuRDDAnTHWAXIkzjeBaxCEtDJHRGrOKG49XACJhsg3PbqtG/AbWniqBHSr/JBSTP0s2mXLrk45Wt38Ea+Xoa/4k1L/uGHQ/vfglKPoef0fQJ8wR2fIsgW7L88DqZbpd6YQuXr6Is5Xqqa7g1VNBXDp5kwvybTWRthW/rJBgNcVYmBhvAcBRi66VWgiXSqX6Bp7H7a+EHIxTr8V5YmwGnMvMka21MfE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 02/26/25 at 06:46pm, Michal Hocko wrote: > On Wed 26-02-25 23:57:48, Baoquan He wrote: > > On 02/26/25 at 01:01pm, Michal Hocko wrote: > > > On Wed 26-02-25 19:51:12, Baoquan He wrote: > > > > On 02/26/25 at 12:00pm, Michal Hocko wrote: > > > > > On Wed 26-02-25 11:52:41, Michal Hocko wrote: > > > > > > On Wed 26-02-25 18:00:26, Baoquan He wrote: > > > > > > > On 02/26/25 at 07:54am, Michal Hocko wrote: > > > > > > [...] > > > > > > > > In any case 96a5c186efff seems incorrect because it assumes that the > > > > > > > > protection has anything to do with how higher zone is populated while > > > > > > > > the protection fundamentaly protects lower zone from higher zones > > > > > > > > allocation. Those allocations are independent on the actual memory in > > > > > > > > that zone. > > > > > > > > > > > > > > The protection value was introduced in non-NUMA time, and later adapted > > > > > > > to NUMA system. While it still only reflects each zone with other zones > > > > > > > within one specific node. We may need take this opportunity to > > > > > > > reconsider it, e.g in the FALLBACK zonelists case it needs take crossing > > > > > > > nodes into account. > > > > > > > > > > > > Are you suggesting zone fallback list to interleave nodes? I.e. > > > > > > numa_zonelist_order we used to have in the past and that has been > > > > > > removed by c9bff3eebc09 ("mm, page_alloc: rip out ZONELIST_ORDER_ZONE"). > > > > > > > > Hmm, if Gabriel can provide detailed node/zone information of the > > > > system, we can check if there's anything we can do to adjust > > > > zone->lowmem_reserve[] to reflect its real usage and semantics. > > > > > > Why do you think anything needs to be adjusted? > > > > No, I don't think like that. But I am wondering what makes you get > > the conclusion. > > > > > > > > > I haven't thought of the whole zone fallback list to interleave nodes > > > > which invovles a lot of change. > > > > > > > > > > > > > > Btw. has 96a5c186efff tried to fix any actual runtime problem? The > > > > > changelog doesn't say much about that. > > > > > > > > No, no actual problem was observed on tht. > > > > > > OK > > > > > > > I was just trying to make > > > > clear the semantics because I was confused by its obscure value printing > > > > of zone->lowmem_reserve[] in /proc/zoneinfo. > > > > > > > > I think we can merge this reverting firstly, then to investigate how to > > > > better clarify it. > > > > > > What do you think needs to be clarify? How exactly is the original > > > output confusing? > > > > When I did the change, I wrote the reason in commit log. I don't think > > you care to read it from your talking. Let me explain again, in > > setup_per_zone_lowmem_reserve(), each zone's protection value is > > calculated based on its own node's zones. E.g below on node 0, its > > Movable zone and Device zone are empty but still show influence on > > Normal/DMA32/DMA zone, this is unreasonable from the protection value > > calculating code and its showing. > > You said that in the commit message without explanation why. Also I > claim this is just wrong because the zone's protection is independent on > the size of the zone that it is protected from. I have explained why I > believe but let me reiterate. ZONE_DMA32 should be protected from > GFP_MOVABLE even if the zone movable is empty (same as if it had a > single or many pages). Why? Because, the lowmem reserve protects low > memory allocation requests. > > See my point? Is that reasoning clear? Very clear. But now the protection is calculated node by node. Please think about one case, Node 0 only has ZONE_DMA and ZONE_DMA32, Node 1 and 2, 3 ...N have NORMAL_ZONE and MOVABLE_ZONE. How could ZONE_DMA32 be protected from GFP_MOVABLE? Linux kernel has restriction on the node layout where Node 0 can't do this? Especailly on arm64, there's only ZONE_DMA and its boundary is not fixed some time, what if system vendor arranges the Node 0 only having ZONE_DMA? Secondly, the existing protection ratio was created based on the old x86 system. It may not be fit for the current ARCH, e.g arm64, it only has ZONE_DMA which is under 4G by default, the default ratio obviously not suitable any more. And we can clearly feel that the current protection value is for __GFP_THISNODE allocation. ====== /* * results with 256, 32 in the lowmem_reserve sysctl: * 1G machine -> (16M dma, 800M-16M normal, 1G-800M high) * 1G machine -> (16M dma, 784M normal, 224M high) * NORMAL allocation will leave 784M/256 of ram reserved in the ZONE_DMA * HIGHMEM allocation will leave 224M/32 of ram reserved in ZONE_NORMAL * HIGHMEM allocation will leave (224M+784M)/256 of ram reserved in ZONE_DMA * * TBD: should special case ZONE_DMA32 machines here - in those we normally * don't need any ZONE_NORMAL reservation */ static int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = { ====== So my thought is we either have two-dimention protection value, one is for __GFP_THISNODE allocation, 2nd dimention is for FALLBACK allocation. struct zone { ...... long lowmem_reserve[MAX_ZONELISTS][MAX_NR_ZONES]; } Or we count in one higher zone's amount in all nodes when calculating the proction value for lower zone, while the formula need be adjusted because the one zone's calculated page number could be huge and the the protections value could be bigger than the lower zone's page number. Or leave it until one true problem occur, we can consider to fix it accordingly. Any one is fine to me. > > P.S. > I think we can have a more productive discussion without accusations. Yes, we can, and I have no doubt about it always, no matter when and with whom.