From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 587B9C36010 for ; Fri, 11 Apr 2025 06:15:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9533280174; Fri, 11 Apr 2025 02:15:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C434028016E; Fri, 11 Apr 2025 02:15:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B3056280174; Fri, 11 Apr 2025 02:15:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 957D828016E for ; Fri, 11 Apr 2025 02:15:50 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BB5021CD7DD for ; Fri, 11 Apr 2025 06:15:50 +0000 (UTC) X-FDA: 83320752060.06.AB153C3 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf02.hostedemail.com (Postfix) with ESMTP id 2BB1E8000A for ; Fri, 11 Apr 2025 06:15:47 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=hJWousWx; spf=pass (imf02.hostedemail.com: domain of ying.huang@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=ying.huang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744352149; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S0VfXWGHPH5LSwgQh/k4+cmjq4OOTI5xXI/EGgIhxWw=; b=iY5V0WtyDr0lH7wSiFxRtE4BuyjNIH18Fl2DQgs44d7lqfTUkdcZv43k/DKYWyQ5mDrkGJ c8GbhKwmOi3Wktb+x51TXtbHNTeT95kOpnZOiNbSiaRE+6SpI/0bMaKimWPlHmtumnPSu1 lPSBgFvHhrwI5cVfgPY7R+f5QDPLDjI= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=hJWousWx; spf=pass (imf02.hostedemail.com: domain of ying.huang@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=ying.huang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744352149; a=rsa-sha256; cv=none; b=qXMulPaqi6Q3XmzNaQlJ4NMAWfs6cBvesDbVaazcffKYe1hEsXmzDA66VlhEFrEXVGerm+ GWhhMb0/NwzW3RyUkm7I0h3FmGX5YzEL1ffs7zJiYfsALOOou5VjPRfoBY7yMXcnqZsSzD AQcsIA5o19PmXmGQ2vhOIcZejJzfBzw= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1744352145; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type; bh=S0VfXWGHPH5LSwgQh/k4+cmjq4OOTI5xXI/EGgIhxWw=; b=hJWousWxp6l3O65ttUHxjvq4TQel3GKT+S2++RMiQggwM8cIKpQeoCCw7qT3/K7Ee7qOBaZi9bhG+tkLZik8cbzXTlUQNPy8BaIt3FyJsdcPg8GciPguX4HYuNn/ZF/NC4eyMUVPwWQR8DGyPi+Jc6qPFNdXr3SWDtbONkbbX+s= Received: from DESKTOP-5N7EMDA(mailfrom:ying.huang@linux.alibaba.com fp:SMTPD_---0WWSVhJv_1744352142 cluster:ay36) by smtp.aliyun-inc.com; Fri, 11 Apr 2025 14:15:43 +0800 From: "Huang, Ying" To: Raghavendra K T , Nikhil Dhama Cc: akpm@linux-foundation.org, bharata@amd.com, raghavendra.kodsarathimmappa@amd.com, oe-lkp@lists.linux.dev, lkp@intel.com, Huang Ying , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mel Gorman Subject: Re: [PATCH v3] mm: pcp: increase pcp->free_count threshold to trigger free_high In-Reply-To: (Raghavendra K. T.'s message of "Fri, 11 Apr 2025 11:32:08 +0530") References: <20250407105219.55351-1-nikhil.dhama@amd.com> <87mscn8msp.fsf@DESKTOP-5N7EMDA> Date: Fri, 11 Apr 2025 14:15:42 +0800 Message-ID: <87mscn5ilt.fsf@DESKTOP-5N7EMDA> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 2BB1E8000A X-Stat-Signature: ijsitj3ubobjwaxjaj791q46wnyiq3i4 X-HE-Tag: 1744352147-470377 X-HE-Meta: U2FsdGVkX1+kO7iTqFjBmt1hkWROuqmnSvSZbYxPI1AvBvpKh2AwMwptO/02MxuYQclEN484pCvpmdePLI1/2ob8Xo66fPe2F8EE1qRKrfYtDe+xmeDfLToWN40nBgStMkfKKdPTjeX2Upu5cJgXflouO8a8zSW1l26NPRQQZIG78s5DKBpIX/yvYNmALpjE5WsPuiQ8zVvSV14y4hmTqS0hMYPARLxHsLMBbm44/loQ9GST0ilVsdbMh/sU2fdMCrXG+3fMGxIE+0+0YAk9gWYR8L/m9Ev8rgu8rOZTSykWaFZ1kfIZyCw2cHw63DMc9UvyfiZdozuH8Zlfjo4C6CZL+gu4dE4Fqm2ZxIBguiBLGT+r+J33Frh0bvytEUe5iyOrC8fx0Sp4SB+HZ1xw+wYFU0a2kGS1KRyJbOzYn7ze2FLXXK3gPwgDLPmSxNwtra9D5TvrYtGVLLhTDLgaGMu39iDn2gRGGOwSTpeodQZdFQs1KyEo+HKQNJj+yBE0RndRwgNBPHE0s8uSxzGin11TX754NUxoA0GwMceSEJ5GQSTuUHQKetAyolmMZyy0fguU2mAHzHlEQtWvTf+4OPLqdbQO4SvIpu0cStjyd2XsUNxoclLDiT0t7lFZtD4niU1LKAE+3cHNkmzTOtoLogIrHFxOVunSVydsdppnJMCebGT3poww6c0MRd7tQwEI6BwpFwjVr0cN5DjVcapUqLtd1q62Lm5UkSFFXtrHbx+nreh6ZbI9kVbJ04RiP8OKVWaVR/tlslknapNzKtqzS3+Zp2G5TFzDPqkot6MhQEwmjGa+TzylYka+PlwQT0b7cK0haM6ndHbnPgkhsrKeuBOVO+N/aOSrGzHQk4/j7XVgSGwD3xlgyNR0HXmIBiKIS8p8dB4ogtUqG4Vbl2dDC+XlQqec/Segxg4BHJ54l8X7AcKicisHx5rnoAXZkXZfX8ysI9oTloBoN3mwW6/ SHQDW76v zCu3fPDGUs3r80dWp87oGPvrPUEReKXMlX8sGajqFzmzYQKp8XsNgMUsPOZVoS20MlDVxYb5qkkixubX/kuikcARbqvl0zW7Tv+cfuopsGYgCEyTWw9na1zijS5cHsUL75m1u/KMM4Lvm8cJ45z0lOQX8CC3G2qVD/bG6u0b4U2GBG3ddJ0NeMQzNGyKp9tSPic+WkTq9hgPExrIDg9UDKll53kayqziz4BkUnH/vn7gppnrrJiYU0KYoOUfdJ8PduX1yMKgY1qmlMjsslXpz0jhkT0qQejB+LKIZexaOY2oDncgTsvRKaz+cH7VSidY+tnKG+m+yc76xkiRV/Cxwt5oqt3Q2OZbwl6Ucq0ur5BMWJeGHsUJYY2ScCg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Raghavendra K T writes: > On 4/11/2025 7:46 AM, Huang, Ying wrote: >> Hi, Nikhil, >> Sorry for late reply. >> Nikhil Dhama writes: >> >>> In old pcp design, pcp->free_factor gets incremented in nr_pcp_free() >>> which is invoked by free_pcppages_bulk(). So, it used to increase >>> free_factor by 1 only when we try to reduce the size of pcp list or >>> flush for high order, and free_high used to trigger only >>> for order > 0 and order < costly_order and pcp->free_factor > 0. >>> >>> For iperf3 I noticed that with older design in kernel v6.6, pcp list was >>> drained mostly when pcp->count > high (more often when count goes above >>> 530). and most of the time pcp->free_factor was 0, triggering very few >>> high order flushes. >>> >>> But this is changed in the current design, introduced in commit 6ccdcb6d3a74 >>> ("mm, pcp: reduce detecting time of consecutive high order page freeing"), >>> where pcp->free_factor is changed to pcp->free_count to keep track of the >>> number of pages freed contiguously. In this design, pcp->free_count is >>> incremented on every deallocation, irrespective of whether pcp list was >>> reduced or not. And logic to trigger free_high is if pcp->free_count goes >>> above batch (which is 63) and there are two contiguous page free without >>> any allocation. >> The design changes because pcp->high can become much higher than >> that >> before it. This makes it much harder to trigger free_high, which causes >> some performance regressions too. >> >>> With this design, for iperf3, pcp list is getting flushed more frequently >>> because free_high heuristics is triggered more often now. I observed that >>> high order pcp list is drained as soon as both count and free_count goes >>> above 63. >>> >>> Due to this more aggressive high order flushing, applications >>> doing contiguous high order allocation will require to go to global list >>> more frequently. >>> >>> On a 2-node AMD machine with 384 vCPUs on each node, >>> connected via Mellonox connectX-7, I am seeing a ~30% performance >>> reduction if we scale number of iperf3 client/server pairs from 32 to 64. >>> >>> Though this new design reduced the time to detect high order flushes, >>> but for application which are allocating high order pages more >>> frequently it may be flushing the high order list pre-maturely. >>> This motivates towards tuning on how late or early we should flush >>> high order lists. >>> >>> So, in this patch, we increased the pcp->free_count threshold to >>> trigger free_high from "batch" to "batch + pcp->high_min / 2". >>> This new threshold keeps high order pages in pcp list for a >>> longer duration which can help the application doing high order >>> allocations frequently. >> IIUC, we restore the original behavior with "batch + pcp->high / 2" >> as >> in my analysis in >> https://lore.kernel.org/all/875xjmuiup.fsf@DESKTOP-5N7EMDA/ >> If you think my analysis is correct, can you add that in patch >> description too? This makes it easier for people to know why the code >> looks this way. >> > > Yes. This makes sense. Andrew has already included the patch in mm tree. > > Nikhil, > > Could you please help with the updated write up based on Ying's > suggestion assuming it works for Andrew? Thanks! Just send a updated version, Andrew will update the patch in mm tree unless it has been merged by mm-stable. --- Best Regards, Huang, Ying