From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04C41D1CDA1 for ; Tue, 22 Oct 2024 06:40:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 20F406B0083; Tue, 22 Oct 2024 02:40:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C05A6B0085; Tue, 22 Oct 2024 02:40:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 087F86B0088; Tue, 22 Oct 2024 02:40:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id DC8986B0083 for ; Tue, 22 Oct 2024 02:40:36 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 15827A064F for ; Tue, 22 Oct 2024 06:40:07 +0000 (UTC) X-FDA: 82700288916.07.A8C1C12 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by imf28.hostedemail.com (Postfix) with ESMTP id 758E8C000F for ; Tue, 22 Oct 2024 06:40:17 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=GTv3dPQI; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.13 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729579034; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Eqy+TpN5hTJqSwAKkLTUM5WjUtmP2nEykrvx5PS0eXw=; b=0N5vM2k+HUsAf8vklt2YLVbtOaaCpyO+2iLW9xaz9Wwd7ukm8rfRvQhKe+vs6J2SjrKlQq 3r4usCJD3uwp3Z/XvC0mkDwtLbemUMEqVKbOA2PWOB//8RcSPt+Jc+Ei1wvPyG1088ajoi eka0BrlLceuXuOz1HMqG4jPSUp8BznU= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=GTv3dPQI; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.13 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729579034; a=rsa-sha256; cv=none; b=ujp7JiRsGpVNsJDv8/B+04WFXVUjRj0anFVfm2diuiseiqKfQ+3RtboKYJ8wZroTbg35iP /2i77nMpLvgtj7hT3Al/Ms6TVoUvoiCjOhpeukj6UH1fdH73zTgIxRPZ3d4t9qLCLkTJWm d8Yp08P4C0Dx7CMeQbbFfvKTKqRsEBg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1729579234; x=1761115234; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=ObS/CD1nC9BsnXstuf+6HnMfJ4rOcKFcdJkNNFG/3VA=; b=GTv3dPQIGtbDEdSUssTj78cDwLOXtaKesTCelXxS9GwDSs6ZDdBM1Lat UQ5437oaYdkI5Pnp7KUdYWAXzPeGfJe9d5zNMnYfXSoDIrmlTcxiqgtF1 5Sp51LFC5H1y6Tdw7pR6/+coEYHpk/lOPz6Byg2LZIU05cjAw6xrnidLW DsO4bRMzDJfzr/HrTX2PeQemz3lfiKo02mzbNgtw39JgQT3tnmmDCMu7/ o9nSo4lBDLUHnq8dywHExChLYf+K3fn61LxD6CjAPQgU92aD+JhCOncHy TlP3wHAgX21idbvGb3qgbu0MgyRj831blBVlkVkchurYME7WLans9fpPO w==; X-CSE-ConnectionGUID: CnU0JNcRTjC0NsbtWxd5qw== X-CSE-MsgGUID: VyUQW4SSSmqOpHKMlUuPwA== X-IronPort-AV: E=McAfee;i="6700,10204,11222"; a="40213392" X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="40213392" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Oct 2024 23:40:32 -0700 X-CSE-ConnectionGUID: vn4ZM4mfTwCfPsfjBRxIrg== X-CSE-MsgGUID: d6Oc0JBjSj2aCIRY2JkM0A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,222,1725346800"; d="scan'208";a="79762777" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Oct 2024 23:40:29 -0700 From: "Huang, Ying" To: MengEn Sun Cc: akpm@linux-foundation.org, alexjlzheng@tencent.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mengensun@tencent.com Subject: Re: [PATCH linux-mm v2] mm: make pcp_decay_high working better with NOHZ full In-Reply-To: <1729574046-3392-1-git-send-email-mengensun@tencent.com> (MengEn Sun's message of "Tue, 22 Oct 2024 13:14:06 +0800") References: <87msix6e8c.fsf@yhuang6-desk2.ccr.corp.intel.com> <1729574046-3392-1-git-send-email-mengensun@tencent.com> Date: Tue, 22 Oct 2024 14:36:56 +0800 Message-ID: <87v7xk4p9z.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 758E8C000F X-Stat-Signature: xbq75tmemqrkyra3519stg9cfc96imm7 X-Rspam-User: X-HE-Tag: 1729579217-292704 X-HE-Meta: U2FsdGVkX19JUFxFymT43Pg/hCgVfK5jvoxVWpRLBwWMrOTw6AcpWCxHQNIrUerjQATckcuryO1AjOAdWLRQQt4cfCEl9Yd/f5di3hUInm0vs2U9YmBMtfOdd/FUA3hT8l0iN5Z8awBVli9z+2dvKt8xR90KO+zXpOQ9Xgr+9Yb/mfO0uwC3beZ7rrNVvUIOSsyQ5LfWlxIC2IXFCikjfvp5B63UyMUh+sA5OslUGLJlH1E3QAEq0AqdpoBGGw3dqLpERXuvzPcmQ8mvQstf6PPuZnNp6KE7vW2/d+XqXvGQJpGwvWjkUst3dNq4v1CJ35+lCOGz14/Z6rQnnV5LlS7VG6lATuYiM+3jrzIEsoSgTE8ShUM4nBJn/fWttSQxroIN226dJaI9h08U0oWpXnJO5Vg1Ut7ng3j9zFblEhLjjLYvu/LRe41wjLFSnL9wAp/eXHgKiX5kR72pj/tYd+pZ9M8Sp4mCjuVzzv3E/wenr8JDX9oQXsANzlULsz3rD1W1nKnDy2djRLYHbNBrOpWtgra4k/qWoQOFmyYEWrI0y7xcyetEk3kdmXhHVOm7V92TLC5c1EyBE7eMint6iCHhyOABMEsMzXQ+If+ydY6wW9WMWO6XXXzF6jiIMMgDNdS821vIh80cnJNTTY9T7PgRHG0IOUXAIBpQJ47JvqjvpZCZEwDtR1z9XUZgg4bJIi9jWNRE60gqYklw537cuYMl8TTREMYn52aMvJ99bRmy9NbQxmMxDnSUK+292AruOwu2hGoR1ExsnUwd7ooNVSH0v2zEvkkfMVHNv4TIWBgDh7gJGX6/yu57HsVsTGuv0TsiT6BlUjFHQiTlSwetAd1El6mWJMPXRXOYwqtrDNfLJbuvJoUajOFe9QOKUQqH6yWtuPV+IrMoqJXIsBDdrBCe1ZczvG2zOWKegrfGNVIsRqIt+txpvtjc6p4IC/YzHkMrXXVHOLP9HxafPCi Eeg8IBsN KTmujjnv1FQl3Hyqb9AmN7SWRYLKDIyfflNMjOW2GbG4xmkLzgT/hluzbFOhWdppm9Sh1U/7E3qPSd0c5jm6HBzSmwR5hdkOERe3RJ+1c0sMEnLFqK7W/xyoOt0RX/s+mU/ziOHO/P/jxSIYVk5eaV6pU70ldQhIeafSnRe0Ko9k48mtxJ+N70tlK31orlRqbUL+gg3TTfAusR9PvYKyn3pPE51dEmuTs8O3x4Ai+qLaGVsIRC8N8M26wd2wrJrjbSpKY77bR+ji29asBTmUIUMDzirBQuRVzzrEE+d88Av9o6hiwkW4Mliqt/YRYpjr5LYGWZOHwtJUz3J9czanVYoHjAqT6XqPWv2aaVueuf7Q67CU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: MengEn Sun writes: > Thank you for your suggestion. I understand and am ready to make > some changes > >> >> Have verified the issue with some test? If not, I suggest you to do >> that. >> > > I have conducted tests: > Applying this patch or not does not have a significant impact on the > test results. I don't expect some measurable performance difference with the patch. If we can observe that the PCP size isn't tuned down to high_min before and is after, that should be a valid test result to show the value of the patch. Can you try that? The PCP size can be observed via /proc/zoneinfo. > perhaps my testing was not thorough enough. #^_^ > > But, the logic of the code is like following: > CPU0 CPUx > ---- ----- > T0: vmstat_work is pending > T1: vmstat_shepherd > check vmstat_work > and do nothing > T2: vmstat_work is in unpending state. > > T3: alloc many pages > T4: free all the pages allocated at T3 > T5: entry NOHZ, flushing all zonestats > and nodestats > T6: next vmstat_shepherd fired > > In my opinion, there are indeed some issues. I'm not sure if there's > something I haven't understood? > > > By the way, > There are two other questions for me: > Q1: > Vmstat_work is a **deferreable work** So, It may be delayed for a long time > by NOHZ. As a result, "vmstat_update() may not be executed once every > second in the above scenario. Therefore, I'm not sure if using a deferrable > work to reduce pcp->high is appropriate. In my tests, if I don't use > deferrable work, it takes about a minute to reduce high to high_min, but > using deferrable work may take several minutes to reduce high to high_min. It's not a big issue to take longer time to decay pcp->high. > Q2: > On a big machine, for example, with 1TB of memory, the default maximum > memory on PCP can be 1TB * 0.125. > This portion of memory is not accounted for in MemFree in /proc/meminfo. > Users can see this portion of memory from /proc/zoneinfo, but the memory > reported by the `free` command is reduced. > can we include the PCP memory in the MemFree statistic in /proc/meminfo? This has been discussed before. https://lore.kernel.org/linux-mm/20220816084426.135528-1-wangkefeng.wang@huawei.com/ https://lore.kernel.org/linux-mm/20240830014453.3070909-1-mawupeng1@huawei.com/ >> > While, This seems to be fine: >> > - if freeing and allocating memory occur later, it may the >> > high_max may be adjust automatically >> > - If memory is tight, the memory reclamation process will >> > release the pcp >> >> This could be a real issue for me. > > Thanks, I will test more carefully for those issue > >> >> > Whatever, we make vmstat_shepherd to checking whether we need >> > decay pcp high_max, and fire pcp_decay_high early if we need. >> > >> > Fixes: 51a755c56dc0 ("mm: tune PCP high automatically") >> > Reviewed-by: Jinliang Zheng >> > Signed-off-by: MengEn Sun >> > --- >> > changelog: >> > v1: https://lore.kernel.org/lkml/20241012154328.015f57635566485ad60712f3@linux-foundation.org/T/#t >> > v2: Make the commit message clearer by adding some comments. >> > --- >> > mm/vmstat.c | 9 +++++++++ >> > 1 file changed, 9 insertions(+) >> > >> > diff --git a/mm/vmstat.c b/mm/vmstat.c >> > index 1917c034c045..07b494b06872 100644 >> > --- a/mm/vmstat.c >> > +++ b/mm/vmstat.c >> > @@ -2024,8 +2024,17 @@ static bool need_update(int cpu) >> > >> > for_each_populated_zone(zone) { >> > struct per_cpu_zonestat *pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); >> > + struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); >> > struct per_cpu_nodestat *n; >> > >> > + /* per_cpu_nodestats and per_cpu_zonestats maybe flush when cpu >> > + * entering NOHZ full, see quiet_vmstat. so, we check pcp >> > + * high_{min,max} to determine whether it is necessary to run >> > + * decay_pcp_high on the corresponding CPU >> > + */ >> >> Please follow the comments coding style. >> >> /* >> * comments line 1 >> * comments line 2 >> */ >> > > Thank you for your suggestion. I understand and am ready to make > some changes > >> > + if (pcp->high_max > pcp->high_min) >> > + return true; >> > + >> >> We don't tune pcp->high_max/min in fact. Instead, we tune pcp->high. >> Your code may make need_update() return true in most cases. > > You are right, using high_max is incorrect. May i use pcp->high > pcp->high_min? > >> >> > /* >> > * The fast way of checking if there are any vmstat diffs. >> > */ -- Best Regards, Huang, Ying