From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F8C3C30658 for ; Tue, 2 Jul 2024 07:25:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1C4B36B009C; Tue, 2 Jul 2024 03:25:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 175856B009F; Tue, 2 Jul 2024 03:25:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 03C686B00A0; Tue, 2 Jul 2024 03:25:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D84516B009C for ; Tue, 2 Jul 2024 03:25:30 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 82643162010 for ; Tue, 2 Jul 2024 07:25:30 +0000 (UTC) X-FDA: 82293977220.14.D5F317E Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) by imf22.hostedemail.com (Postfix) with ESMTP id A22C4C0015 for ; Tue, 2 Jul 2024 07:25:27 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=BFqTMXD7; spf=pass (imf22.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.15 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719905117; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7l0J7n2jswatO14pMyaFrdjZiqPf0eg5BgnuOTo7pwo=; b=ZMtDYFeA1aafnm0NxuquElLb3Q/gWiAuMe/vBRzn0fqo8Tq2WsI5Nhu4ACCJGH3yjUIA3o u3C/u4Lu1+VbDkBpl5naItAK0bj/dCyT85mgX7BzfN4plV4ZSFJXrxoc4Qe85VtHFWJG/f YKV5aLCLOLzg4NbC0RFuHs4B0ZQz0yE= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=BFqTMXD7; spf=pass (imf22.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.15 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719905117; a=rsa-sha256; cv=none; b=kXUliK0zC/j6zAxbIx9Nd9c2YfyVFthU3tTg8mQvzGI9fuS9ptlJVsWCE4TkaDy6uuaKcM Z3VDsCDof+wv4vBIUi9rXoZAlc/Oet7upeCz3qp2Un6kSr6J1tTGYwKf1tYVmbvZv+HtPp YxX509eS6hrVTWOabzyiwas/szEC4e8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719905128; x=1751441128; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=nJFk7yYztkfJCktEUJ1ohT7qv8cWxgUhp+N9SDOxzxQ=; b=BFqTMXD7NhT9cQVxaKaLNEyRtVZ4Wm2BEKqxmAWi0DxALcAyrfriWxSa rDruViZlwkCSBdvEzQedeEDQwRaZZslR/GEfiTNZypdNbfxW/CDLQ6yMC 95AhKU5IR2y0OEii7tFWM55yPodu92t6k+a0zCE0kgpO38w+G4BEKd0Vi zf7XkHOtvvMADyrMaZTxurSg+oAIDriwJpy6WDRsP/agUil676grGl3M6 SrdDKgU+2OIZh7nzPPllmhRz2TLSaAThmDLPcz5UMXuQ+5psxGBmKjVak DK1eKfKXlXCXjUwKBFWLCfJ0Vs8m0yLP/eg4ruw8ID8sFe4ZefPTc9vfT A==; X-CSE-ConnectionGUID: 465eIo1ZQ9Sy0LASze/lXQ== X-CSE-MsgGUID: kIhnuslMS8uD95Cx6C0YaQ== X-IronPort-AV: E=McAfee;i="6700,10204,11120"; a="20822422" X-IronPort-AV: E=Sophos;i="6.09,178,1716274800"; d="scan'208";a="20822422" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2024 00:25:26 -0700 X-CSE-ConnectionGUID: 20P1gKeIR2WteqwuWOG7bQ== X-CSE-MsgGUID: PVodTDLlScefRrIYLy4TJQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,178,1716274800"; d="scan'208";a="46225336" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jul 2024 00:25:24 -0700 From: "Huang, Ying" To: Yafang Shao Cc: akpm@linux-foundation.org, linux-mm@kvack.org, Matthew Wilcox , David Rientjes , Mel Gorman Subject: Re: [PATCH] mm: Enable setting -1 for vm.percpu_pagelist_high_fraction to set the minimum pagelist In-Reply-To: <20240701142046.6050-1-laoar.shao@gmail.com> (Yafang Shao's message of "Mon, 1 Jul 2024 22:20:46 +0800") References: <20240701142046.6050-1-laoar.shao@gmail.com> Date: Tue, 02 Jul 2024 15:23:32 +0800 Message-ID: <878qykntor.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: A22C4C0015 X-Stat-Signature: qp4wnrdaiqbqo1naiyuxwi7ge7qxdd6f X-HE-Tag: 1719905127-525661 X-HE-Meta: U2FsdGVkX1/wVK1HAS72WjMdZlUaWGp5n0iu2wAck4eh3rT+6G0IhtxIW7fq6FFr38STLnA4c8DR5muSlNBYvibzBDDlqTbONZEibNYXTdW/3yV85UQp3i0LwXP42cnjHbeUEYi1QlwvuJiHwZpgE/VnrpLtcwTJ7ObwaxUxdvuMY4jXeavYyOX/IdF0Mq5wya5Az5RKNOP/MfHdnXfsYh4wP30XekhFwKvMumCaRabGqmcuW93mpajEL3YUg/9QUSgWuKw/zk6o3sDiuV2Zd5/z/XNPuWZP72P+tyJ/qcKwbacx+KOMnLTHV/X071JhDW6EpwnUVcGZmT3NowzaVGV00gy7N40g1dg2vz1XvC+so+tKT+TuD7qKUYprh3rZBuHrlDiVLo7+i1TQwq9K33ZDBGwNNR2ipo9+grnnXc3jw9duT8+yKIBv+kDHWb6pdzyY7cdP4Ein4jndFeVnkfkcsLDdGcXgf0CGMFo4zIKdLGpWe/3ZKLUJ7lDCl3Knx8TLtrFwAazCnunzwfiVKacB1es5Fcu/pjfPJQ6z05ND+5BXZJpPR1w8N3MA/AN1F+lVqk+NunpxOIZqoe7o/2Ee+HvrXQ2j+IHOEGGCWYJflrQyU58cROS4qEzeu4QCCk3wHv2x3X0MgWXZavAhyWpyxYhs5fV4uBfWnM9ELumIGguiGnog09nReQP7XCFvzK8ZNoTsVV/03gwnpGltepqcWsbk6bRRLlCbUWe6e8WTOCghWI1VRlwFnIP8OjRpt/OFcRxiGEgD4qicw7+GY0VtSNGxGMsM7v7PwzEv0ltDlUcuj/5kkMnD9yGBUjN/jiEgpEWJes9/PBKc+mrntIx9K6NKdhD9RhAHvVdpsjR1+Api/f/yXtwWuVnz5VDu+L0wn2BBFDgRU0fY/d2lW+Gp+dx2dYDkDT/qnQ5f6kCkKzwT3WayEK3pffZsjmzrH0e3dkoQkmaOUY4ZYYe mPrb1jRG Whgg8/F3E2KjjCVYxj1+d+C4xQTMUA23shtbRm9EKIkegDYqGip2uebI9TB2TRNENAx6cPM+1uo59ZGGXZWFIYoyPhO7dysPsdQT5P3IvIskyc+m7pFVae9bjq9gNmGZssg++lo1XYX8FYQbZOJs9XeJZdvxHYRoodnW0h5tbD3u+wwySCu+fWpWIWudPgpDj1h1/Tueca15c2yQjLuIvITMgXIyJTS4661PCTtdVN8pIKWI8RJhteCy2MofCHMIryP7w579r3Ki09fThmWM5utQx2AUCuCItAPXi2t3V0Cr0B+mLkHKEt2Cbyyz70IlS8PZ2p0uoL+eS6l9LdYqYUb7DJFQzhf3otpYl8ZeXtOv429lF6kPY2K11MO4eYz5tw4rmNNTStjYiHZk9xCUycCElywG6a1qA0xLk3vdxYHlTn/hBdmGjXBK9ZIPpUc+p2EHm32Lbwlh60yiBWqhku0pvX3MT93QkqciZyaalBIh9f9Hi4ZeCn13cMzvEiQAl9Emh X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, Yafang, Yafang Shao writes: > Currently, we're encountering latency spikes in our container environment > when a specific container with multiple Python-based tasks exits. Can you show some data? On which kind of machine, how long is the latency? > These > tasks may hold the zone->lock for an extended period, significantly > impacting latency for other containers attempting to allocate memory. So, the allocation latency is influenced, not application exit latency? Could you measure the run time of free_pcppages_bulk(), this can be done via ftrace function_graph tracer. We want to check whether this is a common issue. In commit 52166607ecc9 ("mm: restrict the pcp batch scale factor to avoid too long latency"), we have measured the allocation/free latency for different CONFIG_PCP_BATCH_SCALE_MAX. The target in the commit is to control the latency <= 100us. > As a workaround, we've found that minimizing the pagelist size, such as > setting it to 4 times the batch size, can help mitigate these spikes. > However, managing vm.percpu_pagelist_high_fraction across a large fleet of > servers poses challenges due to variations in CPU counts, NUMA nodes, and > physical memory capacities. > > To enhance practicality, we propose allowing the setting of -1 for > vm.percpu_pagelist_high_fraction to designate a minimum pagelist size. If it is really necessary, can we just use a large enough number for vm.percpu_pagelist_high_fraction? For example, (1 << 30)? > Furthermore, considering the challenges associated with utilizing > vm.percpu_pagelist_high_fraction, it would be beneficial to introduce a > more intuitive parameter, vm.percpu_pagelist_high_size, that would permit > direct specification of the pagelist size as a multiple of the batch size. > This methodology would mirror the functionality of vm.dirty_ratio and > vm.dirty_bytes, providing users with greater flexibility and control. > > We have discussed the possibility of introducing multiple small zones to > mitigate the contention on the zone->lock[0], but this approach is likely > to require a longer-term implementation effort. > > Link: https://lore.kernel.org/linux-mm/ZnTrZ9mcAIRodnjx@casper.infradead.org/ [0] > Signed-off-by: Yafang Shao > Cc: Matthew Wilcox > Cc: David Rientjes > Cc: "Huang, Ying" > Cc: Mel Gorman [snip] -- Best Regards, Huang, Ying