From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CDF6C4708D for ; Fri, 28 May 2021 12:12:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CD31B601FD for ; Fri, 28 May 2021 12:12:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CD31B601FD Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 674396B006C; Fri, 28 May 2021 08:12:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 64AB26B006E; Fri, 28 May 2021 08:12:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 512B36B0070; Fri, 28 May 2021 08:12:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0164.hostedemail.com [216.40.44.164]) by kanga.kvack.org (Postfix) with ESMTP id 1FDCA6B006C for ; Fri, 28 May 2021 08:12:12 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A3C395850 for ; Fri, 28 May 2021 12:12:11 +0000 (UTC) X-FDA: 78190526862.06.9F588AA Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf07.hostedemail.com (Postfix) with ESMTP id DC475A000149 for ; Fri, 28 May 2021 12:12:01 +0000 (UTC) Received: from imap.suse.de (imap-alt.suse-dmz.suse.de [192.168.254.47]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id DE8DA218B3; Fri, 28 May 2021 12:12:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1622203929; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vri3iLgnuhvZlNDsWXgWeFnIHTDjEjZ+Fvja7rOAqd8=; b=0E5X5iS2BZy6FaWcZxfwQgKnCD45BvZP6XN2h4g0IsJZXsqF2iY6srPWwVL7JmhPorL+x0 RcCTk1v7dXc7RpD1NQPOpQOfaaD4U5espBE5Co6WWpWZvLBumuxMpZ1QrZynlIOUa6J/SR 42MKh+tsbi/PWSmYIugADPeRqVpzpzY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1622203929; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vri3iLgnuhvZlNDsWXgWeFnIHTDjEjZ+Fvja7rOAqd8=; b=mJwRHXkpMbm3U06RZTGrRETPMoScqMqf2bx9sGLAIY/Q9toomS2UBLiBPAqmQX4Mnm0hMH q59gCIEE6gGY5zAQ== Received: from imap3-int (imap-alt.suse-dmz.suse.de [192.168.254.47]) by imap.suse.de (Postfix) with ESMTP id C30F111A98; Fri, 28 May 2021 12:12:09 +0000 (UTC) Received: from director2.suse.de ([192.168.254.72]) by imap3-int with ESMTPSA id mNbELhnesGCBHwAALh3uQQ (envelope-from ); Fri, 28 May 2021 12:12:09 +0000 To: Mel Gorman , Dave Hansen Cc: Andrew Morton , Hillf Danton , Dave Hansen , Michal Hocko , LKML , Linux-MM , "Tang, Feng" References: <20210525080119.5455-1-mgorman@techsingularity.net> <7177f59b-dc05-daff-7dc6-5815b539a790@intel.com> <20210528085545.GJ30378@techsingularity.net> From: Vlastimil Babka Subject: Re: [PATCH 0/6 v2] Calculate pcp->high based on zone sizes and active CPUs Message-ID: <416f39e7-704a-86d0-8261-dc27366336ab@suse.cz> Date: Fri, 28 May 2021 14:12:09 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: <20210528085545.GJ30378@techsingularity.net> Content-Type: text/plain; charset=iso-8859-15 Content-Language: en-US X-Rspamd-Queue-Id: DC475A000149 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=0E5X5iS2; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=mJwRHXkp; dmarc=none; spf=pass (imf07.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Rspamd-Server: rspam03 X-Stat-Signature: ej1druu64754dm876qq41xim37xjudt3 X-HE-Tag: 1622203921-218313 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 5/28/21 10:55 AM, Mel Gorman wrote: > On Thu, May 27, 2021 at 12:36:21PM -0700, Dave Hansen wrote: >> Hi Mel, >>=20 >> Feng Tang tossed these on a "Cascade Lake" system with 96 threads and >> ~512G of persistent memory and 128G of DRAM. The PMEM is in "volatile >> use" mode and being managed via the buddy just like the normal RAM. >>=20 >> The PMEM zones are big ones: >>=20 >> present 65011712 =3D 248 G >> high 134595 =3D 525 M >>=20 >> The PMEM nodes, of course, don't have any CPUs in them. >>=20 >> With your series, the pcp->high value per-cpu is 69584 pages or about >> 270MB per CPU. Scaled up by the 96 CPU threads, that's ~26GB of >> worst-case memory in the pcps per zone, or roughly 10% of the size of >> the zone. >>=20 >> I did see quite a few pcp->counts above 60,000, so it's definitely >> possible in practice to see the pcps filled up. This was not observed >> to cause any actual problems in practice. But, it's still a bit worri= some. >>=20 >=20 > Ok, it does have the potential to trigger early reclaim as pages are > stored on remote PCP lists. The problem would be transient because > vmstat would drain those pages over time but still, how about this patc= h > on top of the series? >=20 > --8<-- > mm/page_alloc: Split pcp->high across all online CPUs for cpuless nodes >=20 > Dave Hansen reported the following about Feng Tang's tests on a machine > with persistent memory onlined as a DRAM-like device. >=20 > Feng Tang tossed these on a "Cascade Lake" system with 96 threads and > ~512G of persistent memory and 128G of DRAM. The PMEM is in "volatil= e > use" mode and being managed via the buddy just like the normal RAM. >=20 > The PMEM zones are big ones: >=20 > present 65011712 =3D 248 G > high 134595 =3D 525 M >=20 > The PMEM nodes, of course, don't have any CPUs in them. >=20 > With your series, the pcp->high value per-cpu is 69584 pages or about > 270MB per CPU. Scaled up by the 96 CPU threads, that's ~26GB of > worst-case memory in the pcps per zone, or roughly 10% of the size of > the zone. >=20 > This should not cause a problem as such although it could trigger recla= im > due to pages being stored on per-cpu lists for CPUs remote to a node. I= t > is not possible to treat cpuless nodes exactly the same as normal nodes > but the worst-case scenario can be mitigated by splitting pcp->high acr= oss > all online CPUs for cpuless memory nodes. >=20 > Suggested-by: Dave Hansen > Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Maybe we should even consider distinguishing high limits for local-to-cpu= zones vs remote, for example for the local-to-cpu zones we would divide by the = number of local cpus, for remote-to-cpu zones we would divide by all cpus. Because we can expect cpus to allocate mostly from local zones, so leavin= g more pages on percpu for those zones can be beneficial. But as the motivation here was to reduce lock contention on freeing, that= 's less clear. We probably can't expect the cpu to be freeing mostly local pages = (in case of e.g. a large process exiting), because no mechanism works towards= that, or does it? In case of cpu freeing to remote zone, the lower high limit c= ould hurt. So that would have to be evaluated if that works in practice. Out of scop= e here, just an idea to discuss.