From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34E81C433EF for ; Thu, 3 Mar 2022 14:10:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 472728D0002; Thu, 3 Mar 2022 09:10:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 424D18D0001; Thu, 3 Mar 2022 09:10:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 311558D0002; Thu, 3 Mar 2022 09:10:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 246268D0001 for ; Thu, 3 Mar 2022 09:10:42 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EB8D32559F for ; Thu, 3 Mar 2022 14:10:41 +0000 (UTC) X-FDA: 79203260682.04.A44CAFB Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf07.hostedemail.com (Postfix) with ESMTP id 4603B40018 for ; Thu, 3 Mar 2022 14:10:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646316640; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aWk3jBMeJTdwrNX3wqrCnZbk5d/gQSX5OEwgsEHsMQY=; b=CIkdahvzdSPSmxYnghK0xrMVl/vN2JCALBA+r2xaID7ESstmaF/VenzfQEMGe21wN+JvC5 la/3kBqeNf5dBPlMKtekslfJlDEMJA57p3ZxI5gYt27Ov+Y7tlmyjZD3SpOAYKhVqKmRrg D/wpgl85qT+BVv3VfHu7uD7A1aP9aoI= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-561-sBDO_tAhP6-R9OQrsiH6QA-1; Thu, 03 Mar 2022 09:10:39 -0500 X-MC-Unique: sBDO_tAhP6-R9OQrsiH6QA-1 Received: by mail-wm1-f69.google.com with SMTP id az36-20020a05600c602400b003811b328ed5so1381465wmb.4 for ; Thu, 03 Mar 2022 06:10:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=aWk3jBMeJTdwrNX3wqrCnZbk5d/gQSX5OEwgsEHsMQY=; b=LLKQCAfR1xu+BXLnEe7eJ6K817MuD9XrGc81OdqjNW3hCbszIHYedjTbHzt8G9y8WV 7SMFQFRr7FTfOpp83asZhvabauNM+cSPTn1fd1OTij2lJKxA83Gb/DDEug0wt/gHMclV Jejj/tuwwUwdpkkCG+YmKMCDbTmiq1f89MVXFAazQKjpetX9p3nA398Wye5ZLDQo347G hK4y82zB7d6jSWxEmKU9+jQ5pCnQ2Ebx83cGF0W9/BDArYPT0gwgKvOVBku+YKyjHDV8 L8O6SeCibLuQ4Jbumg2UnmqJCyWMDfpmeu1Zg2+vS+nAI6kY7Nu3HzD87y1/hY/XkC/D PJHA== X-Gm-Message-State: AOAM531BEXxl/dzr/4+Atjrot8CGdsw6ZG1YA1cLkeIGIyU0dA53c3NW 5dDNtdsc9jzzZoZgX6TfC9d6Ui5Y9psE0DQjNIYWiBmrANL77kBREDSVa4hjeogwBfJW15nSIKF b9N9c6RvdKLs= X-Received: by 2002:a1c:a583:0:b0:383:67c7:3d58 with SMTP id o125-20020a1ca583000000b0038367c73d58mr3923443wme.150.1646316638337; Thu, 03 Mar 2022 06:10:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJy5tuwb8TCREOTiw1/McPbtRm6otYmLs60Pbfo9wZFOZgL7sBSpDo17lbyuysN9uH3XoeRXYA== X-Received: by 2002:a1c:a583:0:b0:383:67c7:3d58 with SMTP id o125-20020a1ca583000000b0038367c73d58mr3923418wme.150.1646316638039; Thu, 03 Mar 2022 06:10:38 -0800 (PST) Received: from ?IPv6:2a0c:5a80:1b14:b500:abb:f9d1:7bc2:3db8? ([2a0c:5a80:1b14:b500:abb:f9d1:7bc2:3db8]) by smtp.gmail.com with ESMTPSA id bd5-20020a05600c1f0500b00387d812a267sm2255337wmb.37.2022.03.03.06.10.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Mar 2022 06:10:37 -0800 (PST) Message-ID: Subject: Re: [PATCH 0/2] mm/page_alloc: Remote per-cpu lists drain support From: Nicolas Saenz Julienne To: Vlastimil Babka , akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, frederic@kernel.org, tglx@linutronix.de, mtosatti@redhat.com, mgorman@suse.de, linux-rt-users@vger.kernel.org, cl@linux.com, paulmck@kernel.org, willy@infradead.org Date: Thu, 03 Mar 2022 15:10:35 +0100 In-Reply-To: References: <20220208100750.1189808-1-nsaenzju@redhat.com> User-Agent: Evolution 3.42.4 (3.42.4-1.fc35) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4603B40018 X-Stat-Signature: zqg8jn4x59xwn1khhg8tce61bm6g1f4b X-Rspam-User: Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CIkdahvz; spf=none (imf07.hostedemail.com: domain of nsaenzju@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=nsaenzju@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam07 X-HE-Tag: 1646316641-304494 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 2022-03-03 at 14:27 +0100, Vlastimil Babka wrote: > On 2/8/22 11:07, Nicolas Saenz Julienne wrote: > > This series replaces mm/page_alloc's per-cpu page lists drain mechanism with > > one that allows accessing the lists remotely. Currently, only a local CPU is > > permitted to change its per-cpu lists, and it's expected to do so, on-demand, > > whenever a process demands it by means of queueing a drain task on the local > > CPU. This causes problems for NOHZ_FULL CPUs and real-time systems that can't > > take any sort of interruption and to some lesser extent inconveniences idle and > > virtualised systems. > > > > The new algorithm will atomically switch the pointer to the per-cpu page lists > > and use RCU to make sure it's not being concurrently used before draining the > > lists. And its main benefit of is that it fixes the issue for good, avoiding > > the need for configuration based heuristics or having to modify applications > > (i.e. using the isolation prctrl being worked by Marcello Tosatti ATM). > > > > All this with minimal performance implications: a page allocation > > microbenchmark was run on multiple systems and architectures generally showing > > no performance differences, only the more extreme cases showed a 1-3% > > degradation. See data below. Needless to say that I'd appreciate if someone > > could validate my values independently. > > > > The approach has been stress-tested: I forced 100 drains/s while running > > mmtests' pft in a loop for a full day on multiple machines and archs (arm64, > > x86_64, ppc64le). > > > > Note that this is not the first attempt at fixing this per-cpu page lists: > > - The first attempt[1] tried to conditionally change the pagesets locking > > scheme based the NOHZ_FULL config. It was deemed hard to maintain as the > > NOHZ_FULL code path would be rarely tested. Also, this only solves the issue > > for NOHZ_FULL setups, which isn't ideal. > > - The second[2] unanimously switched the local_locks to per-cpu spinlocks. The > > performance degradation was too big. > > For completeness, what was the fate of the approach to have pcp->high = 0 > for NOHZ cpus? [1] It would be nice to have documented why it wasn't > feasible. Too much overhead for when these CPUs eventually do allocate, or > some other unforeseen issue? Thanks. Yes sorry, should've been more explicit on why I haven't gone that way yet. Some points: - As I mention above, not only CPU isolation users care for this. RT and HPC do too. This is my main motivation for focusing on this solution, or potentially Mel's. - Fully disabling pcplists on nohz_full CPUs is too drastic, as isolated CPUs might want to retain the performance edge while not running their sensitive workloads. (I remember Christoph Lamenter's commenting about this on the previous RFC). - So the idea would be to selectively disable pcplists upon entering in the really 'isolated' area. This could be achieved with Marcelo Tosatti's new WIP prctrl[1]. And if we decide the current solutions are unacceptable I'll have a go at it. Thanks! [1] https://lore.kernel.org/lkml/20220204173554.534186379@fedora.localdomain/T/ -- Nicolás Sáenz