From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9492AC4332F for ; Wed, 2 Nov 2022 08:53:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1CE168E0003; Wed, 2 Nov 2022 04:53:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 17EF28E0001; Wed, 2 Nov 2022 04:53:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 020858E0003; Wed, 2 Nov 2022 04:53:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E61EE8E0001 for ; Wed, 2 Nov 2022 04:53:52 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id BFB7A161132 for ; Wed, 2 Nov 2022 08:53:52 +0000 (UTC) X-FDA: 80087889504.08.DD125FF Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf11.hostedemail.com (Postfix) with ESMTP id 3752940005 for ; Wed, 2 Nov 2022 08:53:52 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B089A22339; Wed, 2 Nov 2022 08:53:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1667379230; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mAyd5hYheJHs7wZ412hApYrbs0qLQ48nSsf9XNgypBM=; b=RjV6dnJfhfmC76wKcoNRFCWemxydooK/VGDCWqPhiMz0F/TrV1ZdiW0XKhCBJ8Z7iR628t jbGI2t0D0qvdGIrFzIqSqDmp4SOjwQiT0zrCtyvHvOOiv4hyqm+5N/sArHJ9TZeIonaEa1 LVNRrUQKlBz+NnQeVIaR9H9oNcggfrw= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 8E748139D3; Wed, 2 Nov 2022 08:53:50 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id ErMOIB4wYmPQGQAAMHmgww (envelope-from ); Wed, 02 Nov 2022 08:53:50 +0000 Date: Wed, 2 Nov 2022 09:53:49 +0100 From: Michal Hocko To: Leonardo Bras Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Frederic Weisbecker , Phil Auld , Marcelo Tosatti , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v1 0/3] Avoid scheduling cache draining to isolated cpus Message-ID: References: <20221102020243.522358-1-leobras@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20221102020243.522358-1-leobras@redhat.com> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667379232; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mAyd5hYheJHs7wZ412hApYrbs0qLQ48nSsf9XNgypBM=; b=QLuRTC0rhDlvMd9BOc/kmiTHy85KQYjg19P3jLH/d90EtUDJgmlmuZvDJLsqwO5biuumxT sOG5lip0y7PkmR2hztqgjl5LeTS+fqBUUT+bD++8Bj12rCtQEp2FBQ3su+3m0+mPdyGTr3 9+pqvYXwhbgF1mW+Hyq837rnRN641s8= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=RjV6dnJf; spf=pass (imf11.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667379232; a=rsa-sha256; cv=none; b=c73LMejxZ4Ut8S5MN/wE7OB+jeVEjTSnR88YWWFPiERKxKC80sVycEKc6lJwAHeB7gDxeJ sKIv/RPuPbuOtBnOtOpIJt6uLU7nxUwvOsrzCxrDLiClu3Bbq6nv9fJHxuLa/z9RmjJYBN /y+7wL+ud48Y3/fCQY0an/obLKdeSz0= Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=RjV6dnJf; spf=pass (imf11.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com X-Stat-Signature: w4jumns93rewrk4cght85ibqpcfp8td6 X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 3752940005 X-HE-Tag: 1667379232-80116 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 01-11-22 23:02:40, Leonardo Bras wrote: > Patch #1 expands housekeepíng_any_cpu() so we can find housekeeping cpus > closer (NUMA) to any desired CPU, instead of only the current CPU. > > ### Performance argument that motivated the change: > There could be an argument of why would that be needed, since the current > CPU is probably acessing the current cacheline, and so having a CPU closer > to the current one is always the best choice since the cache invalidation > will take less time. OTOH, there could be cases like this which uses > perCPU variables, and we can have up to 3 different CPUs touching the > cacheline: > > C1 - Isolated CPU: The perCPU data 'belongs' to this one > C2 - Scheduling CPU: Schedule some work to be done elsewhere, current cpu > C3 - Housekeeping CPU: This one will do the work > > Most of the times the cacheline is touched, it should be by C1. Some times > a C2 will schedule work to run on C3, since C1 is isolated. > > If C1 and C2 are in different NUMA nodes, we could have C3 either in > C2 NUMA node (housekeeping_any_cpu()) or in C1 NUMA node > (housekeeping_any_cpu_from(C1). > > If C3 is in C2 NUMA node, there will be a faster invalidation when C3 > tries to get cacheline exclusivity, and then a slower invalidation when > this happens in C1, when it's working in its data. > > If C3 is in C1 NUMA node, there will be a slower invalidation when C3 > tries to get cacheline exclusivity, and then a faster invalidation when > this happens in C1. > > The thing is: it should be better to wait less when doing kernel work > on an isolated CPU, even at the cost of some housekeeping CPU waiting > a few more cycles. > ### > > Patch #2 changes the locking strategy of memcg_stock_pcp->stock_lock from > local_lock to spinlocks, so it can be later used to do remote percpu > cache draining on patch #3. Most performance concerns should be pointed > in the commit log. > > Patch #3 implements the remote per-CPU cache drain, making use of both > patches #2 and #3. Performance-wise, in non-isolated scenarios, it should > introduce an extra function call and a single test to check if the CPU is > isolated. > > On scenarios with isolation enabled on boot, it will also introduce an > extra test to check in the cpumask if the CPU is isolated. If it is, > there will also be an extra read of the cpumask to look for a > housekeeping CPU. This is a rather deep dive in the cache line usage but the most important thing is really missing. Why do we want this change? From the context it seems that this is an actual fix for isolcpu= setup when remote (aka non isolated activity) interferes with isolated cpus by scheduling pcp charge caches on those cpus. Is this understanding correct? If yes, how big of a problem that is? If you want a remote draining then you need some sort of locking (currently we rely on local lock). How come this locking is not going to cause a different form of disturbance? -- Michal Hocko SUSE Labs