From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 484CBC4332F for ; Sat, 5 Nov 2022 01:46:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C60838E0001; Fri, 4 Nov 2022 21:46:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C10326B0073; Fri, 4 Nov 2022 21:46:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD8728E0001; Fri, 4 Nov 2022 21:46:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9BD6A6B0071 for ; Fri, 4 Nov 2022 21:46:11 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 55259C0A6E for ; Sat, 5 Nov 2022 01:46:11 +0000 (UTC) X-FDA: 80097698142.05.10AE164 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 025B4A0002 for ; Sat, 5 Nov 2022 01:46:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667612770; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n/vEmjXbFu8njlGuJecpg9SySWjZKX34Yfb/SdTSRfo=; b=VrEozZwgxQPFEvmkXghz/W0V9J2V4NibnFadX8Xc4w5F7SFUgE0taRpZo/zUUlU1GlVrfF 7CRZX3zSDJrrFfFH4AtUGug2PbcX0sRyE3+WhgyXJRQ4CjPd4sjS+FRzzTEIskiFdTKPBf GIVtE2v3CkJCDGosLCgzTTkROztGMbA= Received: from mail-oa1-f70.google.com (mail-oa1-f70.google.com [209.85.160.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-168-Fzbv1gB2MEeZdALEIky8XA-1; Fri, 04 Nov 2022 21:46:06 -0400 X-MC-Unique: Fzbv1gB2MEeZdALEIky8XA-1 Received: by mail-oa1-f70.google.com with SMTP id 586e51a60fabf-13cce313cd3so3368554fac.20 for ; Fri, 04 Nov 2022 18:46:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=n/vEmjXbFu8njlGuJecpg9SySWjZKX34Yfb/SdTSRfo=; b=zBlLXe8evbZfAs212+kJu2tGOipMMVvNe6FifMujIp8+F05IwkTEShlb1EU+1zOYnl cpqxHsSF15FvpEZHU15wU/DIn/ZWRaJwD2q6trPc8IHsz0kApoSs8Qf9XsU43+I5ZGwS 85A/QJBmWviT4rbRwS1S6BBF/GWToNg3EMytAKBr3b+uIZuGDRNoZ1+033fLBfpk5nga pX+BDf6AfLscyZTeC9zIfgrlmlIogz47AZgBqXir5sJ1EZmno36HJmjZS3N1BaWE7c1U cIHDfy/TiG1XXzkJwVUvIRQ6pA6DADXDYfgPjUGQj4Qr8D1D3mUqce5EL4lbMXLAxMel cTAg== X-Gm-Message-State: ACrzQf3qWATFjnKzE9n4V9KTzFHgakhp5OZ2GJQHzzcppe5DYFmjQt2o eZ2nQzrijqPpukvGKZS3gx/a4LWDB2e439nMp9stKIYtcNZfePfbCiF7nx2KBMFLetI+1Guqvr1 xggFo/6Aqyq4= X-Received: by 2002:a05:6830:6991:b0:661:281c:66ad with SMTP id cy17-20020a056830699100b00661281c66admr18892798otb.243.1667612765527; Fri, 04 Nov 2022 18:46:05 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6ZlDNmrYvwFnYbdlFuIgRhOVvkF7K2jY/yn2OBZt07iZaph+M3IW9r0LY/9L/ACYpHMKBcxA== X-Received: by 2002:a05:6830:6991:b0:661:281c:66ad with SMTP id cy17-20020a056830699100b00661281c66admr18892773otb.243.1667612765241; Fri, 04 Nov 2022 18:46:05 -0700 (PDT) Received: from ?IPv6:2804:1b3:a802:1099:7cb2:3a49:6197:5307? ([2804:1b3:a802:1099:7cb2:3a49:6197:5307]) by smtp.gmail.com with ESMTPSA id m27-20020a056870a11b00b0013669485016sm290444oae.37.2022.11.04.18.45.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Nov 2022 18:46:04 -0700 (PDT) Message-ID: <3c4ae3bb70d92340d9aaaa1856928476641a8533.camel@redhat.com> Subject: Re: [PATCH v1 0/3] Avoid scheduling cache draining to isolated cpus From: Leonardo =?ISO-8859-1?Q?Br=E1s?= To: Michal Hocko Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Frederic Weisbecker , Phil Auld , Marcelo Tosatti , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Date: Fri, 04 Nov 2022 22:45:58 -0300 In-Reply-To: References: <20221102020243.522358-1-leobras@redhat.com> <07810c49ef326b26c971008fb03adf9dc533a178.camel@redhat.com> <0183b60e79cda3a0f992d14b4db5a818cd096e33.camel@redhat.com> User-Agent: Evolution 3.46.1 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667612771; a=rsa-sha256; cv=none; b=ZHe+CWpEZcZLmJsC124Pht308a6SvP7ufr/UA2SRgV5zqOQCXZ70rs7usyZT7IuBvpyyoD YmAt7TAYSeF4TdVOYB1UomqLCsHDbwnZNYCUCsOQFsd4g48mmb75Rjxh+LBxiQLlBSr9DK Q3QfYpCL8WtAhTrGzy9CO7LnXYMwsM8= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VrEozZwg; spf=pass (imf25.hostedemail.com: domain of leobras@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=leobras@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667612771; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n/vEmjXbFu8njlGuJecpg9SySWjZKX34Yfb/SdTSRfo=; b=CCIpEnsG/Wd0yEe/cvB0YJ8TtvyqXrJdZ3e0jKUeeRCXiZ5zQbaCCHUpAgmkJa+wCJ3FUQ mOjSCFhqSNlP36H+WmynOunDk8pLNoqDeRDlSf+MmFnoAOo06L+oPYHaQFVdWiMyTl9KBZ Mrhm/lVFfyDoxJIn2dsB3HkoTQHnqE0= Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VrEozZwg; spf=pass (imf25.hostedemail.com: domain of leobras@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=leobras@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 025B4A0002 X-Stat-Signature: 7ibhh8uotuky76x4jxpu8o8kuemc9mb5 X-HE-Tag: 1667612770-685531 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 2022-11-04 at 09:41 +0100, Michal Hocko wrote: > On Thu 03-11-22 13:53:41, Leonardo Br=C3=A1s wrote: > > On Thu, 2022-11-03 at 16:31 +0100, Michal Hocko wrote: > > > On Thu 03-11-22 11:59:20, Leonardo Br=C3=A1s wrote: > [...] > > > > I understand there will be a locking cost being paid in the isolate= d CPUs when: > > > > a) The isolated CPU is requesting the stock drain, > > > > b) When the isolated CPUs do a syscall and end up using the protect= ed structure > > > > the first time after a remote drain. > > >=20 > > > And anytime the charging path (consume_stock resp. refill_stock) > > > contends with the remote draining which is out of control of the RT > > > task. It is true that the RT kernel will turn that spin lock into a > > > sleeping RT lock and that could help with potential priority inversio= ns > > > but still quite costly thing I would expect. > > >=20 > > > > Both (a) and (b) should happen during a syscall, and IIUC the a rt = workload > > > > should not expect the syscalls to be have a predictable time, so it= should be > > > > fine. > > >=20 > > > Now I am not sure I understand. If you do not consider charging path = to > > > be RT sensitive then why is this needed in the first place? What else > > > would be populating the pcp cache on the isolated cpu? IRQs? > >=20 > > I am mostly trying to deal with drain_all_stock() calling schedule_work= _on() at > > isolated_cpus. Since the scheduled drain_local_stock() will be competin= g for cpu > > time with the RT workload, we can have preemption of the RT workload, w= hich is a > > problem for meeting the deadlines. >=20 > Yes, this is understood. But it is not really clear to me why would any > draining be necessary for such an isolated CPU if no workload other than > the RT (which pressumably doesn't charge any memory?) is running on that > CPU? Is that the RT task during the initialization phase that leaves > that cache behind or something else? (I am new to this part of the code, so please correct me when I miss someth= ing.) IIUC,=C2=A0if a process belongs to a control group with memory control, the= 'charge' will happen when a memory page starts getting used by it. So, if we assume a RT load in a isolated CPU will not charge any memory, we= are assuming it will never be part of a memory-controlled cgroup. I mean, can we just assume this?=20 If I got that right, would not that be considered a limitation? like "If you don't want your workload to be interrupted by perCPU cache draining= , don't put it in a cgroup with memory control". > Sorry for being so focused on this > but I would like to understand on whether this is avoidable by a > different startup scheme or it really needs to be addressed in some way. No worries, I am in fact happy you are giving it this much attention :) I also understand this is a considerable change in the locking strategy, an= d avoiding that is the first thing that should be tried. >=20 > > One way I thought to solve that was introducing a remote drain, which w= ould > > require a different strategy for locking, since not all accesses to the= pcp > > caches would happen on a local CPU.=20 >=20 > Yeah, I am not supper happy about additional spin lock TBH. One > potential way to go would be to completely avoid pcp cache for isolated > CPUs.=C2=A0That would have some performance impact of course but on the o= ther > hand it would give a more predictable behavior for those CPUs which > sounds like a reasonable compromise to me. What do you think? You mean not having a perCPU stock, then?=20 So consume_stock() for isolated CPUs would always return false, causing try_charge_memcg() always walking the slow path? IIUC, both my proposal and yours would degrade performance only when we use isolated CPUs + memcg. Is that correct? If so, it looks like the impact would be even bigger without perCPU stock , compared to introducing a spinlock. Unless, we are counting to this case where a remote CPU is draining an isol= ated CPU, and the isolated CPU faults a page, and has to wait for the spinlock t= o be released in the remote CPU. Well, this seems possible to happen, but I woul= d have to analyze how often would it happen, and how much would it impact the deadlines. I *guess* most of the RT workload's memory pages are pre-faulted before its starts, so it can avoid the faulting latency, but I need to conf= irm that. On the other hand, compared to how it works now now, this should be a more controllable way of introducing latency than a scheduled cache drain. Your suggestion on no-stocks/caches in isolated CPUs would be great for predictability, but I am almost sure the cost in overall performance would = not be fine. With the possibility of prefaulting pages, do you see any scenario that wou= ld introduce some undesirable latency in the workload? Thanks a lot for the discussion! Leo