From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA7F0C021A1 for ; Tue, 11 Feb 2025 11:43:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D68776B007B; Tue, 11 Feb 2025 06:43:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CF10E6B0082; Tue, 11 Feb 2025 06:43:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B6B036B0083; Tue, 11 Feb 2025 06:43:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 974166B007B for ; Tue, 11 Feb 2025 06:43:19 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1E5921215F9 for ; Tue, 11 Feb 2025 11:42:57 +0000 (UTC) X-FDA: 83107477194.19.1F38FBF Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com [209.85.218.48]) by imf16.hostedemail.com (Postfix) with ESMTP id 07A9618000E for ; Tue, 11 Feb 2025 11:42:54 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=ELUjlllE; spf=pass (imf16.hostedemail.com: domain of mhocko@suse.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739274175; a=rsa-sha256; cv=none; b=zXfsFomQTchv7N0SIq//Af4bMryn4NUcGaWZPRR+dtPf/E8dqI9nbNZxI1xFQlYWskPPG3 EIKLGnjW82eQP6id/rAF58v09/OKlp6NvcAPNYuRdhsiuTEhuAoxupcu1CI2bRO0+ANjC1 0TxdHIkLbtFbIp3dsoOMm6eMYAiu06o= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=ELUjlllE; spf=pass (imf16.hostedemail.com: domain of mhocko@suse.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739274175; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yImLolx1dYvJiGok8R/MgSIPH3BQANa0TBwOjxPIHfk=; b=iUx6akaFY2sCzXV0rh5HrxMW9SKMUygkNdcTt4PpvUXWiYV22y07FwHzH30cD9jwoqbFLG rQfuaNfBV6fxD6akl1w1j8kn7BFJ0eICq/BAE37qTOGvowaPelwssy3X8ADlytco6C0oEf g7ojW9U9SHujliBn9Sy0+Mo9ZsiOkRc= Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-ab7d583d2afso214652566b.0 for ; Tue, 11 Feb 2025 03:42:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1739274173; x=1739878973; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=yImLolx1dYvJiGok8R/MgSIPH3BQANa0TBwOjxPIHfk=; b=ELUjlllE8f7MZUC5frAZsg5B8TJpbcaeE9zo97+z6ulJVmTyhBemacxSYN08H6WwDS oi1+J5/Sf0fNUWrRjlP81Sq0mKIj1rBytalfyEM4cAipuLDhWcGI8YHlCtJXcJ6uyLju 0809d7LRMbJ1KHMn5UB1IUkj8u6unNBTvsgfzVeb3/ygkX/zYob/k5aSu/V8kpLQbs1v AWXdSCkr0on/5+/jaU3/Q+YHjWXDavoc2DkJ8d4VNcDbvroO7M6Zsv/gT4culqyk8Of7 uBmORxqEzR3mQif/SFEAqhTbF+pHJWZ3fHhNU67Bz7sFNQiLBu4+DoJyWBMYKWRwt9eA /xcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739274173; x=1739878973; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=yImLolx1dYvJiGok8R/MgSIPH3BQANa0TBwOjxPIHfk=; b=ZfVrJwxCsyLBzxok16uphJcd6cFtcUY7aDuobU6MzKITtXAwFGIfjZLR/Cd9K93FJQ LUcTwJa9PLcxZScWXTbnKn+XLkurtgtjbYb9csGKuNyVXBlegDgVoD15K2pUDthb+Sd9 3sIcpDIoD8nMT1RrSfb96bmJMeutSWR/yUvVbaqvnimzg/VNrUQlcz4Y04OYOgl2YYMG jiiBKlJhkNH26XJ20WxjWV/L/v1jaY4Xyj2Itm8HglqQxNOpb06mGXDrHnIrhWwS2AUi 8R/WMPytCiV2A6I6E5EcmZ3mWlpsYnsiSUsQaWMzZVLl/LMjt1310z9eG1NYEri0AzQh pD7g== X-Forwarded-Encrypted: i=1; AJvYcCX7Nd1+QaQzAk4KNUR9iMZmbtf9Aon8S0egypcbfRXYeyq1VDGW6rweCnUbDqhHeRIwKdqUyrxZEg==@kvack.org X-Gm-Message-State: AOJu0YyzYVt9tYSxzoYFTo0ZcnW2YmRzh07fVRK8bLw8zR4MwV9VWzzH T5I/LeQ/TV1eatz8ZCiaOBnU477Q6/h9ma8CIz3e0sF313MvtPjbrszwzPw+ysU= X-Gm-Gg: ASbGnct4BUDEpR7TZQb5fMS8r/T2X51EJxh8O9CqOv6kUahH9mpWyN10hmdHvY82ZZN BoNSIHyzE3KmlPuOl3hwZdoxcqa/56+VjTQ0j6/+LJm1+6ThPtodGx/WkE2zg/Rb3J3qQjV4vX+ M7CGCKxMao3CP0gHsHzomb/88A1VPI7vy6NyvdNWJIjNDtSv3CWE+fep7BZvdIWS+0DFcD8MmcR QQgqBXWHEQMLLm9Get7jHf4sGfHLlcvgn0oFjHuGh7taumAFfUEQBxNw302nl8ffZ8OjFSHMlqG hzKI2RIeDgESgBa2fIpCCQ/oFYGD X-Google-Smtp-Source: AGHT+IGTBrnKOFw0mPFN3Fmkl3m/QpO51+eGneSiPtw1xqy5fXne9PL8xZ6RDr/Hk6A2GN3TQ+maoQ== X-Received: by 2002:a17:906:2983:b0:ab7:d34a:8f83 with SMTP id a640c23a62f3a-ab7daf85005mr240137366b.17.1739274173134; Tue, 11 Feb 2025 03:42:53 -0800 (PST) Received: from localhost (109-81-84-135.rct.o2.cz. [109.81.84.135]) by smtp.gmail.com with UTF8SMTPSA id a640c23a62f3a-ab79378ee30sm808155066b.160.2025.02.11.03.42.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Feb 2025 03:42:52 -0800 (PST) Date: Tue, 11 Feb 2025 12:42:51 +0100 From: Michal Hocko To: Frederic Weisbecker Cc: LKML , Peter Zijlstra , Ingo Molnar , Valentin Schneider , Marcelo Tosatti , Vlastimil Babka , Andrew Morton , Thomas Gleixner , Oleg Nesterov , linux-mm@kvack.org Subject: Re: [PATCH 6/6 v2] mm: Drain LRUs upon resume to userspace on nohz_full CPUs Message-ID: References: <20250209223005.11519-1-frederic@kernel.org> <20250209223005.11519-7-frederic@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250209223005.11519-7-frederic@kernel.org> X-Stat-Signature: i3k39oym85pdrcimorz4sexoxmxyqu9x X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 07A9618000E X-Rspam-User: X-HE-Tag: 1739274174-439460 X-HE-Meta: U2FsdGVkX1+oKw9P26CxDb6KltJkg8n/u0E3JpBoR1AoBZD/MVuP6k7LLtcxuiA+1ZaR7taYwqcZOlCUFhvJ7vNm/FkVuksOBGeAUMkrMn7EMXXttp2Ah7tyfOzPbdHg3PmfYhaV9aHWvGTl42OIIWbsiHTWIAiwfeQGzGbewNLfMpXdghl6TaljoyXqK8aZUrVx3uVYEvb9zf9xZvmMlLRyQW8/PJ3gdwW7Q5XlrSILo/xRxz1PB19rP/xdLm7xUd8d7njgbOEYcsMWI1cuxKWFrQtq6KT7aKm7yNO1FRa+qMM4ye5jbSSqKyROi8/BeQjSaA+/VnqUjzTRvNynPJwvr1aAr2yn43D8JEtJ0mtft6tbG8LEID1NcjR9bnmdtXepPrbkqM2Qi0shdTnM8rMNJ6XxTbcgWFj24820HpScBgcvNZ6Uc78PFYo70G+Dmiyg6XcEUB0NMRFFxVFpoBZfLwbL4LyGgqNnNRUY7jPwXKHc+qW8UcYpKvpiUQlpzKJ5g7VSEz4+Yl9dXxhEzoo4WJZNdGe+E6jIz24rILjLr5W92ps6ZVIO6g/VDWsGLwW/L0kfiQns/jXmvh+2tAAkpHGKgRMvk7DxPoP2Q8EXaZ2Yg3LwAXzRuDVhNHgudDZmRe0oYFcbygHxi5f/B9uLP/gIxyKNQe5iprsAd5AD+e+wgoDLjFv+8W3kBw+m9rFVAdWVuodRutYjsucAWtDGfbyktNoVSqzr0k1fZd8bnIAWsY9D+PVboUwKPsz3f9C4vdnbNb84Jx6ByqeTpxf6SkwdQw/bh7cJBt4BqtMOWzEtMHgiXNzkhEWXq0Lv7J5+8VxBdE1ylYySFOhKCkfieMaNoFNRDDzdE4KWWZAihBNACWn8XlFEL6Lv4j1Y0iBb+M+SDtLEiU5smB27Bo9oatCyim3XmwLy6dz+KdOPjOlAUUgIIntPq3+jaaLskjnga8sYDn/h6iSlz+D t3FWhERa GqOIesvXNJ8wbiH9i8GU8J8gWcXS7R2+4eU0xKa7ZAUbH+nxQsyCkfuxydtY3ynPq1VRI4DZiQ4og9fWly4/3WCywYTjfvI7H4i1k+UIfFncrSB0/rj5UuyRXiYvEEcRGFaxUPuOyQ4KR2GdW2Ygq/JiLV6XBKYtdxn04rvB7vlnJp5VeFqlBkNeKz3uIPS2LH4UJ7q6++1D0r21lzg61CAtptDjgC396z81HJW2jsyWnO3K6JYtfOfGn2lp8sHqAiqYH7jY4ftocZwm+J28ldZBAnWhVGIJotjXb/dYexkwnz5nWWuKtfosW8Jy9HA7pQAZlxA06TNbOpkaEaGEaSC5zbD+hDMRLnj5R7ev5zWlOJzQUde58mr8sxUQ7tHIGMso2hTpsYpTbTJZQ0kE8Qm1Q2gCw16AJy34u6WNgBpNk368= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun 09-02-25 23:30:04, Frederic Weisbecker wrote: > LRUs can be drained through several ways. One of them may add disturbances > to isolated workloads while queuing a work at any time to any target, > whether running in nohz_full mode or not. > > Prevent from that on isolated tasks with defering LRUs drains upon > resuming to userspace using the isolated task work framework. I have to say this is rather cryptic description of the udnerlying problem. What do you think about the following: LRU batching can be source of disturbances for isolated workloads running in the userspace because it requires kernel worker to handle that and that would preempt the said task. The primary source for such disruption would be __lru_add_drain_all which could be triggered from non-isolated CPUs. Why would an isolated CPU have anything on the pcp cache? Many syscalls allocate pages that might end there. A typical and unavoidable one would be fork/exec leaving pages on the cache behind just waiting for somebody to drain. This patch addresses the problem by noting a patch has been added to the cache and schedule draining to the return path to the userspace so the work is done while the syscall is still executing and there are no suprises while the task runs in the userspace where it doesn't want to be preempted. > > Signed-off-by: Frederic Weisbecker > --- > include/linux/swap.h | 1 + > kernel/sched/isolation.c | 3 +++ > mm/swap.c | 8 +++++++- > 3 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index b13b72645db3..a6fdcc04403e 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -406,6 +406,7 @@ extern void lru_add_drain(void); > extern void lru_add_drain_cpu(int cpu); > extern void lru_add_drain_cpu_zone(struct zone *zone); > extern void lru_add_drain_all(void); > +extern void lru_add_and_bh_lrus_drain(void); > void folio_deactivate(struct folio *folio); > void folio_mark_lazyfree(struct folio *folio); > extern void swap_setup(void); > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c > index f25a5cb33c0d..1f9ec201864c 100644 > --- a/kernel/sched/isolation.c > +++ b/kernel/sched/isolation.c > @@ -8,6 +8,8 @@ > * > */ > > +#include > + > enum hk_flags { > HK_FLAG_DOMAIN = BIT(HK_TYPE_DOMAIN), > HK_FLAG_MANAGED_IRQ = BIT(HK_TYPE_MANAGED_IRQ), > @@ -253,6 +255,7 @@ __setup("isolcpus=", housekeeping_isolcpus_setup); > #if defined(CONFIG_NO_HZ_FULL) > static void isolated_task_work(struct callback_head *head) > { > + lru_add_and_bh_lrus_drain(); > } > > int __isolated_task_work_queue(void) > diff --git a/mm/swap.c b/mm/swap.c > index fc8281ef4241..da1e569ee3ce 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -37,6 +37,7 @@ > #include > #include > #include > +#include > > #include "internal.h" > > @@ -376,6 +377,8 @@ static void __lru_cache_activate_folio(struct folio *folio) > } > > local_unlock(&cpu_fbatches.lock); > + > + isolated_task_work_queue(); > } This placement doens't make much sense to me. I would put isolated_task_work_queue when we queue something up. That would be folio_batch_add if folio_batch_space(fbatch) > 0. > > #ifdef CONFIG_LRU_GEN > @@ -738,7 +741,7 @@ void lru_add_drain(void) > * the same cpu. It shouldn't be a problem in !SMP case since > * the core is only one and the locks will disable preemption. > */ > -static void lru_add_and_bh_lrus_drain(void) > +void lru_add_and_bh_lrus_drain(void) > { > local_lock(&cpu_fbatches.lock); > lru_add_drain_cpu(smp_processor_id()); > @@ -769,6 +772,9 @@ static bool cpu_needs_drain(unsigned int cpu) > { > struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu); > > + if (!housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE)) > + return false; > + Would it make more sense to use cpu_is_isolated() and use it explicitly in __lru_add_drain_all so that it is clearly visible - with a comment that isolated workloads are dealing with cache on their return to userspace. > /* Check these in order of likelihood that they're not zero */ > return folio_batch_count(&fbatches->lru_add) || > folio_batch_count(&fbatches->lru_move_tail) || > -- > 2.46.0 -- Michal Hocko SUSE Labs