From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 545D9C021AD for ; Thu, 20 Feb 2025 08:00:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C7DD32802AD; Thu, 20 Feb 2025 03:00:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C310C2802AB; Thu, 20 Feb 2025 03:00:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A57282802AD; Thu, 20 Feb 2025 03:00:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 84AB32802AB for ; Thu, 20 Feb 2025 03:00:34 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 3C04780EA2 for ; Thu, 20 Feb 2025 08:00:34 +0000 (UTC) X-FDA: 83139575988.05.A2D6961 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf12.hostedemail.com (Postfix) with ESMTP id CE43440005 for ; Thu, 20 Feb 2025 08:00:31 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZX06T6D7; spf=pass (imf12.hostedemail.com: domain of gmonaco@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=gmonaco@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740038431; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ni5cBujq2HhDSr/l3QFaiba/7JfaRlXTdJoUovWgfTo=; b=2TS+jHCnLco6O3A/6JHfS+iA0j7D3XWpx8oIorh8+m/demKJuMUXq0opKWJZNYfaK91Zy2 SV/b+tl0Zzyb7WD+EDuMQNPFzZBAAuVe8ZKi2TyaqKmLc2KbrIUpaQkvsp8JksjExlbrYQ PVGs0NQEYbfzVlZGBY0rODj3E/zR0n0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740038431; a=rsa-sha256; cv=none; b=l7KYthZLNooJvnb3pkv0hmDurj1pROf4fZrDNP6BXS11I/nu+ixAAbF34vGom+Ii9zuRM0 49r7Kqu/uiCJzO1J70Qw01Bfcvcdd4g7gvUo2ptgA2BStRW1VKvhZahlYHekyfDD5hS3Av oqC64nLCiQhQVLddcol1/AfB2EMYfLc= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZX06T6D7; spf=pass (imf12.hostedemail.com: domain of gmonaco@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=gmonaco@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740038431; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=ni5cBujq2HhDSr/l3QFaiba/7JfaRlXTdJoUovWgfTo=; b=ZX06T6D78iu6cpSPCpTrmNcNfpwzfT4DYAaabZ8VQ+riIhNroZfG2qoIg+tlPvq3DYvy9K 74IrsNEUHf0KQ+yhVV5ytV6tYSptGVtwp9qPce+dHVHqUIeAVisO/iLBa215TrNygH1yF1 w8anvFE3I4ghNv2OsTzXrXpD8K9n210= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-323-Hy3eNrJmPqmoOU8ASXPhXQ-1; Thu, 20 Feb 2025 03:00:30 -0500 X-MC-Unique: Hy3eNrJmPqmoOU8ASXPhXQ-1 X-Mimecast-MFC-AGG-ID: Hy3eNrJmPqmoOU8ASXPhXQ_1740038429 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-4393e89e910so2887905e9.0 for ; Thu, 20 Feb 2025 00:00:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740038429; x=1740643229; h=mime-version:user-agent:content-transfer-encoding:autocrypt :references:in-reply-to:date:cc:to:from:subject:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ni5cBujq2HhDSr/l3QFaiba/7JfaRlXTdJoUovWgfTo=; b=BIuS9TCCHgGEQU9IBu5qZpcjwnnB6+eRpq5kWnZ85VjBEpiUTPYlxFTVSepe9hBrsf l6fzxt0OkV8rFcsbuG7rs55JWb6I9ipY24lHt4C20/gHQhU3wjHLFr7+z5sFfrVj+ZwB g63UP128FOD/7xzYuhPPhaU83BGgFFeUllFfusFTtLrTwTBurRPfCbV0v/TFBYWh6xH8 vmFGcsAWOVAyI7pmgKyU/05KimPjXEn/LSEn+kin1iAqmQ7yBJIMi1oXCZ5GLRsNu9xo MtczClW/DjlHBCqsVCGBkss3rSnjy6IMue71QHmmXRrKlCt4OEYcRoAiO000KmkQhgXO IDig== X-Forwarded-Encrypted: i=1; AJvYcCWZZCB1bynz4m1W1Y/nngSqhcwA3GJcspGQSAnZCCs35eoSgM47W3XuK0JZjt7pBIeZYCet+h4G7Q==@kvack.org X-Gm-Message-State: AOJu0Yymca2OpeQ5hKwiSI/RuTLgLfRI/SR7CTibsHENtoqEyq1ViQH8 CpQFZrTbIwLjcyDfXLKUmcZsTwaYNkHAfYxIaKwYFFZwmiM3zV0YFTgeIumPMhVVj6g5GU5Bbop +Y9kYx9BYq1/xEg8jL7snysd9C/1Tn47GVSE2g1C/zS9dpBc1 X-Gm-Gg: ASbGncuM16Gs2917edqW6vyioGC1zzCPyPdITKkol7x9yZLjjsgTRpl/X5Uqw4bwpzb 5RNIeSklnjCI0+/3gF36trBas4SMhtK5ssTEPhq1nWVtQgISaIJqeuOpsh6+HQj1L5dhZecoNd9 B3WVWmzkpwMq3VI1h+08piU0o52szLsg40cnvABKvum2eVj9miBtir3RnJNpC6bfOTbD11QLgIe qDU4cgqQ+nvCKopf0bnSxrY9gsb7GcUenC0bTXlufoT0Qk9wl5izyKL5dxbVKsvpbk6INx+UB+k mVaxoHpuXrWsr9XKsvBFBU0gzxtAwbs= X-Received: by 2002:a05:600c:1d23:b0:439:9ac3:a8b3 with SMTP id 5b1f17b1804b1-4399ac3aa6cmr53537485e9.18.1740038428574; Thu, 20 Feb 2025 00:00:28 -0800 (PST) X-Google-Smtp-Source: AGHT+IGVH07uirG3vhruI9DjqJFGGMUDyZrbRLPYo5k79VxfBffJ5WS/NvWyfz/b12nwm87jumBLtw== X-Received: by 2002:a05:600c:1d23:b0:439:9ac3:a8b3 with SMTP id 5b1f17b1804b1-4399ac3aa6cmr53537135e9.18.1740038428130; Thu, 20 Feb 2025 00:00:28 -0800 (PST) Received: from gmonaco-thinkpadt14gen3.rmtit.csb ([185.107.56.35]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38f25915146sm20145491f8f.56.2025.02.20.00.00.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Feb 2025 00:00:27 -0800 (PST) Message-ID: Subject: Re: [PATCH v7 1/2] sched: Move task_mm_cid_work to mm work_struct From: Gabriele Monaco To: Mathieu Desnoyers , linux-kernel@vger.kernel.org Cc: Ingo Molnar , Shuah Khan , Andrew Morton , Ingo Molnar , Peter Zijlstra , "Paul E. McKenney" , linux-mm@kvack.org Date: Thu, 20 Feb 2025 09:00:25 +0100 In-Reply-To: <0493b3c4-c37f-4ddd-93ee-6d7946e42846@efficios.com> References: <20250219113108.325545-1-gmonaco@redhat.com> <20250219113108.325545-2-gmonaco@redhat.com> <8fc793e3-cdfc-4603-afe6-d2ed6785ffbb@efficios.com> <86fad2bd-643d-4d3a-bd41-8ffd9389428b@redhat.com> <0493b3c4-c37f-4ddd-93ee-6d7946e42846@efficios.com> Autocrypt: addr=gmonaco@redhat.com; prefer-encrypt=mutual; keydata=mDMEZuK5YxYJKwYBBAHaRw8BAQdAmJ3dM9Sz6/Hodu33Qrf8QH2bNeNbOikqYtxWFLVm0 1a0JEdhYnJpZWxlIE1vbmFjbyA8Z21vbmFjb0ByZWRoYXQuY29tPoiZBBMWCgBBFiEEysoR+AuB3R Zwp6j270psSVh4TfIFAmbiuWMCGwMFCQWjmoAFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgk Q70psSVh4TfJzZgD/TXjnqCyqaZH/Y2w+YVbvm93WX2eqBqiVZ6VEjTuGNs8A/iPrKbzdWC7AicnK xyhmqeUWOzFx5P43S1E1dhsrLWgP User-Agent: Evolution 3.54.3 (3.54.3-1.fc41) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: FVL8rAVwIWT8N5_EKUZ4DfBAVM2rYlnyvmRKNcQgp1g_1740038429 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: CE43440005 X-Rspamd-Server: rspam07 X-Stat-Signature: bxddqw8efrj3nyoscehpiz4p8gbyfeyj X-HE-Tag: 1740038431-451830 X-HE-Meta: U2FsdGVkX1+3UxiAX68CzWnnWFL8IRisIx0AzBCAuffshMs/NC1TFXx60b7wFwxTJtnb+d2uAySEhpRZwAic5+3KYKC5p4AsYfgPwblEkI72mZH+GLF0HSuwrVrwoW118pwu004OR0wGjnyMAGmfZiEtVXbFrI9v3ZvUPGM5QP0cFDNxpc2BUtCEgC1gWKAUz82rDDlA3yMfRq0ulB3giHC3DK2Cksp7EMaxjFS1Oqn9IcoPu6Cekb2H9nkrSLpJo1h54Fv8ZjeYFg3y7ThF5gVMQU1SpO96d3Blx5yKErdh3pcum/8Ivwrl0qgiOdO2Qkyx7EH0GBWc7X1on4zaI4yaMy/WxRcgDdHNthdK7+6lzpvwjL1CdPR1YmjCO8nNRzIqOO5nFISFlKjAmLDH9EOOlNAvDRGjCMybKg7ExWepkfuqPxrzeiyVyDOnk+WYljMB2yufbAoO0o9hKP9O+O/d+QZvoibB1/E0lHYHfCiCd/UwBLeQJ6Jm3Fd29F9qhuDaALLMdCTL1zfayc8I3ahCTQSKWNVZr9TTBLqXifItuXMod2L+FT4FaqziLptd0uCa8JvJdAXhzesjx2LmlmjvRQxl1MwycI9IyIAqXEwntACC1z23GhHF7mz+whQNAU4QdIX19kBgdtLzmj4bXiR0KOe0hz0MPludqiXx/4DhNJpq2qKyPOA40UFwLtElZ2ttlDq33LDsZpqY3NStdhGG30FfHm9ZULDMgMU1ZGNpt+ucjwdCBafz5u0RYar8SrCY7/biyRchhfBkiWkwTvtTywwcVeYBcSEkjtCI4IIJoOwqzTkI1fU9ZAA49LQTnSmjyyfu1d2J5hKUTlfUFQnvphvdRhgoGa2i1zQpApxGS/XymhJQmOd+BFTSzRxHaeRoXCeaykjrYvssp1klDMukqLRnlikSf+KptKnY7EiofZUikIDJdYJWpMpPSbSeLoiHxp/HkzPUbmM2HYz x24P6zkE 7UQ4oodiQbOV08NetGy+J7RyA37lUhtLW/EyCEG/KHnPwmiAyZz2LOLKkxAp2DmI0E+a5qMUyAgRzr9jPH66NjgPdav4HzTIuI8LnfhzH+os90AgVBDcLaTuhTQJhHCOwnBgiv81kBeG3+UTQ2tE2HUnHh9Dls0E3qygJo/e22dLb1t2D8Nm9DvtQqNdpJyAkk6upBBD/yLrj5QEhgSPbnJUmMh7T6EMG0pbX8xYYmEuGqXsybZNz6h3BDMS+Crmx3LIc86y3aYCxnYdOFeCInFe46kePdiZc6MGQLbS6sazsN9Z/WT9T7BmkxBjfYae39b2K70iaCtGq5xU6KbW02m+2/JTSaFN3gO0ISlbeqwVEP7zkl6sWOHrpiHq81DLfDssarB04uW6ETiM6KvYP/L8gnqkuz0Hz4SVC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 2025-02-19 at 12:08 -0500, Mathieu Desnoyers wrote: > On 2025-02-19 11:32, Gabriele Monaco wrote: > >=20 > >=20 > > On Wed, 2025-02-19 at 10:13 -0500, Mathieu Desnoyers wrote: > > > > On 2025-02-19 06:31, Gabriele Monaco wrote: > > > > > > Currently, the task_mm_cid_work function is called in a > > > > > > task work > > > > > > triggered by a scheduler tick to frequently compact the > > > > > > mm_cids of > > > > > > each > > > > > > process. This can delay the execution of the corresponding > > > > > > thread > > > > > > for > > > > > > the entire duration of the function, negatively affecting > > > > > > the > > > > > > response > > > > > > in case of real time tasks. In practice, we observe > > > > > > task_mm_cid_work > > > > > > increasing the latency of 30-35us on a 128 cores system, > > > > > > this order > > > > > > of > > > > > > magnitude is meaningful under PREEMPT_RT. > > > > > >=20 > > > > > > Run the task_mm_cid_work in a new work_struct connected to > > > > > > the > > > > > > mm_struct rather than in the task context before returning > > > > > > to > > > > > > userspace. > > > > > >=20 > > > > > > This work_struct is initialised with the mm and disabled > > > > > > before > > > > > > freeing > > > > > > it. Its execution is no longer triggered by scheduler > > > > > > ticks: the > > > > > > queuing > > > > > > of the work happens while returning to userspace in > > > > > > __rseq_handle_notify_resume, maintaining the checks to > > > > > > avoid > > > > > > running > > > > > > more frequently than MM_CID_SCAN_DELAY. > > > > > >=20 > > > > > > The main advantage of this change is that the function can > > > > > > be > > > > > > offloaded > > > > > > to a different CPU and even preempted by RT tasks. > > > > > >=20 > > > > > > Moreover, this new behaviour is more predictable with > > > > > > periodic > > > > > > tasks > > > > > > with short runtime, which may rarely run during a scheduler > > > > > > tick. > > > > > > Now, the work is always scheduled when the task returns to > > > > > > userspace. > > > > > >=20 > > > > > > The work is disabled during mmdrop, since the function > > > > > > cannot sleep > > > > > > in > > > > > > all kernel configurations, we cannot wait for possibly > > > > > > running work > > > > > > items to terminate. We make sure the mm is valid in case > > > > > > the task > > > > > > is > > > > > > terminating by reserving it with mmgrab/mmdrop, returning > > > > > > prematurely if > > > > > > we are really the last user before mmgrab. > > > > > > This situation is unlikely since we don't schedule the work > > > > > > for > > > > > > exiting > > > > > > tasks, but we cannot rule it out. > > > > > >=20 > > > > > > Fixes: 223baf9d17f2 ("sched: Fix performance regression > > > > > > introduced > > > > > > by mm_cid") > > > > > > Signed-off-by: Gabriele Monaco > > > > > > --- > > > > > > diff --git a/kernel/rseq.c b/kernel/rseq.c > > > > > > index 442aba29bc4cf..f8394ebbb6f4d 100644 > > > > > > --- a/kernel/rseq.c > > > > > > +++ b/kernel/rseq.c > > > > > > @@ -419,6 +419,7 @@ void __rseq_handle_notify_resume(struct > > > > > > ksignal > > > > > > *ksig, struct pt_regs *regs) > > > > > > =C2=A0=C2=A0=C2=A0 } > > > > > > =C2=A0=C2=A0=C2=A0 if (unlikely(rseq_update_cpu_node_id(t))) > > > > > > =C2=A0=C2=A0=C2=A0 goto error; > > > > > > + task_queue_mm_cid(t); > > > > > > =C2=A0=C2=A0=C2=A0 return; > > > > > > =C2=A0=20 > > > > > > =C2=A0=C2=A0 error: > > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > > > > index 9aecd914ac691..ee35f9962444b 100644 > > > > > > --- a/kernel/sched/core.c > > > > > > +++ b/kernel/sched/core.c > > > > > > @@ -5663,7 +5663,6 @@ void sched_tick(void) > > > > > > =C2=A0=C2=A0=C2=A0 resched_latency =3D cpu_resched_latency(rq); > > > > > > =C2=A0=C2=A0=C2=A0 calc_global_load_tick(rq); > > > > > > =C2=A0=C2=A0=C2=A0 sched_core_tick(rq); > > > > > > - task_tick_mm_cid(rq, donor); > > > >=20 > > > > I agree that this approach is promising, however I am concerned > > > > about > > > > the fact that a task running alone on its runqueue (thus > > > > without > > > > preemption) for a long time will never recompact mm_cid, and > > > > also > > > > will never update its mm_cid field. > > > >=20 > > > > So I am tempted to insert this in the sched_tick to cover that > > > > scenario: > > > >=20 > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 rseq_preempt(current); > > > >=20 > > > > This would ensure that the task runs > > > > __rseq_handle_notify_resume() at > > > > least each tick. > > > >=20 > >=20 > > Right, I thought about this scenario but forgot to add it in the > > final patch. > > We could have a test doing that: instead of sleeping, the task busy > > waits. > >=20 > > Does __rseq_handle_notify_resume need to run in this case, besides > > for the cid compaction, I mean? Otherwise we could again just > > enqueu > > the work from there. >=20 > Yes we need to do both: >=20 > - compact cid, > - run __rseq_handle_notify_resume to update the mm_cid. >=20 > We we don't care much if compacting the cid is done at some point > and __rseq_handle_notify_resume only updates the mm_cid field on > the following tick. >=20 > So enqueuing the work is not sufficient there, I would really > issue rseq_preempt(current) which makes sure a busy thread both > triggers cid compaction *and* gets its mm_cid updated. >=20 Sure, will do. I've been trying to test this scenario but it's quite hard to achieve. I set all threads to FIFO and highest priority, removed all system calls from the leader thread (even the ones to wait for other threads) and replaced the usleep with a busy wait, running on a VM so not=C2=A0sure if interrupts can bother. The test still passes.. Anyway it seems a reasonable situation to happens and I guess it won't hurt to just run an rseq_preempt in the tick. Testing and sending V8 without touching the selftest. Thanks, Gabriele