From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A130C021AA for ; Wed, 19 Feb 2025 16:32:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB3DE44016D; Wed, 19 Feb 2025 11:32:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E3E65440156; Wed, 19 Feb 2025 11:32:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CDDBB44016D; Wed, 19 Feb 2025 11:32:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id AC75E440156 for ; Wed, 19 Feb 2025 11:32:43 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 56ED01A1B52 for ; Wed, 19 Feb 2025 16:32:43 +0000 (UTC) X-FDA: 83137237806.17.01C4CB3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 04F05A001D for ; Wed, 19 Feb 2025 16:32:40 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Nxn2nE1l; spf=pass (imf15.hostedemail.com: domain of gmonaco@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=gmonaco@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739982761; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Q8CgIcE3QwJsFaPpCkGgnCnoYS4A/W2amcUZ6xEWGUo=; b=ylzSvwZSKI0DXSIlNM6eWtI80A6ZIz9PUUMe/kj2bZC6gZOOe4077sAng31X00b9nJE03m IOdH9cWgIEeT4Cblz/rxxMR2oyDuC3C2Y4plTnaLz1xDM0Qi1IQRV5DgKl/MfirYrarXo2 VHmLzZy8UMc682Ul4S4oBTrbuj6SGXk= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Nxn2nE1l; spf=pass (imf15.hostedemail.com: domain of gmonaco@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=gmonaco@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739982761; a=rsa-sha256; cv=none; b=YKKh3hv9ZOVy1jW7Y9clpQZqebt4+DaUvx90ocifkt6ftEJK46w2qfzA4goTZwfhDujzf3 oLowAkhkUWWYYZANU/frdhPzNdJioqgeP9/9RR2ZxJsB9sA1XXWDVL3V59+ikug2eFcG69 Z1cJjBjw7RSl7pig7cci6GGzzv8T558= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739982760; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q8CgIcE3QwJsFaPpCkGgnCnoYS4A/W2amcUZ6xEWGUo=; b=Nxn2nE1lz8pBl0SbCOuKEAxvNtzd7cFsZoWJyNX8omnf+IfKk7P9RxYCvtMd3FGm4oVfy7 pVx4N2OuEIjESanjortjtoXvYkJ6hvmnoAlyQTfF9cLvJ8EexMWqjxr3imoGTxwUQmBj/m ED/0XHKpVnkmw9EteDSfnEy3t0Qkby0= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-470-tducdm6hMyaoHAuzVOkfng-1; Wed, 19 Feb 2025 11:32:39 -0500 X-MC-Unique: tducdm6hMyaoHAuzVOkfng-1 X-Mimecast-MFC-AGG-ID: tducdm6hMyaoHAuzVOkfng_1739982758 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-38f27acb979so5389670f8f.0 for ; Wed, 19 Feb 2025 08:32:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739982758; x=1740587558; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=Q8CgIcE3QwJsFaPpCkGgnCnoYS4A/W2amcUZ6xEWGUo=; b=RXqLp1y7ZHmhMpsCe/509Mb9iaL/CXMsdJ2/oMZ9m9LVjvSS4iP27L/nfR7OyRZO2y 8KC7KVmhybaT1ohcjplAzHrKFNuZ1ao5OC3pTlI9zJeX9SDTbtTxEon1GikaqE3YzCBr a/hVdXgrGkZ197Juvj603qJ+hoRaOXwz6ky0rlaNjE3nokaj7FqwrpX2fq+lopUMLS40 5GOzAXqmux6AKM1Ny4ZuLqBJFZ9bz9UG0phlw357VOVLkVZJdP2cqeugpoUSzZrZg9rJ 4tu9IbKKqRyLctNst2D3G5ZAENNb7MJAfVd5hFQpeaB55RGkvMe8Ux2ekc7NZbPNYL64 0aaw== X-Forwarded-Encrypted: i=1; AJvYcCVOfVt9Ac189OaUPZfrGEsFSy2l+Ry1wtk0gOXyt4jirpmbYvGrDf2vFNtbJRKHwScvjsoGc5CnpA==@kvack.org X-Gm-Message-State: AOJu0Yz/P0QhfjzA9zVnhbFg9f0Ptm7b1OPhVOj6BEbvvx/YOQ/SxQw/ w/7nZTFfixaO2NDzwuZl+yhS+8D3xT1r5kIbsLkmb2we9zBCoZ05N9TOUUrSxVpZ9c7m6GgmE2h zhbgjadLdOWsHWDG2jqdsqnQM88HHkclluSK02fC4ITaHAADE X-Gm-Gg: ASbGncuC0n6BHukbhwqPCqnbgSSOQbOvB98zukh3Gt4NmtCzjVCXaM7cWHD2Xf1AaVg wuFp+li8pBQfbN8QU2UvmmXge9OcqTHp6mJ5KGyeDGuJLl6pQQMPWmx3PrLTukGw4NO6cwaXLSa j9aZjiF0bSKM9Jqoq+L5ipEvrfJoYLQTcwvXe1hQczHN/+gwZo6qtO1ouHsKRCG2I/0xU4Wg5yv l/es4nfPLPzVZmY+5XLge5d0BEE6cEbiftTvkJluFOh8skTpWExBRKZ4s6zIFooQ1JUvhEkrZVg Mj8= X-Received: by 2002:a5d:66d0:0:b0:386:3835:9fec with SMTP id ffacd0b85a97d-38f340676d1mr16163504f8f.44.1739982757897; Wed, 19 Feb 2025 08:32:37 -0800 (PST) X-Google-Smtp-Source: AGHT+IHvKPVsH8+ffaanDYO10CBiRBWfP22KMm4t8NJEN1CoxxHA3ngdYS4eiyl+z/MxGvgghgAGsA== X-Received: by 2002:a5d:66d0:0:b0:386:3835:9fec with SMTP id ffacd0b85a97d-38f340676d1mr16163462f8f.44.1739982757459; Wed, 19 Feb 2025 08:32:37 -0800 (PST) Received: from [127.0.0.1] ([195.174.132.168]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38f258cccdesm18262801f8f.26.2025.02.19.08.32.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 19 Feb 2025 08:32:37 -0800 (PST) Date: Wed, 19 Feb 2025 16:32:34 +0000 (UTC) From: Gabriele Monaco To: Mathieu Desnoyers , linux-kernel@vger.kernel.org, Andrew Morton , Ingo Molnar , Peter Zijlstra , "Paul E. McKenney" , linux-mm@kvack.org Cc: Ingo Molnar , Shuah Khan Message-ID: <86fad2bd-643d-4d3a-bd41-8ffd9389428b@redhat.com> In-Reply-To: <8fc793e3-cdfc-4603-afe6-d2ed6785ffbb@efficios.com> References: <20250219113108.325545-1-gmonaco@redhat.com> <20250219113108.325545-2-gmonaco@redhat.com> <8fc793e3-cdfc-4603-afe6-d2ed6785ffbb@efficios.com> Subject: Re: [PATCH v7 1/2] sched: Move task_mm_cid_work to mm work_struct MIME-Version: 1.0 X-Correlation-ID: <86fad2bd-643d-4d3a-bd41-8ffd9389428b@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Nl1V0gHVe1AcGXrIcE8zzvulmEHsO0G9UN5bFZknbFw_1739982758 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 04F05A001D X-Stat-Signature: ewbfat8rqhzfka41q45triecu7kqfgfb X-HE-Tag: 1739982760-74482 X-HE-Meta: U2FsdGVkX19fhPM28Hap9SYFLSWO7V2a/tTtIBYvf+8QY1iaOf5Gnpe/J+7azHCDZP47zh7HJbuJ8jKmigHU3ab0sijqSGkjyijgADbFZvPKHBt0qdVTxEMa8KTwMsiDlsVH1ArPOmL/GZEIr1x37mGHc+T69wZsRE9NK8/Dr1/NQwdSkMkItUQgp9aC0zi9ZTXwTXpybJb0Ms7lcLdsHa3eRtaGys5/ZqQsjsStsQa7urHIr5mZojh/c4Q+37+OgWopyNeo7AwxgMoFMXaQPMuf1ibJF5UvqAS37vYb/UPRk5tBrvGfmmkWBlFNWZou91LD5EnnawOQ2JB5CD+e0cQAQTwtg4ui45lT9LVdKinssPmzmL4bgrAQD6XbYG1gALwB6o15oI3Nbi3cIXi27ZHZFWI8WvtRkhspySeK44h0+BPisTVogfer+C+l2gpZdS3tRoMg+qCElfk4vuYlbKqp2QTTLdq4LTICYtp7n7lQZAB0g94+IdxjQflohfpFa0wZkteE0/FL8sCmKsF04n/hms0VkIy5oLf5cYZDoff6HrzokLUfZXqW4Rk7ULu/IIPgtpf1jQwVTg2ffSYGCbbpaLMVg1haFFpUmgGxZ+NMOPAE1y/Cks3CC+ZeCBWEVtBJoIogXW51TiCHFv1eBR93VFaFqE8T2RAeWkzMmIhHbpaErIYM+L7m0X215vn4/agqYV9C4OTt4Kuu933PfTWUTGynd8avlbISKQWrKw4DBzDih2zyd2IeM6F26Y/ye7TNLgP/jSyykbvrB6GJeBtGeawn4b/TLkOCEAB9bgb5U8YXsWYZfIsooegD5cn1vB0U8Ay6jrisvRBrC2eQbPVCTNR6WSxvViSN5RzZ35DojhIU0QHz8wV3aEU/2mXgH8wiA/1pt+TiOeB395/kxRi7LfcGRf/MR9jiY9lv4TgS1ZOxoK9p/RO0/8zLDByQphA5GQjaW3p41el0JQ0 zjzafUv5 gjGCpvOb6X2/o0tnYEbLspHlK5r3j/NwWon3jBU8Xzs6FVkXfx6/NltujomWbtLwXIE0wZBVsF6vW/Pj+ZG3X32FNdrfRNdKIYlGNgFMVNh1JDGlDV+gQzGTwB072xVZi8DzyuoxDal4OUvil4wcjRT4RGcmc7i8pJIPqhb/wCqU/n5ZaNaecr4rMW4cFQ4B7XQp1lMQLyIyL9FbGen9Afs4eFoQGW+EUnIj3Z1rHQgFcVO++9GpKFFTVDmZd632eZ/yiO5vOE+aH0IJOoTD4N09wk3ZaeziA6FeeGx6EH1y6hDgTNmVbOe/FVoq27auj77Ix9YISX77CAuNhtzvH28mW/oVOjDSO2c4BF6UxQi+Oh5/cueoN51FQke4pDbgcjdxpptfvVzLYl9LY3ds6MTNrs7AJdVsnJYq4FTiH+5etcJOljgjjSInsA0pcib7ar/wEAcFol8nbxWIYoPY7zWOa5ehAF5OC8jMdZqHpq5MZEb5HPDQaFG7guV6ZoqaZXulgSzD8chJVATxBCxz4nFvi5w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 2025-02-19 at 10:13 -0500, Mathieu Desnoyers wrote: >> On 2025-02-19 06:31, Gabriele Monaco wrote: >>> > Currently, the task_mm_cid_work function is called in a task work >>> > triggered by a scheduler tick to frequently compact the mm_cids of >>> > each >>> > process. This can delay the execution of the corresponding thread >>> > for >>> > the entire duration of the function, negatively affecting the >>> > response >>> > in case of real time tasks. In practice, we observe >>> > task_mm_cid_work >>> > increasing the latency of 30-35us on a 128 cores system, this order >>> > of >>> > magnitude is meaningful under PREEMPT_RT. >>> > >>> > Run the task_mm_cid_work in a new work_struct connected to the >>> > mm_struct rather than in the task context before returning to >>> > userspace. >>> > >>> > This work_struct is initialised with the mm and disabled before >>> > freeing >>> > it. Its execution is no longer triggered by scheduler ticks: the >>> > queuing >>> > of the work happens while returning to userspace in >>> > __rseq_handle_notify_resume, maintaining the checks to avoid >>> > running >>> > more frequently than MM_CID_SCAN_DELAY. >>> > >>> > The main advantage of this change is that the function can be >>> > offloaded >>> > to a different CPU and even preempted by RT tasks. >>> > >>> > Moreover, this new behaviour is more predictable with periodic >>> > tasks >>> > with short runtime, which may rarely run during a scheduler tick. >>> > Now, the work is always scheduled when the task returns to >>> > userspace. >>> > >>> > The work is disabled during mmdrop, since the function cannot sleep >>> > in >>> > all kernel configurations, we cannot wait for possibly running work >>> > items to terminate. We make sure the mm is valid in case the task >>> > is >>> > terminating by reserving it with mmgrab/mmdrop, returning >>> > prematurely if >>> > we are really the last user before mmgrab. >>> > This situation is unlikely since we don't schedule the work for >>> > exiting >>> > tasks, but we cannot rule it out. >>> > >>> > Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced >>> > by mm_cid") >>> > Signed-off-by: Gabriele Monaco >>> > --- >>> > diff --git a/kernel/rseq.c b/kernel/rseq.c >>> > index 442aba29bc4cf..f8394ebbb6f4d 100644 >>> > --- a/kernel/rseq.c >>> > +++ b/kernel/rseq.c >>> > @@ -419,6 +419,7 @@ void __rseq_handle_notify_resume(struct ksignal >>> > *ksig, struct pt_regs *regs) >>> > =C2=A0=C2=A0 } >>> > =C2=A0=C2=A0 if (unlikely(rseq_update_cpu_node_id(t))) >>> > =C2=A0=C2=A0 goto error; >>> > + task_queue_mm_cid(t); >>> > =C2=A0=C2=A0 return; >>> > =C2=A0 >>> > =C2=A0 error: >>> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c >>> > index 9aecd914ac691..ee35f9962444b 100644 >>> > --- a/kernel/sched/core.c >>> > +++ b/kernel/sched/core.c >>> > @@ -5663,7 +5663,6 @@ void sched_tick(void) >>> > =C2=A0=C2=A0 resched_latency =3D cpu_resched_latency(rq); >>> > =C2=A0=C2=A0 calc_global_load_tick(rq); >>> > =C2=A0=C2=A0 sched_core_tick(rq); >>> > - task_tick_mm_cid(rq, donor); >> >> I agree that this approach is promising, however I am concerned about >> the fact that a task running alone on its runqueue (thus without >> preemption) for a long time will never recompact mm_cid, and also >> will never update its mm_cid field. >> >> So I am tempted to insert this in the sched_tick to cover that >> scenario: >> >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 rseq_preempt(current); >> >> This would ensure that the task runs __rseq_handle_notify_resume() at >> least each tick. >> Right, I thought about this scenario but forgot to add it in the final patc= h. We could have a test doing that: instead of sleeping, the task busy waits. Does __rseq_handle_notify_resume need to run in this case, besides for the = cid compaction, I mean? Otherwise we could again just enqueu the work from there. I'll give a shot for both. >>> > =C2=A0=C2=A0 scx_tick(rq); >>> > =C2=A0 >>> > =C2=A0=C2=A0 rq_unlock(rq, &rf); >>> > @@ -10530,22 +10529,16 @@ static void >>> > sched_mm_cid_remote_clear_weight(struct mm_struct *mm, int cpu, >>> > =C2=A0=C2=A0 sched_mm_cid_remote_clear(mm, pcpu_cid, cpu); >>> > =C2=A0 } >>> > =C2=A0 >>> > -static void task_mm_cid_work(struct callback_head *work) >>> > +void task_mm_cid_work(struct work_struct *work) >>> > =C2=A0 { >>> > =C2=A0=C2=A0 unsigned long now =3D jiffies, old_scan, next_scan; >>> > - struct task_struct *t =3D current; >>> > =C2=A0=C2=A0 struct cpumask *cidmask; >>> > - struct mm_struct *mm; >>> > + struct mm_struct *mm =3D container_of(work, struct mm_struct, >>> > cid_work); >>> > =C2=A0=C2=A0 int weight, cpu; >>> > =C2=A0 >>> > - SCHED_WARN_ON(t !=3D container_of(work, struct task_struct, >>> > cid_work)); >>> > - >>> > - work->next =3D work; /* Prevent double-add */ >>> > - if (t->flags & PF_EXITING) >>> > - return; >>> > - mm =3D t->mm; >>> > - if (!mm) >>> > + if (!atomic_read(&mm->mm_count)) >>> > =C2=A0=C2=A0 return; >>> > + mmgrab(mm); >> >> AFAIU this is racy with respect to re-use of mm struct. >> >> I recommend that you move mmgrab() to task_queue_mm_cid() just before >> invoking schedule_work. That way you ensure that the mm count never >> reaches 0 while there is work in flight (and therefore guarantee that >> the mm is not re-used). >> Mmh good point, in that case I think we can still keep on testing the mm_co= unt and return prematurely if it's 1 (we are the only user and the task exi= ted before the work got scheduled). That would be a safe assumption if we don't get to 0, wouldn't it? Thanks, Gabriele