From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A77CEEB64D9 for ; Thu, 6 Jul 2023 11:30:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 28AE78D0002; Thu, 6 Jul 2023 07:30:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 261B78D0001; Thu, 6 Jul 2023 07:30:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 128958D0002; Thu, 6 Jul 2023 07:30:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 042368D0001 for ; Thu, 6 Jul 2023 07:30:13 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D033F1C8F6E for ; Thu, 6 Jul 2023 11:30:12 +0000 (UTC) X-FDA: 80980968264.05.63A5892 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf27.hostedemail.com (Postfix) with ESMTP id 8EF2E40017 for ; Thu, 6 Jul 2023 11:30:09 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ak4nCnvl; spf=pass (imf27.hostedemail.com: domain of vschneid@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=vschneid@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688643009; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LuiRGI269Np92lVN6nJmdyMw4xzKbo2Z74mCs1dTD80=; b=Ue33WHRhWL/aaFrT5VYXg1eJPtD8JZZUzAerhOfqCtRwl2CViMYwf6C9V7D5RVJ3hGENPG azHajKE2vqJrEacUA6+o+oJumJvCk52ZZCtWXM4pdiuiPoQSTCouKXRn7LFTH4a45DMhXa BlA9K0scIGUcMaGilElM+mOIf7X6dbo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688643009; a=rsa-sha256; cv=none; b=R+xwKYpBJyukg2PMMZrBbKaZqHUgIiwa6WPrr9nQ13cSBNxdIUvl1CPCLkSSLxiObeZ3Gc Iyp66o+qBpCSpL/QD7lzIUhO6WGFPltO17FLySzqxUk/llQsGQzXyw5BRERDuSs3pyOEiD 0Bdr4pYlJ0Oma52iuuJ2d/vD4ZYvhxY= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ak4nCnvl; spf=pass (imf27.hostedemail.com: domain of vschneid@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=vschneid@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688643008; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LuiRGI269Np92lVN6nJmdyMw4xzKbo2Z74mCs1dTD80=; b=Ak4nCnvlF6M6RG2+Ow5B6dXl9KoZo2Dy+ahxKxdiWqC/3o/qaDzoxnY//1Rj24CWunQ9gK rnezudHjNGpImnoE3RYUzb9qXD3ejM8xqHavPinTiO4DNdCAfLJiXQd/nLIwV7JQ2P7Kdd NfKVcwgVdNc1XKcHkElNThGF27VCcuI= Received: from mail-ua1-f69.google.com (mail-ua1-f69.google.com [209.85.222.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-665-pw4VQ_4LMvmRHztSIgANcg-1; Thu, 06 Jul 2023 07:30:05 -0400 X-MC-Unique: pw4VQ_4LMvmRHztSIgANcg-1 Received: by mail-ua1-f69.google.com with SMTP id a1e0cc1a2514c-794676c5aa2so56845241.3 for ; Thu, 06 Jul 2023 04:30:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688643005; x=1691235005; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LuiRGI269Np92lVN6nJmdyMw4xzKbo2Z74mCs1dTD80=; b=TEnmEwdJkhiAyfpY6JNp8RFNBcIPNnE4BHNmSXRSa8zjvwnnuz93bdQSe1Am+T7R2s Jk4UD1MSKrwwhUsukNmzVWjgkZv6WhHnP5K2HkfISn6kMtiE8PbgcaMxlIzFSdAhoy91 cI00sPr9pPRnymW8/KFgu2eD9U6TW5YIqE8H8usqwZdb4VWfZ61k1lfpi+1QevgoIxkW p6F1or48N8pqpu03NBscv5uW5125eRKwM4ngYOhmAxYwpAr9PDtcEweq8QEDxDgt5i8E cnHdQM+biib4cvlPGM6m6SWg6rwbzjpSKLM0+gFSGBxvP3yltc76Tz+Cwu3Natx2ijK4 QujQ== X-Gm-Message-State: ABy/qLb4nydB8D/L6VgFZLrYTN/lFH8QlRPAuDEtaeXgBiiq72tdpcDW DBenXgoTfuvQ18mmnmTxk2eFbC6BYx0a0lv5ZV6wiu+T+Q4/Yc/ZN/eKtnE6idq85nwXhyzX1JH gv6Ncw3om7HA= X-Received: by 2002:a67:ead2:0:b0:443:7599:d460 with SMTP id s18-20020a67ead2000000b004437599d460mr515011vso.1.1688643005212; Thu, 06 Jul 2023 04:30:05 -0700 (PDT) X-Google-Smtp-Source: APBJJlEOiF8Wo8ldpNlkhzPot2vQELEDG+c71tqhQGpmzmO1eN6k2V6nMi6SJLmlPXMGtcsuw7emcA== X-Received: by 2002:a67:ead2:0:b0:443:7599:d460 with SMTP id s18-20020a67ead2000000b004437599d460mr514990vso.1.1688643004926; Thu, 06 Jul 2023 04:30:04 -0700 (PDT) Received: from vschneid.remote.csb ([154.57.232.159]) by smtp.gmail.com with ESMTPSA id a25-20020a0ca999000000b0063645f62bdasm761336qvb.80.2023.07.06.04.29.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Jul 2023 04:30:04 -0700 (PDT) From: Valentin Schneider To: Nadav Amit Cc: Linux Kernel Mailing List , "linux-trace-kernel@vger.kernel.org" , "linux-doc@vger.kernel.org" , "kvm@vger.kernel.org" , linux-mm , bpf , the arch/x86 maintainers , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , Thomas =?utf-8?Q?Wei=C3=9Fschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: Re: [RFC PATCH 00/14] context_tracking,x86: Defer some IPIs until a user->kernel transition In-Reply-To: <57D81DB6-2D96-4A12-9FD5-6F0702AC49F6@vmware.com> References: <20230705181256.3539027-1-vschneid@redhat.com> <57D81DB6-2D96-4A12-9FD5-6F0702AC49F6@vmware.com> Date: Thu, 06 Jul 2023 12:29:58 +0100 Message-ID: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8EF2E40017 X-Rspam-User: X-Stat-Signature: bbygaim4p8tcart5p98ibb94zb6fdrda X-Rspamd-Server: rspam03 X-HE-Tag: 1688643009-110154 X-HE-Meta: U2FsdGVkX19iSM9VYcWwYkVMIXkT1CT/PUXhPdl+7TLuL19zaoMqt0nlvyr02C0mcDbkN/nUo9hrVXEgzTY1qQeRCh2OWECwPP7+gGLhEJdaDzS3HCfRbEdjZm56LSS5kQlewKDSAIlOKfvORdGaqk64X7f1cYmefA6caEl+BkS1/Oydl50obInjpkqvudSInEA27tXc5ZLXgLdsnZi51Nhue3G/9gjuRHbxkGI+wVm/fRXKAl0xSSiweURIDaXPYDc+RfqWzO015i27MzIi7EA2harjnMjAZc8KQ0Ui7HwrmTkspritj2RSQf+nhQJmZ0VczXQuDmKE0GKNHuAqU8kELxgSU/nTS3CrJui6Me0yPTKgj16VocfF1Om456eBxlTMAjqBMZReSNpIHx+gpiDBMC+wY9d2bR+m/MWzLMByk1rTmIR+HNAhGyzp4uck0DjAcmDDd0zc/BLkVAhcY3kaXt7jYzWaFrJ8Yfug0KOReTp40CQnVpJBhhmaUKwVVkQt4cBX6uq8EEXJPRhUj1N797Fqoe/aD2mnx+0ZCBhnvYnj6SIm52ildaqJTObW+xe4S9SFqjT6YNzgWWuOGsaLwgZ08BfNENwJe5VTqyzxlNpPjsaWyc2jLxB6IW2TWPFq1DCNNeKoNcCxVo2mbTE7ijQuEjwrBmaG1JiAEhxvO6CCkti2GGgKF+3eGnCNYGU1UuoPZRJ28tFs5UZAlXmNqTfHgQCMrZu40UqAgt+3+y8/a8rmh1GEWwvvwDYhkET/Y3I4MC9ACwsh232xva9PGLnr/cGxmidMmh5mFq5606+T/OEfiYzbDX/15bPXKIs5eUaaa6iT7MWa4HGEJ9yL2IqpClZ2P2/Cm5uwex6o+eZk8O1qA8v/TPv8bWlZsqi+ji7Ksv50g+7+wfRjLGVy7g6yxvyBaKHZnetURpq2WN0xlcfSg8Zz3iqEoNuUW0dMIaOA57MXwYxzyCn e6aohbzr FXXA1+hyjMlHhOEbDjD1jFzgKSwaiYQJ+CnjBvAMsgO4fRWUAykOaW5nQgbEz34NzfVNhNFh5EnganfpdfzHpiOYb0nJzUSEQymznQdil7BFiPcmEJYAlsv4yuqq3BrvmxpF32wSPuVvx36GXPQ1XRq1cKcHDxfnqdP9pdM8R+a/kn9UNyNqCQasUMBDPT0vIHhppJ7lkRKWRwjm9jM2YLBsWZHilv/tIK3w2uw9vIeH0AISur92k5wWhT5edoGrF3cqzhkAhveTvkUnMOZYT+coJZDkZSvhsXt+iRuCRG/5VW1pFTQTaN19mKxQam8LqYWB2tYGPz9zbTTnh1KlG5Xi3khM7Terk+YghxGm7rRtuteW4s2gGnmjkWg9dg9UEYKfZrlt4BuIrYX9w9raddFk0AE0asZdj/TIGD8GDoZrWYsO4MNAOdeQuaA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 05/07/23 18:48, Nadav Amit wrote: >> On Jul 5, 2023, at 11:12 AM, Valentin Schneider wr= ote: >> >> Deferral approach >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> >> Storing each and every callback, like a secondary call_single_queue turn= ed out >> to be a no-go: the whole point of deferral is to keep NOHZ_FULL CPUs in >> userspace for as long as possible - no signal of any form would be sent = when >> deferring an IPI. This means that any form of queuing for deferred callb= acks >> would end up as a convoluted memory leak. >> >> Deferred IPIs must thus be coalesced, which this series achieves by assi= gning >> IPIs a "type" and having a mapping of IPI type to callback, leveraged up= on >> kernel entry. > > I have some experience with similar an optimization. Overall, it can make > sense and as you show, it can reduce the number of interrupts. > > The main problem of such an approach might be in cases where a process > frequently enters and exits the kernel between deferred-IPIs, or even wor= se - > the IPI is sent while the remote CPU is inside the kernel. In such cases,= you > pay the extra cost of synchronization and cache traffic, and might not ev= en > get the benefit of reducing the number of IPIs. > > In a sense, it's a more extreme case of the overhead that x86=E2=80=99s l= azy-TLB > mechanism introduces while tracking whether a process is running or not. = But > lazy-TLB would change is_lazy much less frequently than context tracking, > which means that the deferring the IPIs as done in this patch-set has a > greater potential to hurt performance than lazy-TLB. > > tl;dr - it would be beneficial to show some performance number for both a > =E2=80=9Cgood=E2=80=9D case where a process spends most of the time in us= erspace, and =E2=80=9Cbad=E2=80=9D > one where a process enters and exits the kernel very frequently. Reducing > the number of IPIs is good but I don=E2=80=99t think it is a goal by its = own. > There already is a significant overhead incurred on kernel entry for nohz_full CPUs due to all of context_tracking faff; now I *am* making it worse with that extra atomic, but I get the feeling it's not going to stay :D nohz_full CPUs that do context transitions very frequently are unfortunately in the realm of "you shouldn't do that". Due to what's out there I have to care about *occasional* transitions, but some folks consider even that to be broken usage, so I don't believe getting numbers for that to be much relevant. > [ BTW: I did not go over the patches in detail. Obviously, there are > various delicate points that need to be checked, as avoiding the > deferring of IPIs if page-tables are freed. ]