From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3724EE7717F for ; Tue, 10 Dec 2024 14:43:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A787F6B01FE; Tue, 10 Dec 2024 09:43:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A28256B01FF; Tue, 10 Dec 2024 09:43:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8540C6B0200; Tue, 10 Dec 2024 09:43:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 637B06B01FE for ; Tue, 10 Dec 2024 09:43:03 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1422214083C for ; Tue, 10 Dec 2024 14:43:03 +0000 (UTC) X-FDA: 82879316436.14.DB7E524 Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) by imf25.hostedemail.com (Postfix) with ESMTP id 1000DA0010 for ; Tue, 10 Dec 2024 14:42:44 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b="IgJ/ita2"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf25.hostedemail.com: domain of ptesarik@suse.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=ptesarik@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733841760; a=rsa-sha256; cv=none; b=K/ei4XfZ0Uyu8884oJIvzfjPWyFdnToFk3esiOMEsHYwxizYGmTAi6Gr7LUro0fDoiIA79 MpXgoMKUYRtkGCROAhPSABPAkLjfGySp/p/NAfWABmdwqyS3JWuO2oPSlrkBUoEPXA/+Wj amXLjij8smRj5hFvDGcS87mapyJB45Y= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b="IgJ/ita2"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf25.hostedemail.com: domain of ptesarik@suse.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=ptesarik@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733841760; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FDMYkDLMWfYBtP7XGNsvPP0tBQfKbCsqwMG2oWCcg18=; b=rwcpbFL+DWVgQ+Ai71LNSAAKuJYv2TXCZXYXPyevHaqkjBrCoQl2j0o6bviAqGvnRizOex dP9WZ46TBJzKcAyLoF24aL6/HeEFsvj627k0iJK2IHClPgID4k+srOFL0Qt+eGkh6YSLKT J0qO0Psw2sST5zpemaadsgejYcRTOFI= Received: by mail-ej1-f52.google.com with SMTP id a640c23a62f3a-aa6954ec439so22448966b.1 for ; Tue, 10 Dec 2024 06:43:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1733841779; x=1734446579; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=FDMYkDLMWfYBtP7XGNsvPP0tBQfKbCsqwMG2oWCcg18=; b=IgJ/ita2/rwElhHLAZ3du3+kQIlMCgZY3NQQA/wAvNF6cLdXUcC4nZQxsBr7P9PlGE HqvPOgBpSck9M5VM13qjgY+qT3EniplKq/eelFge13j2d8DsKAij4e1k2Jbc9pWUSmbC tMkPy0q0rQyXKn2swoF2REJ6pB61nNEbVJJ7UdWkVieQ7sl8gBDYHH0Fb3njf88EUk8F PE0lok9qkMDMv490mGk6xRPLC+8TxlEI5yrBwrkCfbIpQyRLLqejLPkLOzds4YI2jazo 7+sexjFcqDBr3aGNZj0khRB6b+EpFEFrS18j70bzJqbFxvLFHSIo5K81M6J5FxOCnqIK wkzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733841779; x=1734446579; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FDMYkDLMWfYBtP7XGNsvPP0tBQfKbCsqwMG2oWCcg18=; b=d02SX2zEIrGCi+j/5NqtzorrDDeEyvLhAFfhg5f549k24uaRau1Vj9GeynF1fXFgJy JHMz0Z2h7H8oTsP6RDowRZd+7OAmsEc2MUQcy6s2O3ah6RLmdmNseWqZl6QUXUG71LU+ osRr8chugvVKaKDd0T4eRBDymw2C5XYuxgL6mgzh0gmqh3IwRTrR2kyBZl1CPDM08ZY5 RhjKpJKd3acQt4odPzaLkCojyWssDbDtbv6KUqvTSegCVOjrl3ZW0NWWpLxypemzIGyf pFrtX95Vq41nnPkAbGHnPdUdGNMM9OzQ6mE5kuhZs6MxLisKZQZvYKObIGeN+dY0ZdRB yZAA== X-Forwarded-Encrypted: i=1; AJvYcCVKBjKj268ALPYG3Kuh58Vu+wwXYPhZ8DapX5pvcLnoEE8jzuGz5gvfQSGRPmiWnTQBP2V9KZnbJA==@kvack.org X-Gm-Message-State: AOJu0YzXJrbsFrvsFppFzczHK4/zMnGDThurOVXEvStyPfUWQq+Bp768 66YMSFEIp+ZTP3IHV2uBBZ+7BJ5zXs7NW2KHRLxOYnTlq2GiiUS6uQIwg5JY16EEK0C3MX/XkxO tLhA= X-Gm-Gg: ASbGncsmXKaAa+F0priyoc8zZMKM6BVqLGpPNGhpXbVbXWbYd9gHqrWkZSgPQ0S69Nu tQbqFjwJIsx6VBmcG5dVtBZyyyKrExNPfwzEfjJT96XULNtWZrzyzU4W2UmINw/o/9JYeAjSSy1 t633GCricS/qNRfZpr1S2N7uio4GLM+4IMoT+/aRKc00wu8+B5AGrJDWrv55kmgAPvFqKhD3oyd gmwqkBTYgr0+kY58BtVPL3NOPXg4lphgWnr4y+puY1Oa/7mUeUxCr2AuMXYGLelaIaI9mcW/tOL lGKdqU39EBDuMVSbX1DQAfa1SXXEjUBstj/i6ankkYq3xeZQ675RqSGgDb8= X-Google-Smtp-Source: AGHT+IF+BVSl+ovmy5z/z2cOrnSCU4r6DIF0pijy9YMcMwFTwuXECASsWN0wVhsf39b1va2dJNeN6Q== X-Received: by 2002:a17:907:d24:b0:a99:a6e0:fa0b with SMTP id a640c23a62f3a-aa69f176cf2mr159635866b.5.1733841779120; Tue, 10 Dec 2024 06:42:59 -0800 (PST) Received: from mordecai.tesarici.cz (dynamic-2a00-1028-83b8-1e7a-3010-3bd6-8521-caf1.ipv6.o2.cz. [2a00:1028:83b8:1e7a:3010:3bd6:8521:caf1]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-aa6749d08efsm443993566b.29.2024.12.10.06.42.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Dec 2024 06:42:58 -0800 (PST) Date: Tue, 10 Dec 2024 15:42:49 +0100 From: Petr Tesarik To: Valentin Schneider Cc: Peter Zijlstra , Dave Hansen , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , Thomas =?UTF-8?B?V2Vpw59zY2h1aA==?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner Subject: Re: [RFC PATCH v3 13/15] context_tracking,x86: Add infrastructure to defer kernel TLBI Message-ID: <20241210154249.1260046a@mordecai.tesarici.cz> In-Reply-To: References: <20241119153502.41361-1-vschneid@redhat.com> <20241119153502.41361-14-vschneid@redhat.com> <20241120152216.GM19989@noisy.programming.kicks-ass.net> <20241120153221.GM38972@noisy.programming.kicks-ass.net> <20241121111221.GE24774@noisy.programming.kicks-ass.net> <4b562cd0-7500-4b3a-8f5c-e6acfea2896e@intel.com> <20241121153016.GL39245@noisy.programming.kicks-ass.net> <20241205183111.12dc16b3@mordecai.tesarici.cz> <20241209121249.GN35539@noisy.programming.kicks-ass.net> <20241209154252.4f8fa5a8@mordecai.tesarici.cz> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-suse-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Stat-Signature: 9cu37q41x3thdhsia1z8nbqcoubhxf6a X-Rspam-User: X-Rspamd-Queue-Id: 1000DA0010 X-Rspamd-Server: rspam08 X-HE-Tag: 1733841764-715330 X-HE-Meta: U2FsdGVkX18Ioc1CK5vQUPDAsGqaYvKXbgtCD2x7zBFMPwfBIAzKF/iY/OLTwfnSOijFfgBI+u30tW7+ZYZnNKN85kZ4lBh9fTrYk5v4GRrqJjnCMWf2Oz756Fbx3z4cIGBTuXlPaQLrIzpajfALEImgNQz81bA/UtcBayAq5rztSAUER8qLDuKHh0HrA9/+47QIvXgzUkRugbqn168qpsGUHeCE4J0t9JqAM6AAI4SRgAmmC5WulvsFMPjptLJrVLDKSOHv+6Pmg5ymAeH0+P7/+nbmzmWS7uAS1qDRdvYQQY6H7Mzxn7MpHL+1D3f9GvvKAf2BrqtgntiYKg2TF1hBnH17kzpdSvcLBOn5Y7CmgKvSseDzAjB4+WdeO0WC1zHlH5awd7rkjprrI1iye7Dr4WN6NftMYvSVca1Rgdw77ShUrrkzjvucnK4K2FrBftP2HpoDAIuKWfyP0tHtKZwwiWEDvyJMvm+Whrj2fpbLTub3DgMMNsHNKi79FNkAkuXqG7FjCdyn09aEh5WsZiN7QHPqQHoZuke1gjRj8OcgtTTkr6G26UzbFFPCVtmgC4YdYaTjhX+2wpExMFfJ4VNblEJRZVpwWiBLXBeD2A1aBBeRiVKhPEnpXfOWpzhKeJub9YC+V0OFvR50H45IEGjpySuLUdCqCgpuP/WjBIFopIH8nnFiZ5t7QT6+8oWx8hcCIklHA5aJk/mox8JtVRAre5mKMLiMgqkodJxwu8lUY1Lhrx7C0Du/D0bpaCE6Lm8R3mf3GFQ8C3vDeE3OPJAYq12neNDmGoPDYeoqUp/zGKl3q9QcFH3hi9qm94lJGuE0foNE6Jybr1epQhkzmklw9/zN1JAEH/I5RalhyXVd44H+gHSzhlMB1CNdqq11bhZRzT0PT4WTAqMYGcX2+1w7yIAx7NZFUeCUgfdvkLjJENUkmhRdavnlLT64X2stUmccMg3ZS9rWhn1ntH0 dlMVyzHv 1SNVIYk/Cf8wb9ny2ehNPvZr0JZJuKBZRuvJzUum6RRY/gplhr51gP+eCeMKey77Fci4BVgzca+x/iDL1kKWdnCJVmRT87tnox15DFFrz//CWAHp/kmY+ya4RtQZ0UohhLYPmJzOx++exnltaTDoxSsFZB0koYvi2aT6qFHoZHchWlnpuIivS89L/FdldoGTiKr7J2Rn1BmnJEkCT/r+87Jn5RHzebne2HiiGFIzpUby7ePUKV7bsueDjUML14H71PeSLinpb4sBkuMm5i4B2G4XP4scBXJ8hVVULn/l1N6d/u10nO8YUgUVGK0OwKHO5rCDMdBIkEC1mSqGLQgpSjdh+JRxSoH/JulQbM9Ov7Xml4E8ngsd3qkv2HBdOJJkwegIPSZMsyZKF5MwISx0YGjuC9xdgU8IoPWSE0D2WCed/bCw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 10 Dec 2024 14:53:36 +0100 Valentin Schneider wrote: > On 09/12/24 15:42, Petr Tesarik wrote: > > On Mon, 9 Dec 2024 13:12:49 +0100 > > Peter Zijlstra wrote: > > > >> On Mon, Dec 09, 2024 at 01:04:43PM +0100, Valentin Schneider wrote: > >> > >> > > But I wonder what exactly was the original scenario encountered by > >> > > Valentin. I mean, if TLB entry invalidations were necessary to sync > >> > > changes to kernel text after flipping a static branch, then it might be > >> > > less overhead to make a list of affected pages and call INVLPG on them. > >> > >> No; TLB is not involved with text patching (on x86). > >> > >> > > Valentin, do you happen to know? > >> > > >> > So from my experimentation (hackbench + kernel compilation on housekeeping > >> > CPUs, dummy while(1) userspace loop on isolated CPUs), the TLB flushes only > >> > occurred from vunmap() - mainly from all the hackbench threads coming and > >> > going. > >> > >> Right, we have virtually mapped stacks. > > > > Wait... Are you talking about the kernel stac? But that's only 4 pages > > (or 8 pages with KASAN), so that should be easily handled with INVLPG. > > No CR4 dances are needed for that. > > > > What am I missing? > > > > So the gist of the IPI deferral thing is to coalesce IPI callbacks into a > single flag value that is read & acted on upon kernel entry. Freeing a > task's kernel stack is not the only thing that can issue a vunmap(), so Thank you for confirming it's not the kernel stack. Peter's remark left me a little confused. > instead of tracking all the pages affected by the unmap (which is > potentially an ever-growing memory leak as long as no kernel entry happens > on the isolated CPUs), we just flush everything. Yes, this makes some sense. Of course, there is no way to avoid the cost; we can only defer it to a "more suitable" point in time, and current low-latency requirements make kernel entry better than IPI. It is at least more predictable (as long as device interrupts are routed to other CPUs). I have looked into ways to reduce the number of page faults _after_ flushing the TLB. FWIW if we decide to track to-be-flushed pages, we only need an array of tlb_single_page_flush_ceiling pages. If there are more, flushing the entire TLB is believed to be cheaper. That is, I merely suggest to use the same logic which is already implemented by flush_tlb_kernel_range(). Anyway, since there is no easy trick, let's leave the discussion for a later optimization. I definitely do not want to block progress on this patch series. Thanks for all your input! Petr T