From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14A9ECEACF3 for ; Wed, 2 Oct 2024 17:39:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 94CD26B0440; Wed, 2 Oct 2024 13:39:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8FCFA6B0441; Wed, 2 Oct 2024 13:39:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79DA96B0442; Wed, 2 Oct 2024 13:39:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 53AC46B0440 for ; Wed, 2 Oct 2024 13:39:39 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id BA59B120D9A for ; Wed, 2 Oct 2024 17:39:38 +0000 (UTC) X-FDA: 82629374436.19.694C902 Received: from mail-lj1-f178.google.com (mail-lj1-f178.google.com [209.85.208.178]) by imf13.hostedemail.com (Postfix) with ESMTP id 7743220010 for ; Wed, 2 Oct 2024 17:39:36 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=GqW8ro65; spf=pass (imf13.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.178 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727890711; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Fcw0AUFxqpg2EROaTPkK5EUAJwDwpjeNdNVYK6kfU4w=; b=Zk9xg6RVaPdCwYENm6PqzXJVXxWhDEUJx54uaFzQQxUor6aOJHlUWaHhgOAhx+Tz7xEnnU x8mApD7iqMeT4NliS0NzTM4A/Kzqap8o4CVHtxsjT9dSG5xRX9VMFZzDEHolrSssdh7yEV SYVc0RoHTOoDZ4CCw5h0hAnDRU+mpwA= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=GqW8ro65; spf=pass (imf13.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.178 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727890711; a=rsa-sha256; cv=none; b=BPlmFKCFZKu6qwJRHLLuqDgti41JWfk9QX0Atwx8p0GieTCWf/dPt7HEvtrh63kFm/iGdF P6Fkz9Afqh7AQ+svaIXwooYSuiHuzwtbfxOjX/2XM2L4CEdyHipMsxA0eHdFDhxIFu/hT2 16wogv8fxhdMIXsWdZh/o5BnehQ/b/o= Received: by mail-lj1-f178.google.com with SMTP id 38308e7fff4ca-2fac6b3c220so747611fa.2 for ; Wed, 02 Oct 2024 10:39:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1727890774; x=1728495574; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Fcw0AUFxqpg2EROaTPkK5EUAJwDwpjeNdNVYK6kfU4w=; b=GqW8ro65T+TIeuX/m4epi7tbtOrt8RAOt0Ij77SGAGFCrjVU6jl7uBlhEJyGGADHXB 9WStr1zDAkuhCysx3izWk2xYiX605CBfGMVlSQLkoxxofV/6gopbw5wzY3TnCDqYeKiV UNyc+sR4wl3VlMboFfvTq7rLUbX3a4n58WmRo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727890774; x=1728495574; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Fcw0AUFxqpg2EROaTPkK5EUAJwDwpjeNdNVYK6kfU4w=; b=uKBUK7/2+VIULE+UTm0MBJXOr2ZbXNQVaEyygUhysMVPKgVGWXjTNTSOpHx9ZGUcUd mmJ7a9KbOl59e4ZwhUOR74fln1bFAhbZtznBYYGLIXst3c5y2LX19vAXGrnPEjb48CGK ZrQkM+daSf0pCR83o8qeLoEUKzbnCp29cQO8+eYGnqM62IhcmsdVVIPOSXcYhY5rdHyy VN1bBNRVPY2c1w4Dyklist+f64w0v5rk2xaNCQBii9bQlCzD/SZrgH5L9wjqAmF9UFm1 6BNvp9EtZdlz/l7E0wvyTjWLF0r3c/LdsSi8kAn/LV9mVhwU5bz3QQiWDDtDHI2YgDNc YQFQ== X-Forwarded-Encrypted: i=1; AJvYcCWSpL6fEciK+lJPrt1Oa5ULUeQlRuRfxDFz2V810gZfWlkvWjhSh933ePcX9UDujjIZU1qr3DFUDQ==@kvack.org X-Gm-Message-State: AOJu0YxbZeKebiVot3Vkkhm1M9jC0vObF+cAThEddTT9aTKDZ+dKG8HE 04fNYzQ/iPSFLBqasEYnPN8sLBBJCddfjpj0oq7HLkR4vAYHEI8CwANcZB21LYp5gvu0pQf5/u+ K X-Google-Smtp-Source: AGHT+IH5Yzm+bNMCxjNgMcQIF/6BbTVX9jgvgHXf3yPG5Egp6t/vEVyG187EAsxo9VlIlUdx2NcV7g== X-Received: by 2002:a2e:4e11:0:b0:2fa:dc24:a35c with SMTP id 38308e7fff4ca-2fae103a809mr30379911fa.15.1727890774485; Wed, 02 Oct 2024 10:39:34 -0700 (PDT) Received: from mail-lj1-f178.google.com (mail-lj1-f178.google.com. [209.85.208.178]) by smtp.gmail.com with ESMTPSA id 38308e7fff4ca-2face3943easm10804311fa.24.2024.10.02.10.39.33 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 02 Oct 2024 10:39:33 -0700 (PDT) Received: by mail-lj1-f178.google.com with SMTP id 38308e7fff4ca-2fabb837ddbso808801fa.1 for ; Wed, 02 Oct 2024 10:39:33 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCVo9548Acf8fgXUPZtZ7aHRWc6CYX0UmDQMayGJLCD2v7x78lFdoDIXSl2y7XOJQx8Cjiz/vO12NA==@kvack.org X-Received: by 2002:a2e:a9a5:0:b0:2f3:f1ee:2256 with SMTP id 38308e7fff4ca-2fae10b46a2mr39398791fa.44.1727890772630; Wed, 02 Oct 2024 10:39:32 -0700 (PDT) MIME-Version: 1.0 References: <20241002010205.1341915-1-mathieu.desnoyers@efficios.com> In-Reply-To: <20241002010205.1341915-1-mathieu.desnoyers@efficios.com> From: Linus Torvalds Date: Wed, 2 Oct 2024 10:39:15 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 0/4] sched+mm: Track lazy active mm existence with hazard pointers To: Mathieu Desnoyers Cc: Andrew Morton , Peter Zijlstra , linux-kernel@vger.kernel.org, Nicholas Piggin , Michael Ellerman , Greg Kroah-Hartman , Sebastian Andrzej Siewior , "Paul E. McKenney" , Will Deacon , Boqun Feng , Alan Stern , John Stultz , Neeraj Upadhyay , Frederic Weisbecker , Joel Fernandes , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Lai Jiangshan , Zqiang , Ingo Molnar , Waiman Long , Mark Rutland , Thomas Gleixner , Vlastimil Babka , maged.michael@gmail.com, Mateusz Guzik , Jonas Oberhauser , rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 7743220010 X-Stat-Signature: byq1yqd8bzxefqa6y85drga1e1e57ajn X-HE-Tag: 1727890776-1277 X-HE-Meta: U2FsdGVkX19w5vRoHkLczXm/GIBYUzEhfYaihQ5+6ZwCHriiWzOp+HOYfrRtfOyfguiaoT4ncWuFCSXU4NEuX2H+l7lSdIACeLdY5utHaPW0sUK+09bdxgDxP6kJ4+akgSXQZO23fLLrBrOOjhkhsyMtudV0ktrsWYSaBLktX2YfdMgJwRDpp4N3sXHZrWRFKEKdXKPufC6qT0dJEFVC20Vq7NIWeohzCPDLki56th7sblAA97dj+PU9+EuYCPlduNXn1UW/jMfJkyErvR/jib7RayLWKCnto9FCjJd6zooIJ6DVcs+IUDuxdz0WwpIztSy+khtLrXpYqQfHicAG5UtHH69dDFV90BuzsGV2AmBXNF+kPVtkcFzu9aZZ58u2oWR1JxM26NsMYU1bBNnEKQU1zXbFBTLRrirwdGYa2W/yGn3C8G9S2s7w59TpV0HZWG12uEE05VHzdoQitfjF9yg1MY/tIuJG9KWVjKErjxkpgbYnQxM4YwVl62fEhPlbvcmMNmYSvKORpgGhTQjFx7OaftT8oaaqJjgBTA+H+F40ETpVF8SCDeijBffOSUNyE7zZB9uQJbjrQNDvglDHtbiTRmMavymq+DarSKyftK+EBAZ82bmkhXRxHS7HSUIgkBlVfHFzKK3gIp/pSmHY8TSWIABV6q/u5cE8dCk3/SlpJ8mHT4/2lUVaak3qAV/4j7Z5jVpVKu7hCnZJWSXOo2S6Iq99WXAnnfxeXtrRh6+MjLwHqpTyoSujNtC3tYuujmJPlBn/g3tgfL94WjsPezbhBdln3dv1pRvslioBmMHw2xY0zakhJxxug1RuqyEje0RB4ox1gqSgJIBu9Mekf3eQX9nOQQf5Z/kdIW2AS8pNzhlUOW+Aqu/GHymIZ4shjztimKbjqmklY72ekax8dwaY7/WPudghI7Ya53saywB/o/bc7hAmVqOVj25Qb0yFe3eUCiZh9eMShj/s/Dy dIT6UvJh f2RMhn5R6IOmz/wRgoBz3j549DPKzxLROocLs68P6s7Fg/CAV9VDnt76Yf1PczI27h+YAl/OSUdX8bKJ79+NRjje28KSUbIAWWi8Djr342xVacWQ6YQ/3bFYeD3GPACg/wyWb23/qqgmEgjzVkR5N2mpJ0o4/6wGCCIaFDeUpQg56tDmxBYy0WFNUYCarrPzOE9EhpprcsqJ9qkOoekbnt9iTY1yGiIQCKLgI6DvWL5rmZn2rtXHToFq9JUd+lxUjIZsM62L37vUJQPGZ/FiPYJYeSA/FaMiD22HD6wl6J3eXLurgnY54eO5i4cnMq33ywKSlby6kbH4kasatXgNgVV9gvTMJBP/DHC2rjRCOTOiDB0dDfMFemu/Zu/djlmm4H6rVktDvxBkqt01Ym2QjSjZdxuzr2/qLRG2JrWAKJUNWLvZe6UBkOBsqBnksrm/rUPooVFd6sRofpnjaLBUQapeDrjhPBnbCXU6kpjV1FcGN0rsioOTUBxWxkoxfa2rR81UyH1drXOJmWDAY0fqT6ugetPf1k5buou08X53/60kqqdxtftPxzCgN7kJ1B09c1dSi+jO4aYejkRfei9V8LNKErlDLx/6GDfOwtHoCbS1SBjN8Ly6Fa+yww/gedXyOHd/sE/vBJeHwxVM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 1 Oct 2024 at 18:04, Mathieu Desnoyers wrote: > > Hazard pointers appear to be a good fit for replacing refcount based lazy > active mm tracking. If the mm refcount is this expensive, I suspect we really shouldn't use it at all. The thing is, we don't _need_ to use the mm refcount - the reason the lazy-tlb handling uses it is because we already had that refcount and it was easy to extend on existing logic, not because it's really required any more. The lazy-tlb activation is basically "I'm switching to a kernel thread, so I'll re-use the TLB state of the previous thread". (And yes, it also has a secondary case of "I'm exiting, so I will turn the mm I already have into a lazy one"). But in the actual task switch case, the previous thread hasn't _lost_ that mm, so we don't actually need to take the refcount at all. We really just need to make sure to invalidate it before it's torn down, but we do that *anyway* as part of TLB flushing. (The exit case is actually different: we are setting it up to be lost, although delayed - and the lazy count is the delay). The only thing the refcount means is that we don't actually have to be as careful when we actually *really* get rid of the MM. We can be a bit laissez-faire about things because even if we weren't to invalidate the lazy mm, it does have its own refcount, so we don't much care. But in reality, we're actually very careful about the active_mm _anyway_, because of a fairly fundamental issue: the TLB shootdown and PCID handling that we need to do even when mm's aren't lazy. So we actually keep track of things like "which CPU's have seen this MM state" in all the TLB code. And even the exit case doesn't actually need the special thing - it *does* need the "this CPU is still using this MM", but we have that too as part of the TLB code - entirely independently of 'active_mm'. So in many ways, I'm pretty sure not just the refcount, but all of 'active_mm', is largely pointless to begin with. And if the refcount really is this big of a deal: > nr threads (-t) speedup > 192 +28% then we should probably just strive to get rid of 'active_mm' altogether. Look, at least on x86 we ALREADY has a better replacement: it's the percpu 'cpu_tlbstate'. It basically duplicates all we do with active_mm and the whole "keep track of old mm state" (the 'loaded_mm' member is basically the true 'active' mm), except it has some additional fixes: - it has some extra housekeeping data that the architecture wants (for PCID updates etc) - it's actually atomic wrt the low-level code in ways that 'current->active_mm' isn't So I think the real issue is that "active_mm" is an old hack from a bygone era when we didn't have the (much more involved) full TLB tracking. Linus