From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EBA6C38A2D for ; Thu, 27 Oct 2022 07:27:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ABDA48E0002; Thu, 27 Oct 2022 03:27:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A6EA38E0001; Thu, 27 Oct 2022 03:27:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9364F8E0002; Thu, 27 Oct 2022 03:27:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 805E98E0001 for ; Thu, 27 Oct 2022 03:27:54 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 50AEA1C5D26 for ; Thu, 27 Oct 2022 07:27:54 +0000 (UTC) X-FDA: 80065900068.18.EBADB21 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf29.hostedemail.com (Postfix) with ESMTP id 6F062120019 for ; Thu, 27 Oct 2022 07:27:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=jIdXoveHpdJhxsDFwLhwCoDvMUr3LOjoTAAeFN1VZDQ=; b=g65dTzP5bb0fhavL07nyRH67sL JxPUwHx33jCezjgWMyZ/zIRcKV747ZUDgHTkXrrstMI3mR4lR+UnZ52hRh6EIX6CPKUMlEzYtEcZf ENNDp2T908EBfh0CapbhhRzQAaRii980WW83/tj0rno7AGTb6mAHOtI8dlestPf5eTzLYVIhq6zlS 3/OSWHsnT0BcatXfMGrcBq/n/r6KUuFxE4Zk+T5RYoYPQx40ydg7WLYbUC8F/Ks6NSt30Pv0ulsXM YZPImpEsJerxzOqjdaLjzzxyelWx+ZyGdUJZD/MnGPuLDZdYQ+qQFYZ2DyTXZmzW7/8zWVax3OyNo CZ+tbLOg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1onxIs-0000zr-3v; Thu, 27 Oct 2022 07:27:50 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id E350730029C; Thu, 27 Oct 2022 09:27:43 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id C96D22012B90F; Thu, 27 Oct 2022 09:27:43 +0200 (CEST) Date: Thu, 27 Oct 2022 09:27:43 +0200 From: Peter Zijlstra To: Nadav Amit Cc: Jann Horn , John Hubbard , X86 ML , Matthew Wilcox , Linus Torvalds , Andrew Morton , kernel list , Linux-MM , Andrea Arcangeli , "Kirill A . Shutemov" , jroedel@suse.de, ubizjak@gmail.com Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment Message-ID: References: <20221022111403.531902164@infradead.org> <20221022114424.515572025@infradead.org> <2c800ed1-d17a-def4-39e1-09281ee78d05@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666855674; a=rsa-sha256; cv=none; b=VeJxA2mnj4uBCKzGRnctrk1DyhfSRQVcFJ46PVjlJJG2fAsUu0Ta+pJ2Wl4856fIlfPVEB MtgMJ2mi99nRrkEWPU40f7k/CXz7KrAJNAEdagTMHtipc8a6PkEl2jELvj9LlV7QhsFokD UY5Zbs1MpTzNVEtMrYqCkpPCliq6dC4= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=g65dTzP5; dmarc=none; spf=none (imf29.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666855674; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jIdXoveHpdJhxsDFwLhwCoDvMUr3LOjoTAAeFN1VZDQ=; b=JbMEN+fGkFv1Dg4/s6L2cFVHRnpO/4miYmoIMro8NvM3qAOqnIdXuzsN4KJDn5I9wTZLzb w/ZqtEagMJREzPClBVVqFXGWqX1GUOZlfy0nqXj+q+cyOvjDiAvwuUbdmNa9jJV/7TOJDy 0WdWxvYfiW2NtAluOSxomKQW11rocZY= X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 6F062120019 X-Rspam-User: Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=g65dTzP5; dmarc=none; spf=none (imf29.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org X-Stat-Signature: wo7e3qsiaeab6tatw8wfojneqac7hub8 X-HE-Tag: 1666855673-175193 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Oct 26, 2022 at 10:43:21PM +0300, Nadav Amit wrote: > On Oct 25, 2022, at 6:06 PM, Peter Zijlstra wrote: > > > if (!force_flush && !tlb->fullmm && details && > > + details->zap_flags & ZAP_FLAG_FORCE_FLUSH) > > + force_flush = 1; > > Isn’t it too big of a hammer? It is the obvious hammer :-) TLB invalidate under pte_lock when clearing. > At the same time, the whole reasoning about TLB flushes is not getting any > simpler. We had cases in which MADV_DONTNEED and another concurrent > operation that effectively zapped PTEs (e.g., another MADV_DONTNEED) caused > the zap_pte_range() to skip entries since pte_none() was true. To resolve > these cases we relied on tlb_finish_mmu() to flush the range when needed > (i.e., flush the whole range when mm_tlb_flush_nested()). Yeah, whoever thought that allowing concurrency there was a great idea :/ And I must admit to hating the pending thing with a passion. And that mm_tlb_flush_nested() thing in tlb_finish_mmu() is a giant hack at the best of times. Also; I feel it's part of the problem here; it violates the basic rules we've had for a very long time. > Now, I do not have a specific broken scenario in mind following this change, > but it is all sounds to me a bit dangerous and at same time can potentially > introduce new overheads. I'll take correctness over being fast. As you say, this whole TLB thing is getting out of hand. > One alternative may be using mm_tlb_flush_pending() when setting a new PTE > to check for pending flushes and flushing the TLB if that is the case. This > is somewhat similar to what ptep_clear_flush() does. Anyhow, I guess this > might induce some overheads. As noted before, it is possible to track > pending TLB flushes in VMA/page-table granularity, with different tradeoffs > of overheads. Right; I just don't believe in VMAs for this, they're *waaay* to big.