From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C7B2C77B7A for ; Fri, 19 May 2023 16:32:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 40496280002; Fri, 19 May 2023 12:32:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B471900003; Fri, 19 May 2023 12:32:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A3BA280002; Fri, 19 May 2023 12:32:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1B6DF900003 for ; Fri, 19 May 2023 12:32:48 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E6AAC80ADC for ; Fri, 19 May 2023 16:32:47 +0000 (UTC) X-FDA: 80807548374.25.1059F2B Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf16.hostedemail.com (Postfix) with ESMTP id 9B0B7180017 for ; Fri, 19 May 2023 16:32:45 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=L1VTbndi; dkim=pass header.d=linutronix.de header.s=2020e header.b=wWy5TJaa; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf16.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684513965; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LwglNPm3Wth5Q6CY1De6mYXGMuXuNcELSG+zCo8P0To=; b=qKCNxavplzy/XTuPXfVA6Ipo6z0yjsOZ7udR8V5yo8dLdlK5izH4CT24ZiYavPeE4JqWLk aRTBOeK2hQASJQZil0u+98fu2nfKn7PDlOs7qxv0If1Y2YTmEfX+mvFikDXKuCV99Dg0KK 3/G0qKQpa5wUviU7FhagduackrJke4k= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=L1VTbndi; dkim=pass header.d=linutronix.de header.s=2020e header.b=wWy5TJaa; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf16.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684513966; a=rsa-sha256; cv=none; b=dO8pGzZghCUoGTq+ZNxcPdvTlREh/4KA9L1Qzk9Y/yMCqkhY+njypT+HszY6W/1EjYN4aE 18lOiKsWOsJcO5Mdhu/LHUqa+yZ56gQ9NTfPxRXUmg1du3XI2bneERWiw1LXY8d+D2xdS4 L9TGhpgQHRBWCbImHpQujPEKDYkTViE= From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1684513962; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=LwglNPm3Wth5Q6CY1De6mYXGMuXuNcELSG+zCo8P0To=; b=L1VTbndi6M9al5kpZy1k4w0+Q61kZH6OYXbzgywDAhmP6TGxu8CMiuw34lErenJ55lwu/d OUqmUlQBnl2OXlPNcjr6XSPaxFuH4fnbi8vNrqrvHgDlVV7HoR+ixd2kUyClqgnW0EwksR owIrtC3WhZ+1tiSo5uAfRa2E65E7VBGALvDdPjzXsOXL+r8v0L7X2k8jUpXrvEMlpRMPJa OQSvQQU2homVMJKrUXwV0+EVnmeklH2d/i0ElcVVpC51Oy1G/lRgXsttAFAXVSByAV0bqN HlCv5mdge8BygmqTXAP1WX0Bsyo59T7ZJ2y4ZnpPEvU4rtNxEQbe02NYmnxsrg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1684513962; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=LwglNPm3Wth5Q6CY1De6mYXGMuXuNcELSG+zCo8P0To=; b=wWy5TJaaRdThpYwYx/Jc8PfdA0En2K9Bx8MAboc3hPsxpVKr0rVHDSzxlkPOYGM9Ing8hP fEoud5JE4z4tgEDg== To: Uladzislau Rezki Cc: Uladzislau Rezki , "Russell King (Oracle)" , Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Lorenzo Stoakes , Peter Zijlstra , Baoquan He , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org, Nadav Amit Subject: Re: Excessive TLB flush ranges In-Reply-To: References: <87o7mk733x.ffs@tglx> <87leho6wd9.ffs@tglx> <87o7mj5fuz.ffs@tglx> <87edne6hra.ffs@tglx> <87lehk4bey.ffs@tglx> Date: Fri, 19 May 2023 18:32:42 +0200 Message-ID: <87fs7s46z9.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Rspam-User: X-Stat-Signature: t9pxc9jprboyygs15r7fzu1mgqf8gu59 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 9B0B7180017 X-HE-Tag: 1684513965-67585 X-HE-Meta: U2FsdGVkX195f2ZWPuaFypL/0SzFBaNslGGeWbe1DI0jzP8AI8gRs2t3WIKOGPnptsNJOy081PNGh7Av51x5Zccg2p1pjlr2aGXnr9ADUNRBOmq4ZrZ1N+4njUQYoWiY7+Kw+gyFZb93zULocR71HSMsU3BPyMUDMBY8ZH+73GJeKpTne49+g9+ZdRH++gESxXJAL6oeTvvkHvtJjC1Ynt2he0TJGatHD5J2vaMbRsxze1nfpxgbUPCFXFvMGI0mbsPRMITouAgahpMVXX3+kLv1jjNGKeB/FwXPNOklgyBAP9Z3i8k7EtZiuB8c6XCzvp1pXbMTI98LtFp+Q9QmMrRx/0d7s9khbQWnBx84Fw/hFsqGog8rcbqfr3BSX1nfa6HYKcq9RFerP28suTrpPl3KKkg6KJfd0LPrQiELLGJaHmwhhawJZV+JfASkq6vzauXenc68kLJqyvfMfC5wAUxHqGA+SxKSNaR6Weq2xbmflO9ktXAbwKEgz9n6MMg7NETxGd27mfmjoppHwOapPa5FG2tY4/dD33aqy46xhhKjtFui1/QY2/JFc4SNgxV27fAVkm7h6Pvz3ZflT35D6e9KJDmQvKa4QR79PidHrHhyhF6AIJZigYeKqqVd9IVXCcPyR6brQQjKa8VkIGRvkjYTqcisWh7BhmDrnSzS2mZNjHq0H36J0E/gVnRIPolD5PjqgIEAgnvVj8vUwiCpx9DM8kb3iv8GbJ2q00cSC9VYnzGcmM6WTtvGDzJujzZZVA2eWwiqdqCMjM++I+8yIhnRvVQaxLwVWOGSV+VANQ8p/I+KVWK+1X5kf1laK+MOe5jTl/BMc58d/mScEPfo13p0/JPQtwFjEPNcB10I4lyGBfSdCArWDwcL4wECiYp3HpPj/y/ihzU58i3I2iOfXwdftIJnQwHMz4XWFhCf1OsB6pEWDMIOvhbOLbPPahTrt6ycQHE5SJd5K/6074d 2LLn9PuE b+e2Mu7aLFaPQqSQ4k+gcKkUReRuaf5HN9gqr01ZfOJ44SVZyEuUgPQ/tpGYHhm+OAvucTT5LbZ88AfVp8N/QlrqlUJrRgMmdZp1pfBvkntckEpBEie9HDpkLWw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, May 19 2023 at 17:14, Uladzislau Rezki wrote: > On Fri, May 19, 2023 at 04:56:53PM +0200, Thomas Gleixner wrote: >> > + /* Flush per-VA. */ >> > + list_for_each_entry(va, &local_purge_list, list) >> > + flush_tlb_kernel_range(va->va_start, va->va_end); >> > >> > - flush_tlb_kernel_range(start, end); >> > resched_threshold = lazy_max_pages() << 1; >> >> That's completely wrong, really. >> > Absolutely. That is why we do not flush a range per-VA ;-) I provided the > data just to show what happens if we do it! Seriously, you think you need to demonstrate that to me? Did you actually read what I wrote? "I understand why you want to batch and coalesce and rather do a rare full tlb flush than sending gazillions of IPIs." > A per-VA flushing works when a system is not capable of doing a full > flush, so it has to do it page by page. In this scenario we should > bypass ranges(not mapped) which are between VAs in a purge-list. ARM32 has a full flush as does x86. Just ARM32 does not have a cutoff for a full flush in flush_tlb_kernel_range(). That's easily fixable, but the underlying problem remains. The point is that coalescing the VA ranges blindly is also fundamentally wrong: start1 = 0x95c8d000 end1 = 0x95c8e000 start2 = 0xf08a1000 end2 = 0xf08a5000 --> start = 0x95c8d000 end = 0xf08a5000 So this ends up with: if (end - start > flush_all_threshold) ipi_flush_all(); else ipi_flush_range(); So with the above example this ends up with flush_all(), but a flush_vas() as I demonstrated with the list approach (ignore the storage problem which is fixable) this results in if (total_nr_pages > flush_all_threshold) ipi_flush_all(); else ipi_flush_vas(); and that ipi flushes 3 pages instead of taking out the whole TLB, which results in a 1% gain on that machine. Not massive, but still. The blind coalescing is also wrong if the resulting range is not giantic but below the flush_all_threshold. Lets assume a threshold of 32 pages. start1 = 0xf0800000 end1 = 0xf0802000 2 pages start2 = 0xf081e000 end2 = 0xf0820000 2 pages --> start = 0xf0800000 end = 0xf0820000 So because this does not qualify for a full flush and it should not, this ends up flushing 32 pages one by one instead of flushing exactly four. IOW, the existing code is fully biased towards full flushes which is wrong. Just because this does not show up in your performance numbers on some enterprise workload does not make it more correct. Thanks, tglx