From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8770FC7EE23 for ; Wed, 17 May 2023 11:26:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE029900004; Wed, 17 May 2023 07:26:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D9014900003; Wed, 17 May 2023 07:26:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C30D3900004; Wed, 17 May 2023 07:26:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B1667900003 for ; Wed, 17 May 2023 07:26:28 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 86D101C71FC for ; Wed, 17 May 2023 11:26:28 +0000 (UTC) X-FDA: 80799518856.09.76AEECC Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44]) by imf27.hostedemail.com (Postfix) with ESMTP id 971C940010 for ; Wed, 17 May 2023 11:26:26 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=gHIYIFcb; spf=pass (imf27.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.44 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684322786; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7o3KAIWm/mO8J93CHn5LsiTq/SPNzAT8j8L3cNwZHUA=; b=6l43TbEkP3TA1wPUsu4f4U4Sg4fH+x4YNt1XXMcIlNL8otvqjagQSd3U29XcoMCshFCIsk 7PpkssnaJLjGURi2+7kIZW2Su+fOedjp8Op5b8Z5O6k247FYNTmxLcPyn+8eoosaQTsd2E ZjEnPzIpbkP0dkZPghPM6Bcb8I9ct/E= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=gHIYIFcb; spf=pass (imf27.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.44 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684322786; a=rsa-sha256; cv=none; b=j+nD+YOC3cEQK2hwEsN3ZsPndd6Wc0cjVi2BPkGHmHq5ihVp6ulZB1XShxsGefg5tNEOBj ABm3SdydHRiYt29eJjAbvzkQr/bE9j2nSvR2yUKkIGKPJKCDVOnYERH+16atj+D/KaPa4T Gm407eFSugqGhntjwKGQcED47Z2CcsA= Received: by mail-lf1-f44.google.com with SMTP id 2adb3069b0e04-4f14ec8d72aso869876e87.1 for ; Wed, 17 May 2023 04:26:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684322785; x=1686914785; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:from:to:cc:subject:date:message-id:reply-to; bh=7o3KAIWm/mO8J93CHn5LsiTq/SPNzAT8j8L3cNwZHUA=; b=gHIYIFcb7iGNzSeiIppsvSeluLjEEkFzxKfWPlhEkTiqU/8hwwFPGJwIf0Y5f3ao8l bRO2HsOUR5+cps0t/C3yH4KXfAp2qd3QE7PHB60F5DsTXv/sBdvuCZYb9RyPP2T9PT/G DWj+C2v0aSRJiSvEQuVKhd4RS4YcFn9waSgRiVvHqOt5NwZBNu54ADo2oIhqi5+JO8xu 1vHoRwzmySf4znQmmJPISAYZgcHJ/YJaQ+AvvClWeN5hhJ8yYfWffsnysleWQ0n/keyJ 3KItFU1YG6ia3One2tkXng36VX9E1l2jtHSXrOY2LajgN6BS+vivujd3vjILvVRM+tCh EacQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684322785; x=1686914785; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=7o3KAIWm/mO8J93CHn5LsiTq/SPNzAT8j8L3cNwZHUA=; b=JpAl1yiFIygX2UWWVhI/oAODbIgN/ZyS9O9ePTrFaR60bkOLaxxzB4OCwY81aoTsny cDFlVKCE8H4tsI8/tk3j1wdkH8k5zQTP6H5tgOc6ohNep0p2IuitHfWG7bpOComfMGg4 RY5/DCCfU55ShSxLrWRnGwj+/5pDcq05LHikpBrW6+05eUrdGJgxEHbEWofn/pmmwcUQ Gnch33E+GoP28bnoB2MdCRlLDZzTyU0vpCLJCoOs5IdFKTfqb0Tj2iXNwQfrxGrFt2Kg 6Xnni+FdJkC3wxCROEsx44y9cIm4OBH+TmY5jhC59b6FPHUdgzAFZ3Ewof34RGyVxj8p KvZg== X-Gm-Message-State: AC+VfDxynxiqLxNoVszNnTwjhvwieo1FNC/1FOtucihnkyy5vpowvDQB GUJN5/3Mdx0BH6K32UPQrpU= X-Google-Smtp-Source: ACHHUZ6CjkmsgzOLbbheRwXxILxoLuRencv8dSvdo6IBSvfX+9DK1x/dc+PYZeFDZBTihawy1HyODA== X-Received: by 2002:a19:c207:0:b0:4ea:fa78:3662 with SMTP id l7-20020a19c207000000b004eafa783662mr122947lfc.39.1684322784356; Wed, 17 May 2023 04:26:24 -0700 (PDT) Received: from pc636 (host-90-235-18-147.mobileonline.telia.com. [90.235.18.147]) by smtp.gmail.com with ESMTPSA id f20-20020ac251b4000000b004f14fa44403sm3310864lfk.283.2023.05.17.04.26.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 May 2023 04:26:23 -0700 (PDT) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Wed, 17 May 2023 13:26:21 +0200 To: Thomas Gleixner Cc: Uladzislau Rezki , "Russell King (Oracle)" , Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Lorenzo Stoakes , Peter Zijlstra , Baoquan He , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org Subject: Re: Excessive TLB flush ranges Message-ID: References: <87r0rg93z5.ffs@tglx> <87cz308y3s.ffs@tglx> <87y1lo7a0z.ffs@tglx> <87o7mk733x.ffs@tglx> <87leho6wd9.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87leho6wd9.ffs@tglx> X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 971C940010 X-Stat-Signature: mhqtju9cxpjeafq4g1oo3aw7pctzj6ch X-HE-Tag: 1684322786-793476 X-HE-Meta: U2FsdGVkX18cY1wxDBRpORDTS8IOx2HHTpiIjLZBl0lzOUcGJq1NHx3dVCjWafgiZgUIbKpp1kXOs+caAOLDo50Q9NtTH4RuD6TXpL6TGJUrKl2WNjyd+VjhLqnmeKfjvI6kmtUOy//SLF0LQgwGkxxXQ1foKJ+yJ9pQlt2aUolLuSEJO6iY8veYNY6WDPOZPKWwkXyD6a3v6qesTDDwndm7RZXwvlJCvtrrIaXOxYuoh//tSIBDtaZK5kp4wzd/lldo8TmJObEgT8jIkrCxcvVCGtPC5HPk6id/kIpoaNvpLHidCwsUUx9eOwuBYlayryJx16NYpXdQ+w6wCDLr6LniiLSD43q2yZx8A+H4QTh2DIjGA2MuP2g/L4BWs6e5QkDYovMwwZ4etEzVNggNcq1VvPLLdAehZHV+rTHf1xGL3blsp6TEoeeiI/pW+NEimODg3JQDzF7WQM32C4v/KNQYaoMwt9nlxoJg/P3BIM3CKjNs6HWmktJqCgHa7OBe5jqX524YwwB0EbzXDRMCZ1UEkCkKuS1hiPiX/i/6VjufkdwP5zuYy8JvX4EC2VhzfNyfWu54XYR/rg+KxfNiYkHtVfo/Kc0SsMB5UrwLrTRgRDXAZHlKbrzZQTH/O8UrAIIb8J1izEnpjdMk2RUfjPIYbhQKaxt3fUiD2XKjx4b7IYuM9w5+1GCYtR2O9gfvoTfEj2PDSFL9tk0M7ShMh/6/S4h5QRB0JpUrbgrQsJlVN0htrov1qIvxu/OinxJNbgh/vVPJDnmWA8S39szkmDDms7pQQyUSaZ2ulmsnSFbQq1MAVVR94+BnpgRYU4C6X9nzoIzFqjk6+HGuoLLJXvUe9o/YaWWvdO/c3bFaoUgRtXFPDBtFQ5SSHOPIl5SmAMwCwOjYGgBt+DTNxZpqRVCqiA89nvjzWG0Koc8tcb5CbDK0fymHkv8FieUcIc3zBWPxWqye5fuNDlTZg6h TcARmIiT 3HDEo3wNiF27VoRGm8IJ+HjReCqB6dVNrFWTJGlgeO+6tadL2wL5R6agesDdLBKshYiLQ/lF/p1AJGPFv4YA8uvbuXaRr1OPqjzUe+zVpu4CyHrPJhsgaHuNJNlqafLvcF8qDq+P1iEhiEJOhPr7nfBKn+rySya6hNApOYTfUXIYwCuu6oEk32CTf6wDx1XUQqUnCEdZI+sSQp4uhxM0pRR5hRKclxhB8wlzaa/K0qQKfqK61teUG4aJvJm8l4O9rKiUwjih0jH0lDqW8uQgiMmoddpWtEmHIXU5XP5/jBTB53LzsECUPEaCYsqoGPaDkvhb5R8CIrMnVQqH5fJZadojW9sgLITPf6vLFg8pw2PkDNwNXdXkfiMpmAcWZg8dajR3G6LY+n0Km2YUp/UNvd8rE4gB/d6S5nuZHfrLIPVBYZKhtE8NjEzSmUw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 16, 2023 at 07:04:34PM +0200, Thomas Gleixner wrote: > On Tue, May 16 2023 at 17:01, Uladzislau Rezki wrote: > > On Tue, May 16, 2023 at 04:38:58PM +0200, Thomas Gleixner wrote: > >> There is a world outside of x86, but even on x86 it's borderline silly > >> to take the whole TLB out when you can flush 3 TLB entries one by one > >> with exactly the same number of IPIs, i.e. _one_. No? > >> > > I meant if we invoke flush_tlb_kernel_range() on each VA's individual > > range: > > > > > > void flush_tlb_kernel_range(unsigned long start, unsigned long end) > > { > > if (tlb_ops_need_broadcast()) { > > struct tlb_args ta; > > ta.ta_start = start; > > ta.ta_end = end; > > on_each_cpu(ipi_flush_tlb_kernel_range, &ta, 1); > > } else > > local_flush_tlb_kernel_range(start, end); > > broadcast_tlb_a15_erratum(); > > } > > > > > > we should IPI and wait, no? > > The else clause does not do an IPI, but that's irrelevant. > > The proposed flush_tlb_kernel_vas(list, num_pages) mechanism > achieves: > > 1) It batches multiple ranges to _one_ invocation > > 2) It lets the architecture decide based on the number of pages > whether it does a tlb_flush_all() or a flush of individual ranges. > > Whether the architecture uses IPIs or flushes only locally and the > hardware propagates that is completely irrelevant. > > Right now any coalesced range, which is huge due to massive holes, takes > decision #2 away. > > If you want to flush individual VAs from the core vmalloc code then you > lose #1, as the aggregated number of pages might justify a tlb_flush_all(). > > That's a pure architecture decision and all the core code needs to do is > to provide appropriate information and not some completely bogus request > to flush 17312759359 pages, i.e. a ~64.5 TB range, while in reality > there are exactly _three_ distinct pages to flush. > 1. I think, all two cases(logic) should be moved into ARCH code, so a decision is made _not_ by vmalloc code how to flush, either fully, if it supported or page by page that require list chasing. As for vmalloc interace, we can provide the list(we keep it short, because of merging property) + number of pages to flush. 2. It looks like your problem is because of void vfree(const void *addr) { ... if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS)) vm_reset_perms(vm); <---- ... } so, all purged areas are drained in a caller context, so it is blocked until the drain is done including flushing. I am not sure why it is done from a caller context. IMHO, it should be deferred same way as we do in: static void free_vmap_area_noflush(struct vmap_area *va) if do not miss the point why vfree() has to do it directly. -- Uladzislau Rezki