From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D566C77B7A for ; Wed, 17 May 2023 12:15:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF49D900004; Wed, 17 May 2023 08:15:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA4BF900003; Wed, 17 May 2023 08:15:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C6C68900004; Wed, 17 May 2023 08:15:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B8540900003 for ; Wed, 17 May 2023 08:15:20 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 90A14C051C for ; Wed, 17 May 2023 12:15:20 +0000 (UTC) X-FDA: 80799642000.01.055947C Received: from mail-lj1-f172.google.com (mail-lj1-f172.google.com [209.85.208.172]) by imf25.hostedemail.com (Postfix) with ESMTP id 0A0AFA0004 for ; Wed, 17 May 2023 12:15:16 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=joVnh5EX; spf=pass (imf25.hostedemail.com: domain of urezki@gmail.com designates 209.85.208.172 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684325717; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=X1lQxUyzM4Rd/o4Wvs1eOyZlxtftGtmBvO8kh4mwGQw=; b=aY3rGZDI8CjNZCYlk181FUg1PAC49U4rcpL236W5zlMS0HfXg1AdsyqroyDvu5/5gQn0Aa s5tYFLOI08cgZ0XzB5Jr4v9yY14XA/XvybqL87Zmxf50tRIlxKvo5FzgHAIesNDcD2CaXS e20pMT2fF1FQofEs3fYHPrdxeU3dhfw= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=joVnh5EX; spf=pass (imf25.hostedemail.com: domain of urezki@gmail.com designates 209.85.208.172 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684325717; a=rsa-sha256; cv=none; b=aOBJLMnf0TrTG0iZvq//Y3CFjXbtgO3KiZCynjVoot10r93uaowPeqn/nj2MrFaui8z4fs 7z8YnVvzdqrbep7PlXTvKRvGaV79jry5513Rvrltsm3ZJLVFQyIdaVd4EbK28q5j2CHYew OT4PZEiVZZl2/anKvL1r+bbidYNh96k= Received: by mail-lj1-f172.google.com with SMTP id 38308e7fff4ca-2ac80da3443so6648541fa.0 for ; Wed, 17 May 2023 05:15:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684325715; x=1686917715; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:from:to:cc:subject:date:message-id:reply-to; bh=X1lQxUyzM4Rd/o4Wvs1eOyZlxtftGtmBvO8kh4mwGQw=; b=joVnh5EXMID6F3MrTXY5vgDF2FPG6wH9TeSpD7jfPJksnIdYsAbvVFWB+zhhYIW9MA jRG43y9kkK8FaMfxR/DIq6vlkMprm0SaxFQ8ZXfYHWXCcFgKH8MUw7Yt9wjSrMAH/Fad 4f3p6B/roTW/SxHcIVNsNyM33/rjnQo7JOTAeYHCxINWdDcjBTAIKrpIHWv4iVTlyIrE ovHxzSt8IqL5YOggA5iDAIFTvbNnb+TDBQgkDjPsurW8yHy9AT9lCO9vEL7sql0lPdtw fHW0EY9c5ay+tfhWZm8TfhfyuTHRtJQxqW7Ah1LRSlB286wmBJoFw8VLPfwrfBdgtD/q sK2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684325715; x=1686917715; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=X1lQxUyzM4Rd/o4Wvs1eOyZlxtftGtmBvO8kh4mwGQw=; b=kpaG2xn9th+V2vkaEAfAUwJHHvomIYjibMUNChEHgOinIBVi8dnvoOWrXvoSk2BTi7 ouRZFzC0pJcLdf/OZUkc+xsmtkH92ZwLdBjrKptFrfIVNOuVybwWSawqkTR66aQPjMdj sZJQgg+zjD5myEgmnlchUTgVeimeavs2IGnCtott5+0hqLMWR7bpghXO71QD6wmSR0Hz DdzZVZmI/TX46hlUsmF4hx0YUf0gm3livMEZqkHmfaRTGB2GBcEfecR/PtppB9TdKmZp 0JNZLPxVoTAN3IS/I9PclHjrnet0eAafSRy/150O6XnD2SqAe4tA02Ca3qKj+L3npUe8 kOzA== X-Gm-Message-State: AC+VfDxZMvPhlXlucHS+SIkBv+i4HOSiJFhPfJ5HnGhXOs5ax2kzfrOb yaWQZ8Mw1vokgSA9gp/3MY4= X-Google-Smtp-Source: ACHHUZ4WlyXWr8kjmzgFQrKv8K4OtwNkyDPC3iVfLqkqMsyJIZuveZXpA5fuTnF+vCQBqYIniDvgzg== X-Received: by 2002:a2e:9495:0:b0:2a8:b579:225b with SMTP id c21-20020a2e9495000000b002a8b579225bmr9517027ljh.40.1684325714713; Wed, 17 May 2023 05:15:14 -0700 (PDT) Received: from pc636 (host-90-235-18-147.mobileonline.telia.com. [90.235.18.147]) by smtp.gmail.com with ESMTPSA id v21-20020a2e87d5000000b002adb10a6620sm2836812ljj.107.2023.05.17.05.15.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 May 2023 05:15:14 -0700 (PDT) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Wed, 17 May 2023 14:15:11 +0200 To: Thomas Gleixner Cc: Uladzislau Rezki , "Russell King (Oracle)" , Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Lorenzo Stoakes , Peter Zijlstra , Baoquan He , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org Subject: Re: Excessive TLB flush ranges Message-ID: References: <87cz308y3s.ffs@tglx> <87y1lo7a0z.ffs@tglx> <87o7mk733x.ffs@tglx> <87leho6wd9.ffs@tglx> <87o7mj5fuz.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87o7mj5fuz.ffs@tglx> X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 0A0AFA0004 X-Stat-Signature: cr63mynkrrau1bdgc3fbbic5yfmo3xu7 X-HE-Tag: 1684325716-572639 X-HE-Meta: U2FsdGVkX1+R0sN7q3Q+YGtdVx43rpC6fapUbOz8jkn3T9xM/X/Q0MkxSCKvu5A52IqmW9lebujJwGcFQbdEULjv8oTh5lDBs8upuAAiWSBb8DXod/Q2lry5NzoBmIAp4dMX/Q8Y811Y8Mdu43K7UWfqzqwCpdFRznXUGtl/hbz56eds4zMJ7Oedd6wZ6XVfAKz5sgkyE1SqafGYaEqMhXkgpWFNS/Doab9Biga66tJYAe6pZqFMDB27gEFjsvPbNG08qB5HF+vsLcrUBn5p+a9XBCnoPDcGV3OO7kBPEMfrdlXwKVlLAPbXRer4u6n9Pd4mAYyHt3Ij7vIds9ozHpV0iA9tlZ33nXU+w/TQ9iDd6tuzergvnPm2g0fTMksho7txNErwfPv1sD5k3Tvz+L5gPBT0HNSjOffaRw7sAZ4R9BkmnrhNCGahuGQb3MlOoJHzWCM4v3nYw7nSu6MC77OHsaF7/pOuMtYtoV5eKi38YBhKU0T9pTya42nQ9GL4a0egyQ+mRPuJlcSCKtAsPNqfkfGtRJ6JxFApUa4OitoCxekcjAj28GDvT4FmAWF9OQAzp3D7RUa/0ALwMnjNqCNgejDE8VvozOqcN4TYbH+TH8yyxYIuuwgz/q9lZ8bDVuiaLgPjH6P1zBCKoZ7OFpqFxPPdz9i4qNVr6DLQ+KbuEhFZ990L6pBLpuLLAAxMqUS0CkB+tp7K/aO7oHGk8iTJTu493BDzYhP6vVslCHKDWQuFXtcLE3LwO7JjyNkdgJ97nt/GxRHu8kW3+JYm0iwyVPwljM5/OpqanEArdn/bij5jvR+AdOnvTr4wKiCA8Ajzdjy/bFGRq905ak9Yr6zO8aYtHQ4lc5bx2dWNfPpAzUdSncBO9xM51vliWcoB4fIDxL9LF48iYgu3v6+NE/8ea7cLggQ1w71zg6ZaN39FQb5nBaQ+q76iRRlfh9d6axcED7zOwePK0NeZbfE MGTybc8O QIu07LyD4IxagHwz48TXC/jDuiQ1Wb/BykFR80uYG96Ow7SnLQ/ayLtTrDUV0ZR2l7kdJQETcueAahwPay1D78/dDw1lcsHzMGCiZuiKnYGCajRHUWqMV8h4htbymGRSEoRtmxiOYxNvjafzJ+l9Jo6JjGjuMxFfvsGg+GUGCNsl6pP/gIogYbbxmyf6awoS1jUPHu865WvLD0ORPOHEmzQhsm1PWIcGNsAF6Yjz9lma+IKHLZEbVAMT2ztnh9nUInF213p061w4WAdZlvTDnuWRsETa12yYehF/o3ckMxhphsFMMUOWxeNAdMttI+G9Wy+Yaz6L/czC6wX306oZieSLgyv/v6Wwadh3v3AE7U3OsUQjTqnmI7Uf00ekDguKDW9jcljc7xCFiApMNCVySRo7bufN3BXHOhMaicqrI8z4iYMZIXZmSSXg5ow== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 17, 2023 at 01:58:44PM +0200, Thomas Gleixner wrote: > On Wed, May 17 2023 at 13:26, Uladzislau Rezki wrote: > > On Tue, May 16, 2023 at 07:04:34PM +0200, Thomas Gleixner wrote: > >> The proposed flush_tlb_kernel_vas(list, num_pages) mechanism > >> achieves: > >> > >> 1) It batches multiple ranges to _one_ invocation > >> > >> 2) It lets the architecture decide based on the number of pages > >> whether it does a tlb_flush_all() or a flush of individual ranges. > >> > >> Whether the architecture uses IPIs or flushes only locally and the > >> hardware propagates that is completely irrelevant. > >> > >> Right now any coalesced range, which is huge due to massive holes, takes > >> decision #2 away. > >> > >> If you want to flush individual VAs from the core vmalloc code then you > >> lose #1, as the aggregated number of pages might justify a tlb_flush_all(). > >> > >> That's a pure architecture decision and all the core code needs to do is > >> to provide appropriate information and not some completely bogus request > >> to flush 17312759359 pages, i.e. a ~64.5 TB range, while in reality > >> there are exactly _three_ distinct pages to flush. > >> > > 1. > > > > I think, all two cases(logic) should be moved into ARCH code, so a decision > > is made _not_ by vmalloc code how to flush, either fully, if it supported or > > page by page that require list chasing. > > Which is exactly what my patch does, no? > > > 2. > > void vfree(const void *addr) > > { > > ... > > if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS)) > > vm_reset_perms(vm); <---- > > ... > > } > > > > so, all purged areas are drained in a caller context, so it is blocked > > until the drain is done including flushing. I am not sure why it is done > > from a caller context. > > > > IMHO, it should be deferred same way as we do in: > > > > static void free_vmap_area_noflush(struct vmap_area *va) > > How is that avoiding the problem? It just deferres it to some point in > the future. There is no guarantee that batching will be large enough to > justify a full flush. > > > if do not miss the point why vfree() has to do it directly. > > Keeping executable mappings around until some other flush happens is > obviously neither a brilliant idea nor correct. > It avoids of blocking a caller on vfree() by deferring the freeing into a workqueue context. At least i got the filling that "your task" that does vfree() blocks for unacceptable time. It can happen only if it performs VM_FLUSH_RESET_PERMS freeing(other freeing are deferred): if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS)) vm_reset_perms(vm); in this case the vfree() can take some time instead of returning back to a user asap. Is that your issue? I am not talking that TLB flushing takes time, in this case holding on mutex also can take time. -- Uladzislau Rezki