From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51954C77B7A for ; Fri, 19 May 2023 10:02:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EAA10280001; Fri, 19 May 2023 06:02:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E333F900003; Fri, 19 May 2023 06:02:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFD9E280001; Fri, 19 May 2023 06:02:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BEC3F900003 for ; Fri, 19 May 2023 06:02:06 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 72F6A1A0A26 for ; Fri, 19 May 2023 10:02:06 +0000 (UTC) X-FDA: 80806563852.12.0C1B080 Received: from mail-lj1-f172.google.com (mail-lj1-f172.google.com [209.85.208.172]) by imf09.hostedemail.com (Postfix) with ESMTP id 9123314001A for ; Fri, 19 May 2023 10:02:03 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=X3EVcJrS; spf=pass (imf09.hostedemail.com: domain of urezki@gmail.com designates 209.85.208.172 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684490523; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=H1W3Mxvm971CbgIs6535/Aup+UtfhvG+TAFS3/65WyA=; b=pTp3E76adyaPnec00m19M0w4QlAUKnufgYjuE/uyZbm44GA481Z8a2aCyd69NK5uJWYVgZ quD37nNMZBJrTUhCwfw8HJFuUl2GvIP462Kp+Zshzsv7op4JvwouuGrzu7f/uTR17MBrPd nYsdq4Ma7O32JzAcu5PIAxzmTeuogAY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684490523; a=rsa-sha256; cv=none; b=hkR3H4XlT222TPMFPM2cIHyOlmA44HYXlPc94onYDqVEO5Z3DDgcUjzcZJJMrEADy6NeTQ d7MMXYzEbEiGKAv9Eo7gV8464Is9UoUnWZSy7rZJP/6X067VLTsyHf74XRzh0qQo1jPdBU LopQI0kGVVbi/uzN09YVG4jm6/9lHO4= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=X3EVcJrS; spf=pass (imf09.hostedemail.com: domain of urezki@gmail.com designates 209.85.208.172 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lj1-f172.google.com with SMTP id 38308e7fff4ca-2af28a07be9so879701fa.2 for ; Fri, 19 May 2023 03:02:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684490521; x=1687082521; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:date:from:from:to :cc:subject:date:message-id:reply-to; bh=H1W3Mxvm971CbgIs6535/Aup+UtfhvG+TAFS3/65WyA=; b=X3EVcJrSTD2GB+mRsOX4o622yogTawhfhfmFNgE7Ax1o6lJPn7dR/2yI7NfxEbDxwr P6ynBjY+NRMzsKcCOHTTM/6j54vF6wPfxU6iQCPcAMaR8i+lGrKEnWrMxeVVoK9ypqZW X5/PPcj6hkjfKk1weA2i6Q6aIiv7+cCcJ00gqXpSC2nAFx3aAhJ00Nc8GpADtriwjaY5 Gj86aE628ccW6VLWYTAGgrgqPPLh9FO+gufmjMEZIvOTVI65JlXtqWsOWq5SX566AcMi y993rKEblzaG3lFB/HbkWSDY6uZbAPpRxstOo8nb9RRjq8gKCMOBFIJAInnH77e+Ge53 F2TQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684490521; x=1687082521; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:date:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=H1W3Mxvm971CbgIs6535/Aup+UtfhvG+TAFS3/65WyA=; b=FmxElM8qVTk/z09U1LYwzdv9eh3YDK/ya4zQp23BLO+QhZxIqlH3GK7+XgKufkxaBm VWKhDO1sNv8WE4wXEPT3/C/oWXWbHYZUPu1vTiibmWlBrwzsofQhga/v+2D0Vj8t1AW1 u5oGLM3GvqhvnPFRRhpJvfcG3/0UxEyeuXq0RfPQisFjAkwSvMiYgHy4T5yH4pO3a9Qa RGuXqz9dGs8Pz6Zu0ILhs0/odTQc59wAzGiCohjpzbKiize/ehlO1DFzYwivjEDCC2kS X1yuCibegtihEmx9kRhq9FTQDUzyXEIFTtsBYzbu4KriS7vCbHCQmxu/40RFx0ftr5Ed LljQ== X-Gm-Message-State: AC+VfDw1+YTkbUWgHXjmEZG4C/uFdrzTQcYPjOYCUewOQ+v9vNmMIwci /7AftfO+zJu9rfhTQkIBJug= X-Google-Smtp-Source: ACHHUZ4OeGX9W7fs4IoWxUoIcWCCf32gCnZv/94bAAK/Z2OYS7sDCpcxiDNaKmFiTEjRZPuTDkWxpQ== X-Received: by 2002:a2e:b0ec:0:b0:2af:1db6:8f13 with SMTP id h12-20020a2eb0ec000000b002af1db68f13mr571754ljl.34.1684490521185; Fri, 19 May 2023 03:02:01 -0700 (PDT) Received: from pc636 (host-90-235-19-70.mobileonline.telia.com. [90.235.19.70]) by smtp.gmail.com with ESMTPSA id t16-20020a2e9d10000000b002aa3ad9014asm745669lji.54.2023.05.19.03.01.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 May 2023 03:01:59 -0700 (PDT) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Fri, 19 May 2023 12:01:56 +0200 To: Thomas Gleixner Cc: Uladzislau Rezki , "Russell King (Oracle)" , Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Lorenzo Stoakes , Peter Zijlstra , Baoquan He , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org Subject: Re: Excessive TLB flush ranges Message-ID: References: <87y1lo7a0z.ffs@tglx> <87o7mk733x.ffs@tglx> <87leho6wd9.ffs@tglx> <87o7mj5fuz.ffs@tglx> <87edne6hra.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87edne6hra.ffs@tglx> X-Rspamd-Queue-Id: 9123314001A X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: du9fuhhkhmmxchyxtqaj1p4ieyrf15hr X-HE-Tag: 1684490523-522444 X-HE-Meta: U2FsdGVkX19Hmxp4mHSJXWOAhTsOHWtgzLOuz06ca0JbLQzB+a5ixZkDEKUv9SAypZzVUfAG+cXWUZLWSvlTIW0Emwb+LGW6/4uc0668gsVs4REe863Mq1S4ewnmFeQiJgnd52e3IIcAJIgemcum+f9b6mtHvYvvYTKFKU1EJSEIzepiHcuC6QVbcLtWcQ5RV0dIAMQj/0jbf4Y3jl3yUmZN7TFL9FP4JQvSzhZtuiG35OJvcYpsgqX2ApQRVIcgO/m38uBq8n1IFs9yY5VrZPOP9M3VnVLMUGSflyNFhDgA4Md+J590Roj8K/iM0DNeBKpNWjI5Xm4i/ewAarp4BGUQGmCWPSOz53qZQhgP3nVnbvzUjg0h9nX8IjjLDGNsOK9B5uH53mj9boK0X7s0wOEeRDcURkJnOCdCleTi/SXT2mLb6c/CV8K5jMblbkgtYsrA2/bBGMTUTfhBobTPtRIGXRnmJ1kzIjgSqEvM49Hv0c0+TEsmH+OaqxbV8pZbNs1F0Iu0/PH492ZoFnmxVV+DM+QeKmXZ1rPd+HH8qKt40muC53xwrd52S8p5sH0jJrCWB8+ITxZM2b0wa5X2+RXaTAubSnf0zGxMH8ZEWxjC9wFfsIEXc7sU+zz/TZRStMowCYmKNjZnFgxZ47T4LtNoKWYvCvnz2OgkyMtzODUzy/6ttr9IgVmuGnntGSZ0BfCMdP8y+oKGdJuKj7J4zk/z+PEM+8SBcLFLoyd7FtWwqoD8+fuuJ8BKzoQ7hxvwEVMDWaDyA5bXmbMLEFsclDi/o2uJifC8bCd/FHrSEi1ZBAnc9K7krr8mihkPd9M6TPrWsoi13s+8mOPf+G46q+5FsuBOtn8cCXDeZfQkeeEI2506XYL51HkxlbOcCJMbNNjed04GzCmB2c6NgasSRZu0E3weORxUTUaHpibg1o9KLwfXHSo7ml+/U4ahwfrrnBSQy11KDjTeZZFoxsF reti4LPC TcKpcgpXdlnWMNt9PelGK53m9bRaNtyl5D5WEv/hfEGvfov8eVN+NPnuHtld4gDHSFxMCP+rAsedREg76ql52JsRKAimAIsKopDlvLjI//IN85C3PAAjdyog/ru8OExVEyluGXFnWsq0Y/TINZPO5Y89VDv4BNFlI2VE8/mpBaEgI0YJZcrXnifeOVxsq2GrDQK6F/XsIOIImj6lk/KVjXQm6Pm6g0TnVNX9eTrcW58dmEUlz574ZBe3fFMmL9QLflfsF/GiBA28NTUxJPCJjwvNEL2SflP8spNm3eFTqXc7AdZpoQoz8bM2rWqkjSLYew6bv3vH2dqrbXOnp68cYuUJg+6Vy4xjg32p/1bt1yLFAn3dpH9CGtRxlD8jxXXNbbAyh3CfhWE9m6e/tvDGlMbdBIr4BTl0ippErYFpTPxMc6VlFAzYqbImDd0EDmIk+h4XIbK7jQbYwdCB2QjTeLLsjMnt/FVAfugjeuW1zQgz61GASgLvI48xNWL+t7JUVGEcmL9gQrymXsuI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 17, 2023 at 06:32:25PM +0200, Thomas Gleixner wrote: > On Wed, May 17 2023 at 14:15, Uladzislau Rezki wrote: > > On Wed, May 17, 2023 at 01:58:44PM +0200, Thomas Gleixner wrote: > >> Keeping executable mappings around until some other flush happens is > >> obviously neither a brilliant idea nor correct. > >> > > It avoids of blocking a caller on vfree() by deferring the freeing into > > a workqueue context. At least i got the filling that "your task" that > > does vfree() blocks for unacceptable time. It can happen only if it > > performs VM_FLUSH_RESET_PERMS freeing(other freeing are deferred): > > > > > > if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS)) > > vm_reset_perms(vm); > > > > > > in this case the vfree() can take some time instead of returning back to > > a user asap. Is that your issue? I am not talking that TLB flushing takes > > time, in this case holding on mutex also can take time. > > This is absolutely not the problem at all. This comes via do_exit() and > I explained already here: > > https://lore.kernel.org/all/871qjg8wqe.ffs@tglx > > what made us look into this and I'm happy to quote myself for your > conveniance: > > "The scenario which made us look is that CPU0 is housekeeping and CPU1 is > isolated for RT. > > Now CPU0 does that flush nonsense and the RT workload on CPU1 suffers > because the compute time is suddenly factor 2-3 larger, IOW, it misses > the deadline. That means a one off event is already a problem." > > So it does not matter at all how long the operations on CPU0 take. The > only thing which matters is how much these operations affect the > workload on CPU1. > Thanks. I focused on your first email, where you have not mentioned your second part, explaining that you have a housekeeping CPU and another for RT activity. > > That made me look into this coalescing code. I understand why you want > to batch and coalesce and rather do a rare full tlb flush than sending > gazillions of IPIs. > Your issues has no connections with merging. But the place you looked was correct :) > > But that creates a policy at the core code which does not leave any > decision to make for the architecture, whether it's worth to do full or > single flushes. That's what I worried about and not about the question > whether that free takes 1ms or 10us. That's a completely different > debate. > > Whether that list based flush turns out to be the better solution or > not, has still to be decided by deeper analysis. > I had a look how per-VA TLB flushing behaves on x86_64 under heavy load: commit 776a33ed63f0f15b5b3f6254bcb927a45e37298d (HEAD -> master) Author: Uladzislau Rezki (Sony) Date: Fri May 19 11:35:35 2023 +0200 mm: vmalloc: Flush TLB per-va Signed-off-by: Uladzislau Rezki (Sony) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 9683573f1225..6ff95f3d1fa1 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1739,15 +1739,14 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) if (unlikely(list_empty(&local_purge_list))) goto out; - start = min(start, - list_first_entry(&local_purge_list, - struct vmap_area, list)->va_start); + /* OK. A per-cpu wants to flush an exact range. */ + if (start != ULONG_MAX) + flush_tlb_kernel_range(start, end); - end = max(end, - list_last_entry(&local_purge_list, - struct vmap_area, list)->va_end); + /* Flush per-VA. */ + list_for_each_entry(va, &local_purge_list, list) + flush_tlb_kernel_range(va->va_start, va->va_end); - flush_tlb_kernel_range(start, end); resched_threshold = lazy_max_pages() << 1; spin_lock(&free_vmap_area_lock); There are at least two observation: 1. asm_sysvec_call_function adds extra 12% in therms of cycles # per-VA TLB flush - 12.00% native_queued_spin_lock_slowpath ▒ - 11.90% asm_sysvec_call_function ▒ - sysvec_call_function ▒ __sysvec_call_function ▒ - __flush_smp_call_function_queue ▒ - 1.64% __flush_tlb_all ▒ native_flush_tlb_global ▒ native_write_cr4 ▒ # default 0.18% 0.16% [kernel] [k] asm_sysvec_call_function 2. Memory footprint grows(under heavy load) because the TLB-flush + extra lazy-list scan take longer time. Hope it could be somehow useful for you. -- Uladzislau Rezki