From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C718C7EE24 for ; Mon, 15 May 2023 19:46:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80CFE900004; Mon, 15 May 2023 15:46:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7BCE3900002; Mon, 15 May 2023 15:46:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 684CB900004; Mon, 15 May 2023 15:46:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 57BFC900002 for ; Mon, 15 May 2023 15:46:44 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 22AE11203CA for ; Mon, 15 May 2023 19:46:44 +0000 (UTC) X-FDA: 80793521928.22.5FD1B1E Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf11.hostedemail.com (Postfix) with ESMTP id 0526440010 for ; Mon, 15 May 2023 19:46:41 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=iq3CnA9H; dkim=pass header.d=linutronix.de header.s=2020e header.b=Kec+mAjx; spf=pass (imf11.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684180002; a=rsa-sha256; cv=none; b=u7QYBzZ6OvFKwratDwSW9cz+QLVzmqBugrM03Nq20eB3FpYcSKEMaYTpmuhwzHYB43BNOT YZaEYREZiQha90TiJAj2Vz/VMfYNv7DqD+SWPg4HgNnnn6A/97xxiNaUUSrxrvvNPLuG3D uMsOLEVqb410w8lzKUPw/HQfy1HGPT8= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=iq3CnA9H; dkim=pass header.d=linutronix.de header.s=2020e header.b=Kec+mAjx; spf=pass (imf11.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684180002; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6fgURv4U0P1Gi8D14D0+ZT9POL819NJFtWCbYalWclc=; b=Qz+edmwrjR6inCyXxWIXy2n0JFIrpa/2qSqJ1We6aHPGE/Znrj2QEikdT2jqD5Xlz8zcOV +jyUAiiIDwtB74B23s658HieWtPDW9IFLl3F6YGjupYgWxFa1EbnS/XOl6QorU2Uxp380f GSm59KNc+TMMzrzkTTpBVJLxbJEW3fs= From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1684179999; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=6fgURv4U0P1Gi8D14D0+ZT9POL819NJFtWCbYalWclc=; b=iq3CnA9HTWbDFQtJTbehbmhSY8ZftE9nfCmY/WFZCEz8V27QNWp3G1xoXe42RyBvbyhCVn hXMzwHb3PdDXNjhr7B+Mcf8IvHeiIe9INa/uPIffPYLs5yi3EVRsyBwfCJyBE6y37AQAWU hBpT+ZPflVSKUHwVSwPYMHptfJMr9HPkkrxJ/JySiI/Ba6+AKLgUPeWtobwqQBsDL94AJy fuDglD1OZ7FSk6xL+iUuVCeM71iSD8J0MMRTj5eQQK+CCzghVhRrtfA8U58bU5L3ZETYsF 70scZIvhfWpZapnEuWM8uvMUrv4PMM4RFMVWDOFwsEiAVUHZ3OrvIMV9pqQC1w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1684179999; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=6fgURv4U0P1Gi8D14D0+ZT9POL819NJFtWCbYalWclc=; b=Kec+mAjxrAaOyun3rDIky9jiMyw7/PHAsYhHrBqJkQ7ne3vXmj/ISiYJRV6jRCInxTUTOq GBQDVK3f/n1RhUCQ== To: "Russell King (Oracle)" Cc: Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Uladzislau Rezki , Lorenzo Stoakes , Peter Zijlstra , Baoquan He , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org Subject: Re: Excessive TLB flush ranges In-Reply-To: References: <87a5y5a6kj.ffs@tglx> Date: Mon, 15 May 2023 21:46:38 +0200 Message-ID: <87353x9y3l.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Rspam-User: X-Stat-Signature: oxjn7z1op8y5t9hgzb949ukhmpfd3mq3 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 0526440010 X-HE-Tag: 1684180001-352978 X-HE-Meta: U2FsdGVkX1+IDpIkNJqvkfYqslmO8sr0Y9coX3n/IhSEPQJf0elx4rwRnk0GMkUW1MQdtfhMx0dY/3rhCTOTo77GjLkxeKsZ+MumenzyYO1ALnXheM921sYlnootwIkSirG0yxJmXoipSWTCcfnOjwPdjh+jBUNp/Bqxwr+7aoZ8nvushP96/QOmGdbi4YdeLGC2lQ/X4jgHLY35lz6SmhW5/ifQrQbg0EwpZkS3X0I4BocmaHBbCl/3UkqrpqgYfr7PxZMrGKRh3q1UFMvFi8fj/Z2hfvLaGjyjMtOHSCTnBTn/njeoyVHu6oCtM76H/oXyHlKCYocIcwRtecTOHAfOlk2RsRQ0dsIHPBuBv/CyfHtnQEhKygWKwt8v2Q2dU/k4sBlW6i//Vhe9CsjsbvlOX/CoZNRhYVApJix7VaLCh1EvHzIxe8Jg/ZJyk6tBlKiUi4nR9iWxMTc2BLU/cRCrTgbVhqVW7DvvzjfchtxRqm60THjd5VZPFWrvVUwlKg0IDWQb5AlziIz2+sAPX2QS0Ay9tJxcc3XCkd5inzwJs6OQO39TcHLNmz4Ts97Uz3xmdtp98g0DCI5ULIkqcv/jf8+zYXUjDSlAa8GGyvGhIqumP97PBG4pXeuUcedQA5luAr4n9mwlTDT/gvpvnzuxzMOcqPal2MhI57iaKuSBhtGv4lGWYULLO8dDlmCu9NuXArfy6ddJGgEGr+yJ+kPhRRKBAAjwlS0Cw4MVeCSghYElEKZ8+Tw7ST69qstccZ+C45NLShDI2IaO/YobHW1EQo9Qwa6bktWDYACsMIOJho4AAcwsUQtwevVe7EnYgzQZPo6Ac1kXG18z8Aqy10RDVitnF2SoUWoBIPNptrZmvzXawOFctJ/h/vrK/2g1HDGvLM9jx7z0ufV88MaTBG8Npd2XAZasTAb0i/OKiacInp65cOUjxWiYS8TcHWMIk9b5snh8QVIwuuTvVqM JA+hIydZ l4ozZO/iiCWZCkT5fCg6bvhacxfveSkpnV1L8cK8D8LqZ0KI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, May 15 2023 at 17:59, Russell King wrote: > On Mon, May 15, 2023 at 06:43:40PM +0200, Thomas Gleixner wrote: >> bpf_prog_free_deferred() >> vfree() >> _vm_unmap_aliases() >> collect_per_cpu_vmap_blocks: start:0x95c8d000 end:0x95c8e000 size:0x1000 >> __purge_vmap_area_lazy(start:0x95c8d000, end:0x95c8e000) >> >> va_start:0xf08a1000 va_end:0xf08a5000 size:0x00004000 gap:0x5ac13000 (371731 pages) >> va_start:0xf08a5000 va_end:0xf08a9000 size:0x00004000 gap:0x00000000 ( 0 pages) >> va_start:0xf08a9000 va_end:0xf08ad000 size:0x00004000 gap:0x00000000 ( 0 pages) >> va_start:0xf08ad000 va_end:0xf08b1000 size:0x00004000 gap:0x00000000 ( 0 pages) >> va_start:0xf08b3000 va_end:0xf08b7000 size:0x00004000 gap:0x00002000 ( 2 pages) >> va_start:0xf08b7000 va_end:0xf08bb000 size:0x00004000 gap:0x00000000 ( 0 pages) >> va_start:0xf08bb000 va_end:0xf08bf000 size:0x00004000 gap:0x00000000 ( 0 pages) >> va_start:0xf0a15000 va_end:0xf0a17000 size:0x00002000 gap:0x00156000 ( 342 pages) >> >> flush_tlb_kernel_range(start:0x95c8d000, end:0xf0a17000) >> >> Does 372106 flush operations where only 31 are useful > > So, you asked the architecture to flush a large range, and are then > surprised if it takes a long time. There is no way to know how many > of those are useful. I did not ask for that. That's the merge ranges logic in __purge_vmap_area_lazy() which decides that the one page at 0x95c8d000 should build a flush range with the rest. I'm just the messenger :) > Now, while using the sledge hammer of flushing all TLB entries may > sound like a good answer, if we're only evicting 31 entries, the > other entries are probably useful to have, no? That's what I was asking already in the part you removed from the reply, no? > I think that you'd only run into this if you had a huge BPF > program and you tore it down, no? There was no huge BPF program. Some default seccomp muck. I have another trace which shows that seccomp creates 10 BPF programs for one process where each allocates 8K vmalloc memory in the 0xf0a.... address range. On teardown this is even more horrible than the above. Every allocation is deleted separately, i.e. 8k at a time and the pattern is always the same. One extra page in the 0xca6..... address range is handed in via _vm_unmap_aliases(), which expands the range insanely. So this means ~1.5M flush operations to flush a total of 30 TLB entries. That reproduces in a VM easily and has exactly the same behaviour: Extra page[s] via The actual allocation _vm_unmap_aliases() Pages Pages Flush start Pages alloc: ffffc9000058e000 2 free : ffff888144751000 1 ffffc9000058e000 2 ffff888144751000 17312759359 alloc: ffffc90000595000 2 free : ffff8881424f0000 1 ffffc90000595000 2 ffff8881424f0000 17312768167 ..... seccomp seems to install 29 BPF programs for that process. So on exit() this results in 29 full TLB flushes on x86, where each of them is used to flush exactly three TLB entries. The actual two page allocation (ffffc9...) is in the vmalloc space, the extra page (ffff88...) is in the direct mapping. This is a plain debian install with the a 6.4-rc1 kernel. The reproducer is: # systemctl start logrotate Thanks, tglx