From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9523EC77B7D for ; Tue, 16 May 2023 02:27:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA3EB900003; Mon, 15 May 2023 22:27:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C5442900002; Mon, 15 May 2023 22:27:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B1B82900003; Mon, 15 May 2023 22:27:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A26D2900002 for ; Mon, 15 May 2023 22:27:12 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 739F2A11C5 for ; Tue, 16 May 2023 02:27:12 +0000 (UTC) X-FDA: 80794531104.14.A30E273 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 9629C1C0004 for ; Tue, 16 May 2023 02:27:09 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=iQY0A7Ib; spf=pass (imf20.hostedemail.com: domain of bhe@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684204029; a=rsa-sha256; cv=none; b=H/77PKAGz6PGswynUUKgKhkK1FX6T/KIUVJpN9tZTzRDMgJMWLJ52WeUu0/adPgeBLNgVY tFvcNLjl6d86DRVlklYhf1nuBGsWMkee3pjKtEJjOI3ldb8lWcNq13i6VzEWJn8j/beWKx imC/B02ammUnpqH/Lp6Glwhx+/dA/fU= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=iQY0A7Ib; spf=pass (imf20.hostedemail.com: domain of bhe@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684204029; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SF/5uct239E8TgdHgpQpj6cQpYA6fkI0mkiBiVjTdyg=; b=R9iwKWbuEWkxjZm/SJUt1+Pm6qzyWivO5Vt4uoJyAieyVIsSzxC0Jd54e/KwaMiP4Rscjl F9b0I6+f+SCxgZVcMW8avbGkdQw5ksn57T4S+92bhAAE/UtUyKny91fnAJzmBk00ORJjLj h/vflb6+5ldNWnksWzQX8ANCCjPUeRo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684204028; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SF/5uct239E8TgdHgpQpj6cQpYA6fkI0mkiBiVjTdyg=; b=iQY0A7IbdkQHYPjLFeaNuyPD7wQrxk/8fY54ijBJP8slQQOKthXaUh5jG8BU9EiAIx369c ISxIkcHZ9MkyA3Ch4Jch3v7co0brffcN+JHaTNsonyniCxNdp8phrN6fnThtqGb3oaXN1l i4gTXSDsJzi+FVYpGziVngOdRj4gfKs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-217-8y7tj8PNPqes3PAuz7GfNw-1; Mon, 15 May 2023 22:27:02 -0400 X-MC-Unique: 8y7tj8PNPqes3PAuz7GfNw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E634987082D; Tue, 16 May 2023 02:27:01 +0000 (UTC) Received: from localhost (ovpn-12-32.pek2.redhat.com [10.72.12.32]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DE80740C2063; Tue, 16 May 2023 02:27:00 +0000 (UTC) Date: Tue, 16 May 2023 10:26:57 +0800 From: Baoquan He To: Uladzislau Rezki Cc: Thomas Gleixner , Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Lorenzo Stoakes , Peter Zijlstra , John Ogness , linux-arm-kernel@lists.infradead.org, Russell King , Mark Rutland , Marc Zyngier Subject: Re: Excessive TLB flush ranges Message-ID: References: <87a5y5a6kj.ffs@tglx> MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 9629C1C0004 X-Stat-Signature: rfdix6fq9jtiqum1kucj6ryn3cx5drd8 X-HE-Tag: 1684204029-464532 X-HE-Meta: U2FsdGVkX188pIwAXg0P66y/88vlrxuBYDBVZ8Fyydiu6jLZR+ZggrvF7DtNDkNjVJGc7mARDSLQSnIiRFZiY82tca+rl3A72hevf0U5tThVTDH2kiMnO2t/ioHsYi1Bn7qmTw5W/cr/HCn36AKtDuUAHAdRTby42qxHVTYNgfFwZ2qUA0HgoNMzx59B9yA00abva18NgH1HkojMEikUNo1Pe4JOaVTYcE79eKgy2g4h7je0YGi3HonAfWQrGJLWrwe8ab2ySzsHWrc2KG68YJhMzDQ/Yvw21YTwRQnBl/2s7GblefgNAty6y8QoPLSeUqs4UG8vLesBJAeK4JjOm/jy9rz6rApgWjY3+ErVcd4z2v4w7gOmnCFQab7hWT7NkwGLldlaOjfkSZv20kNi9rPq1p8sq3feZEXc6/hUB29z/4EcmPH8blJTtfZ6FmLQfE8AoXSlkqb3/qhj+v7YqzHausSmMuDJY05px7cUkCP0Hpo864vwrSRfjtvNwu9cKz0VNGOP+qmolhZkt4TENw+L+KHSAR9QLn/5F0jWHkWcPzvzMEZJM4SMdBjrPqFhxBF/oDWYfW4etGVsgDtbDWsGnyXIEzIMyCUOaT5s6x/uylz2ZQtwLcYG+cYHo47RqgYcojG8VQKRNwMuImjI/2TjxowB+qCI+7FgZkH7t2t+5ixPaN+7HID65vwJ/C2qyeJFnZKmYiJxZsRKiUz8IMYEiFvfpcY7O5BLPE7gYaRPlKeS/eb6SVl8cXwZcbFjUhcqkutaTfVATD5BEm9nYrJuJ6qsXfGgudbN2Slh0g8G1x2KKxr/cuS2+w7v0yt/e/yvFsrN+5KoT9zqm1y3dSqQToxRuEcaTgNIuyNBLSxgqqU5S2JUHo/zg92tqaVe4sQvoOnc406nA2ojz1sT7MumnWAmGVySE5iyv3gkyQjpdKYCoJvcdww5DWmBh+hIPbLg0kOsaBhUXdglvul qblElFc3 zn6WocCvuhcmB7rSz/N1JAYvjolspX2FT2XEgy8/rXXxzhww2lPOdtSkXaECXo73kgX5vWAVLDO9o/IqtWvm8OE1LfxIByn5yG9xO/WpXc7OXddHZMuG0HytKuREKUiYNN8EEUfoWtuimbJkPR66RYvgPRBPV24OMISZ/cD+wn6aNEB+HIdbY9MXkcVwNPbVuhzqvelYkyYjkz5g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 05/15/23 at 08:17pm, Uladzislau Rezki wrote: > On Mon, May 15, 2023 at 06:43:40PM +0200, Thomas Gleixner wrote: > > Folks! > > > > We're observing massive latencies and slowdowns on ARM32 machines due to > > excessive TLB flush ranges. > > > > Those can be observed when tearing down a process, which has a seccomp > > BPF filter installed. ARM32 uses the vmalloc area for module space. > > > > bpf_prog_free_deferred() > > vfree() > > _vm_unmap_aliases() > > collect_per_cpu_vmap_blocks: start:0x95c8d000 end:0x95c8e000 size:0x1000 > > __purge_vmap_area_lazy(start:0x95c8d000, end:0x95c8e000) > > > > va_start:0xf08a1000 va_end:0xf08a5000 size:0x00004000 gap:0x5ac13000 (371731 pages) > > va_start:0xf08a5000 va_end:0xf08a9000 size:0x00004000 gap:0x00000000 ( 0 pages) > > va_start:0xf08a9000 va_end:0xf08ad000 size:0x00004000 gap:0x00000000 ( 0 pages) > > va_start:0xf08ad000 va_end:0xf08b1000 size:0x00004000 gap:0x00000000 ( 0 pages) > > va_start:0xf08b3000 va_end:0xf08b7000 size:0x00004000 gap:0x00002000 ( 2 pages) > > va_start:0xf08b7000 va_end:0xf08bb000 size:0x00004000 gap:0x00000000 ( 0 pages) > > va_start:0xf08bb000 va_end:0xf08bf000 size:0x00004000 gap:0x00000000 ( 0 pages) > > va_start:0xf0a15000 va_end:0xf0a17000 size:0x00002000 gap:0x00156000 ( 342 pages) > > > > flush_tlb_kernel_range(start:0x95c8d000, end:0xf0a17000) > > > > Does 372106 flush operations where only 31 are useful > > > > So for all architectures which lack a mechanism to do a full TLB flush > > in flush_tlb_kernel_range() this takes ages (4-8ms) and slows down > > realtime processes on the other CPUs by a factor of two and larger. > > > > So while ARM32, CSKY, NIOS, PPC (some variants), _should_ arguably have > > a fallback to tlb_flush_all() when the range is too large, there is > > another issue. I've seen a couple of instances where _vm_unmap_aliases() > > collects one page and the actual va list has only 2 pages, which might > > be eventually worth to flush one by one. > > > > I'm not sure whether that's worth it as checking for those gaps might be > > too expensive for the case where a large number of va entries needs to > > be flushed. > > > > We'll experiment with a tlb_flush_all() fallback on that ARM32 system in > > the next days and see how that works out. > > > For systems which lack a full TLB flush and to flush a long range is > a problem(it takes time), probably we can flush VA one by one. Because > currently we calculate a flush range [min:max] and that range includes > the space that might not be mapped at all. Like below: It's fine if we only calculate a flush range of [min:max] with VA. In vm_reset_perms(), it calculates the flush range with the impacted direct mapping range, then merge it with VA's range. That looks really strange and surprising. If the vm->pages[] are got from a lower part of physical memory, the final merged flush will span tremendous range. Wondering why we need merge the direct map range with VA range, then do flush. Not sure if I misunderstand it. > > > VA_1 VA_2 > |....|-------------------------|............| > 10 12 60 68 > > . mapped; > - not mapped. > > so we flush from 10 until 68. Instead, probably we can do a flush of VA_1 > range and VA_2 range. On modern systems with many CPUs, it could be a big > slow down.