From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 077DBC77B75 for ; Wed, 17 May 2023 10:52:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 98027900005; Wed, 17 May 2023 06:52:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 93015900003; Wed, 17 May 2023 06:52:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81F81900005; Wed, 17 May 2023 06:52:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 72159900003 for ; Wed, 17 May 2023 06:52:38 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 425EE120534 for ; Wed, 17 May 2023 10:52:38 +0000 (UTC) X-FDA: 80799433596.13.8C9B995 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf02.hostedemail.com (Postfix) with ESMTP id C475180014 for ; Wed, 17 May 2023 10:52:35 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TZIipvac; spf=pass (imf02.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684320756; a=rsa-sha256; cv=none; b=xrSV8a2j3QpGfC52mpfsmBLzdUcJX+rZLqAYvk5Tsnsd6cIwzAYuq9qv0C6QCv13e6u4B6 NsGSaUKZIrgNi07Ox1I0xo2ZUYCV0B7LdzTPAGCNbNPgZueA0xNG9EiMAtDVITqeQslJq6 LX1A0JYstI0vLeMrAxAmHM6+piFr0Bs= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=TZIipvac; spf=pass (imf02.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684320756; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ypgh9nYJ7cN0U0SyanYmSIA/OBRRXNRPNSCafyHDX3c=; b=t0FSruNJ/fbAPRfUm/HSDonCx+LF9G/TjjBLETXVpGDcZ64lHaZcxf32SmgNFSPfA7LXPQ DjbbibkNBvPMh1yIr6Z3T22jzIyQUvPdUSYsnfPH5r9x1/VSvEdrtFDD9WRSAyYw0ef1Mo aP6LCkwSj8XOYz22yNrmsgvU1WaHLmA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684320754; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ypgh9nYJ7cN0U0SyanYmSIA/OBRRXNRPNSCafyHDX3c=; b=TZIipvacq5F3TrLg7wOmyTliEyCx88mWuy7ZLWZ4QIvII8+U5+tyzUX3xgpFrvipldsvDP ECOU6Pni93hk3r38CTIWCZTHl9CCj5jDIGp/A2samu4Ye/EwJcARyeUcFuxg6QuQZVzK8L o8W5a3NJTyx/deer8Mtqt2EN6D7f8DY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-294-_rhXu5noP32h8Ax73Q5PvA-1; Wed, 17 May 2023 06:52:29 -0400 X-MC-Unique: _rhXu5noP32h8Ax73Q5PvA-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E6FFE868A00; Wed, 17 May 2023 10:52:28 +0000 (UTC) Received: from localhost (ovpn-12-79.pek2.redhat.com [10.72.12.79]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 047B8492C3F; Wed, 17 May 2023 10:52:27 +0000 (UTC) Date: Wed, 17 May 2023 18:52:24 +0800 From: Baoquan He To: Thomas Gleixner Cc: "Russell King (Oracle)" , Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Uladzislau Rezki , Lorenzo Stoakes , Peter Zijlstra , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org Subject: Re: Excessive TLB flush ranges Message-ID: References: <87r0rg93z5.ffs@tglx> <87ilcs8zab.ffs@tglx> <87fs7w8z6y.ffs@tglx> <874joc8x7d.ffs@tglx> <87r0rg73wp.ffs@tglx> <87edng6qu8.ffs@tglx> <87y1ln5md2.ffs@tglx> MIME-Version: 1.0 In-Reply-To: <87y1ln5md2.ffs@tglx> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C475180014 X-Stat-Signature: kzismq39y45xytw3pujcxiw3hr6ihmai X-HE-Tag: 1684320755-454191 X-HE-Meta: U2FsdGVkX19XjGntxsgF76Qfwob/Bc1k2+euPx/2Hr7czVVBMNIw/g4INHLvnvWCWWDk8AycJv/JXqVJdBmCbteSCi2RLqsLV9X0SEabAmt/haOlBBlNHqqW5cHGoF452xAmwUmMGDD7hnD4R2L1knYmbij5qZAyMYteHD+A59mRM6ffKti13PaYqOtfrhvY3MhFQ9BeUuI+tNmUFdKUX8Y/RXY5GLhau1a9oyJwQ8aZI6ntWZ6IZGg4TG1T05OwQQ1l/fyM1i82kTbhu+UNtKqpy1LG9ALJHxA4PpPP80s2uVSK2iu622m1IwTuJAj+mDkhec3dea5JumQc7JJSPqGmDhh2rgr86E2wzmhvArM0f3ckePXAjRmS/6gbz2nQ5/AIFBSDuChihganL9mcwq9uQ7C3A++r3s4/PsdInWgDDdZoefflZl0Tqb7gmUBcStXKLrgFPb9Bzx1OOiogQACzrQbLcaONs34Z3xh9VP5Acz0jq+pMh+atEffcwqJ1vcntI84jARIw/X9/3HDejCA/Ci7jMWQJm3HsJrrlH6IUNSCOl7Yi9Y4bvMkCZd3KWT1T4whZ93y/V8GG2A1t+c+kxsgf+MckmfOAOVfXcT+BwOH2KwzC6biiwL35JYrdafZceLxMfC2FlxDCaHvkdtuQ1IOBBpjXQCB8zyUFx9KZ2KuC7CirngTis+J64xJOMhw9i4rK4y9acTrJw3ynnuvD1iYgwwHB8RB4BoisQTAJ0x1yEmg6Q1rAFJXD5KJJA15ijqF0hX8eyX1w4VOo4TZQQDjWSxOP5H3p8534EqW+kDGhHFyQGZN+Rcc3VPmkFnReYP8QPEDCk60TPlgiG47GmWBvehdF2WWsKKR54FzQkEmEHPu4UjFW0HfWUuq8Uk4H8sdHVFlUq0Ys9g+gfAM9iyCJHl5UpKHp7eShDKWcVLafCbc1AsKcpjoyjy4AY4UANms7C78pl734iSW eKgUAVbf e3UO9mdBA+0D2AiFh4sXVoU9349jw/bW7VOax8XOLjPnDy5+Ef7LAXUyx9heF9oHc4i0AVorng8zhfe7ArGAheUqzkyrntbBNspj6AE6o9+Hv/9L+eObv8+4SjLYVGttKVzI4NQbkUSlPIsju+0N0nfpReDLXr1gcTRaqXDp8T8PCJ2NVB5fox9xRQx1lbaWugd+Mp580L86p1s4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 05/17/23 at 11:38am, Thomas Gleixner wrote: > On Tue, May 16 2023 at 21:03, Thomas Gleixner wrote: > > > > Aside of that, if I read the code correctly then if there is an unmap > > via vb_free() which does not cover the whole vmap block then vb->dirty > > is set and every _vm_unmap_aliases() invocation flushes that dirty range > > over and over until that vmap block is completely freed, no? > > Something like the below would cure that. > > While it prevents that this is flushed forever it does not cure the > eventually overly broad flush when the block is completely dirty and > purged: > > Assume a block with 1024 pages, where 1022 pages are already freed and > TLB flushed. Now the last 2 pages are freed and the block is purged, > which results in a flush of 1024 pages where 1022 are already done, > right? This is good idea, I am thinking how to reply to your last mail and how to fix this. While your cure code may not work well. Please see below inline comment. One vmap block has 64 pages. #define VMAP_MAX_ALLOC BITS_PER_LONG /* 256K with 4K pages */ > > Thanks, > > tglx > --- > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2211,7 +2211,7 @@ static void vb_free(unsigned long addr, > > spin_lock(&vb->lock); > > - /* Expand dirty range */ > + /* Expand the not yet TLB flushed dirty range */ > vb->dirty_min = min(vb->dirty_min, offset); > vb->dirty_max = max(vb->dirty_max, offset + (1UL << order)); > > @@ -2240,13 +2240,17 @@ static void _vm_unmap_aliases(unsigned l > rcu_read_lock(); > list_for_each_entry_rcu(vb, &vbq->free, free_list) { > spin_lock(&vb->lock); > - if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) { > + if (vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) { > unsigned long va_start = vb->va->va_start; > unsigned long s, e; When vb_free() is invoked, it could cause three kinds of vmap_block as below. Your code works well for the 2nd case, for the 1st one, it may be not. And the 2nd one is the stuff that we reclaim and put into purge list in purge_fragmented_blocks_allcpus(). 1) |-----|------------|-----------|-------| |dirty|still mapped| dirty | free | 2) |------------------------------|-------| | dirty | free | 3) Handled by free_vmap_block(), and vb is put into purge list. |--------------------------------------| > > s = va_start + (vb->dirty_min << PAGE_SHIFT); > e = va_start + (vb->dirty_max << PAGE_SHIFT); > > + /* Prevent that this is flushed more than once */ > + vb->dirty_min = VMAP_BBMAP_BITS; > + vb->dirty_max = 0; > + > start = min(s, start); > end = max(e, end); > >