From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9991C433DB for ; Thu, 18 Mar 2021 08:34:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 53A9764F2A for ; Thu, 18 Mar 2021 08:34:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 53A9764F2A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 900DC6B0072; Thu, 18 Mar 2021 04:34:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8D8276B0073; Thu, 18 Mar 2021 04:34:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 753608D0001; Thu, 18 Mar 2021 04:34:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0023.hostedemail.com [216.40.44.23]) by kanga.kvack.org (Postfix) with ESMTP id 5764A6B0072 for ; Thu, 18 Mar 2021 04:34:13 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0E29081E1 for ; Thu, 18 Mar 2021 08:34:13 +0000 (UTC) X-FDA: 77932332786.09.B56D07F Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf22.hostedemail.com (Postfix) with ESMTP id 7A433C0007C1 for ; Thu, 18 Mar 2021 08:34:11 +0000 (UTC) Received: by mail-pl1-f174.google.com with SMTP id k4so924583plk.5 for ; Thu, 18 Mar 2021 01:34:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=0XbdwMWWufgj6NUQLxw9YImq1OekOBff1KvMfVrfHNo=; b=SXs/Ib/pMOk7phuZJcvHN4wVcGCtwINMVwn1wwU6lKL/PGVGrDQvSRTowU18z1OmQX lLoJN7Wvi2GBlHsf+OEKhy6xll+r8E0Nxk1dcQH7UhYp6/i+NTBFVxpFH+JZB8zgjUAR TSi0z6CH+xB3N13HkYr3YfagXfQYTgJSkyYuiu2CuffFC53k05aS/GXwtfgoBRKxHAgx +ltEWChcmK33tCIQLHD+T1hmECn7udx/KsGbgnvhXAmbt6DGFzTYbNdsXatmAnsMOiBO C2eLrCy2D+T37FYeQIqbgz8KLHDM4kajQg/lGjsi0x68NXtSNJakFBCOMICTAk8jmR8w fC4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=0XbdwMWWufgj6NUQLxw9YImq1OekOBff1KvMfVrfHNo=; b=DVpNw8AeYqGatYyevULU/By3wymUWpcshisbr440KSgPmpX3u2HUQ7Y6g8aeb5ipW6 dr8U2SWJquQr88a5oyL93uA7SZLX3e+MvmPcAF6dhTscKV/hk8Ybo0uCSr2R+B6mJ10S Ydza8FCvbaXHcFVUMQlAsgapuk/LevNSUQNhkDFfCPVeOL4J5H1HtrRXLmDI8Fo2o3eF Kip+eFUH0JguJbPszqD5OUP4zeEDsD1rYq8aS4WvrhkozORf74RhIZCilhhK4KnZRWfF rZm/eDvtpZusY4KbOvG0t8eytreRDxU0Y2DICUtjcvek+gKypH6HFj4yZtKo5Zf/OjXW 8VgQ== X-Gm-Message-State: AOAM532rV44N+iWuZPvFivY05hvl2sCGgt/nvv7Cmvzth571NNhs69mM aZM6Qa4IgcsTZbp51Xy0Y1U= X-Google-Smtp-Source: ABdhPJwfd5EflkG/PVmx84xxWArOhYCl9X6OHIHiIlDclMqFRHTsl6I4M/igQJjX6vhTp7rahNpyLw== X-Received: by 2002:a17:90a:e556:: with SMTP id ei22mr3090811pjb.214.1616056450380; Thu, 18 Mar 2021 01:34:10 -0700 (PDT) Received: from localhost ([58.84.78.96]) by smtp.gmail.com with ESMTPSA id h19sm1620635pfc.172.2021.03.18.01.34.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Mar 2021 01:34:09 -0700 (PDT) Date: Thu, 18 Mar 2021 18:34:02 +1000 From: Nicholas Piggin Subject: Re: [PATCH v2 4/6] mm/mremap: Use mmu gather interface instead of flush_tlb_range To: akpm@linux-foundation.org, "Aneesh Kumar K.V" , linux-mm@kvack.org Cc: joel@joelfernandes.org, kaleshsingh@google.com, linuxppc-dev@lists.ozlabs.org, peterz@infradead.org References: <20210315113824.270796-1-aneesh.kumar@linux.ibm.com> <20210315113824.270796-5-aneesh.kumar@linux.ibm.com> In-Reply-To: <20210315113824.270796-5-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 Message-Id: <1616056158.oq9i3fvoxn.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Stat-Signature: k6ya8wr6rw446n94k1qb1ooasz9m46xf X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7A433C0007C1 Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf22; identity=mailfrom; envelope-from=""; helo=mail-pl1-f174.google.com; client-ip=209.85.214.174 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616056451-706681 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Excerpts from Aneesh Kumar K.V's message of March 15, 2021 9:38 pm: > Some architectures do have the concept of page walk cache and only mmu ga= ther > interface supports flushing them. A fast mremap that involves moving page > table pages instead of copying pte entries should flush page walk cache s= ince > the old translation cache is no more valid. Hence switch to mm gather to = flush > TLB and mark tlb.freed_tables =3D 1. No page table pages need to be freed= here. > With this the tlb flush is done outside page table lock (ptl). I would maybe just get archs that implement it to provide a specific flush_tlb+pwc_range for it, or else they get flush_tlb_range by default. I think that would be simpler for now, at least in generic code. There was some other talk of consolidating the TLB flush APIs, I jsut=20 don't know if it's the best way to go to use the page/page table=20 gathering and freeing API for it. Thanks, Nick >=20 > Signed-off-by: Aneesh Kumar K.V > --- > mm/mremap.c | 33 +++++++++++++++++++++++++++++---- > 1 file changed, 29 insertions(+), 4 deletions(-) >=20 > diff --git a/mm/mremap.c b/mm/mremap.c > index 574287f9bb39..fafa73b965d3 100644 > --- a/mm/mremap.c > +++ b/mm/mremap.c > @@ -216,6 +216,7 @@ static bool move_normal_pmd(struct vm_area_struct *vm= a, unsigned long old_addr, > { > spinlock_t *old_ptl, *new_ptl; > struct mm_struct *mm =3D vma->vm_mm; > + struct mmu_gather tlb; > pmd_t pmd; > =20 > /* > @@ -244,11 +245,12 @@ static bool move_normal_pmd(struct vm_area_struct *= vma, unsigned long old_addr, > if (WARN_ON_ONCE(!pmd_none(*new_pmd))) > return false; > =20 > + tlb_gather_mmu(&tlb, mm); > /* > * We don't have to worry about the ordering of src and dst > * ptlocks because exclusive mmap_lock prevents deadlock. > */ > - old_ptl =3D pmd_lock(vma->vm_mm, old_pmd); > + old_ptl =3D pmd_lock(mm, old_pmd); > new_ptl =3D pmd_lockptr(mm, new_pmd); > if (new_ptl !=3D old_ptl) > spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); > @@ -257,13 +259,23 @@ static bool move_normal_pmd(struct vm_area_struct *= vma, unsigned long old_addr, > pmd =3D *old_pmd; > pmd_clear(old_pmd); > =20 > + /* > + * Mark the range. We are not freeing page table pages nor > + * regular pages. Hence we don't need to call tlb_remove_table() > + * or tlb_remove_page(). > + */ > + tlb_flush_pte_range(&tlb, old_addr, PMD_SIZE); > + tlb.freed_tables =3D 1; > VM_BUG_ON(!pmd_none(*new_pmd)); > pmd_populate(mm, new_pmd, (pgtable_t)pmd_page_vaddr(pmd)); > =20 > - flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE); > if (new_ptl !=3D old_ptl) > spin_unlock(new_ptl); > spin_unlock(old_ptl); > + /* > + * This will invalidate both the old TLB and page table walk caches. > + */ > + tlb_finish_mmu(&tlb); > =20 > return true; > } > @@ -282,6 +294,7 @@ static bool move_normal_pud(struct vm_area_struct *vm= a, unsigned long old_addr, > { > spinlock_t *old_ptl, *new_ptl; > struct mm_struct *mm =3D vma->vm_mm; > + struct mmu_gather tlb; > pud_t pud; > =20 > /* > @@ -291,11 +304,12 @@ static bool move_normal_pud(struct vm_area_struct *= vma, unsigned long old_addr, > if (WARN_ON_ONCE(!pud_none(*new_pud))) > return false; > =20 > + tlb_gather_mmu(&tlb, mm); > /* > * We don't have to worry about the ordering of src and dst > * ptlocks because exclusive mmap_lock prevents deadlock. > */ > - old_ptl =3D pud_lock(vma->vm_mm, old_pud); > + old_ptl =3D pud_lock(mm, old_pud); > new_ptl =3D pud_lockptr(mm, new_pud); > if (new_ptl !=3D old_ptl) > spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); > @@ -304,14 +318,25 @@ static bool move_normal_pud(struct vm_area_struct *= vma, unsigned long old_addr, > pud =3D *old_pud; > pud_clear(old_pud); > =20 > + /* > + * Mark the range. We are not freeing page table pages nor > + * regular pages. Hence we don't need to call tlb_remove_table() > + * or tlb_remove_page(). > + */ > + tlb_flush_pte_range(&tlb, old_addr, PUD_SIZE); > + tlb.freed_tables =3D 1; > VM_BUG_ON(!pud_none(*new_pud)); > =20 > pud_populate(mm, new_pud, (pmd_t *)pud_page_vaddr(pud)); > - flush_tlb_range(vma, old_addr, old_addr + PUD_SIZE); > + > if (new_ptl !=3D old_ptl) > spin_unlock(new_ptl); > spin_unlock(old_ptl); > =20 > + /* > + * This will invalidate both the old TLB and page table walk caches. > + */ > + tlb_finish_mmu(&tlb); > return true; > } > #else > --=20 > 2.29.2 >=20 >=20