From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA2D9C433ED for ; Fri, 21 May 2021 06:13:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7A7686135B for ; Fri, 21 May 2021 06:13:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7A7686135B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C2F9A940016; Fri, 21 May 2021 02:13:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BDF0B94000D; Fri, 21 May 2021 02:13:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A315D940016; Fri, 21 May 2021 02:13:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0086.hostedemail.com [216.40.44.86]) by kanga.kvack.org (Postfix) with ESMTP id 6EBD594000D for ; Fri, 21 May 2021 02:13:52 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id F04628248076 for ; Fri, 21 May 2021 06:13:51 +0000 (UTC) X-FDA: 78164222262.24.C95AE9F Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) by imf25.hostedemail.com (Postfix) with ESMTP id D10DE600024C for ; Fri, 21 May 2021 06:13:48 +0000 (UTC) Received: by mail-lf1-f42.google.com with SMTP id i9so27975115lfe.13 for ; Thu, 20 May 2021 23:13:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=JYeTnt817j0hDOsZg2+6QbAqKD9XseJ4wF7q34isKW0=; b=ZcDfkX6DiOjHo/ORJtlZsNPW3AhYiDjdZv3t2+5SASSeC3nXlV3uPxfkE1ZzgFGi7n QP3Etv5YCRCh3Mk1agO/XzUjJQ/5zQ030ZiFzpg0YzWVlRUbdXyrgWvYRnu1xLwm4PWT 1FT1Cxm2cKaLXtGFxz2BVR6Q9tQIL3TmwHeJA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=JYeTnt817j0hDOsZg2+6QbAqKD9XseJ4wF7q34isKW0=; b=UcEGgkCV3WRdjzz+a734P12AjtQ3m2SpKY97qsFJ/vkjSBZysXa741scF8QhQQ9hbZ BGAUAzyBh6199J7Y55rC1VTCgMU/xRHo/0NozpIJvbdaxNM/4RXGyTrhxbxzBqjcHXLp 7suAGk9TNFLOLZGBoCjArKyL4qvp9TuWSvogTbR5FuVCW07dlubjgTssyv8WXlou+vq5 cR0fa/x4pEx7N/YKGQ0rm31MzsutJdINijPX1byFn6YkH/VNA9bmHTfRyHb+7MJ+QEl5 /Xz+nI9e7Ememc4G7Gx6Gwv8moSAedYzuDaG5B81xCi86yOzXoxDqVeWKIWiPOkLUEAB Sk/g== X-Gm-Message-State: AOAM531uFEluKwGGT3+XjUPyVhHl95tfPxi7CnXk8RJwUMvPxlm0pOwZ 4Hg7E6g2NLcytjOHI4PnaohoOU2af//2Ff+P X-Google-Smtp-Source: ABdhPJyAWCGnKU01QCdWewxitW2j6xP/UZLoV8m9g8tbgm/rC7hDdAn122dHHRJ/1pHYVlz2hZd98A== X-Received: by 2002:a05:6512:1284:: with SMTP id u4mr283524lfs.49.1621577629433; Thu, 20 May 2021 23:13:49 -0700 (PDT) Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com. [209.85.167.48]) by smtp.gmail.com with ESMTPSA id h20sm300137lfg.103.2021.05.20.23.13.48 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 20 May 2021 23:13:49 -0700 (PDT) Received: by mail-lf1-f48.google.com with SMTP id b26so11982089lfq.4 for ; Thu, 20 May 2021 23:13:48 -0700 (PDT) X-Received: by 2002:ac2:4a9d:: with SMTP id l29mr1012167lfp.201.1621577628649; Thu, 20 May 2021 23:13:48 -0700 (PDT) MIME-Version: 1.0 References: <20210422054323.150993-1-aneesh.kumar@linux.ibm.com> <20210422054323.150993-8-aneesh.kumar@linux.ibm.com> <2eafd7df-65fd-1e2c-90b6-d143557a1fdc@linux.ibm.com> In-Reply-To: From: Linus Torvalds Date: Thu, 20 May 2021 20:13:32 -1000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock To: "Aneesh Kumar K.V" Cc: Linux-MM , Andrew Morton , Michael Ellerman , linuxppc-dev , Kalesh Singh , Nick Piggin , Joel Fernandes , Christophe Leroy Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=ZcDfkX6D; spf=pass (imf25.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.167.42 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: D10DE600024C X-Stat-Signature: 5u6u83q8nhfcton55h3zgb69oia6oh61 X-HE-Tag: 1621577628-827699 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 20, 2021 at 5:03 PM Aneesh Kumar K.V wrote: > > On 5/21/21 8:10 AM, Linus Torvalds wrote: > > > > So mremap does need to flush the TLB before releasing the page table > > lock, because that's the lifetime boundary for the page that got > > moved. > > How will we avoid that happening with > c49dd340180260c6239e453263a9a244da9a7c85 / > 2c91bd4a4e2e530582d6fd643ea7b86b27907151 . The commit improves mremap > performance by moving level3/level2 page table entries. When doing so we > are not holding level 4 ptl lock (pte_lock()). But rather we are holding > pmd_lock or pud_lock(). So if we move pages around without holding the > pte lock, won't the above issue happen even if we do a tlb flush with > holding pmd lock/pud lock? Hmm. Interesting. Your patch (to flush the TLB after clearing the old location, and before inserting it into the new one) looks like an "obvious" fix. But I'm putting that "obvious" in quotes, because I'm now wondering if it actually fixes anything. Lookie here: - CPU1 does a mremap of a pmd or pud. It clears the old pmd/pud, flushes the old TLB range, and then inserts the pmd/pud at the new location. - CPU2 does a page shrinker, which calls try_to_unmap, which calls try_to_unmap_one. These are entirely asynchronous, because they have no shared lock. The mremap uses the pmd lock, the try_to_unmap_one() does the rmap walk, which does the pte lock. Now, imagine that the following ordering happens with the two operations above, and a CPU3 that does accesses: - CPU2 follows (and sees) the old page tables in the old location and the took the pte lock - the mremap on CPU1 starts - cleared the old pmd, flushed the tlb, *and* inserts in the new place. - a user thread on CPU3 accesses the new location and fills the TLB of the *new* address - only now does CPU2 get to the "pte_get_and_clear()" to remove one page - CPU2 does a TLB flush and frees the page End result: - both CPU1 _and_ CPU2 have flushed the TLB. - but both flushed the *OLD* address - the page is freed - CPU3 still has the stale TLB entry pointing to the page that is now free and might be reused for something else Am I missing something? Linus