From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 325FCC2B9F8 for ; Tue, 25 May 2021 17:22:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C354461042 for ; Tue, 25 May 2021 17:22:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C354461042 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EB1A36B0036; Tue, 25 May 2021 13:22:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E5CFC6B006C; Tue, 25 May 2021 13:22:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 934B46B006E; Tue, 25 May 2021 13:22:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0176.hostedemail.com [216.40.44.176]) by kanga.kvack.org (Postfix) with ESMTP id 53B076B0036 for ; Tue, 25 May 2021 13:22:55 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id DF54EA745 for ; Tue, 25 May 2021 17:22:54 +0000 (UTC) X-FDA: 78180423468.32.BF8F4D3 Received: from mail-lj1-f182.google.com (mail-lj1-f182.google.com [209.85.208.182]) by imf17.hostedemail.com (Postfix) with ESMTP id 7C84C42D31E5 for ; Tue, 25 May 2021 17:22:50 +0000 (UTC) Received: by mail-lj1-f182.google.com with SMTP id b12so31946793ljp.1 for ; Tue, 25 May 2021 10:22:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=xRx+VnzGk9dlAHv6Rxoyzid/pdiG6eDnf+v/WynSiE4=; b=ZomZk8u97bUZ1sx6MNxMxtSPcIPBjfeVr9LwuA2E6Lwg0hpueAm9xZPzT59HmBYmr6 txikVNOjywwe9ND2TjgVuc6LOpOGqvvS1HeRvXp5cDL5wEx+S75UHAyTLTRiSQsEt+c/ md6vwaw6NQ47hmX+pY/yQWvNGAiXGX2YRbtJs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=xRx+VnzGk9dlAHv6Rxoyzid/pdiG6eDnf+v/WynSiE4=; b=LhLfENaXNAUbeIFS19O6XpwfWw1pXg14OCXb7rddugu0ga164J8GvFsTT9GcizVj85 fKvHonsIN4BakLGnDb00tqbZ9rJcWvGvuiC6fG38F4MerZ5kVUNm/F7YT0wctByY4h7G KrCk0I6FVTi0SCAq7odeFrwUHi3SNemBnMMnr8FSzla+zHhB8zZlLpYcnr5Xeg0yhKbw TqfyTs3JI8EfqHTNOtMmmXxopbjWPC+s0PAC41F/BHzADy55TP2+tGmbxL3ZoPnCmcGx za34kfNn+a4pstkHQlikXskKsUkG/G46CqBH3z+2da9bQ4qc6Yf7N/Km1kpv1P5sFBaL Q5fA== X-Gm-Message-State: AOAM533q4QTlgt/PcXljCzR89rCzmNMZdO7qIfKpEXJ5AmvUrqVz3qND PNVyXRncqUToZLwkjeMMf7zKRedAUuR3Auj5NKM= X-Google-Smtp-Source: ABdhPJxBirrP0QIA9RTDdhhy4ChldroSP5S5YB8zkVRLglmddgy2t0N32UaAie2TjZo3yDHTjOYS4g== X-Received: by 2002:a2e:a314:: with SMTP id l20mr21162994lje.167.1621963372187; Tue, 25 May 2021 10:22:52 -0700 (PDT) Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com. [209.85.208.177]) by smtp.gmail.com with ESMTPSA id j8sm1792366lfh.192.2021.05.25.10.22.51 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 25 May 2021 10:22:51 -0700 (PDT) Received: by mail-lj1-f177.google.com with SMTP id v5so39148910ljg.12 for ; Tue, 25 May 2021 10:22:51 -0700 (PDT) X-Received: by 2002:a05:651c:333:: with SMTP id b19mr21270548ljp.61.1621963370856; Tue, 25 May 2021 10:22:50 -0700 (PDT) MIME-Version: 1.0 References: <20210524133818.84955-1-aneesh.kumar@linux.ibm.com> <87pmxf6w4m.fsf@linux.ibm.com> In-Reply-To: <87pmxf6w4m.fsf@linux.ibm.com> From: Linus Torvalds Date: Tue, 25 May 2021 07:22:34 -1000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v6 updated 9/11] mm/mremap: Fix race between mremap and pageout To: "A lneesh Kumar K.V" Cc: Linux-MM , Andrew Morton , Michael Ellerman , linuxppc-dev , Kalesh Singh , Nick Piggin , Joel Fernandes , Christophe Leroy Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=ZomZk8u9; dmarc=none; spf=pass (imf17.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.182 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org X-Stat-Signature: 5jujcpzgge8knjt1j37tf9kjr9w9z1pp X-Rspamd-Queue-Id: 7C84C42D31E5 X-Rspamd-Server: rspam02 X-HE-Tag: 1621963370-613674 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, May 24, 2021 at 10:44 PM A lneesh Kumar K.V wrote: > > Should we worry about the below race. The window would be small > > CPU 1 CPU 2 CPU 3 > > mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one > > mmap_write_lock_killable() > > addr = old_addr > > lock(pmd_ptl) > pmd = *old_pmd > pmd_clear(old_pmd) > flush_tlb_range(old_addr) > > lock(pte_ptl) > *new_pmd = pmd > unlock(pte_ptl) > > unlock(pmd_ptl) > lock(pte_ptl) > *new_addr = 10; and fills > TLB with new addr > and old pfn > > ptep_clear_flush(old_addr) > old pfn is free. > Stale TLB entry Hmm. Do you need a third CPU there? What is done above on CPU3 looks like it might just be CPU1 accessing the new range immediately. Which doesn't actually sound at all unlikely - so maybe the window is small, but it sounds like something that could happen. This looks nasty. The page shrinker has always been problematic because it basically avoids the normal full set of locks. I wonder if we could just make the page shrinker try-lock the mmap_sem and avoid all this that way. It _is_ allowed to fail, after all, and the page shrinker is "not normal" and should be less of a performance issue than all the actual normal VM paths. Does anybody have any good ideas? > > And new optimization for empty pmd, which seems unrelated to the > > change and should presumably be separate: > > That was added that we can safely do pte_lockptr() below Oh, because pte_lockptr() doesn't actually use the "old_pmd" pointer value - it actually *dereferences* the pointer. That looks like a mis-design. Why does it do that? Why don't we pass it the pmd value, if that's what it wants? Linus