From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=h896=BH=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4BE74C433E1
	for <linux-mm@archiver.kernel.org>; Tue, 28 Jul 2020 00:38:24 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 05DAD20809
	for <linux-mm@archiver.kernel.org>; Tue, 28 Jul 2020 00:38:23 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="Y9ZcA3LM"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 05DAD20809
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 76A5F6B0002; Mon, 27 Jul 2020 20:38:23 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 719E06B0005; Mon, 27 Jul 2020 20:38:23 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 60BC16B0006; Mon, 27 Jul 2020 20:38:23 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0252.hostedemail.com [216.40.44.252])
	by kanga.kvack.org (Postfix) with ESMTP id 48A7C6B0002
	for <linux-mm@kvack.org>; Mon, 27 Jul 2020 20:38:23 -0400 (EDT)
Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay02.hostedemail.com (Postfix) with ESMTP id D68FE3622
	for <linux-mm@kvack.org>; Tue, 28 Jul 2020 00:38:22 +0000 (UTC)
X-FDA: 77085623244.14.table65_4205cb526f65
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin14.hostedemail.com (Postfix) with ESMTP id B14561822987A
	for <linux-mm@kvack.org>; Tue, 28 Jul 2020 00:38:22 +0000 (UTC)
X-HE-Tag: table65_4205cb526f65
X-Filterd-Recvd-Size: 7533
Received: from mail-lj1-f194.google.com (mail-lj1-f194.google.com [209.85.208.194])
	by imf01.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue, 28 Jul 2020 00:38:22 +0000 (UTC)
Received: by mail-lj1-f194.google.com with SMTP id d17so19249018ljl.3
        for <linux-mm@kvack.org>; Mon, 27 Jul 2020 17:38:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linux-foundation.org; s=google;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=Ta7oAGSOk1hpMj1IM8G7zlCFfLXZkJdHtbS5NNv18qc=;
        b=Y9ZcA3LMPXiXY0pSaCbmSQKEwtV5OEziKnfGuRteee6ZN/rSR4GQS6+u60HY9VjcuA
         KQBIRPnQtsFAEJ5tXZ6pXgmvAVPSNELYW/gdJEzo1suuHRpd4tAJyZHFxQtJkTg3PtVY
         GEvWDRCLJdQRxGNqMWjw9kAUDIsm0lRk4fZPc=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=Ta7oAGSOk1hpMj1IM8G7zlCFfLXZkJdHtbS5NNv18qc=;
        b=r+JyARrX6NOT25lRGfdIJ9va+C5gSl4TFJde8nAC4oySMPv+TVdCouLq6khWLet/R/
         1w9UHDbRxL7UufcJqLTF1mM64zzz1Cu/+kfxyzGXtYZs9K585PHbMsGdBYsMm1ys+7c2
         hmRVxuZo9iPozYOopAjlzR6WLtOASpMNLVWXqEtf2P6DL6tdg+cwtfUpCyY8HuYT7YCS
         KLeEWEEptCIn9pI7HWL7OYPNkbxzrQOvp9rYKNrhbIx5Gh3ZfxcwM/CLRB7AoNlFFdou
         qVfI9TqpbBOgTproAH78GZfxrrI/MsG2Vr1k7Sn9TtIIpEefR2p9dOJWwJTgvFhGVhi2
         G9rQ==
X-Gm-Message-State: AOAM532J9/iQrccwQAijCFbvvNtnQVQtv17OdUEciaXb3ZrwOcPoxDFR
	WDSiPiSkwCQE9anBfYtXckx90pVqFNs=
X-Google-Smtp-Source: ABdhPJwbX17DvIvbwJCMgoHkaYENcb8q6m7A7Aq7YGMGc3UI0cz3RloFEXtKOzlxKzFHvmokDM9BMw==
X-Received: by 2002:a2e:3003:: with SMTP id w3mr11415360ljw.273.1595896699802;
        Mon, 27 Jul 2020 17:38:19 -0700 (PDT)
Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com. [209.85.208.179])
        by smtp.gmail.com with ESMTPSA id z8sm3365762lfd.58.2020.07.27.17.38.18
        for <linux-mm@kvack.org>
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Mon, 27 Jul 2020 17:38:18 -0700 (PDT)
Received: by mail-lj1-f179.google.com with SMTP id q4so19277243lji.2
        for <linux-mm@kvack.org>; Mon, 27 Jul 2020 17:38:17 -0700 (PDT)
X-Received: by 2002:a2e:9c92:: with SMTP id x18mr10338762lji.70.1595896697586;
 Mon, 27 Jul 2020 17:38:17 -0700 (PDT)
MIME-Version: 1.0
References: <20200723211432.b31831a0df3bc2cbdae31b40@linux-foundation.org>
 <20200724041508.QlTbrHnfh%akpm@linux-foundation.org> <CAHk-=wguPA=pDskR-eMMjwR5LDEaMXrqbmDbrKr0u=wV1LE4rg@mail.gmail.com>
 <CAHk-=wh4kmU5FdT=Yy7N9wA=se=ALbrquCrOkjCMhiQnOBLvDA@mail.gmail.com>
 <0323de82-cfbd-8506-fa9c-a702703dd654@linux.alibaba.com> <20200727110512.GB25400@gaia>
 <39560818-463f-da3a-fc9e-3a4a0a082f61@linux.alibaba.com> <eb1f5cb4-7c3d-df42-f4aa-804e12df45e2@linux.alibaba.com>
 <CAHk-=wha6f0gF1SJg96R77h0oTuc_oO7-37wD=mYGy6TyJOwbQ@mail.gmail.com> <89c6671a-39ba-d1cc-9bac-2e6717916220@linux.alibaba.com>
In-Reply-To: <89c6671a-39ba-d1cc-9bac-2e6717916220@linux.alibaba.com>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 27 Jul 2020 17:38:01 -0700
X-Gmail-Original-Message-ID: <CAHk-=wg8vswrKBhKSzfyLJ-UVENeydK-RMVEHT+puDPyWr+bnQ@mail.gmail.com>
Message-ID: <CAHk-=wg8vswrKBhKSzfyLJ-UVENeydK-RMVEHT+puDPyWr+bnQ@mail.gmail.com>
Subject: Re: [patch 01/15] mm/memory.c: avoid access flag update TLB flush for
 retried page fault
To: Yang Shi <yang.shi@linux.alibaba.com>
Cc: linux-arch <linux-arch@vger.kernel.org>, Yu Xu <xuyu@linux.alibaba.com>, 
	Catalin Marinas <catalin.marinas@arm.com>, Andrew Morton <akpm@linux-foundation.org>, 
	Johannes Weiner <hannes@cmpxchg.org>, Hillf Danton <hdanton@sina.com>, Hugh Dickins <hughd@google.com>, 
	Josef Bacik <josef@toxicpanda.com>, 
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, Linux-MM <linux-mm@kvack.org>, 
	mm-commits@vger.kernel.org, Will Deacon <will.deacon@arm.com>, 
	Matthew Wilcox <willy@infradead.org>
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: B14561822987A
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam04
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Mon, Jul 27, 2020 at 3:43 PM Yang Shi <yang.shi@linux.alibaba.com> wrote:
>
> With the commit ("mm: drop mmap_sem before calling balance_dirty_pages()
> in write fault") the retried fault may happen much more frequently than
> before since it would drop mmap lock as long as dirty throttling happens.

Sure. And that probably explains why it shows up as a regression.

That said, the fact that it showed up as a regression that clearly
probably means that the whole spurious TLB flush has always been a
low-grade problem, and the extra retries just made it much more
noticeable because now there was a change to it.

The fact that we have that (very questionable) optimization to only do
it for writes kind of reinforces that notion - it has happened before,
it's just never been fixed properly, and it's just never been
noticeable on most machines because this is all a no-op on x86.

I think Catalin's patch - with some way to fix the problem with KVM -
is the way to go.

That said, testing FAULT_FLAG_TRIED and suppressing the spurious TLB
fill for that case is certainly always safe. At worst, we'll take
another fault, and then do the TLB flush at _that_ point when not
retrying.

So it's the FAULT_FLAG_WRITE test that I think is bogus, or at least
should be protected by some architecture decision (with a comment
about why it's ok for that architecture, ie the ARM kind of "old PTE's
will never be in the TLB, and if it's not a write fault we know it
doesn't depend on the dirty bit either")

Of course, it may be that on every architecture that requires SW
accessed bits, the "old PTE's will never be in the TLB is true".

Except I think I know at least one architecture where that isn't true.
On alpha, the way the acccessed bit works is exactly the same way the
dirty bit works - except it's done for reads, instead of writes.

So on at least one architecture, access faults and dirty faults are
100% equivalent, just using read/write bits respectively.

Of course, alpha doesn't really matter any more. But it's an example
of an architecture where "old" does not necessarily mean "cannot be in
the TLB", and where testing for FAULT_FLAG_WRITE looks buggy.

Again: I think in practice, it's really *really* hard to hit the
problem with accessed bits, unlike dirty bits. Normally, PTE's are all
instantiated young if they are in the TLB. You have to kind of work at
it to get an old PTE _and_ then hit the "now access it exactly at the
same time from two different CPU's, and watch one CPU keep taking page
faults forever because it never flushes its TLB entry".

Of course, it is so long since I worked with alpha that maybe there's
some other reason this can't happen. Like "PAL-code always flushes the
TLB entry of the faulting address".

Which all hardware should do, dammit. It's all kinds of stupid to
cache a faulting TLB entry. The fault is thousands of times more
expensive than a reload would be, even if  it were intentional and
done repeatedly (which sounds like an insane thing to optimize for
anyway).

So one way to fix this problem would be to just specify that "every
pagefault handler _must_ flush the local-CPU TLB entry that the fault
happened for if the architecture doesn't already do that in hardware
or microcode".

And then we'd just remove the spurious TLB flush code entirely.

                     Linus