From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CCAEEC28B2B
	for <linux-mm@archiver.kernel.org>; Thu, 18 Aug 2022 06:34:56 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 64EDE6B0073; Thu, 18 Aug 2022 02:34:56 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 5FECB8D0003; Thu, 18 Aug 2022 02:34:56 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 4C5E68D0002; Thu, 18 Aug 2022 02:34:56 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id 3DA976B0073
	for <linux-mm@kvack.org>; Thu, 18 Aug 2022 02:34:56 -0400 (EDT)
Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay02.hostedemail.com (Postfix) with ESMTP id C6B1B12099C
	for <linux-mm@kvack.org>; Thu, 18 Aug 2022 06:34:55 +0000 (UTC)
X-FDA: 79811750550.11.6999E90
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
	by imf07.hostedemail.com (Postfix) with ESMTP id 848424005C
	for <linux-mm@kvack.org>; Thu, 18 Aug 2022 06:34:54 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1660804494; x=1692340494;
  h=from:to:cc:subject:references:date:in-reply-to:
   message-id:mime-version:content-transfer-encoding;
  bh=+do2uA89fmz0CDWVL3JlfrwCb7Glq9Xqu12lfvGSbEI=;
  b=dAXsHA5pWUbNxhOV4F7+xjVvDp6q7MhguQYGEd1TEUYkKceg06Yqgt65
   4gKON61Z/3ioAiHdlbLqSA+1L2Ln3dltm5Rr+5k/DRMT2YUJICDOfz5WY
   BTasOOFRH0Sw1z/WrZNNCI7t8SgsDzgIFkhTBTWtSLjAOJEpr+j/1S3sC
   V+z9yfatRBg+o9Dt8/69I8Wfq0nTSyxVPoR0G9zXKVYI5RVZsJInoFWDi
   TemCxeuzcrYpA6b7uWQr+nN2uBJIJPuT38AO4jyrACvllN/d+QcNLBsHV
   BnyB/7im794IULXd+fg8mQDQ/W3/48X76GtATQ7KIG1L645dZ0EuuSHX+
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10442"; a="279644809"
X-IronPort-AV: E=Sophos;i="5.93,245,1654585200"; 
   d="scan'208";a="279644809"
Received: from fmsmga005.fm.intel.com ([10.253.24.32])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Aug 2022 23:34:52 -0700
X-IronPort-AV: E=Sophos;i="5.93,245,1654585200"; 
   d="scan'208";a="935680965"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55])
  by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Aug 2022 23:34:48 -0700
From: "Huang, Ying" <ying.huang@intel.com>
To: Peter Xu <peterx@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>,  Alistair Popple
 <apopple@nvidia.com>,  huang ying <huang.ying.caritas@gmail.com>,  Linux
 MM <linux-mm@kvack.org>,  Andrew Morton <akpm@linux-foundation.org>,  LKML
 <linux-kernel@vger.kernel.org>,  "Sierra Guiza, Alejandro (Alex)"
 <alex.sierra@amd.com>,  Felix Kuehling <Felix.Kuehling@amd.com>,  Jason
 Gunthorpe <jgg@nvidia.com>,  John Hubbard <jhubbard@nvidia.com>,  David
 Hildenbrand <david@redhat.com>,  Ralph Campbell <rcampbell@nvidia.com>,
  Matthew Wilcox <willy@infradead.org>,  Karol Herbst <kherbst@redhat.com>,
  Lyude Paul <lyude@redhat.com>,  Ben Skeggs <bskeggs@redhat.com>,  Logan
 Gunthorpe <logang@deltatee.com>,  paulus@ozlabs.org,
  linuxppc-dev@lists.ozlabs.org,  stable@vger.kernel.org
Subject: Re: [PATCH v2 1/2] mm/migrate_device.c: Copy pte dirty bit to page
References: <6e77914685ede036c419fa65b6adc27f25a6c3e9.1660635033.git-series.apopple@nvidia.com>
	<CAC=cRTPGiXWjk=CYnCrhJnLx3mdkGDXZpvApo6yTbeW7+ZGajA@mail.gmail.com>
	<Yvv/eGfi3LW8WxPZ@xz-m1.local> <871qtfvdlw.fsf@nvdebian.thelocal>
	<YvxWUY9eafFJ27ef@xz-m1.local> <87o7wjtn2g.fsf@nvdebian.thelocal>
	<87tu6bbaq7.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<1D2FB37E-831B-445E-ADDC-C1D3FF0425C1@gmail.com>
	<Yv1BJKb5he3dOHdC@xz-m1.local>
Date: Thu, 18 Aug 2022 14:34:45 +0800
In-Reply-To: <Yv1BJKb5he3dOHdC@xz-m1.local> (Peter Xu's message of "Wed, 17
	Aug 2022 15:27:32 -0400")
Message-ID: <87czcyawl6.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660804495; a=rsa-sha256;
	cv=none;
	b=utHVEdBWMVvFT/LLDKZE0cupEGvnSWDIfif7VSI2CMu4U4d5nRvE/7thMrD0TLdErAqaPr
	tLZOEWmlpDbUb2QK7AVp5IDCcuLEuySMKVth3rJorSgtoAJ+IPy2NjbFHztOF+3jNvbsyY
	M9RSBFuIfm/u5vpkdyV7gHneKD/PPbU=
ARC-Authentication-Results: i=1;
	imf07.hostedemail.com;
	dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=dAXsHA5p;
	spf=pass (imf07.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1660804495;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=g8FhtQu1Gf0lwdMlNQfHvyMKNYshi+G19JSSnM8kK8g=;
	b=e1ZYlAj8viQfcqg1L5xoaTm/EXBbGovH1Eb9Wcx9ULXWLFA4oQyHg9xFIlVAYlZM0c5V7Z
	XOpgD18HIgE+rbBjz0HSTTmP3fUalERhE9KypZYx7KFVdNKOEKUCzC+nSgF8rif/a2vYeI
	M+tfTo/U9v3mVqCn0QVwd00EJ0QJ1kk=
Authentication-Results: imf07.hostedemail.com;
	dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=dAXsHA5p;
	spf=pass (imf07.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
X-Rspam-User: 
X-Rspamd-Server: rspam12
X-Stat-Signature: j3qro9p7bfrwzksq4gjqaxtyy6z53y5z
X-Rspamd-Queue-Id: 848424005C
X-HE-Tag: 1660804494-26814
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Peter Xu <peterx@redhat.com> writes:

> On Wed, Aug 17, 2022 at 02:41:19AM -0700, Nadav Amit wrote:
>> 4. Having multiple TLB flushing infrastructures makes all of these
>> discussions very complicated and unmaintainable. I need to convince myse=
lf
>> in every occasion (including this one) whether calls to
>> flush_tlb_batched_pending() and tlb_flush_pending() are needed or not.
>>=20
>> What I would like to have [3] is a single infrastructure that gets a
>> =E2=80=9Cticket=E2=80=9D (generation when the batching started), the old=
 PTE and the new PTE
>> and checks whether a TLB flush is needed based on the arch behavior and =
the
>> current TLB generation. If needed, it would update the =E2=80=9Cticket=
=E2=80=9D to the new
>> generation. Andy wanted a ring for pending TLB flushes, but I think it i=
s an
>> overkill with more overhead and complexity than needed.
>>=20
>> But the current situation in which every TLB flush is a basis for long
>> discussions and prone to bugs is impossible.
>>=20
>> I hope it helps. Let me know if you want me to revive the patch-set or o=
ther
>> feedback.
>>=20
>> [1] https://lore.kernel.org/all/20220711034615.482895-5-21cnbao@gmail.co=
m/
>> [2] https://lore.kernel.org/all/20220718120212.3180-13-namit@vmware.com/
>> [3] https://lore.kernel.org/all/20210131001132.3368247-16-namit@vmware.c=
om/
>
> I need more reading on tlb code and also [3] which looks useful to me.
> It's definitely sad to make tlb flushing so complicated.  It'll be great =
if
> things can be sorted out someday.
>
> In this specific case, the only way to do safe tlb batching in my mind is:
>
> 	pte_offset_map_lock();
> 	arch_enter_lazy_mmu_mode();
>         // If any pending tlb, do it now
>         if (mm_tlb_flush_pending())
> 		flush_tlb_range(vma, start, end);
>         else
>                 flush_tlb_batched_pending();

I don't think we need the above 4 lines.  Because we will flush TLB
before we access the pages.  Can you find any issue if we don't use the
above 4 lines?

Best Regards,
Huang, Ying

>         loop {
>                 ...
>                 pte =3D ptep_get_and_clear();
>                 ...
>                 if (pte_present())
>                         unmapped++;
>                 ...
>         }
> 	if (unmapped)
> 		flush_tlb_range(walk->vma, start, end);
> 	arch_leave_lazy_mmu_mode();
> 	pte_unmap_unlock();
>
> I may miss something, but even if not it already doesn't look pretty.
>
> Thanks,