From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D17F1C25B74 for ; Thu, 30 May 2024 08:26:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 066416B0093; Thu, 30 May 2024 04:26:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 016806B009D; Thu, 30 May 2024 04:26:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1FCE6B009E; Thu, 30 May 2024 04:26:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C5AE46B009C for ; Thu, 30 May 2024 04:26:13 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 782C01A07D6 for ; Thu, 30 May 2024 08:26:13 +0000 (UTC) X-FDA: 82174379826.05.A46063E Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) by imf16.hostedemail.com (Postfix) with ESMTP id 3352518000C for ; Thu, 30 May 2024 08:26:09 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=VSyGxUHB; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf16.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.9 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717057571; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QPnMoEJ01eTdxAEzYQpzh+d+zy7AgMiYQj/oNOHNP+k=; b=yz8/TysFM/Z5/x/6KrHzGAbRbhLd3J272tE3ty8Q9oAVnQsZo6DVJfdKovFN+/jaYjroZw JOpfbfDfpfrHHNJzOcjmQu/A3QjygdkYCvQu+lsF5bYjkWqPWyJLE6m6jGEK5yMoAKyFXN X0nFUtvUXUOYJQcKHRt9dSr+UsbSr0M= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=VSyGxUHB; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf16.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.9 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717057571; a=rsa-sha256; cv=none; b=H7z3tA3AHT9PqGYF9T8CbcBTSg5dUF8H6QUUTZt9T0u7qTnusq2YqJfyjzReh/2ytTk40A qh+ntHf0IJhG9WeAT9qmOo420/OnTXXvncTcaemKpWaj0QCZz1haIobMegSLGkIH45z03N 1m7enkvOkY4Q7ZUyc2u1aRJ+uI4V5wU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1717057571; x=1748593571; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=vgiAh5vIFdBs0VMnES8cHQ74CxqIaKPI5QnJwm7Kn9I=; b=VSyGxUHBT+tyi7VR27V+vZRtLElYmlB2LFTbG0DlJs40lA2y/2e+DKng hIivgzyjio1PG4duj9CH5riM5yBJ/b8FfadkxUPL/wurgFdBeV7PGfJGk w+6iOqHm1TdGGQxf76fRQz+BrCXaaNv0ipZZn5alFQKG7ijwWH3uLCSRG fWXD/qe+xCNaoTkO8DOxa79AsHjpzMgpquj+zleSqF21aslHe6Jb3RPfy IgrZB+Qwax1ZY+6o+MVRVlGt6QyK3x7/2ywWQ/UIF/yV00r5vnp1Jhnzn GrHFH5xoE+h7alp+l+wvMrXVkpiq4uDfew44Dn/5hMwwgphM6oSsYw7Dq Q==; X-CSE-ConnectionGUID: mOz5AnuHSRWFAWyOjYmQYQ== X-CSE-MsgGUID: KFSb2z0fRDaSanYR9Px4lg== X-IronPort-AV: E=McAfee;i="6600,9927,11087"; a="36039791" X-IronPort-AV: E=Sophos;i="6.08,199,1712646000"; d="scan'208";a="36039791" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 May 2024 01:26:09 -0700 X-CSE-ConnectionGUID: BxO9lYlDReCtZtbm6T1nTg== X-CSE-MsgGUID: aDlGNP4OQZajZalKmOnGsw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,199,1712646000"; d="scan'208";a="35718839" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 May 2024 01:26:04 -0700 From: "Huang, Ying" To: Byungchul Park Cc: Dave Hansen , , , , , , , , , , , , , , , , Subject: Re: [PATCH v10 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% In-Reply-To: <20240530071847.GA15344@system.software.com> (Byungchul Park's message of "Thu, 30 May 2024 16:18:47 +0900") References: <20240510065206.76078-1-byungchul@sk.com> <982317c0-7faa-45f0-82a1-29978c3c9f4d@intel.com> <20240527015732.GA61604@system.software.com> <8734q46jc8.fsf@yhuang6-desk2.ccr.corp.intel.com> <44e4f2fd-e76e-445d-b618-17a6ec692812@intel.com> <20240529050046.GB20307@system.software.com> <961f9533-1e0c-416c-b6b0-d46b97127de2@intel.com> <20240530005026.GA47476@system.software.com> <87a5k814tq.fsf@yhuang6-desk2.ccr.corp.intel.com> <20240530071847.GA15344@system.software.com> Date: Thu, 30 May 2024 16:24:12 +0800 Message-ID: <871q5j1zdf.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 3352518000C X-Stat-Signature: fukmdtj9kofsdqx8hbhnspr5wbjsm784 X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1717057569-602298 X-HE-Meta: U2FsdGVkX19aL3haLZx/nfXmsz3gXW6qHUv8HchUtCg0jD3Mdn4mAOQum0xxRIlm1RvvOdxJeMGzNYNdihyYLJ5ew7aArea286h+Zlt0PfqSyu7p86ANH/Sm3tlrMRon5Vj5ah8m9PYWZlUf2vd3Ipid0p6qRwbZON0RNzxsEP7R35QCfQVKhvpWVpLvpYcN40UfiSQlGphqS0otIFAUZb6pkH4+RONxlRNZjH5MrDI3KaZQ1E7d1beWTHA59b77UNnFu1eVKZsO1x638B+eaGaZ5WbCs2Rq7RwHC4ADQi8cKcZ/bhRib+8mvbuNVXKiGgqI8ZG/F4NAr38W/VnioNxoakgmWjfDqoJJ6XH41TneUgDZMHgOceaAUxGi/uEQXZhTjEEOeG2evY5jNLEc/Owu/FvhI9XZp5KxVVv/aVTCV2pLT6VE7BoIc/UbGZBTOvIRYW5hkOkLEv/xJtq8IzYNIPzazcpMmTb5pedzS7y2hWpyeG++Sp0LivA6EixYH7VDPT1UCT73MLj64zupWZx1oeJ+nnCDFKsbgF8M46sYJ0UNEfwIoB7CljlMdN1OOhXSEzYnvnyoa/H+qIrlCwgjOFcJIMlgKGHsg8zTlYWTE+R4iuWf6j6sp9HRurdcl8pwn2YmSnzbX04RRmxfRtK0iuTjdqtytdJ3uzosQfq3cnLfnNCSMh0u3ssR5Md2z/toLaP3Kf4+TmSfXOm2eFXNvhiVnigRnQ9B/O2S3YoX3nhxLHkkTgTwOJhsXkxr5C/FZ15izWLwO+7GwfJDYu0ROyMOXpc2B59xTOyy3rCbRG0kBHE5JjyRJbKQ+hlG6TnMriKIUMsCi5e82i5S8btrnUyvmuP3rfnhMLuXD4uMpxMiCnTPjOvnYFzeU5MPKZDsjZPcdqMvsvwwIT8YU1rESG16LlkvxtsR87kMkECar4cZ9nkkeZJzezuHsZKmOlOTGw+j0Ja8PyT1hfW RSEu7roq rdf3lpSqthgv3y27l2u3sT4LWJVnwJPAJXMHfLAIEOsagFs3txO3zzkvlLRUG3EX8eHCm5/FvDIoVYK0rKwzQBtYSA7YpfJxaukyN0pvff+8PjLljLEuTmJBflsftDEtH+ypbDo7C+O1Mn6/MjX4d0LnCLY1W+TVK7X1S/v6pYNkPNHYJhlxlla1K3qGtPmAiWkL3fFMG/1swlyc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Byungchul Park writes: > On Thu, May 30, 2024 at 09:11:45AM +0800, Huang, Ying wrote: >> Byungchul Park writes: >> >> > On Wed, May 29, 2024 at 09:41:22AM -0700, Dave Hansen wrote: >> >> On 5/28/24 22:00, Byungchul Park wrote: >> >> > All the code updating ptes already performs TLB flush needed in a safe >> >> > way if it's inevitable e.g. munmap. LUF which controls when to flush in >> >> > a higer level than arch code, just leaves stale ro tlb entries that are >> >> > currently supposed to be in use. Could you give a scenario that you are >> >> > concering? >> >> >> >> Let's go back this scenario: >> >> >> >> fd = open("/some/file", O_RDONLY); >> >> ptr1 = mmap(-1, size, PROT_READ, ..., fd, ...); >> >> foo1 = *ptr1; >> >> >> >> There's a read-only PTE at 'ptr1'. Right? The page being pointed to is >> >> eligible for LUF via the try_to_unmap() paths. In other words, the page >> >> might be reclaimed at any time. If it is reclaimed, the PTE will be >> >> cleared. >> >> >> >> Then, the user might do: >> >> >> >> munmap(ptr1, PAGE_SIZE); >> >> >> >> Which will _eventually_ wind up in the zap_pte_range() loop. But that >> >> loop will only see pte_none(). It doesn't do _anything_ to the 'struct >> >> mmu_gather'. >> >> >> >> The munmap() then lands in tlb_flush_mmu_tlbonly() where it looks at the >> >> 'struct mmu_gather': >> >> >> >> if (!(tlb->freed_tables || tlb->cleared_ptes || >> >> tlb->cleared_pmds || tlb->cleared_puds || >> >> tlb->cleared_p4ds)) >> >> return; >> >> >> >> But since there were no cleared PTEs (or anything else) during the >> >> unmap, this just returns and doesn't flush the TLB. >> >> >> >> We now have an address space with a stale TLB entry at 'ptr1' and not >> >> even a VMA there. There's nothing to stop a new VMA from going in, >> >> installing a *new* PTE, but getting data from the stale TLB entry that >> >> still hasn't been flushed. >> > >> > Thank you for the explanation. I got you. I think I could handle the >> > case through a new flag in vma or something indicating LUF has deferred >> > necessary TLB flush for it during unmapping so that mmu_gather mechanism >> > can be aware of it. Of course, the performance change should be checked >> > again. Thoughts? >> >> I suggest you to start with the simple case. That is, only support page >> reclaiming and migration. A TLB flushing can be enforced during unmap >> with something similar as flush_tlb_batched_pending(). > > While reading flush_tlb_batched_pending(mm), I found it already performs > TLB flush for the target mm, if set_tlb_ubc_flush_pending(mm) has been > hit at least once since the last flush_tlb_batched_pending(mm). > > Since LUF also relies on set_tlb_ubc_flush_pending(mm), it's going to > perform TLB flush required, in flush_tlb_batched_pending(mm) during > munmap(). So it looks safe to me with regard to munmap() already. > > Is there something that I'm missing? > > JFYI, regarding to mmap(), I have reworked on fault handler to give up > luf when needed in a better way. If TLB flush is always enforced during munmap(), then your solution can only avoid TLB flushing for page reclaiming and migration, not unmap. Or do I miss something? -- Best Regards, Huang, Ying