From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81C52C54E68 for ; Thu, 21 Mar 2024 13:38:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 06FDD6B0088; Thu, 21 Mar 2024 09:38:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 020BF6B008A; Thu, 21 Mar 2024 09:38:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E2B126B008C; Thu, 21 Mar 2024 09:38:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D28C46B0088 for ; Thu, 21 Mar 2024 09:38:18 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A2EEDA1EBD for ; Thu, 21 Mar 2024 13:38:18 +0000 (UTC) X-FDA: 81921150276.07.82273E5 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf15.hostedemail.com (Postfix) with ESMTP id 71E71A000D for ; Thu, 21 Mar 2024 13:38:16 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711028297; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JOeefzUXwircysLTXgg/PAedwBdUQ6yvtxDoSFE1cjY=; b=iLXMQKqdwjThp9k7ShGfdnkLRQRanzBnPIxK0nKAES686ZS1TlbEtO6WAv9YLdDhCdpb7i Z1wKJYyZRz22Hq6gHVxJw3WkFzfVSxDKR+yYI3gOxK+x/z2bmuLNAo/5LNHFgDjULjDN65 CAhW+qF3rwAy2jl4iUpk7hTO00R4nhE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711028297; a=rsa-sha256; cv=none; b=1wbY2+oTeD4D1uEr4HkFQ5OVlaHvdkLyjSG/VstavnAQL1cPo2BpTtuyUOehzYE3Wd70L0 im/MyuTAzfTESMN/OQ08+LeJqEussQPTRi+PdtqDLygKKmXMEzz8mHK/NzBZ3a2z6hkcJw uoh64HwdYobtcKoaYP5FWWmDqVr9OQw= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A66FD1007; Thu, 21 Mar 2024 06:38:49 -0700 (PDT) Received: from [10.1.33.177] (XHFQ2J9959.cambridge.arm.com [10.1.33.177]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 464183F762; Thu, 21 Mar 2024 06:38:13 -0700 (PDT) Message-ID: <269375a4-78a3-4c22-8e6e-570368a2c053@arm.com> Date: Thu, 21 Mar 2024 13:38:11 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 6/6] mm: madvise: Avoid split during MADV_PAGEOUT and MADV_COLD Content-Language: en-GB To: Lance Yang Cc: Barry Song <21cnbao@gmail.com>, Andrew Morton , David Hildenbrand , Matthew Wilcox , Huang Ying , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko , Kefeng Wang , Chris Li , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240311150058.1122862-1-ryan.roberts@arm.com> <20240311150058.1122862-7-ryan.roberts@arm.com> <7ba06704-2090-4eb2-9534-c4d467cc085a@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: d56z1hms6x4x5jdow4xyqcy1ejbrokub X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 71E71A000D X-Rspam-User: X-HE-Tag: 1711028296-35237 X-HE-Meta: U2FsdGVkX193Dl3yAZT9lKlDOH7CfvZW1CklbWE/iZ/M7Jin1bCzoXy4O7YWxDwoHXPU8ikshh6c/fl6cj2zOoc1/ZwquAuRmiE1Qg6p++MuTIz5b294uszuMZQhgEEy6GZTiN1W99zcZqCobZlveaq6ZOJUCOU2THzCW8NLInjoM6H9ulHF+6y8Q2Ig7rMQzd0OzO/vioje/Q5dDjNM4qV12TPeH3lvihzR6w//RngG9j20STS0dlaUR5eISHqXfcEoa7ApV+sBvIVjExIvI/NPtQdqcO7/LlW1gScm6ulumFqVneL/QiB538QsCWfg1BWb0YLDHbXYX7uT0oG6f2JrsRxNDd0oqa/DjfXBWGe1AhvH5pWWOiKkrapyJH2TPsYHpXtwCIdK84GgfUACWjGARfT4Clj3XwZp1oWBeC+a/SWVBaVPCZsnY2QRKgP0oIsoTjuk3AJK888kLuCKQHoZ83cGYG8kp36qqbumLvbCJChqJnS1xuKKU9dMB9sYqCZDqI3DIgqKrNr5Wb2rGRSpdKwqRtHkgird/TkWH5MpqRPm4vjNYMJAXX8hbZ+EkGlq52oR/l6CxlGqcAd9kumJFCIUWtsw88N56j086M3Av1aY1S/jHRKh30jBT6t+yq5ohVkbD4B/nSLoQk30MwaQ172YOpRBDx7Si81SaXBqJ2G4IzGZBbrf+PUZ94+bhASDvQBa/d3V7jgsufjajjD2KQl8NBh4AeHJdol8AP98Lv5zoYd+8BmjDN7JglzE6k+IgUtJvVtXr3F8/tGWl7QzWm7QFvvY0H300EUAKdVSRvA2hFvamK+HlMlG9P2zv6+MIctQuusBhV0j9WslnmuuBfgsnyYn+HkIvNhkLp8QnixZSKGalC1sMy3w0UOx7+z6u9SuSU6BFzUwwJdvKFiaGa/6t16Cu3EPt6l91KDQWfTiXMXL98DnBpYwKJ8WjWy6ZQ18S2+IC+qSCjh DtKmn6k4 FnOdM6MHMk6AP2ifzvhauVgxudy03F8SHODbEle59wlXcKFHtd5+WKkw3qXYUHTl0JfO8pladBEreJsvAfG4vgR6Asva67x123Jwh5O/imiUhghvmBct9Nkhue7lul7omJ1212LJa6OhOHNjODV0tj53vqIfDFbURY2aln/f9ugB1utff8H709kORZdBPEKnNcicdWKUcJClEK2Q8xTxDdo0gOwWgXUjQKyk7Z+Bu8jvqH5sBh9PLvfmU+5bTwN0QUVWkjiDydbFGVQ6Pyk7UA7BkY9d0mYy0B/jUR+51ftHwrT4qjHyUj2+P2s7/wZx2ICdTfPE6+HgXARa5YdJ67Pa7UQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: >>>>>>>> - VM_BUG_ON_FOLIO(folio_test_large(folio), folio); >>>>>>>> - >>>>>>>> - if (!pageout && pte_young(ptent)) { >>>>>>>> - ptent = ptep_get_and_clear_full(mm, addr, pte, >>>>>>>> - tlb->fullmm); >>>>>>>> - ptent = pte_mkold(ptent); >>>>>>>> - set_pte_at(mm, addr, pte, ptent); >>>>>>>> - tlb_remove_tlb_entry(tlb, pte, addr); >>>>>>>> + if (!pageout) { >>>>>>>> + for (; nr != 0; nr--, pte++, addr += PAGE_SIZE) { >>>>>>>> + if (ptep_test_and_clear_young(vma, addr, pte)) >>>>>>>> + tlb_remove_tlb_entry(tlb, pte, addr); >>>>> >>>>> IIRC, some of the architecture(ex, PPC) don't update TLB with set_pte_at and >>>>> tlb_remove_tlb_entry. So, didn't we consider remapping the PTE with old after >>>>> pte clearing? >>>> >>>> Sorry Lance, I don't understand this question, can you rephrase? Are you saying >>>> there is a good reason to do the original clear-mkold-set for some arches? >>> >>> IIRC, some of the architecture(ex, PPC) don't update TLB with >>> ptep_test_and_clear_young() >>> and tlb_remove_tlb_entry(). Afraid I'm still struggling with this comment. Do you mean to say that powerpc invalidates the TLB entry as part of the call to ptep_test_and_clear_young()? So tlb_remove_tlb_entry() would be redundant here, and likely cause performance degradation on that architecture? IMHO, ptep_test_and_clear_young() really shouldn't be invalidating the TLB entry, that's what ptep_clear_flush_young() is for. But I do see that for some cases of the 32-bit ppc, there appears to be a flush: #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG static inline int __ptep_test_and_clear_young(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { unsigned long old; old = pte_update(mm, addr, ptep, _PAGE_ACCESSED, 0, 0); if (old & _PAGE_HASHPTE) flush_hash_entry(mm, ptep, addr); <<<<<<<< return (old & _PAGE_ACCESSED) != 0; } #define ptep_test_and_clear_young(__vma, __addr, __ptep) \ __ptep_test_and_clear_young((__vma)->vm_mm, __addr, __ptep) Is that what you are describing? Does any anyone know why flush_hash_entry() is called? I'd say that's a bug in ppc and not a reason not to use ptep_test_and_clear_young() in the common code! Thanks, Ryan >> >> Err, I assumed tlb_remove_tlb_entry() meant "invalidate the TLB entry for this >> address please" - albeit its deferred and batched. I'll look into this. >> >>> >>> In my new patch[1], I use refresh_full_ptes() and >>> tlb_remove_tlb_entries() to batch-update the >>> access and dirty bits. >> >> I want to avoid the per-pte clear-modify-set approach, because this doesn't >> perform well on arm64 when using contpte mappings; it will cause the contpe >> mapping to be unfolded by the first clear that touches the contpte block, then >> refolded by the last set to touch the block. That's expensive. >> ptep_test_and_clear_young() doesn't suffer that problem. > > Thanks for explaining. I got it. > > I think that other architectures will benefit from the per-pte clear-modify-set > approach. IMO, refresh_full_ptes() can be overridden by arm64. > > Thanks, > Lance >> >>> >>> [1] https://lore.kernel.org/linux-mm/20240316102952.39233-1-ioworker0@gmail.com >>> >>> Thanks, >>> Lance >>> >>>> >>>>> >>>>> Thanks, >>>>> Lance >>>>> >>>>> >>>>> >>>>>>>> + } >>>>>>> >>>>>>> This looks so smart. if it is not pageout, we have increased pte >>>>>>> and addr here; so nr is 0 and we don't need to increase again in >>>>>>> for (; addr < end; pte += nr, addr += nr * PAGE_SIZE) >>>>>>> >>>>>>> otherwise, nr won't be 0. so we will increase addr and >>>>>>> pte by nr. >>>>>> >>>>>> Indeed. I'm hoping that Lance is able to follow a similar pattern for >>>>>> madvise_free_pte_range(). >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> /* >>>>>>>> -- >>>>>>>> 2.25.1 >>>>>>>> >>>>>>> >>>>>>> Overall, LGTM, >>>>>>> >>>>>>> Reviewed-by: Barry Song >>>>>> >>>>>> Thanks! >>>>>> >>>>>> >>>> >>