From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03467C4167B for ; Mon, 27 Nov 2023 09:15:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8ED306B02FF; Mon, 27 Nov 2023 04:15:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 89E0B6B0300; Mon, 27 Nov 2023 04:15:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 78BE26B0301; Mon, 27 Nov 2023 04:15:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 687D46B02FF for ; Mon, 27 Nov 2023 04:15:12 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 3D1F41A015A for ; Mon, 27 Nov 2023 09:15:12 +0000 (UTC) X-FDA: 81503175264.23.223F4C4 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf16.hostedemail.com (Postfix) with ESMTP id 26295180007 for ; Mon, 27 Nov 2023 09:15:09 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701076510; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PwpxvlKlfNp0amUAeC9ET4QfJevqqRd3oxNZa0s25dY=; b=qcoE9E5L3meKrGAJ2qzGAOVgccRP7blaZNPsQ/iQa9wDAHkXFXe7Ek7HibZtGmWQaMhfUe ATd6ITrwRuzMvn+VB21pTNJspt/D1DsbPMxiTkPWxv/oj6bxQNiW0eMYVGqFeDcZFqR074 Kbcwh1zs57mW7evnYa+8HpoNR/qoRMY= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701076510; a=rsa-sha256; cv=none; b=XVocpxYvnnKLI6IWVTn153ZdTfwxcl0E329rluaOpagq2Kg5Oy+I7jV9T6whd2UKJTFEBJ ccid9mgNYbQB1NFwqw45pVqO3PeOX33vbQdkL01rJmxx88AZHm9Rnrie6CDPq5bdOVTM+r 2g8HULJX2Q8v6+l015IlpR36veuQZ9E= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8E9832F4; Mon, 27 Nov 2023 01:15:56 -0800 (PST) Received: from [10.57.73.191] (unknown [10.57.73.191]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 996353F6C4; Mon, 27 Nov 2023 01:15:05 -0800 (PST) Message-ID: <234021ba-73c2-474a-82f9-91e1604d5bb5@arm.com> Date: Mon, 27 Nov 2023 09:15:04 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 00/14] Transparent Contiguous PTEs for User Mappings Content-Language: en-GB To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, andreyknvl@gmail.com, anshuman.khandual@arm.com, ardb@kernel.org, catalin.marinas@arm.com, david@redhat.com, dvyukov@google.com, glider@google.com, james.morse@arm.com, jhubbard@nvidia.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mark.rutland@arm.com, maz@kernel.org, oliver.upton@linux.dev, ryabinin.a.a@gmail.com, suzuki.poulose@arm.com, vincenzo.frascino@arm.com, wangkefeng.wang@huawei.com, will@kernel.org, willy@infradead.org, yuzenghui@huawei.com, yuzhao@google.com, ziy@nvidia.com References: <20231115163018.1303287-1-ryan.roberts@arm.com> <20231127031813.5576-1-v-songbaohua@oppo.com> From: Ryan Roberts In-Reply-To: <20231127031813.5576-1-v-songbaohua@oppo.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 26295180007 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: nmz854pia8ad1rj136thmfujqy4heqrd X-HE-Tag: 1701076509-884580 X-HE-Meta: U2FsdGVkX197ishmWzQyip6ubpr/uYPWE5nH/3AA7pn4USNcQZZQrh+3+TKBGGmwJA/9K3TOGDJAQ5a03/CDfiRrxgmKR65sWU14MGtFKqNdl7JbFoqcFHSDl8DODcPcq89EAehUN6YKzB0N8DBz6zRQiKEuvL0NisniS5t17SkQMs/Snoh4nmWFxdPFTbacF0LAaRnsaDUoz4zeoJE6ULXwErBd+JmroqO5nA+rwg+eB+ew/dPn9lQmDwuzhUFwNbQHjsWMk+KAFzjb3h7y4r1wP6GYknFGUKe3GS72PRcKrvg3MXnZKIXjWGsP6j2VdjNYuW27Vcf56GrGXLAFokTxsTU4JtyFKI72CJ0SdcKF9ItngwH0zMlxJa91YKxZtVQKj1BEV2HzzRixPYxVpLuwJbiOXv1JMJiXrgcNVqFZC9u59KytABwUaQdMUNkXExV5jDWIZEgKc+FloD659DABpDfSuWFJm/cxgIU+s6bZy9mghWyPh0Pp3+apQJbYzIlzTmaJMv43YgOoQoEkep03GwU0GpuQxbCpnAuXOHBc4bX76FLt0j0/mxDxTeyDT1wgmZBlK0tXtFS8uclf4Mcq8u3DO0xyNaX4IbdJLqT2psy3H4Kzgs/QedTeoEZaiTEfYnV0s+TVSuAiFpxcEkOgcNxvQFAMUrxjS+XOLw5A5n+8bKHxuTt9CyiyPPTESglyEr6egl9i4z2nQan5wXbzBtuswP2cSlF/sq0XJLhn42xrtaZoPuq4dWyr8uXAWtqz1riUjEjEajlR7iD2WeGQjxwzg/vLM8bxKddVaa2fG6xeU61pJ5tHAzAPa3065fJpubpwe4VCTXUbW70ayR2rEhD9zZH3fTxnzq/B5AmZRfGObp2Qbu3DUmuaSEL545WaXyHyC+b+wjNBA4kxhCEws5Z8RHHTE6nxqmHSBc73DOAT9tUiOk7OLxpP9Gt0Fx3DxZgfTHjSYLuHfjf n/oAm5qI GD5Qmd4mnI9l0Js8+osfmuuns5M4HyxUzEpXt6oDPpgAa7oC60OudDaKC/snjzBo8l5JEchwYeR85twkk6cCf9UA7tubQYBleKtrFDTC8cjDtZn4J9087kxP3nWbXVc/W7mTQ0GIYfE7LoyTGCNTceWUP+PHJubRYXPJceDBSHxuDGH0gnx4ZvmtvytoTncppQphwoEW0P7TB2wej1znbBmnptZKxFg9gw6aQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 27/11/2023 03:18, Barry Song wrote: >> Ryan Roberts (14): >> mm: Batch-copy PTE ranges during fork() >> arm64/mm: set_pte(): New layer to manage contig bit >> arm64/mm: set_ptes()/set_pte_at(): New layer to manage contig bit >> arm64/mm: pte_clear(): New layer to manage contig bit >> arm64/mm: ptep_get_and_clear(): New layer to manage contig bit >> arm64/mm: ptep_test_and_clear_young(): New layer to manage contig bit >> arm64/mm: ptep_clear_flush_young(): New layer to manage contig bit >> arm64/mm: ptep_set_wrprotect(): New layer to manage contig bit >> arm64/mm: ptep_set_access_flags(): New layer to manage contig bit >> arm64/mm: ptep_get(): New layer to manage contig bit >> arm64/mm: Split __flush_tlb_range() to elide trailing DSB >> arm64/mm: Wire up PTE_CONT for user mappings >> arm64/mm: Implement ptep_set_wrprotects() to optimize fork() >> arm64/mm: Add ptep_get_and_clear_full() to optimize process teardown > > Hi Ryan, > Not quite sure if I missed something, are we splitting/unfolding CONTPTES > in the below cases The general idea is that the core-mm sets the individual ptes (one at a time if it likes with set_pte_at(), or in a block with set_ptes()), modifies its permissions (ptep_set_wrprotect(), ptep_set_access_flags()) and clears them (ptep_clear(), etc); This is exactly the same interface as previously. BUT, the arm64 implementation of those interfaces will now detect when a set of adjacent PTEs (a contpte block - so 16 naturally aligned entries when using 4K base pages) are all appropriate for having the CONT_PTE bit set; in this case the block is "folded". And it will detect when the first PTE in the block changes such that the CONT_PTE bit must now be unset ("unfolded"). One of the requirements for folding a contpte block is that all the pages must belong to the *same* folio (that means its safe to only track access/dirty for thecontpte block as a whole rather than for each individual pte). (there are a couple of optimizations that make the reality slightly more complicated than what I've just explained, but you get the idea). On that basis, I believe all the specific cases you describe below are all covered and safe - please let me know if you think there is a hole here! > > 1. madvise(MADV_DONTNEED) on a part of basepages on a CONTPTE large folio The page will first be unmapped (e.g. ptep_clear() or ptep_get_and_clear(), or whatever). The implementation of that will cause an unfold and the CONT_PTE bit is removed from the whole contpte block. If there is then a subsequent set_pte_at() to set a swap entry, the implementation will see that its not appropriate to re-fold, so the range will remain unfolded. > > 2. vma split in a large folio due to various reasons such as mprotect, > munmap, mlock etc. I'm not sure if PTEs are explicitly unmapped/remapped when splitting a VMA? I suspect not, so if the VMA is split in the middle of a currently folded contpte block, it will remain folded. But this is safe and continues to work correctly. The VMA arrangement is not important; it is just important that a single folio is mapped contiguously across the whole block. > > 3. try_to_unmap_one() to reclaim a folio, ptes are scanned one by one > rather than being as a whole. Yes, as per 1; the arm64 implementation will notice when the first entry is cleared and unfold the contpte block. > > In hardware, we need to make sure CONTPTE follow the rule - always 16 > contiguous physical address with CONTPTE set. if one of them run away > from the 16 ptes group and PTEs become unconsistent, some terrible > errors/faults can happen in HW. for example Yes, the implementation obeys all these rules; see contpte_try_fold() and contpte_try_unfold(). the fold/unfold operation is only done when all requirements are met, and we perform it in a manner that is conformant to the architecture requirements (see contpte_fold() - being renamed to contpte_convert() in the next version). Thanks for the review! Thanks, Ryan > > case0: > addr0 PTE - has no CONTPE > addr0+4kb PTE - has CONTPTE > .... > addr0+60kb PTE - has CONTPTE > > case 1: > addr0 PTE - has no CONTPE > addr0+4kb PTE - has CONTPTE > .... > addr0+60kb PTE - has swap > > Unconsistent 16 PTEs will lead to crash even in the firmware based on > our observation. > > Thanks > Barry > >