From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D00FC4167B for ; Tue, 28 Nov 2023 03:13:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E14F06B02B1; Mon, 27 Nov 2023 22:13:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D9D866B02B2; Mon, 27 Nov 2023 22:13:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C3F056B02B3; Mon, 27 Nov 2023 22:13:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B22326B02B1 for ; Mon, 27 Nov 2023 22:13:57 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8CE97A00A8 for ; Tue, 28 Nov 2023 03:13:56 +0000 (UTC) X-FDA: 81505893672.23.79EA259 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf12.hostedemail.com (Postfix) with ESMTP id BE3C44000F for ; Tue, 28 Nov 2023 03:13:54 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FUcmc0a4; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701141234; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kfP5mMSfLJ6WwJw6W2yDqYJjxBNmPJwPrm2m/+Ywi/Q=; b=8PPMUdTy81lpdBPRjo1FqZWhoF+ZnT69k/v0SJVWqn7mKk4vKrFGx7MXH0VIwmcOde256e 4L6iPYg0F1Zl/W6sORbB+1OVv7n6WPoXlWua7oNlQvbTVbkgA3o/tQwcH9MOs2uzjUNXk5 +ECfLurnKtTwvdQ1IhDC55sPeQg6keM= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FUcmc0a4; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701141234; a=rsa-sha256; cv=none; b=xjYCFTlW22UquG2FKY1dwj61CVaS1kE5HpsSA47RZ0+jFqfKPxTJuHIS68d2IetKg9Rj+P OTCMGuqprVhnCytffCPvUVAyUevkBQ0ETZ5zLtG5mFFGZsAhDHu94abZsYknewkDND2SQi KsACaxeSDPOSAf7LH6WE7r34/52ZcHw= Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-285196556fcso4001015a91.0 for ; Mon, 27 Nov 2023 19:13:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701141233; x=1701746033; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kfP5mMSfLJ6WwJw6W2yDqYJjxBNmPJwPrm2m/+Ywi/Q=; b=FUcmc0a48oiJCQ+/yCJKvzHAz67CvhfNUmnnjhexzfiwkrQbO8+xD5/onbvb+QPQYc XBDHbvoWEM2TVFctvXVnjYChvYmJULr5CP0t4thYA9XNXaKgoOHAND22RPS56LCfiWlm fDAEZ6xl8ogvVDokC1ryaOi1Zepn/K+8f9EKIg4Sw8JSUyzdDBymfY8v4oQZlZnTgAbd ux5g2VJnsmrfY+/z81TE52VIyH3393a8iIRuH7ausFHUvWENLoJhYX4kkCvSz0tPAbGi yi50WNhiXCVAxtGbXvTx/MDV8PE+kOpndeQJmCjZH3uiuUk7kyxat3Dr79Nq7eeA2SHK 5Ypg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701141233; x=1701746033; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kfP5mMSfLJ6WwJw6W2yDqYJjxBNmPJwPrm2m/+Ywi/Q=; b=D5VcHTwXf9aBghq6fStIt3MaWl8QjC/0Ca2xAUMytRWrii5HyaiNAKc/Y/xiBrXHNt vGD33sO7S72wi1CeA8//1CJ2crYINVl7HZIS83uAnpWzG++c7IZt/5G6hmDlgettnbFE Mnrrrk6HA6noPvAbWG+JA1U99+t/Kh5XXHdreYuS5FRciLK3SBw/bnCcq8aBK3tJtaP/ WiDTJJofv6g7fJenWyJqexN3nkxnGNTzy5yIgOsSgR+fdTf3A2ZDvBhI9M+oH3Mp4518 8jprCCMQ4t0mp8AfvC6YKHzCa2hES/IQh3X5nYWWIf25P2xxeomYBFa0msmC+gbdp2xq AwLQ== X-Gm-Message-State: AOJu0Yy+zTtj0yUFvN8uHPP0Tk7aYDGHbTcGYFRI4MPqeeuQrl6PQVvJ C7b3k4or9Dhb89ESEwgAFTg/A4CCpsneJfW0T1E= X-Google-Smtp-Source: AGHT+IEdzP5roOA0rdkzFKM7yNebk6ltpIEeaVjg/gI5b0YNqA0av2Bbqdk6vjLx8g+gaaDjZfRum0qi+Y/2kKcibHQ= X-Received: by 2002:a17:90b:1809:b0:27d:b244:cd28 with SMTP id lw9-20020a17090b180900b0027db244cd28mr15031170pjb.42.1701141233551; Mon, 27 Nov 2023 19:13:53 -0800 (PST) MIME-Version: 1.0 References: <20231115163018.1303287-1-ryan.roberts@arm.com> <20231127031813.5576-1-v-songbaohua@oppo.com> <234021ba-73c2-474a-82f9-91e1604d5bb5@arm.com> In-Reply-To: <234021ba-73c2-474a-82f9-91e1604d5bb5@arm.com> From: Yang Shi Date: Mon, 27 Nov 2023 19:13:41 -0800 Message-ID: Subject: Re: [PATCH v2 00/14] Transparent Contiguous PTEs for User Mappings To: Ryan Roberts Cc: Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org, andreyknvl@gmail.com, anshuman.khandual@arm.com, ardb@kernel.org, catalin.marinas@arm.com, david@redhat.com, dvyukov@google.com, glider@google.com, james.morse@arm.com, jhubbard@nvidia.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mark.rutland@arm.com, maz@kernel.org, oliver.upton@linux.dev, ryabinin.a.a@gmail.com, suzuki.poulose@arm.com, vincenzo.frascino@arm.com, wangkefeng.wang@huawei.com, will@kernel.org, willy@infradead.org, yuzenghui@huawei.com, yuzhao@google.com, ziy@nvidia.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: BE3C44000F X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: hqbpccuxxoyfkquyspthwz1dp7zygi1t X-HE-Tag: 1701141234-808045 X-HE-Meta: U2FsdGVkX1+L2UoIaryC2nhY0LWdiLIdefx87HaP4WOAR52wRrRE1qmR054jkAy7tTRIoC/y4o+qXGrYlV1SUhvHQ1AA8GAD5N/2+YvhVmCh3HIfDuUg/n0+0zWwJyB5dSfcDlYdXX3Yz9lJVyndRLuKdDd+tZZhAUisTDiAv8ZK7FnbI07qeAFNeQ++aWPewk8Tq8B8Iag0StYgOBusJaZTTSbIfjbGmcbp03nx/bWoTLX9tbwt6/2/XczdnCxCZBcoM/O0nkbg63tVNZJypXvViUcWTn/LBkwgmkfPi7uz3TV2CbhtUJyiTvYgkxAmjIBmxYow532RHkvO0gsCE0PXVgYMNuHvNnM7F+Z1xFXQLQChP1z6+lhBmroXRZ4QYGNpwZlf9LQ8f51dnt1hXu+qih4+s8GzDxMLBHPCe5Pifqn8lOeoffZZ1l8TzZaIsguJ2istaLFU1dIosaLdTKavfNFMzP3Pzbl7mjBgfd1h/PHkhdu0b85C04Xj4IEXjQtSHZhGLn71xZPYA+JdtaNdzlV+VGwgr8U+MLFx0ArImi2EB5D4co9WARy+DjOxUmZtLEyBZC+zjBcEdWCGsVA4v9B8WR28yA4euMLbnr1FA0DmBhxEhgMyUrD5wHuFfbsXu9qLNcmlYHqxEGvrk06kVMe7kympIjqDgLWPZEIVlJQPnWp8nQzK07g+w9GKN4Th8GCBZOM/AVkADq6aM6Jzws3xrXOK1X0wrk9zYVGAUF3e9ezFMFKaF5BNLXi+gyNsI+gCaMbuAk0vvkmhArsxT7Klpk6zJ7HO7GYW/3C11TxPItLiNCVqs5saO4OAkNFh6i5WbKeif6VJc9SfmO8SVft/rZL+O5gMnnOkPYJEa4VvvDRDnIlM8DOd7PocOevloPYs7F62HioIOQJ0R/RnDmVqsHXXwg05dir/Dm9I6147F/x1Lyl955cOM0btbMxC5hd/Y3LP/mysvuP WO4mw5dk +8cvXtHsmM0jVVTduz95up3AskTZpOVn2BaeO0CCbDTYiJnMw4oIQZNUbg8yOMunPadfNXbIxyz0ZZOqhXHX4RgZLn528hBC0L8uT4FuxD99jk/1XdP2EngkWs5OUR5ixnqQTCgKuQ/BoMaH+9/0wW+15oShqi92C8I1LQMVhKLFFAiZ2TKoNFayHXR1x7s0LBBFR6ZQgKdjxzzLomzMqHOYdm6a7iJnRVXe9rh0EQ4m67b/lUYdeKAOJ120Bct84tiurK/+y71VftOgNhgaOErAi8Bkc6Xp9B5ijMxtPdD3jJdMscR/bwhVL3IawQsKHkFGaHhbQ+U3nVrQUQeq7/nmbCg85LxSDwPxXcNaUsSToLwW501qrXKECpvodGZj83R4nZevNheQT1a88l7A8MiQI0iWmSece7A8PAp/K3TOPhj0Nmnoy7Vlly7G7L1gRSOuFr6GVGCxedADPzKihzXlsKFR8Hw3ZZ/f96Y/ZstHSPCg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 27, 2023 at 1:15=E2=80=AFAM Ryan Roberts = wrote: > > On 27/11/2023 03:18, Barry Song wrote: > >> Ryan Roberts (14): > >> mm: Batch-copy PTE ranges during fork() > >> arm64/mm: set_pte(): New layer to manage contig bit > >> arm64/mm: set_ptes()/set_pte_at(): New layer to manage contig bit > >> arm64/mm: pte_clear(): New layer to manage contig bit > >> arm64/mm: ptep_get_and_clear(): New layer to manage contig bit > >> arm64/mm: ptep_test_and_clear_young(): New layer to manage contig bi= t > >> arm64/mm: ptep_clear_flush_young(): New layer to manage contig bit > >> arm64/mm: ptep_set_wrprotect(): New layer to manage contig bit > >> arm64/mm: ptep_set_access_flags(): New layer to manage contig bit > >> arm64/mm: ptep_get(): New layer to manage contig bit > >> arm64/mm: Split __flush_tlb_range() to elide trailing DSB > >> arm64/mm: Wire up PTE_CONT for user mappings > >> arm64/mm: Implement ptep_set_wrprotects() to optimize fork() > >> arm64/mm: Add ptep_get_and_clear_full() to optimize process teardown > > > > Hi Ryan, > > Not quite sure if I missed something, are we splitting/unfolding CONTPT= ES > > in the below cases > > The general idea is that the core-mm sets the individual ptes (one at a t= ime if > it likes with set_pte_at(), or in a block with set_ptes()), modifies its > permissions (ptep_set_wrprotect(), ptep_set_access_flags()) and clears th= em > (ptep_clear(), etc); This is exactly the same interface as previously. > > BUT, the arm64 implementation of those interfaces will now detect when a = set of > adjacent PTEs (a contpte block - so 16 naturally aligned entries when usi= ng 4K > base pages) are all appropriate for having the CONT_PTE bit set; in this = case > the block is "folded". And it will detect when the first PTE in the block > changes such that the CONT_PTE bit must now be unset ("unfolded"). One of= the > requirements for folding a contpte block is that all the pages must belon= g to > the *same* folio (that means its safe to only track access/dirty for thec= ontpte > block as a whole rather than for each individual pte). > > (there are a couple of optimizations that make the reality slightly more > complicated than what I've just explained, but you get the idea). > > On that basis, I believe all the specific cases you describe below are al= l > covered and safe - please let me know if you think there is a hole here! > > > > > 1. madvise(MADV_DONTNEED) on a part of basepages on a CONTPTE large fol= io > > The page will first be unmapped (e.g. ptep_clear() or ptep_get_and_clear(= ), or > whatever). The implementation of that will cause an unfold and the CONT_P= TE bit > is removed from the whole contpte block. If there is then a subsequent > set_pte_at() to set a swap entry, the implementation will see that its no= t > appropriate to re-fold, so the range will remain unfolded. > > > > > 2. vma split in a large folio due to various reasons such as mprotect, > > munmap, mlock etc. > > I'm not sure if PTEs are explicitly unmapped/remapped when splitting a VM= A? I > suspect not, so if the VMA is split in the middle of a currently folded c= ontpte > block, it will remain folded. But this is safe and continues to work corr= ectly. > The VMA arrangement is not important; it is just important that a single = folio > is mapped contiguously across the whole block. Even with different permissions, for example, read-only vs read-write? The mprotect() may change the permission. It should be misprogramming per ARM ARM. > > > > > 3. try_to_unmap_one() to reclaim a folio, ptes are scanned one by one > > rather than being as a whole. > > Yes, as per 1; the arm64 implementation will notice when the first entry = is > cleared and unfold the contpte block. > > > > > In hardware, we need to make sure CONTPTE follow the rule - always 16 > > contiguous physical address with CONTPTE set. if one of them run away > > from the 16 ptes group and PTEs become unconsistent, some terrible > > errors/faults can happen in HW. for example > > Yes, the implementation obeys all these rules; see contpte_try_fold() and > contpte_try_unfold(). the fold/unfold operation is only done when all > requirements are met, and we perform it in a manner that is conformant to= the > architecture requirements (see contpte_fold() - being renamed to > contpte_convert() in the next version). > > Thanks for the review! > > Thanks, > Ryan > > > > > case0: > > addr0 PTE - has no CONTPE > > addr0+4kb PTE - has CONTPTE > > .... > > addr0+60kb PTE - has CONTPTE > > > > case 1: > > addr0 PTE - has no CONTPE > > addr0+4kb PTE - has CONTPTE > > .... > > addr0+60kb PTE - has swap > > > > Unconsistent 16 PTEs will lead to crash even in the firmware based on > > our observation. > > > > Thanks > > Barry > > > > > >