From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FEC8C369B2 for ; Mon, 14 Apr 2025 13:23:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ECFCC280052; Mon, 14 Apr 2025 09:22:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E7DB1280036; Mon, 14 Apr 2025 09:22:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D6FD9280052; Mon, 14 Apr 2025 09:22:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id BAF1A280036 for ; Mon, 14 Apr 2025 09:22:59 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 768A456D5F for ; Mon, 14 Apr 2025 13:23:00 +0000 (UTC) X-FDA: 83332714920.01.5751C90 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf05.hostedemail.com (Postfix) with ESMTP id 8A77A10000E for ; Mon, 14 Apr 2025 13:22:58 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744636979; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0xwzCebsdaUdvPkHC0EOsNNQo57Jy5yce9xmpllB5zs=; b=VIp3+w8xWGtZtSAWam2rFe+GalrFXUC+SeJHU5aXVN2zIplDaiXdb+mL4WjD093xTiIyW4 bqwYIiIMEvZhGNpICgAFUShUQIxwpgFZ4i5028yjFUw7utbJAtaMFrj587FA81l7N1PU0V 6qlyrImk13NKHSJmBFTfUEJGf1cM+PY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744636979; a=rsa-sha256; cv=none; b=FibEiTQsAhJNolM3y5s7d3PyDgLiSFdNVG4uyGIJ/Xf873BGqprVh+MrtGwv5uzco3wqoR 66bcEkT39qLYTgaIYvOo84NUzVQKMPF8a5hJjSwbOT0Nd7pXfzTPjgREM4AaEz2nsd0CTR V/s0Jwk9OldQiGgI1qx9wZXdEIS+qws= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3AE831007; Mon, 14 Apr 2025 06:22:56 -0700 (PDT) Received: from [10.57.86.225] (unknown [10.57.86.225]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 093363F66E; Mon, 14 Apr 2025 06:22:54 -0700 (PDT) Message-ID: <5b0609c9-95ee-4e48-bb6d-98f57c5d2c31@arm.com> Date: Mon, 14 Apr 2025 14:22:53 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 0/5] Fix lazy mmu mode Content-Language: en-GB To: Alexander Gordeev Cc: Andrew Morton , "David S. Miller" , Andreas Larsson , Juergen Gross , Boris Ostrovsky , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , "Matthew Wilcox (Oracle)" , Catalin Marinas , linux-mm@kvack.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org References: <20250303141542.3371656-1-ryan.roberts@arm.com> <912c7a32-b39c-494f-a29c-4865cd92aeba@agordeev.local> From: Ryan Roberts In-Reply-To: <912c7a32-b39c-494f-a29c-4865cd92aeba@agordeev.local> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: gdrqmjy1y3xsqaq9d1qz9irak6da9566 X-Rspam-User: X-Rspamd-Queue-Id: 8A77A10000E X-Rspamd-Server: rspam08 X-HE-Tag: 1744636978-88639 X-HE-Meta: U2FsdGVkX1+aCajoqslgd0eCvEgaLOcsBybTnzF4mBvrRZO8BxzBNe54ojlBOXk0uWOdAcm5rSco7OgcqZba2nTuSfHW5x8Ccfdw3d7+TrsP+a3+GYAl0MZMmo/DNN7sEXKUABMj1G4xuCYUfmng6s6GYj/h5dB6I5HyFqH9o6YHdLR6q2h/A1Fye9kMtbSIcyaW5dYIJMFAILhh6+IjpUpA5a3wahzxUrZMbLWuTKyjDKXPxN8Ucibgvz1oObKrfLG9Z4sbARl9/1VJlGY9Kk0vR6Twf8fTUTWJ02/yhd8ev0vbuZeFlu0vVmj9Lo5e7YxbsHCuCkgL+TzzjajJQn9HHwpr2gxzPy6uYRlQuW2pd/blrtlQHcErfJwhmDFoBD8Zu27I2xsmo1WXbDGUKGtEJl63QQIon4xz7akl+hpDCQ52a6y//EfLxbzf4FYUoygDbOey/VY7IOX2cd3Xkd/0xkoCMv/QRfm5ji5QyDLii3YKgQJNRUdyckBSqqO8ypKSPOgC4OcyGzKXll7Ed6Arod4nA2zkMHnshJBH7RBmJS2Cdgv9m2CPAO0LRM8JW74m/ZitzUdKTABAJueqyNOJIvFgDK5X7j4+qT9SmGudYQCZmkL3sBbdI5mK0BNbyasho92w1GrQuJc/PgZiO+sRfYDonMdbuNPDIbtT4BczOlwsObDig98X9XGZCesSQDgB/iLFvvvzU3mpval4EOTD6U+nTqGogoNZnBYPjBR+Kw9902sAzmkmEbR+Dn+Eg1mm/l88XxxvE9KXLvjm5Fz38YlNGQ+l9BaNyfUDK1F7iKlt0Ac3OQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 10/04/2025 17:07, Alexander Gordeev wrote: > On Mon, Mar 03, 2025 at 02:15:34PM +0000, Ryan Roberts wrote: > > Hi Ryan, > >> I'm planning to implement lazy mmu mode for arm64 to optimize vmalloc. As part >> of that, I will extend lazy mmu mode to cover kernel mappings in vmalloc table >> walkers. While lazy mmu mode is already used for kernel mappings in a few >> places, this will extend it's use significantly. >> >> Having reviewed the existing lazy mmu implementations in powerpc, sparc and x86, >> it looks like there are a bunch of bugs, some of which may be more likely to >> trigger once I extend the use of lazy mmu. > > Do you have any idea about generic code issues as result of not adhering to > the originally stated requirement: > > /* > ... > * the PTE updates which happen during this window. Note that using this > * interface requires that read hazards be removed from the code. A read > * hazard could result in the direct mode hypervisor case, since the actual > * write to the page tables may not yet have taken place, so reads though > * a raw PTE pointer after it has been modified are not guaranteed to be > * up to date. > ... > */ > > I tried to follow few code paths and at least this one does not look so good: > > copy_pte_range(..., src_pte, ...) > ret = copy_nonpresent_pte(..., src_pte, ...) > try_restore_exclusive_pte(..., src_pte, ...) // is_device_exclusive_entry(entry) > restore_exclusive_pte(..., ptep, ...) > set_pte_at(..., ptep, ...) > set_pte(ptep, pte); // save in lazy mmu mode > > // ret == -ENOENT > > ptent = ptep_get(src_pte); // lazy mmu save is not observed > ret = copy_present_ptes(..., ptent, ...); // wrong ptent used > > I am not aware whether the effort to "read hazards be removed from the code" > has ever been made and the generic code is safe in this regard. > > What is your take on this? Hmm, that looks like a bug to me, at least based on the stated requirements. Although this is not a "read through a raw PTE *pointer*", it is a ptep_get(). The arch code can override that so I guess it has an opportunity to flush. But I don't think any arches are currently doing that. Probably the simplest fix is to add arch_flush_lazy_mmu_mode() before the ptep_get()? It won't be a problem in practice for arm64, since the pgtables are always updated immediately. I just want to use these hooks to defer/batch barriers in certain cases. And this is a pre-existing issue for the arches that use lazy mmu with device-exclusive mappings, which my extending lazy mmu into vmalloc won't exacerbate. Would you be willing/able to submit a fix? Thanks, Ryan > > Thanks!