From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E19D5C369A2 for ; Mon, 14 Apr 2025 14:05:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D5B3A28005D; Mon, 14 Apr 2025 10:05:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D09AB280054; Mon, 14 Apr 2025 10:05:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD0C028005D; Mon, 14 Apr 2025 10:05:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A0B82280054 for ; Mon, 14 Apr 2025 10:05:03 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C95A51A022A for ; Mon, 14 Apr 2025 14:05:04 +0000 (UTC) X-FDA: 83332820928.11.E5292EE Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf17.hostedemail.com (Postfix) with ESMTP id 7316C40013 for ; Mon, 14 Apr 2025 14:05:02 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=DpyD6Udw; spf=pass (imf17.hostedemail.com: domain of agordeev@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=agordeev@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744639502; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mR/z8wqlahqEQEIdoTwi+iagtqGDJekWvs0xnEoJMPE=; b=x4QpEYVOfLnvvTv8PYaPQr7ho30sEQgNTXouprqBu/lHPlycjtpGo33XtmhPE/K/iRpre2 qRTT70FcnJiEd70p+YMbjPDiSph9EOnDTrE0eIqXVFOLGypMAOp49yXJgWtnlWKeMUtlr5 ztByTf22gp8CZ08Nd2v/TdtV4daYmJA= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=DpyD6Udw; spf=pass (imf17.hostedemail.com: domain of agordeev@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=agordeev@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744639502; a=rsa-sha256; cv=none; b=hN9lKyKO2+9QYwKIOyB4B1b8H6lP3PVtrXEoxk+fxSalcjHKGd0s+adyodxNCGTa28bhCu +DTa9XLJuIW0eXcXEzK0ouR4Rb1RF1XSLx+LBhUn9sNYcTcva07N7xYpd7K79uKJRmsPiK cYpLPrqXp7AdMLRwf8L2ZkNzBdlBS4g= Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 53EA4ATV027370; Mon, 14 Apr 2025 14:04:44 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to; s=pp1; bh=mR/z8wqlahqEQEIdoTwi+iagtqGDJe kWvs0xnEoJMPE=; b=DpyD6Udw88tqaPr0fYcs3CazdQ5QWENJPynzyhJZnbNOi5 zY0G9Hu7PEsJ3TjJG7rVrb/GbHvfITN3AIbtoLh/H0aKWaZX+1eKMHjVLpwTeAUE lRbqHvx0K9kSyQT0HXlbLRn1suq7p1N21lpak5cyhNJDX+1kLb81jLSzQe19TM2X xKrKJV6i/xFFC2mKilEeszzr+tgbhoVNSPBwhEulRvMdF1Ct68oZPbXVDMaG0p/U WXYRkSiwc4AJlRENzE7s4q7aNYIHCvE9+ZUHe6U9seSlEUrtTlhISJ0ht+Fdm4uA x4K+2XDTVTvm98SnPR0wm7k4SM2+vbROYs8t7LSA== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 46109f167h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 14 Apr 2025 14:04:44 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 53EDcZPO016703; Mon, 14 Apr 2025 14:04:42 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 460571x6a5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 14 Apr 2025 14:04:42 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 53EE4f4G48300418 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 14 Apr 2025 14:04:41 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 212F720043; Mon, 14 Apr 2025 14:04:41 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 30D1920040; Mon, 14 Apr 2025 14:04:40 +0000 (GMT) Received: from li-008a6a4c-3549-11b2-a85c-c5cc2836eea2.ibm.com (unknown [9.171.13.82]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTPS; Mon, 14 Apr 2025 14:04:40 +0000 (GMT) Date: Mon, 14 Apr 2025 16:04:38 +0200 From: Alexander Gordeev To: Ryan Roberts Cc: Andrew Morton , "David S. Miller" , Andreas Larsson , Juergen Gross , Boris Ostrovsky , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , "Matthew Wilcox (Oracle)" , Catalin Marinas , linux-mm@kvack.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 0/5] Fix lazy mmu mode Message-ID: References: <20250303141542.3371656-1-ryan.roberts@arm.com> <912c7a32-b39c-494f-a29c-4865cd92aeba@agordeev.local> <5b0609c9-95ee-4e48-bb6d-98f57c5d2c31@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5b0609c9-95ee-4e48-bb6d-98f57c5d2c31@arm.com> X-TM-AS-GCONF: 00 X-Proofpoint-GUID: Xm9MkOqhRW8-TUfXNY7yPIc4SjJFzfI0 X-Proofpoint-ORIG-GUID: Xm9MkOqhRW8-TUfXNY7yPIc4SjJFzfI0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1095,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-04-14_04,2025-04-10_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 suspectscore=0 phishscore=0 bulkscore=0 adultscore=0 priorityscore=1501 mlxlogscore=999 malwarescore=0 clxscore=1015 spamscore=0 mlxscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2502280000 definitions=main-2504140102 X-Rspamd-Queue-Id: 7316C40013 X-Rspamd-Server: rspam05 X-Rspam-User: X-Stat-Signature: 584qcpsareeqh48idzztm6rhqsg3xaoq X-HE-Tag: 1744639502-399916 X-HE-Meta: U2FsdGVkX1+b7MqpIXPSFdNTQbZ4+PvLig90EyvMzPMmup3sAxLZCJ3lSyja7GMV8DkA24U0ZrdhPRRo8hGVIZJM/UGKjXRx4aeLL5rYf+U7FtmbMjNXdH9Q/PE6wRB6qwDt7KCNzAc2ti1v989k1ROwv5qWGpQhWDu7/kyk/Ai8JYYlTcArWo7VCpKeytYFbEnzMu8sfsK8SmlO/Ve4fKf3hb0QLiwvK6pdxQmX+fMUoFXFy1OL8OJL4ZgjxVjBrjU/CEAheFu39jPuaqJqQrOofr6dpRBuMekSwoXtPyEXLdK1X01A0usYVoIQscMgK/b3CHktJxJFp4v5I8+mZtBDP4ilSszUdEtjnz0eCBB25/wEyBLSo8AOEq2+dqg7q3tvXviI+p5FzaYSRVrgMxjSfAbp3rwdJ/7gASGtBoTbX3mbXarEaULYs4K08W9l1UB0UC7nPd8BYmGQPC4ngT2gSyIeOzMnCaEFVB4Bh7LVRDif8+RCE7jMGfiHDMDWxOpboTvCwvfuIy/XGKKHzQkfNvVNuJhxELMznYqoAF5z5oaPFAkEQhlDSr1gD43suzo31VeSE3aEY02ybKWOvqMkwxGwIr36Mc+7o54rxZHk21mCGiSDj9tHfwXT8Y8hC2AodGEAjDuyTpgAUIiBGfSWC+H4kpTuZJj8KRZP9G+pR37aCPvtIHUKva/iU0FtoPHJJDL/VBcg+UPFxveGWUNPIEeo34wCwt5IIKZbEZY81NG/069/IZV9s5iBo21ncHM/H5ucmP7raTnhEcluWBgxK1XN9Kch9UTanTbt8DcZcU+vFSY0LcnXtAtII2q62dnTdyYivEo+RqIgVxQ4+ipC+NM4EG5gZmq7r1YLzrgVx+b354nFbnx0XflZIXHE9OV6DzoTf0NOzW/lvBsFxQIupgVQw4iC+TuocFl1PMoLX0fldDqqnfv1+oQRuIJSzg8kTfGJw9t1VWnhcPh k4fZdcAe ILBnS/Th1E/YsMRfPZ2y5FTomLYPYX5Hqx7WJs6KSBlWPTLt1wjaaS93U+EK6xrNMAzE6t0nJ1icPqLTMw49+kkuNcAViLNhDX7cXZqNhbYQNrrzVP1BMs/sZVBs40Lw4dC2WBf1Qybmapgq7K+2uLur7PqNl32MywP4h/mxf3OqUP4KVRrdnVl257vVFASrihIu/7BQLIJ/5K2Tr/dkPDOTAph8OMEM9Wjl8RmwSu3EzP/iuBGc6ZOafBts3L3Rph3VaykmrMYmav2IV+kuMKZhnPR3SZsyJ1Fd7EwhIVEOa90Rs1pE1oTlCI042vwar/EuT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Apr 14, 2025 at 02:22:53PM +0100, Ryan Roberts wrote: > On 10/04/2025 17:07, Alexander Gordeev wrote: > >> I'm planning to implement lazy mmu mode for arm64 to optimize vmalloc. As part > >> of that, I will extend lazy mmu mode to cover kernel mappings in vmalloc table > >> walkers. While lazy mmu mode is already used for kernel mappings in a few > >> places, this will extend it's use significantly. > >> > >> Having reviewed the existing lazy mmu implementations in powerpc, sparc and x86, > >> it looks like there are a bunch of bugs, some of which may be more likely to > >> trigger once I extend the use of lazy mmu. > > > > Do you have any idea about generic code issues as result of not adhering to > > the originally stated requirement: > > > > /* > > ... > > * the PTE updates which happen during this window. Note that using this > > * interface requires that read hazards be removed from the code. A read > > * hazard could result in the direct mode hypervisor case, since the actual > > * write to the page tables may not yet have taken place, so reads though > > * a raw PTE pointer after it has been modified are not guaranteed to be > > * up to date. > > ... > > */ > > > > I tried to follow few code paths and at least this one does not look so good: > > > > copy_pte_range(..., src_pte, ...) > > ret = copy_nonpresent_pte(..., src_pte, ...) > > try_restore_exclusive_pte(..., src_pte, ...) // is_device_exclusive_entry(entry) > > restore_exclusive_pte(..., ptep, ...) > > set_pte_at(..., ptep, ...) > > set_pte(ptep, pte); // save in lazy mmu mode > > > > // ret == -ENOENT > > > > ptent = ptep_get(src_pte); // lazy mmu save is not observed > > ret = copy_present_ptes(..., ptent, ...); // wrong ptent used > > > > I am not aware whether the effort to "read hazards be removed from the code" > > has ever been made and the generic code is safe in this regard. > > > > What is your take on this? > > Hmm, that looks like a bug to me, at least based on the stated requirements. > Although this is not a "read through a raw PTE *pointer*", it is a ptep_get(). > The arch code can override that so I guess it has an opportunity to flush. But I > don't think any arches are currently doing that. > > Probably the simplest fix is to add arch_flush_lazy_mmu_mode() before the > ptep_get()? Which would completely revert the very idea of the lazy mmu mode? (As one would flush on every PTE page table iteration). > It won't be a problem in practice for arm64, since the pgtables are always > updated immediately. I just want to use these hooks to defer/batch barriers in > certain cases. > > And this is a pre-existing issue for the arches that use lazy mmu with > device-exclusive mappings, which my extending lazy mmu into vmalloc won't > exacerbate. > > Would you be willing/able to submit a fix? Well, we have a dozen of lazy mmu cases and I would guess it is not the only piece of code that seems affected. I was thinking about debug feature that could help spotting all troubled locations. Then we could assess and decide if it is feasible to fix. Just turning the code above into the PTE read-modify-update pattern is quite an exercise... > Thanks, > Ryan