From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1663FC61DB2 for ; Fri, 13 Jun 2025 10:32:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 79F366B007B; Fri, 13 Jun 2025 06:32:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 750056B0089; Fri, 13 Jun 2025 06:32:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63F966B008A; Fri, 13 Jun 2025 06:32:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 44DFE6B007B for ; Fri, 13 Jun 2025 06:32:26 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D51431CC6C8 for ; Fri, 13 Jun 2025 10:32:25 +0000 (UTC) X-FDA: 83550013050.27.8FC7007 Received: from mail-lj1-f182.google.com (mail-lj1-f182.google.com [209.85.208.182]) by imf11.hostedemail.com (Postfix) with ESMTP id 00FAC40012 for ; Fri, 13 Jun 2025 10:32:23 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lObgxnik; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of urezki@gmail.com designates 209.85.208.182 as permitted sender) smtp.mailfrom=urezki@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749810744; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Wu4qJvn4Llwk0tWZQ65fBk0n6CvVyWDwwMQQPjhEpAs=; b=aNY2M8F9lp4Yx46Drm/o9IZLYKLUis75viYOkwUe4BCJM2aoYuAk8uvW0/AiTcy2iQavll IF/M4flRFLm19W6/YHo5XX3wi57Zck0x1Cfdy6zNmnC23TdwCTiP483awwxNHwE76vMjry P2KqzSazph9bTly0hdwYo1MVJ3Lei08= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749810744; a=rsa-sha256; cv=none; b=qpOB0Yo0vqh14/bYIkIdtlet5h3WrVY/8VjkyulkDfP5/lmnZkjPf3If+FwzOvbHXzrW47 mehzUWz5vppOO17tymiO6xe+C5Pe/JMb8G4TYZ2hpPaHtyWLYz6SPljrKgCPl52zCL+1PR 8yCqcNQN0GdYyA8pEclGqhtsSm2Jy1o= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lObgxnik; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of urezki@gmail.com designates 209.85.208.182 as permitted sender) smtp.mailfrom=urezki@gmail.com Received: by mail-lj1-f182.google.com with SMTP id 38308e7fff4ca-32adebb15c5so17555271fa.3 for ; Fri, 13 Jun 2025 03:32:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749810742; x=1750415542; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:from:to:cc:subject:date:message-id:reply-to; bh=Wu4qJvn4Llwk0tWZQ65fBk0n6CvVyWDwwMQQPjhEpAs=; b=lObgxnikAKHKgYM8lCsyfc6gmcO47nO/+OvfOD9PCQtmq2zY3qYsiD1Alvk4GqEYXL mdtiGBCpaUWpheP//bs1J0rTmOlUZSJrZjpZ9J1i4AK0b6o3foBBRQWfi1DF65tzzgEI ghCzxxt61No6VzY3HvMTpse9Wd5eMQcu/WsAlbq6bC+OCUz65xRvl13ST+qTuxstwfNy GA0my7vH6YwxypBDgT7cLfRNmhDcbGvT0zgF6ZfkTxgfdYwvYBnGdWYLjGsfFHSmN73b jyqNn+esnoYhqrO5IILC0aDpmS540rSJs1QrHjCP1JuZsSWfP/Bv/NYqN2dmkwvQvNmy p4/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749810742; x=1750415542; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Wu4qJvn4Llwk0tWZQ65fBk0n6CvVyWDwwMQQPjhEpAs=; b=IUjnC6ZVIzKcZJkU2pb8bXPsgJZpXgLfJNTGqPNrE+QnKQ5HO0nTP8Rn3YEc68O5bj 7xR+CeEdz8cCiePOTta3VxK1QxQqvi+eopY8kUyt/W/CzSpjfWMXwFtjeYfnWhxGbZz7 qp8318YUfGrXjfOwWdh4K42SBYiOQQY1QxTtuTAp6AdCSfwUwfSuDAPx6qAtnEguGQ4K k11q2hQilIDL8tTa0OtZ2evGkzH3LwdcuGy9GAohM3qGjQI3wNwde7DiYzoI+EmL1c2P veK/tgJaeuKBfO/q0oAYZ8D58mHsw2WYMDgwgroORVJmPHyWLKcsT4rTobRg/NxdNpVB fURg== X-Forwarded-Encrypted: i=1; AJvYcCW+zo5Rd4RjAcI1/OAKIuXXrP48dCKoKU8ltsDtr5tDgw5JrCIGxRjWEWeV9soF3CQMqdG3mbD0HA==@kvack.org X-Gm-Message-State: AOJu0YxHjNjRdEDxQIC3Au/ZnJXLPJPzn7bw7LuVMCnNQ+r4CkP6Ht7F FHStgZkokl9OYY3RJ56Tl7S/wUgLAzGu7YhyOSqOzio9BknQuCMPQ7WxkmpSFq2b X-Gm-Gg: ASbGncvv8rIas3FxHSRsW2+z7XM2bHAteeZ1oWdnRQswUw6MNw7rqIj3QT1PRAJaE18 XI2zS2O3fdn+G9fj9oX1Uy4IAPso64yKK/fnN/c5Qgw3O0O11lDl/oDv6e7yZMslM37h3QZHPXY c122jGvfRSCb9ohzdQVQP7M1OYjx5AHELCJhm2uHowqt8bXUjON/sWW3kuSj9sRnGlqR447iS4y wVrKp38wP7NfEt56kiEmnExjY8RCNz9rjBBOJqR9JRXJvBpK2OA4uJzXW+4yYECW2DOXJQ81ysL rCrQ/Q+MpQbQ6FRs7LFOPvQJBEZnTvQu40/RBAhj21F/KvzpUGXS/fL96p5DnerZuh4BM/cfOZN V/Kg5H/qJj7g= X-Google-Smtp-Source: AGHT+IH9UJ7uFj1fdsdkfO2A9iWrTLeLn4U5mol1YXw4WzFM5kRsgurhm6wsyckaQjO11Dn5eJdcbA== X-Received: by 2002:a05:6512:3b06:b0:553:a3e7:812e with SMTP id 2adb3069b0e04-553af90817dmr681078e87.20.1749810730193; Fri, 13 Jun 2025 03:32:10 -0700 (PDT) Received: from pc636 (host-95-203-1-180.mobileonline.telia.com. [95.203.1.180]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-553ac1dc3absm402194e87.197.2025.06.13.03.32.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Jun 2025 03:32:09 -0700 (PDT) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Fri, 13 Jun 2025 12:32:06 +0200 To: Alexander Gordeev Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, Hugh Dickins , Nicholas Piggin , Guenter Roeck , Juergen Gross , Jeremy Fitzhardinge , Ryan Roberts Subject: Re: [PATCH 2/6] mm: Lock kernel page tables before entering lazy MMU mode Message-ID: References: <7bd3a45dbc375dc2c15cebae09cb2bb972d6039f.1749747752.git.agordeev@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7bd3a45dbc375dc2c15cebae09cb2bb972d6039f.1749747752.git.agordeev@linux.ibm.com> X-Stat-Signature: u64re938zewaukzut1anqzor4skxy6mp X-Rspamd-Queue-Id: 00FAC40012 X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1749810743-370118 X-HE-Meta: U2FsdGVkX1+2Zrvp8AT7C27Pi4Qp33jhJNfwzmTQINNNiUbdbuIJytAx+bMB22Cs83tida78HNTBOm2LmDXGGwSuAgQUkeJNJYPKAG4imszbWtvIm21My9VWCAdA3KAYc34/1OKb8Nb2YY4xVOMivDaDdAX52rk1Czn33bgKYUYplOsxyp0uEWNqEgy3o9UfV6gjR2Os341TyRr6jrcIZT4sM2BetMSDMM4NQMIPWJhL5oV4Z2mzGfOK948cX7jMhSWLjKbp0ghFNiiDnQ8S0kvd6eOZ1/ZG2YuzGoeyY8MSBCl5WVep0HomiWzffl3TaRjgFVrXCdIDrC6SSUk/L9aoie8CNpNVqCw1YWacoICLA+nmH8jcydu+ybp20QFq6K8gqjuFklFk6RxERO2D1o8fP5YoiQkWAxCrm6oETiBofKKDrdFyOI37pbfUg7r/3pDehsQagLZdTb8u7p3x5BMjs4OuVawYJIGlD4cMJn6Pfy/By4mgon1SWRTq0jmgOyZr0TsolHjHRP721YolDavaIGuHilBCsPuMRsq/hFTIquMxfWPBtyeBb5RdygsZ3ShpSgszogtYcGaTcIJnKL2GTFWVfvgQXgETmzra4xDdJfG907CIw2Swki3dA515nIitOsyketMpkFOORPkjKAIxmh/18GbBv8AzxMI3Orkyyg+MpJl/v0Xb8WU3d1CECtZwh3D6cixAyhcPIIY9id4+MrWlxUUzMhIpC0S8frltwLOqks178pGvTTRk6oya1QC6MyFUCK0h3BpgCAYR+ZYfDeXk5slq9fibu415yQ0Tg0RNyAGd4BeQu+UQ2nX1T8Eg+Qeh6cZrY29HRSOj98dwv8Q2tDr9Ov5STKu7Ve7fUc8djiE9bpczatUxV/i2JceQCk0UrQifEeptjEdlcRQxnCYeWyv30dEpN+VwC/9zIk8M4QC0VOcNz1pCnIZAf7wLApuq23Osl/xnGZl l9hCSOnu l2lCwhKlP3mO7B+Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 12, 2025 at 07:36:09PM +0200, Alexander Gordeev wrote: > As a follow-up to commit 691ee97e1a9d ("mm: fix lazy mmu docs and > usage") take a step forward and protect with a lock not only user, > but also kernel mappings before entering the lazy MMU mode. With > that the semantics of arch_enter|leave_lazy_mmu_mode() callbacks > is consolidated, which allows further simplifications. > > The effect of this consolidation is not fully preemptible (Real-Time) > kernels can not enter the context switch while the lazy MMU mode is > active - which is easier to comprehend. > > Signed-off-by: Alexander Gordeev > --- > include/linux/pgtable.h | 12 ++++++------ > mm/kasan/shadow.c | 5 ----- > mm/memory.c | 5 ++++- > mm/vmalloc.c | 6 ++++++ > 4 files changed, 16 insertions(+), 12 deletions(-) > > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index 0b6e1f781d86..33bf2b13c219 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -224,12 +224,12 @@ static inline int pmd_dirty(pmd_t pmd) > * a raw PTE pointer after it has been modified are not guaranteed to be > * up to date. > * > - * In the general case, no lock is guaranteed to be held between entry and exit > - * of the lazy mode. So the implementation must assume preemption may be enabled > - * and cpu migration is possible; it must take steps to be robust against this. > - * (In practice, for user PTE updates, the appropriate page table lock(s) are > - * held, but for kernel PTE updates, no lock is held). Nesting is not permitted > - * and the mode cannot be used in interrupt context. > + * For PREEMPT_RT kernels implementation must assume that preemption may > + * be enabled and cpu migration is possible between entry and exit of the > + * lazy MMU mode; it must take steps to be robust against this. There is > + * no such assumption for non-PREEMPT_RT kernels, since both kernel and > + * user page tables are protected with a spinlock while in lazy MMU mode. > + * Nesting is not permitted and the mode cannot be used in interrupt context. > */ > #ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE > #define arch_enter_lazy_mmu_mode() do {} while (0) > diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c > index d2c70cd2afb1..45115bd770a9 100644 > --- a/mm/kasan/shadow.c > +++ b/mm/kasan/shadow.c > @@ -313,12 +313,10 @@ static int kasan_populate_vmalloc_pte(pte_t *ptep, unsigned long addr, > __memset(page_to_virt(page), KASAN_VMALLOC_INVALID, PAGE_SIZE); > pte = pfn_pte(page_to_pfn(page), PAGE_KERNEL); > > - spin_lock(&init_mm.page_table_lock); > if (likely(pte_none(ptep_get(ptep)))) { > set_pte_at(&init_mm, addr, ptep, pte); > data->pages[index] = NULL; > } > - spin_unlock(&init_mm.page_table_lock); > > return 0; > } > @@ -465,13 +463,10 @@ static int kasan_depopulate_vmalloc_pte(pte_t *ptep, unsigned long addr, > > page = (unsigned long)__va(pte_pfn(ptep_get(ptep)) << PAGE_SHIFT); > > - spin_lock(&init_mm.page_table_lock); > - > if (likely(!pte_none(ptep_get(ptep)))) { > pte_clear(&init_mm, addr, ptep); > free_page(page); > } > - spin_unlock(&init_mm.page_table_lock); > > return 0; > } > diff --git a/mm/memory.c b/mm/memory.c > index 71b3d3f98999..1ddc532b1f13 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3017,6 +3017,7 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, > pte = pte_offset_kernel(pmd, addr); > if (!pte) > return err; > + spin_lock(&init_mm.page_table_lock); > } else { > if (create) > pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); > @@ -3042,7 +3043,9 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, > > arch_leave_lazy_mmu_mode(); > > - if (mm != &init_mm) > + if (mm == &init_mm) > + spin_unlock(&init_mm.page_table_lock); > + else > pte_unmap_unlock(mapped_pte, ptl); > > *mask |= PGTBL_PTE_MODIFIED; > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index ab986dd09b6a..57b11000ae36 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -105,6 +105,7 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, > if (!pte) > return -ENOMEM; > > + spin_lock(&init_mm.page_table_lock); > This is not good. We introduce another bottle-neck. -- Uladzislau Rezki