From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 985E9C5B549 for ; Fri, 30 May 2025 09:35:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 373006B0093; Fri, 30 May 2025 05:35:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 322F46B00BE; Fri, 30 May 2025 05:35:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1EB4E6B00C1; Fri, 30 May 2025 05:35:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id EE29F6B0093 for ; Fri, 30 May 2025 05:35:54 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 9769AC2C48 for ; Fri, 30 May 2025 09:35:54 +0000 (UTC) X-FDA: 83499067428.27.284103C Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf20.hostedemail.com (Postfix) with ESMTP id B0B5C1C0004 for ; Fri, 30 May 2025 09:35:52 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=XEz2maQY; spf=pass (imf20.hostedemail.com: domain of libo.gcs85@bytedance.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=libo.gcs85@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748597752; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ruJXCVL33HlYZ/8pvuQkFgYyPjMywOaCJ6OilWvGVoA=; b=TkaS70xl0RrVyEq6we+dmW2kOF5j67GlIaEK6IwMsyqTuLzknS4qnnV3J7qTAYsK0SSnLL j6ELXBbba5si7evMn236tZeWDYJnkAbJB+5L5hQhE6VlKOtHzUrAE3scrkXzOAVdtX8x8Y /HX5MIWFJuQz5jJMtMFaniITuOKX7q0= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=XEz2maQY; spf=pass (imf20.hostedemail.com: domain of libo.gcs85@bytedance.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=libo.gcs85@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748597752; a=rsa-sha256; cv=none; b=2OL7KJzYhK2PVdQjW7xPFJv408YmP7gC2C8Yam/og3sqRN4JbumphqjdMcKgTwdCMKzdX8 e+BwlzkmQRJGHRBsMNZIpNgbO0d/5mxC2YbmB/5w+0wL7xsYJVOv7PzT3q5o5DsHpASFwx CKhpO0b9wV1Xhd0PExvqD1MiCKt+kHk= Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-3121aed2435so1492344a91.2 for ; Fri, 30 May 2025 02:35:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1748597751; x=1749202551; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ruJXCVL33HlYZ/8pvuQkFgYyPjMywOaCJ6OilWvGVoA=; b=XEz2maQY9BOMwFW2r5unWE4K16u2FTwoIcIgxW6wuVBXkJOEnE9vnUveMJNb73Svub VmyP4tp0SsItreFd92ih19Q/S/BQKWTJ5aebLyqont7DmLeXBIqqldrgtzjffxs8Jw66 YZZLyD1b2fbrSDGFptEKHh5+DP9rZn9EhFXXDq6Ew6y5lS2mexS2fa/YmD3sbuIY/ThK Ok4PqORoJO9tgTAsEoqjCbEP8Uut+rhtFsu5fwNyJOnvHtH335tEhMLZAYNnZg3Rrq2G 3rFe/UKCKm0iE0ZH/FQ65zGc684yYJ9b4BmHGaNMKv9WkdY4OqdLmHlgqtgIzHK+4h9z EbUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748597751; x=1749202551; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ruJXCVL33HlYZ/8pvuQkFgYyPjMywOaCJ6OilWvGVoA=; b=vhMDNgfkmTSpUkrnTBizIZgW2raRPK5y2IufbUZD/Yz5rMkWSyph+v02HRDGCUoAn/ 2WPhuRzfvjQIa4JEQeJvkQgBDFHzOnD+/gohiu9V6MOrt1MUL/7MYpCgnATP3z4SFyFw kjSPjAWd88wA/o6S3JWEAROnT+0wqI4LNhxnoHOV9SyZNe3IUuIOCFKW/KY2CKAdmLdc z7andAVYJMpofGWyLAvKHrZDjthTqiVqVsz2ry3o8BA+X7ilTpjHjAdCzcSNa4v5Ou7A xEkfO2b6xkm8uhwl8BIdo5mIuGJ+n/H6pdAYrxAKpJ9tFEtZ2MnpQ5VCJBLjTPnyqlgg 5S4w== X-Forwarded-Encrypted: i=1; AJvYcCWRrs1QLZrE3ynh6irjCcwhl2RNwXkyz3JPxrNKg8qz2JYTdxOLay8BdF27EG8nsn+vLRbcV19otA==@kvack.org X-Gm-Message-State: AOJu0Yz2SkfPykIeOOCHbkNOTB1cLC4NGBvATLhgvp0Q+dWHdmNTsZzm ftQ5lOjl4gbskcW6E342+RrJzmpdfNfpSs4FCA8DEyQCXugMIXzn82f/WTJY6HdA6lg= X-Gm-Gg: ASbGncvnqDFqa+moaTlY+nkAJ4T+NzjcrA8cNvN4ufurMhn0f8bdCHDXF+RUZn3rfv6 r+dIFc1SOKPLMpakr5jOcyAZsaSYvkOkD0Ka+P5VHn0pYHCex7l9LGsqqwCTkv/AzA08qhiXkAF 5DZimFAlSQXTRaAQ9WPDNq1tyIrbp9pufJ7wbbm45XKFXdFdivI1HGgQVOd2rLtrvnDlVdGc9RB O0f/NF5eGJM3BDzSCJqekXh64+YR7MG1SEfAPMNJCr/mxx5mAQcgNLiWAkSdTUMP+/nyl1sHEar mnFfSKvyDxnmUoUBFhUeRz9oIw6Q5Bsr8aXtREl/cnucACKslGLZ1Chqg5s3wV4JjYOIkonjrdc OvFtbgl6yGQ== X-Google-Smtp-Source: AGHT+IEvXrgn/ooMRW35qDhy0ngiIGcMy7za8V80iv/SemfT2y7R5n/KmWGhxdI6JxqwxTCjuIZGRA== X-Received: by 2002:a17:90b:1dc4:b0:311:afaa:5e25 with SMTP id 98e67ed59e1d1-31241865ecdmr4397369a91.24.1748597751453; Fri, 30 May 2025 02:35:51 -0700 (PDT) Received: from FQ627FTG20.bytedance.net ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-3124e29f7b8sm838724a91.2.2025.05.30.02.35.36 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 30 May 2025 02:35:51 -0700 (PDT) From: Bo Li To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, luto@kernel.org, kees@kernel.org, akpm@linux-foundation.org, david@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, peterz@infradead.org Cc: dietmar.eggemann@arm.com, hpa@zytor.com, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, jannh@google.com, pfalcato@suse.de, riel@surriel.com, harry.yoo@oracle.com, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, yinhongbo@bytedance.com, dengliang.1214@bytedance.com, xieyongji@bytedance.com, chaiwen.cc@bytedance.com, songmuchun@bytedance.com, yuanzhu@bytedance.com, chengguozhu@bytedance.com, sunjiadong.lff@bytedance.com, Bo Li Subject: [RFC v2 29/35] RPAL: fix race condition in pkru update Date: Fri, 30 May 2025 17:27:57 +0800 Message-Id: <7fbb84a57fc8046738c7196031a3fd97ea8334e2.1748594841.git.libo.gcs85@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: B0B5C1C0004 X-Stat-Signature: 55pdwzrs4t3jfmcf64tieazpcrq8hs16 X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1748597752-947034 X-HE-Meta: U2FsdGVkX18yaiDJuBfLIBQCDg5ErDGgcVO+YQmtMcAMT085CSKTxr/sYcNLqD3mp+Ue61Y5jgjVSxoTkz/OUbsd5ozS05FZi4RZN9zb5th7nPRBZbU4JHf6wXw5gqhtCqKAUuO8z5c/N9C09/wDvqXyzcTuwF0Fx4T8Uhq5GsQecesbTRtd1qVvpo/DB2FbjKArYX/STkmT/SMigByNGHAvE5//RROxjIj0jYQpP4DomrwUBci4DMyEs6DIX4p3X8dprn5rnhDxJJKChaodz0s0tSlE1Kc6hjOciJiujHk9FTEW27SuGp1A6JaOu9XeXt+t34JxdF4BLNIwdFV38JvvYFOsMKIho/1GxCLDvZhsf39RVUmTjnAnTN1V/0VcTbrq3yZuUK+rTZuFNHc5bCX5jGMcgV7k3bUi1wexuJcQVYxmNLaTH6YUc2GJMmwqYkHEmNhI/HT5H8VtyMsdZvKWOQQtf2vpfBCeyRydZxRSpTtM1z6ZGRnoT5sXJLYUXJ6AoTChSccLYlMUz/SiKYxi8z24ib3l0wnkh6QnVo7VdvXpKrwQnURcIo2kQePKJhzkB0TIhnLy8uxGK9pT2MgM599+07BDaM2grs6jgjmyjhMjTMncyjlqLN+OQ/CELOFVeB6CzMVbDQUnvZDwi2Lv+MEImM1OhtYg4GF5qQstX0rgfSSXU0p22dEAC8mL0J18HfuOZBKgWV8eB1Tuu61Ez7ip/A/sHUXn86w9Fm8WnXxqX+l/35Yt3F7xC6gisikSKkdDqBFTU0aOU6NB72xUgeGlDNVmNKfGw7F/AO/F2PT5mP5Wgxje8KfR78DwC32Omq5I0VfdLlaRJmj5czzhiJ4NzhIdZk8CKguY5azqeG2kFH3pSQxECkRVvvxUWMsvPONOCRhY0AKqMl9fX+sPQGBOXiohZUjy3bjGBrCb0M+PQm2Ne1TNjJELqBMXcTb9JOP8cEBeIj7d5l/ GCzWu+gr 7DrlZ+K0ZHgtHxshJ/lRvfvIYfCjCFzU5gmfAaje767LSauoAO6EnVmo4w5Cl30uHpF2Tn/QlAt+LWWkmfKrgMCWx6SLDwDUJkMImkXQ0z1tNOmNq4hXEXClH4e0IlCCyrtUTiQDA+/X/R+Tus3OXcYNeOyaADzRgHEEXK6Ba+I6Q0w7f1qm8oSlTj+cbGgwIb+t7VyIiWVMsgQWHq6R+iISwZf4bfMmUI2byyHHeaEvWmoIY2QxtuXISBp1R84iMKBds7e3Q4xXPpJWIWYu791hQIyqsz/e92I0AU+JB3cssUqk+LG/kh6ldFQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When setting up MPK, RPAL uses IPIs to notify tasks running on each core in the thread group to modify their PKRU values and update the PKEY fields in all VMA page tables. A race condition exists here: when updating PKRU, the page table updates may not yet be complete. In such cases, writing PKRU permissions at locations that require calling pkru_write_default() (e.g., during signal handling) must not be restricted to a single PKEY, as this would cause PKRU permissions to fail to accommodate both old and new page table PKEY settings. This patch introduces a pku_on state with values PKU_ON_FALSE, PKU_ON_INIT, and PKU_ON_FINISH, representing the states before, during, and after page table PKEY updates, respectively. For RPAL services, all calls to pkru_write_default() are replaced with rpal_pkru_write_default(). - Before page table setup (PKU_ON_FALSE), rpal_pkru_write_default() directly calls pkru_write_default(). - During page table setup (PKU_ON_INIT), rpal_pkru_write_default() enables permissions for all PKEYs, ensuring the task can access both old and new page tables simultaneously. - After page table setup completes (PKU_ON_FINISH), rpal_pkru_write_default() tightens permissions to match the updated page tables. For newly allocated page tables, the new PKEY is only used when pku_on is PKU_ON_FINISH. The mmap lock is used to ensure no race conditions occur during this process. Signed-off-by: Bo Li --- arch/x86/kernel/cpu/common.c | 4 ++-- arch/x86/kernel/fpu/core.c | 4 ++-- arch/x86/kernel/process.c | 4 ++-- arch/x86/rpal/pku.c | 14 +++++++++++++- arch/x86/rpal/service.c | 2 +- include/linux/rpal.h | 9 ++++++++- mm/mmap.c | 2 +- mm/mprotect.c | 1 + mm/vma.c | 2 +- 9 files changed, 31 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 2678453cdf76..d21f44873b86 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -534,8 +534,8 @@ static __always_inline void setup_pku(struct cpuinfo_x86 *c) cr4_set_bits(X86_CR4_PKE); /* Load the default PKRU value */ #ifdef CONFIG_RPAL_PKU - if (rpal_current_service() && rpal_current_service()->pku_on) - write_pkru(rpal_pkey_to_pkru(rpal_current_service()->pkey)); + if (rpal_current_service()) + rpal_pkru_write_default(); else #endif pkru_write_default(); diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 251b1ddee726..4b413af0b179 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -748,8 +748,8 @@ static inline void restore_fpregs_from_init_fpstate(u64 features_mask) frstor(&init_fpstate.regs.fsave); #ifdef CONFIG_RPAL_PKU - if (rpal_current_service() && rpal_current_service()->pku_on) - write_pkru(rpal_pkey_to_pkru(rpal_current_service()->pkey)); + if (rpal_current_service()) + rpal_pkru_write_default(); else #endif pkru_write_default(); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index b74de35218f9..898a9e0b23e7 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -286,8 +286,8 @@ static void pkru_flush_thread(void) * the hardware right here (similar to context switch). */ #ifdef CONFIG_RPAL_PKU - if (rpal_current_service() && rpal_current_service()->pku_on) - write_pkru(rpal_pkey_to_pkru(rpal_current_service()->pkey)); + if (rpal_current_service()) + rpal_pkru_write_default(); else #endif pkru_write_default(); diff --git a/arch/x86/rpal/pku.c b/arch/x86/rpal/pku.c index 26cef324f41f..8e530931fb23 100644 --- a/arch/x86/rpal/pku.c +++ b/arch/x86/rpal/pku.c @@ -161,7 +161,7 @@ int rpal_pkey_setup(struct rpal_service *rs, int pkey) rs->pkey = pkey; /* others must see rs->pkey before rs->pku_on */ barrier(); - rs->pku_on = true; + rs->pku_on = PKU_ON_INIT; mmap_write_unlock(current->mm); rpal_set_group_pkru(val, RPAL_PKRU_UNION); err = do_rpal_mprotect_pkey(rs->base, RPAL_ADDR_SPACE_SIZE, pkey); @@ -182,3 +182,15 @@ int rpal_alloc_pkey(struct rpal_service *rs, int pkey) return ret; } + +void rpal_pkru_write_default(void) +{ + struct rpal_service *cur = rpal_current_service(); + + if (cur->pku_on == PKU_ON_INIT) + write_pkru(0); + else if (cur->pku_on == PKU_ON_FINISH) + write_pkru(rpal_pkey_to_pkru(rpal_current_service()->pkey)); + else + pkru_write_default(); +} diff --git a/arch/x86/rpal/service.c b/arch/x86/rpal/service.c index 7a83e85cf096..9fd568fa9a29 100644 --- a/arch/x86/rpal/service.c +++ b/arch/x86/rpal/service.c @@ -210,7 +210,7 @@ struct rpal_service *rpal_register_service(void) init_waitqueue_head(&rs->rpd.rpal_waitqueue); #ifdef CONFIG_RPAL_PKU rs->pkey = -1; - rs->pku_on = false; + rs->pku_on = PKU_ON_FALSE; rpal_service_pku_init(); #endif diff --git a/include/linux/rpal.h b/include/linux/rpal.h index 7657e6c6393b..16a3c80383f7 100644 --- a/include/linux/rpal.h +++ b/include/linux/rpal.h @@ -138,6 +138,12 @@ enum rpal_capability { RPAL_CAP_PKU }; +enum { + PKU_ON_FALSE, + PKU_ON_INIT, + PKU_ON_FINISH, +}; + struct rpal_critical_section { unsigned long ret_begin; unsigned long ret_end; @@ -245,7 +251,7 @@ struct rpal_service { #ifdef CONFIG_RPAL_PKU /* pkey */ - bool pku_on; + int pku_on; int pkey; #endif @@ -599,6 +605,7 @@ __rpal_switch_to(struct task_struct *prev_p, struct task_struct *next_p); asmlinkage __visible void rpal_schedule_tail(struct task_struct *prev); int do_rpal_mprotect_pkey(unsigned long start, size_t len, int pkey); void rpal_set_pku_schedule_tail(struct task_struct *prev); +void rpal_pkru_write_default(void); int rpal_ep_autoremove_wake_function(wait_queue_entry_t *curr, unsigned int mode, int wake_flags, void *key); diff --git a/mm/mmap.c b/mm/mmap.c index d36ea4ea2bd0..85a4a33491ab 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -404,7 +404,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, do { struct rpal_service *cur = rpal_current_service(); - if (cur && cur->pku_on) + if (cur && cur->pku_on == PKU_ON_FINISH) pkey = cur->pkey; } while (0); #endif diff --git a/mm/mprotect.c b/mm/mprotect.c index e9ae828e377d..ac162180553e 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -938,6 +938,7 @@ int do_rpal_mprotect_pkey(unsigned long start, size_t len, int pkey) } tlb_finish_mmu(&tlb); + rpal_current_service()->pku_on = PKU_ON_FINISH; out: mmap_write_unlock(current->mm); return error; diff --git a/mm/vma.c b/mm/vma.c index fa9d8f694e6e..57ec99a5969d 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -2632,7 +2632,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma, struct rpal_service *cur = rpal_current_service(); unsigned long vma_pkey_mask; - if (cur && cur->pku_on) { + if (cur && cur->pku_on == PKU_ON_FINISH) { vma_pkey_mask = VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | VM_PKEY_BIT3; flags &= ~vma_pkey_mask; -- 2.20.1