From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBF1FC83F09 for ; Wed, 9 Jul 2025 13:50:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 68F5F6B011A; Wed, 9 Jul 2025 09:50:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 63FE66B011D; Wed, 9 Jul 2025 09:50:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5083B6B011C; Wed, 9 Jul 2025 09:50:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3C9F26B00E7 for ; Wed, 9 Jul 2025 09:50:04 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E55161A01C9 for ; Wed, 9 Jul 2025 13:50:03 +0000 (UTC) X-FDA: 83644859886.16.B7B315C Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf17.hostedemail.com (Postfix) with ESMTP id 3A52E4000C for ; Wed, 9 Jul 2025 13:50:02 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=qv0TVJVV; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of rppt@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=rppt@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752069002; a=rsa-sha256; cv=none; b=N/kI73EX1/O6hTxwSFzjDNn41h+tWfw8ZtVluAafiklD31t64TnN3emp63uX6XfcisEVD/ oTADL3+GvpRNgy64nJktPtmqr+FWCIeotOU1wsaJCVuN9TyuaKwabgQFfPtlz8txJiblq+ DBUU7hYJqaKjxorACBn96qfDUEpl2Kc= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=qv0TVJVV; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of rppt@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=rppt@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752069002; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=izrYhujypC8KAONvJVU7cD9z/VRe4bye0rJnfzVM9N0=; b=uFp6OJTIsnH87WDgTUI/F1ffv0XAaoY3WW+I+nII9kx3nQMIOZ9Ra2qf1WHruriYZJHhsT Q3lwO55Ox9HdjZryWZl6txycA4vQmy3+ZRflXLQQCVsBNgdg8pZj994vgKUW1ajMrgYXcn bjVbTPNVVuijbkMnc4MPUxbqTYMvLDA= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 55D9D61437; Wed, 9 Jul 2025 13:50:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6E5B6C4CEEF; Wed, 9 Jul 2025 13:49:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1752069001; bh=2EJZ9+aB/E5JNkmaeT8QY2pqIApZJVO2kuQHP+iIbQI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qv0TVJVVFILyhiJ07rmM2QkXyhZw3eWuUqSiiJFuqIKYREmQJdbXlo1ydYetRUV/Q 6pq7HnIV70A+FY4Yyg9S8PSunITrE7M+rpWbrZbZH0Cth1MXBjHR83573BR3awXc2k VL3WTsF2m+B1ZLDW0AbK4aFS2IkzcjjCnu1+VpV8WtA1AfQITIvf7Vh14y5CpS2JWi 3IdMPfvL3K9WlefR9X1eLsdYUUgQUSU61PP/FpVqEeWgnYVARIGgy5xHJc3Cuuz0By 214qqcDhO1GoOQZ5CxWgbFk3UwmBkndOBXsHJyb1lppDMeLGVJGQAY3p19cHgzASj1 iYdenymGUPpdQ== From: Mike Rapoport To: Andrew Morton Cc: Andy Lutomirski , Borislav Petkov , Christophe Leroy , Daniel Gomez , Dave Hansen , Ingo Molnar , "Liam R. Howlett" , Luis Chamberlain , Mark Rutland , Masami Hiramatsu , Mike Rapoport , "H. Peter Anvin" , Peter Zijlstra , Petr Pavlu , Sami Tolvanen , Steven Rostedt , Thomas Gleixner , Yann Ylavic , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 3/8] execmem: rework execmem_cache_free() Date: Wed, 9 Jul 2025 16:49:28 +0300 Message-ID: <20250709134933.3848895-4-rppt@kernel.org> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250709134933.3848895-1-rppt@kernel.org> References: <20250709134933.3848895-1-rppt@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 3A52E4000C X-Stat-Signature: p15y8scah7eifc3cq5t5zgm187jdwdj6 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1752069002-829153 X-HE-Meta: U2FsdGVkX1/VuMjeRIAlldE/izqVC8I7m3QF205APpfiY8bARDl1IINJZRwOxVUnnCKdLVvjkKwZ09U31yUnJyaUcuTRdmKe2LyX6EBes08SOiaqDdeoksbToZFczTuryO1mXVbY0AZq3AhimbcfIPD1j3VnAyyh72BEMutHlXYayY532fnfZ6mx085ZoUgj2Fp+06N5gt1Klfl1mWCk6DeeOnpOCeMY4q533haSUucp2+eeaMNhxRDmN3o/WL/eFWJ5L3ly/HAvs11mpPCFn0V1AQIdcYxwdJoe4aqpqaerkCoGx0mLDMWP5aQwRYZaMG4PRb8w0YnWKDEa0MHT4YEDvlvCWgA2ow076yPY+0gMh6JIXzcDl9npTySdK+0bukKW35L+iuy6KbJtxdXFIk/Qh2N79UeiH+WLmBNtrLfskziEvRheYGuhBoTsy99OA2hDHVSXK8tqfJ0FBT7m7uIWkwzbpUtwucX5GWV74jTaFZI2zKh1Zt0ZrqhY4EDDiP8ddjpA7217bUH0B8yvNqHVmtPdBiTtEUjWpwKp/778Q7myxvpEAMdvLaMevWfM2Pt6BYigdkfaMVAMWiq1OVYYsD+lDDxncjKc93q6lLKWooI6DQKjrvp2mU4dEUxpS7P95ThKsPdW4HXAwXx522PVVLg++raK4vid34FfkVQF1tzB1a9zS3SUXHs2HyOJPWWihe2J1eHHWnkB5+MwZELAFyk9Pt7DphBITciNZirxFm3HxKWyGsrMoomUsQ4Oe4HFmh1MI91/7qtJMFpDiZRkPA0Y7DqVyUIHntTXh2kESNFwERn7FBDSr4DDnXO5oEhxUaz6k1y2Zmoyp+PGo2pS5t6DWqkHn/RYb779zQ0oKIPwxN1D5Qzr7HnYA7rsHa6+KUQFOtTUvYl87kPWwA+ZPEOArmMjN6VdxsYcPk0B8R0j9++Y8PGe6SrytjWo7AAtgE58Rb4Q9kq6WFR yYjCI971 80QpyRYzoGN6QZFEOAcQTQ0EW3kX5wPmORMRyTc6TvJeHvITXDFTunTJhZexFtbr/9HQy1vb5V/F9QwbzIwZ6qbax6nKuU/CJnHniAxoMUJ6PF7fiW1n0yqo4wE2RLERM6TvkKQtsAwkBIS1G17HjSZ9GJFJz2jCuWgv2O3GiSMv0Zhi42qJbB9EG6uI9hwVdLag42kGF7MFbSQ9mkb59fD6aSq4qAVrvi3R7Fdgis5YzW3C/Pfpy9KoIhO8F1PD53JcqR7az45cN8Vll0br3USw9k62H6JmCXxue7BU3WqkvNIEAqQIjyahF6H7Vok4Pl0jHB4N4CadCbB3tLO7TMK7rl9wIuDtw1SBC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: "Mike Rapoport (Microsoft)" Currently execmem_cache_free() ignores potential allocation failures that may happen in execmem_cache_add(). Besides, it uses text poking to fill the memory with trapping instructions before returning it to cache although it would be more efficient to make that memory writable, update it using memcpy and then restore ROX protection. Rework execmem_cache_free() so that in case of an error it will defer freeing of the memory to a delayed work. With this the happy fast path will now change permissions to RW, fill the memory with trapping instructions using memcpy, restore ROX permissions, add the memory back to the free cache and clear the relevant entry in busy_areas. If any step in the fast path fails, the entry in busy_areas will be marked as pending_free. These entries will be handled by a delayed work and freed asynchronously. To make the fast path faster, use __GFP_NORETRY for memory allocations and let asynchronous handler try harder with GFP_KERNEL. Signed-off-by: Mike Rapoport (Microsoft) --- mm/execmem.c | 125 +++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 102 insertions(+), 23 deletions(-) diff --git a/mm/execmem.c b/mm/execmem.c index 6b040fbc5f4f..4670e97f8e4e 100644 --- a/mm/execmem.c +++ b/mm/execmem.c @@ -93,8 +93,15 @@ struct execmem_cache { struct mutex mutex; struct maple_tree busy_areas; struct maple_tree free_areas; + unsigned int pending_free_cnt; /* protected by mutex */ }; +/* delay to schedule asynchronous free if fast path free fails */ +#define FREE_DELAY (msecs_to_jiffies(10)) + +/* mark entries in busy_areas that should be freed asynchronously */ +#define PENDING_FREE_MASK (1 << (PAGE_SHIFT - 1)) + static struct execmem_cache execmem_cache = { .mutex = __MUTEX_INITIALIZER(execmem_cache.mutex), .busy_areas = MTREE_INIT_EXT(busy_areas, MT_FLAGS_LOCK_EXTERN, @@ -155,20 +162,17 @@ static void execmem_cache_clean(struct work_struct *work) static DECLARE_WORK(execmem_cache_clean_work, execmem_cache_clean); -static int execmem_cache_add(void *ptr, size_t size) +static int execmem_cache_add_locked(void *ptr, size_t size, gfp_t gfp_mask) { struct maple_tree *free_areas = &execmem_cache.free_areas; - struct mutex *mutex = &execmem_cache.mutex; unsigned long addr = (unsigned long)ptr; MA_STATE(mas, free_areas, addr - 1, addr + 1); unsigned long lower, upper; void *area = NULL; - int err; lower = addr; upper = addr + size - 1; - mutex_lock(mutex); area = mas_walk(&mas); if (area && mas.last == addr - 1) lower = mas.index; @@ -178,12 +182,14 @@ static int execmem_cache_add(void *ptr, size_t size) upper = mas.last; mas_set_range(&mas, lower, upper); - err = mas_store_gfp(&mas, (void *)lower, GFP_KERNEL); - mutex_unlock(mutex); - if (err) - return err; + return mas_store_gfp(&mas, (void *)lower, gfp_mask); +} - return 0; +static int execmem_cache_add(void *ptr, size_t size, gfp_t gfp_mask) +{ + guard(mutex)(&execmem_cache.mutex); + + return execmem_cache_add_locked(ptr, size, gfp_mask); } static bool within_range(struct execmem_range *range, struct ma_state *mas, @@ -278,7 +284,7 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size) if (err) goto err_free_mem; - err = execmem_cache_add(p, alloc_size); + err = execmem_cache_add(p, alloc_size, GFP_KERNEL); if (err) goto err_reset_direct_map; @@ -307,29 +313,102 @@ static void *execmem_cache_alloc(struct execmem_range *range, size_t size) return __execmem_cache_alloc(range, size); } +static inline bool is_pending_free(void *ptr) +{ + return ((unsigned long)ptr & PENDING_FREE_MASK); +} + +static inline void *pending_free_set(void *ptr) +{ + return (void *)((unsigned long)ptr | PENDING_FREE_MASK); +} + +static inline void *pending_free_clear(void *ptr) +{ + return (void *)((unsigned long)ptr & ~PENDING_FREE_MASK); +} + +static int execmem_force_rw(void *ptr, size_t size); + +static int __execmem_cache_free(struct ma_state *mas, void *ptr, gfp_t gfp_mask) +{ + size_t size = mas_range_len(mas); + int err; + + err = execmem_force_rw(ptr, size); + if (err) + return err; + + execmem_fill_trapping_insns(ptr, size, /* writable = */ true); + execmem_restore_rox(ptr, size); + + err = execmem_cache_add_locked(ptr, size, gfp_mask); + if (err) + return err; + + mas_store_gfp(mas, NULL, gfp_mask); + return 0; +} + +static void execmem_cache_free_slow(struct work_struct *work); +static DECLARE_DELAYED_WORK(execmem_cache_free_work, execmem_cache_free_slow); + +static void execmem_cache_free_slow(struct work_struct *work) +{ + struct maple_tree *busy_areas = &execmem_cache.busy_areas; + MA_STATE(mas, busy_areas, 0, ULONG_MAX); + void *area; + + guard(mutex)(&execmem_cache.mutex); + + if (!execmem_cache.pending_free_cnt) + return; + + mas_for_each(&mas, area, ULONG_MAX) { + if (!is_pending_free(area)) + continue; + + area = pending_free_clear(area); + if (__execmem_cache_free(&mas, area, GFP_KERNEL)) + continue; + + execmem_cache.pending_free_cnt--; + } + + if (execmem_cache.pending_free_cnt) + schedule_delayed_work(&execmem_cache_free_work, FREE_DELAY); + else + schedule_work(&execmem_cache_clean_work); +} + static bool execmem_cache_free(void *ptr) { struct maple_tree *busy_areas = &execmem_cache.busy_areas; - struct mutex *mutex = &execmem_cache.mutex; unsigned long addr = (unsigned long)ptr; MA_STATE(mas, busy_areas, addr, addr); - size_t size; void *area; + int err; + + guard(mutex)(&execmem_cache.mutex); - mutex_lock(mutex); area = mas_walk(&mas); - if (!area) { - mutex_unlock(mutex); + if (!area) return false; - } - size = mas_range_len(&mas); - mas_store_gfp(&mas, NULL, GFP_KERNEL); - mutex_unlock(mutex); - - execmem_fill_trapping_insns(ptr, size, /* writable = */ false); - - execmem_cache_add(ptr, size); + err = __execmem_cache_free(&mas, area, GFP_KERNEL | __GFP_NORETRY); + if (err) { + /* + * mas points to exact slot we've got the area from, nothing + * else can modify the tree because of the mutex, so there + * won't be any allocations in mas_store_gfp() and it will just + * change the pointer. + */ + area = pending_free_set(area); + mas_store_gfp(&mas, area, GFP_KERNEL); + execmem_cache.pending_free_cnt++; + schedule_delayed_work(&execmem_cache_free_work, FREE_DELAY); + return true; + } schedule_work(&execmem_cache_clean_work); -- 2.47.2