From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FCA4D5C0F5 for ; Fri, 8 Nov 2024 16:00:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A14D16B0092; Fri, 8 Nov 2024 11:00:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C4CF6B0098; Fri, 8 Nov 2024 11:00:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 864DF6B0099; Fri, 8 Nov 2024 11:00:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 679436B0092 for ; Fri, 8 Nov 2024 11:00:16 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DDE38C0E68 for ; Fri, 8 Nov 2024 16:00:15 +0000 (UTC) X-FDA: 82763388876.04.FF10BBC Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf27.hostedemail.com (Postfix) with ESMTP id B041B40003 for ; Fri, 8 Nov 2024 15:59:35 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=Cw7Kfq2p; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=lwkyX4zU; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=Cw7Kfq2p; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=lwkyX4zU; dmarc=none; spf=pass (imf27.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731081442; a=rsa-sha256; cv=none; b=WP+Rq62up2wj28pLMNxsynqKW82d+uJt0wTcftWkt2luyOUROaMmmiETGANN4r+kJaCaiq oi/gGSbixdefnGN976FhikhfUFlZWGFyHLtggZ1agBVYyxQVwHdEVEgB8O2iLExC2WV2YR LKsaRgDb5SJzhcSC0xwEUfNLu+cuV24= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=Cw7Kfq2p; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=lwkyX4zU; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=Cw7Kfq2p; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=lwkyX4zU; dmarc=none; spf=pass (imf27.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731081442; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=v3jnqR1uH7//51ZGQAIg7SNFQ0QCOmeA+yva2hZHnZI=; b=u5d7Gwioko4qAwa0lfIoOgGP7w9uEpFYXD7LBo/xcm6amHUqZDmMAqb7CY6U7i50WHg94F 4zkLH+0JL0pCuMFVD8ykGU1RtdxhbZyTTXKs05ZPuZJsfMoGRTg+GdaPECfkEvRDu82Pih hMBHiA1oKHoZoIjCje2mvZNFjx37U/s= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 9169021CCC; Fri, 8 Nov 2024 16:00:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1731081611; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v3jnqR1uH7//51ZGQAIg7SNFQ0QCOmeA+yva2hZHnZI=; b=Cw7Kfq2p43wQAKXjjUJM8EUkYWi9cxU5cdRiYD3SlCh7hAiUklMoGuUUO8+wpl2wasmfYr 3dQVs1Ef0Jbrzg8zc98fcuBABS5uSkkiueysgiImQZR5bqx7GVIq2kxkJgXG2HVXAPTvvN fWWpwqR+bRHwzMm6tw+NpSxQMcwG6Ks= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1731081611; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v3jnqR1uH7//51ZGQAIg7SNFQ0QCOmeA+yva2hZHnZI=; b=lwkyX4zU1bVupNGBGUsJuh4XUsfS0RVvTjgOIdaQvL3CTR7bnC9TQc77SbkNdMtEcVcvHw LV6M+HWQJsUHtnBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1731081611; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v3jnqR1uH7//51ZGQAIg7SNFQ0QCOmeA+yva2hZHnZI=; b=Cw7Kfq2p43wQAKXjjUJM8EUkYWi9cxU5cdRiYD3SlCh7hAiUklMoGuUUO8+wpl2wasmfYr 3dQVs1Ef0Jbrzg8zc98fcuBABS5uSkkiueysgiImQZR5bqx7GVIq2kxkJgXG2HVXAPTvvN fWWpwqR+bRHwzMm6tw+NpSxQMcwG6Ks= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1731081611; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v3jnqR1uH7//51ZGQAIg7SNFQ0QCOmeA+yva2hZHnZI=; b=lwkyX4zU1bVupNGBGUsJuh4XUsfS0RVvTjgOIdaQvL3CTR7bnC9TQc77SbkNdMtEcVcvHw LV6M+HWQJsUHtnBg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 6171113967; Fri, 8 Nov 2024 16:00:11 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id D/sUF4s1LmemIwAAD6G6ig (envelope-from ); Fri, 08 Nov 2024 16:00:11 +0000 Message-ID: Date: Fri, 8 Nov 2024 17:00:10 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/1] sched/numa: Fix memory leak due to the overwritten vma->numab_state Content-Language: en-US To: Adrian Huang , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Andrew Morton Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Raghavendra K T , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Adrian Huang , Jiwei Sun References: <20241108133139.25326-1-ahuang12@lenovo.com> From: Vlastimil Babka In-Reply-To: <20241108133139.25326-1-ahuang12@lenovo.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: 9hwy18ki9pzj9stfgah5abh3iuoa19gu X-Rspamd-Queue-Id: B041B40003 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1731081575-49407 X-HE-Meta: U2FsdGVkX1+TQ4HrOGaewLV1yq6e6pQ3b9sxk4g4AYNCWuKrZ0DEQPDF0Ps3gmne4/V/sqHXdQYQFURSVK83uQKwfxbkGp9uZ1wWxNclh3aAaLfpBqGYAYV+WuoUQu4mWkilfDHHoZxPZoJmS57N75CSw4xx7Emcac4wgpGWCjcaMF8ZrL2nrWoZJT373+dk0wKD/C7UVc75wvHnHI/sWs7szPTgqIqyk8fv0g4umv05wRJXIovPNLvB9D4thoCJj5SustQjgd1YFB1JXp88B5+QlMEGz+AOzgWjIJu0d1xAcWde3uTZjLdQB8KJrmD+zcEJd/2sjC9SJop8+JfJia3UZOLCDJDh1pCSMECVZcL/3USJhQVl+zAPHf1COtsG3DXlzgbMy17osy+o3nnMADbMeNIoRsTsnqVYzednZaplIWGabvk/VEY3hmOsek6rnbXtuH4rrEo0besSCuZbNulcah9Xx6AusfKaMejTcWCHx9uamXrAWxtsuZlA09K6ukUffNnrRXkJWIbNEgWVZNWnEYNhVPqpcWKPc2dZkFiFnllcqsTA5MCOEWvYjY6gvjO5QSASogcDrIgmMrhPYYYOHczuTma261OrXtm7HAqq9+2puudWIw3cJ08RvpPnAGQL95oTJTmv5jjTr7oRWyxD3G93sk5r0uVuaFSAcFZdEcEhAuCyqjqgfZTI8wR3CW/It6gIjz0DWigKx0T1U7Md2l6QNa444ZeN/NZuJAmnSY4Descu/1acaW23XtcmFhq6Nz+LkTyY3OHG1qSk6LCuUDEDqIBHDsPIxFZHDrFr5zJ8rjnuJjnhAeQ0dzvecqUck+wG5uW4bH+BR/6FcdGtXC//sRoaSZUSpFcJaR4QT8S5kGxqmTMYnqa4hu4IuoEub+7/mYT1PJgF1lMxAoPZu4zaGNaDLnQjGNwFF76bThVKKA7URQkzVBQabEwTg1aEfLPmVBotxCLPcWt mshzgLtw f92Dfxw6ANoEtH4X/vhcdgD+9wMrjJ3N5FrJpg7A7uRqg/c/B2VyRCOAuPoUjZgLAScE9d9Lc+IgjuwAdfCx+AeKE4jJqylfcOHBJU6a+7hNzRdSjgaB+Em8kvFDLxuSVzyQ3wEZvepcRQidZh5SdjUXjvustJZyS5dkZWI3M4EkotvD0odmZidlG4BmXtsYMO5LCKWN8H9Ex5QttR2vo6I/9v+8MfwKRvaMG+qpvF3sKcXVzjfE93nc0Dnar5YOb1/IkWHwUomXf9uKlUXhuWvrTLbqCP88EGrBN786O2vm2qt98v/x/N/HlpWcf+3gyru8hks28xaNa8l3AljIR4qUNjLxB+QZM6ZsY5URyuGNpxxU8aGxO6s1PhFFBNcMXUTrNsNC3gyKnbLI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 11/8/24 14:31, Adrian Huang wrote: > From: Adrian Huang > > [Problem Description] > When running the hackbench program of LTP, the following memory leak is > reported by kmemleak. > > # /opt/ltp/testcases/bin/hackbench 20 thread 1000 > Running with 20*40 (== 800) tasks. > > # dmesg | grep kmemleak > ... > kmemleak: 480 new suspected memory leaks (see /sys/kernel/debug/kmemleak) > kmemleak: 665 new suspected memory leaks (see /sys/kernel/debug/kmemleak) > > # cat /sys/kernel/debug/kmemleak > unreferenced object 0xffff888cd8ca2c40 (size 64): > comm "hackbench", pid 17142, jiffies 4299780315 > hex dump (first 32 bytes): > ac 74 49 00 01 00 00 00 4c 84 49 00 01 00 00 00 .tI.....L.I..... > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace (crc bff18fd4): > [] __kmalloc_cache_noprof+0x2f9/0x3f0 > [] task_numa_work+0x725/0xa00 > [] task_work_run+0x58/0x90 > [] syscall_exit_to_user_mode+0x1c8/0x1e0 > [] do_syscall_64+0x85/0x150 > [] entry_SYSCALL_64_after_hwframe+0x76/0x7e > ... > > This issue can be consistently reproduced on three different servers: > * a 448-core server > * a 256-core server > * a 192-core server > > [Root Cause] > Since multiple threads are created by the hackbench program (along with > the command argument 'thread'), a shared vma might be accessed by two or > more cores simultaneously. When two or more cores observe that > vma->numab_state is NULL at the same time, vma->numab_state will be > overwritten. > > Note that the command `/opt/ltp/testcases/bin/hackbench 50 process 1000` > cannot the reproduce the issue because of the fork() and COW. It is > verified with 200+ test runs. > > [Solution] > Introduce a lock to make sure the atomic operation of the vma->numab_state > access. > > Fixes: ef6a22b70f6d ("sched/numa: apply the scan delay to every new vma") > Reported-by: Jiwei Sun > Signed-off-by: Adrian Huang Could this be achieved without the new lock, by a cmpxchg attempt to install vma->numab_state that will free the allocated vma_numab_state if it fails? Thanks, Vlastimil > --- > include/linux/mm.h | 1 + > include/linux/mm_types.h | 1 + > kernel/sched/fair.c | 17 ++++++++++++++++- > 3 files changed, 18 insertions(+), 1 deletion(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 61fff5d34ed5..a08e31ac53de 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -673,6 +673,7 @@ struct vm_operations_struct { > static inline void vma_numab_state_init(struct vm_area_struct *vma) > { > vma->numab_state = NULL; > + mutex_init(&vma->numab_state_lock); > } > static inline void vma_numab_state_free(struct vm_area_struct *vma) > { > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 6e3bdf8e38bc..77eee89a89f5 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -768,6 +768,7 @@ struct vm_area_struct { > #endif > #ifdef CONFIG_NUMA_BALANCING > struct vma_numab_state *numab_state; /* NUMA Balancing state */ > + struct mutex numab_state_lock; /* NUMA Balancing state lock */ > #endif > struct vm_userfaultfd_ctx vm_userfaultfd_ctx; > } __randomize_layout; > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index c157d4860a3b..53e6383cd94e 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3397,12 +3397,24 @@ static void task_numa_work(struct callback_head *work) > continue; > } > > + /* > + * In case of the shared vma, the vma->numab_state will be > + * overwritten if two or more cores observe vma->numab_state > + * is NULL at the same time. Make sure that only one core > + * allocates memory for vma->numab_state. This can prevent > + * the memory leak. > + */ > + if (!mutex_trylock(&vma->numab_state_lock)) > + continue; > + > /* Initialise new per-VMA NUMAB state. */ > if (!vma->numab_state) { > vma->numab_state = kzalloc(sizeof(struct vma_numab_state), > GFP_KERNEL); > - if (!vma->numab_state) > + if (!vma->numab_state) { > + mutex_unlock(&vma->numab_state_lock); > continue; > + } > > vma->numab_state->start_scan_seq = mm->numa_scan_seq; > > @@ -3428,6 +3440,7 @@ static void task_numa_work(struct callback_head *work) > if (mm->numa_scan_seq && time_before(jiffies, > vma->numab_state->next_scan)) { > trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_SCAN_DELAY); > + mutex_unlock(&vma->numab_state_lock); > continue; > } > > @@ -3440,6 +3453,8 @@ static void task_numa_work(struct callback_head *work) > vma->numab_state->pids_active[1] = 0; > } > > + mutex_unlock(&vma->numab_state_lock); > + > /* Do not rescan VMAs twice within the same sequence. */ > if (vma->numab_state->prev_scan_seq == mm->numa_scan_seq) { > mm->numa_scan_offset = vma->vm_end;