From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00F8CC6FD1D for ; Tue, 4 Apr 2023 16:09:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4047C6B0071; Tue, 4 Apr 2023 12:09:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B4806B0072; Tue, 4 Apr 2023 12:09:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27CC46B0074; Tue, 4 Apr 2023 12:09:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 187FC6B0071 for ; Tue, 4 Apr 2023 12:09:04 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B996A80EAB for ; Tue, 4 Apr 2023 16:08:57 +0000 (UTC) X-FDA: 80644192314.02.C52047B Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by imf24.hostedemail.com (Postfix) with ESMTP id 4C28018001A for ; Tue, 4 Apr 2023 16:08:53 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf24.hostedemail.com: domain of rongwei.wang@linux.alibaba.com designates 115.124.30.133 as permitted sender) smtp.mailfrom=rongwei.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680624535; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c6JkAeP1eCglWpPpPvj4tIGoeaXZiBQjN78ZhNtYXdM=; b=WU0fe84PsfI1RBnaZVGNhqh/vJho6jc0zWPPKQlk2cfnU7HoddZschKcH1gqCatHlvT25q /fgUDPLOaEoyqDrTEELuitOuyOSMvQ+/EciX+CJZzvWRvxn5abI09B5p+oTwHc6nqzag9+ JD8lZzm59i2C/W7UGpAsg13D2D7s/UY= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf24.hostedemail.com: domain of rongwei.wang@linux.alibaba.com designates 115.124.30.133 as permitted sender) smtp.mailfrom=rongwei.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680624535; a=rsa-sha256; cv=none; b=8TPm7YMmVJaAe5q3hdBDz0Rtcx4SBYGWWmvMvUxX2VgJvB5i8o1yo/7tmY6uKoviCWYvSf iNBqXtQXA1b3HfUq1K0JRxkM4AQ6uZ7ZsdMDPwgl4g8Ivsuto6B44jfAc422e34UK1IX/o OtaH7IU/y/rj3orWShbinuRmCzT4Hw8= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R361e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046049;MF=rongwei.wang@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0VfMLb4X_1680624527; Received: from 30.120.152.113(mailfrom:rongwei.wang@linux.alibaba.com fp:SMTPD_---0VfMLb4X_1680624527) by smtp.aliyun-inc.com; Wed, 05 Apr 2023 00:08:48 +0800 Message-ID: <6dad8c2f-b896-3cc0-26c1-37f5fff406bd@linux.alibaba.com> Date: Wed, 5 Apr 2023 00:08:47 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() Content-Language: en-US From: Rongwei Wang To: akpm@linux-foundation.org, bagasdotme@gmail.com, willy@infradead.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org References: <20230401221920.57986-1-rongwei.wang@linux.alibaba.com> <20230404154716.23058-1-rongwei.wang@linux.alibaba.com> In-Reply-To: <20230404154716.23058-1-rongwei.wang@linux.alibaba.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4C28018001A X-Rspam-User: X-Stat-Signature: 9tdwp8tcqu9huigqmgxaf3mc7h6jas83 X-HE-Tag: 1680624533-818511 X-HE-Meta: U2FsdGVkX185HVZk/kGlsogcNwnVwD4LBLvfYfpE/QmhJXgn87hR5zfcE0yflhEmsbrNmMeEutMgoaFqiJ3z1Hyjp6Ae/6+q8Uf5ARXj0xHpXSep6/JarE/ZkoE5nzxsw+768Na4/IjkyG6m1UExeZ6s3XdfcS/E/Eggs3ifkXWD0xG84rFLqtXQYmY7DUc8PaLx5U68x4Lh+Wf22L9GKpEaSC1L0ank7g47ReBQwsqAVdtjxd77taLFRm0emrVGx4lsAuKT4b3KbFVWFR5OrYW8TiMiBVGuiW7XTAm2rJn+NQ6jcuh2SrPQymojMTXbQUJMMEjGo4rCNsjll5iBoF4OIl8ti2hABjCDu3/pB3a5D5TPwcrvccF0q+EXCnLA1KFblX5UsINyPxc0y+8qveEtWFAZM2nIpe8r1soDERsnyXg6Umt4trMQQcRKZq0tkty6L9/Op5I/4VWBFS0R+nIlZ3j7kus0805QGh2S7eqRlqGMrccCI46gvB6rFgexLG6SOkC2uh6eo76KMPwGXyaPpHsvUcSs3VvQESZ9D/f/yJ0PjI6PqgGhI8OBpZwtBiqMZBeJHc9tC9CdMRoUbrv4XkSs+qn8srGl4MmY7xqDpk+rkkE84eUQwfa8VI9WJ7hAIVoGlPPgnjwxSlVy5L5zco4N5nyKXrvAOWg/BwBvTQ2MeYxhmlelG9L4SO0iguWIjLAoQlFcy/oAlYiYaw0NMnr/Qr0bIWnxo5I7bwsmHnh6yKQ/5rUPWhC//NjewmapyBsjx8SR3l+ly6v7Au6iCLXaA+Y8l+2KDg6BfxnYGS9wbFgy1PWnv1Jnyh4ZajNUZmmRL7KYk3I1YZjuZhPWXu3g3o/lplLwIGysWfgqr5hiY96+Gm0OBpgYkGK03ZCwQvAIpJmawxeSZ4WPDY1A4qLQl05FE44Rt8V4Tiqx3i3e9ztx935gvjJ9ZN92qVmnZEhU9y3J2ODDAcY cXADHJ1C oQsluJqs8WUfxHexMaTfmrogURbhontYHWYAZD3QsP5DK3S3RX2hXs10V0+CkRTAl+nKhX15FBc9d4qn5KQkyaAIKHG2FJJHJk9ApRLWFUEsDVmdtTvZ4ysJw9BgErOH6K1a3CIi93Fio94MGhHm57AGkpKgiBoU631OITCMOwh8s5hsrCSkeWn4fA/XhyRa+tMaAmPgVLtjU38Zqnt41TNXaPt2HS7+KVv8SFG7BEv9t4EJva6nexrU7Ilb+HKveH6R9kMQGPNnH0E0H3LWmplTV1ioIbVHmvaV6FWMYjV6NXy9by622qLf2rIE9IcPxxgKQqgeVY3DheyY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello I have fix up some stuff base on Patch v1. And in order to help all readers and reviewers to reproduce this bug, share a reproducer here: swap_bomb.sh #!/usr/bin/env bash stress-ng -a 1 --class vm -t 12h --metrics --times -x bigheap,stackmmap,mlock,vm-splice,mmapaddr,mmapfixed,mmapfork,mmaphuge,mmapmany,mprotect,mremap,msync,msyncmany,physpage,tmpfs,vm-addr,vm-rw,brk,vm-segv,userfaultfd,malloc,stack,munmap,dev-shm,bad-altstack,shm-sysv,pageswap,madvise,vm,shm,env,mmap --verify -v & stress-ng -a 1 --class vm -t 12h --metrics --times -x bigheap,stackmmap,mlock,vm-splice,mmapaddr,mmapfixed,mmapfork,mmaphuge,mmapmany,mprotect,mremap,msync,msyncmany,physpage,tmpfs,vm-addr,vm-rw,brk,vm-segv,userfaultfd,malloc,stack,munmap,dev-shm,bad-altstack,shm-sysv,pageswap,madvise,vm,shm,env,mmap --verify -v & stress-ng -a 1 --class vm -t 12h --metrics --times -x bigheap,stackmmap,mlock,vm-splice,mmapaddr,mmapfixed,mmapfork,mmaphuge,mmapmany,mprotect,mremap,msync,msyncmany,physpage,tmpfs,vm-addr,vm-rw,brk,vm-segv,userfaultfd,malloc,stack,munmap,dev-shm,bad-altstack,shm-sysv,pageswap,madvise,vm,shm,env,mmap --verify -v & stress-ng -a 1 --class vm -t 12h --metrics --times -x bigheap,stackmmap,mlock,vm-splice,mmapaddr,mmapfixed,mmapfork,mmaphuge,mmapmany,mprotect,mremap,msync,msyncmany,physpage,tmpfs,vm-addr,vm-rw,brk,vm-segv,userfaultfd,malloc,stack,munmap,dev-shm,bad-altstack,shm-sysv,pageswap,madvise,vm,shm,env,mmap --verify -v madvise_shared.c #include #include #include #include #define MSIZE (1024 * 1024 * 2) int main() {         char *shm_addr;         unsigned long i;         while (1) {                 // Map shared memory segment                 shm_addr =                     mmap(NULL, MSIZE, PROT_READ | PROT_WRITE,                          MAP_SHARED | MAP_ANONYMOUS, -1, 0);                 if (shm_addr == MAP_FAILED) {                         perror("Failed to map shared memory segment");                         exit(EXIT_FAILURE);                 }                 for (i = 0; i < MSIZE; i++) {                         shm_addr[i] = 1;                 }                 // Advise kernel on usage pattern of shared memory                 if (madvise(shm_addr, MSIZE, MADV_PAGEOUT) == -1) {                         perror                             ("Failed to advise kernel on shared memory usage");                         exit(EXIT_FAILURE);                 }                 for (i = 0; i < MSIZE; i++) {                         shm_addr[i] = 1;                 }                 // Advise kernel on usage pattern of shared memory                 if (madvise(shm_addr, MSIZE, MADV_PAGEOUT) == -1) {                         perror                             ("Failed to advise kernel on shared memory usage");                         exit(EXIT_FAILURE);                 }                 // Use shared memory                 printf("Hello, shared memory: 0x%lx\n", shm_addr);                 // Unmap shared memory segment                 if (munmap(shm_addr, MSIZE) == -1) {                         perror("Failed to unmap shared memory segment");                         exit(EXIT_FAILURE);                 }         }         return 0; } The bug will reproduce more quickly (about 2~5 minutes) if concurrent more swap_bomb.sh and madvise_shared. Thanks. change log: v1 -> v2 * fix up some commits and add assert_spin_locked(&p->lock) inside __delete_from_avail_list() (suggested by Matthew Wilcox and Bagas Sanjaya) On 4/4/23 11:47 PM, Rongwei Wang wrote: > The si->lock must be held when deleting the si from > the available list. Otherwise, another thread can > re-add the si to the available list, which can lead > to memory corruption. The only place we have found > where this happens is in the swapoff path. This case > can be described as below: > > core 0 core 1 > swapoff > > del_from_avail_list(si) waiting > > try lock si->lock acquire swap_avail_lock > and re-add si into > swap_avail_head > > acquire si->lock but > missing si already be > added again, and continuing > to clear SWP_WRITEOK, etc. > > It can be easily found a massive warning messages can > be triggered inside get_swap_pages() by some special > cases, for example, we call madvise(MADV_PAGEOUT) on > blocks of touched memory concurrently, meanwhile, run > much swapon-swapoff operations (e.g. stress-ng-swap). > > However, in the worst case, panic can be caused by the > above scene. In swapoff(), the memory used by si could > be kept in swap_info[] after turning off a swap. This > means memory corruption will not be caused immediately > until allocated and reset for a new swap in the swapon > path. A panic message caused: > (with CONFIG_PLIST_DEBUG enabled) > > ------------[ cut here ]------------ > top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a > prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d > next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a > WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70 > Modules linked in: rfkill(E) crct10dif_ce(E)... > CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+ > Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015 > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) > pc : plist_check_prev_next_node+0x50/0x70 > lr : plist_check_prev_next_node+0x50/0x70 > sp : ffff0018009d3c30 > x29: ffff0018009d3c40 x28: ffff800011b32a98 > x27: 0000000000000000 x26: ffff001803908000 > x25: ffff8000128ea088 x24: ffff800011b32a48 > x23: 0000000000000028 x22: ffff001800875c00 > x21: ffff800010f9e520 x20: ffff001800875c00 > x19: ffff001800fdc6e0 x18: 0000000000000030 > x17: 0000000000000000 x16: 0000000000000000 > x15: 0736076307640766 x14: 0730073007380731 > x13: 0736076307640766 x12: 0730073007380731 > x11: 000000000004058d x10: 0000000085a85b76 > x9 : ffff8000101436e4 x8 : ffff800011c8ce08 > x7 : 0000000000000000 x6 : 0000000000000001 > x5 : ffff0017df9ed338 x4 : 0000000000000001 > x3 : ffff8017ce62a000 x2 : ffff0017df9ed340 > x1 : 0000000000000000 x0 : 0000000000000000 > Call trace: > plist_check_prev_next_node+0x50/0x70 > plist_check_head+0x80/0xf0 > plist_add+0x28/0x140 > add_to_avail_list+0x9c/0xf0 > _enable_swap_info+0x78/0xb4 > __do_sys_swapon+0x918/0xa10 > __arm64_sys_swapon+0x20/0x30 > el0_svc_common+0x8c/0x220 > do_el0_svc+0x2c/0x90 > el0_svc+0x1c/0x30 > el0_sync_handler+0xa8/0xb0 > el0_sync+0x148/0x180 > irq event stamp: 2082270 > > Now, si->lock locked before calling 'del_from_avail_list()' > to make sure other thread see the si had been deleted > and SWP_WRITEOK cleared together, will not reinsert again. > > This problem exists in versions after stable 5.10.y. > > Cc: stable@vger.kernel.org > Tested-by: Yongchen Yin > Signed-off-by: Rongwei Wang > --- > mm/swapfile.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 62ba2bf577d7..2c718f45745f 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -679,6 +679,7 @@ static void __del_from_avail_list(struct swap_info_struct *p) > { > int nid; > > + assert_spin_locked(&p->lock); > for_each_node(nid) > plist_del(&p->avail_lists[nid], &swap_avail_heads[nid]); > } > @@ -2434,8 +2435,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) > spin_unlock(&swap_lock); > goto out_dput; > } > - del_from_avail_list(p); > spin_lock(&p->lock); > + del_from_avail_list(p); > if (p->prio < 0) { > struct swap_info_struct *si = p; > int nid;