From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6419C001DB for ; Thu, 3 Aug 2023 15:06:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 71B59280272; Thu, 3 Aug 2023 11:06:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6CB9C28022C; Thu, 3 Aug 2023 11:06:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5936A280272; Thu, 3 Aug 2023 11:06:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4A29C28022C for ; Thu, 3 Aug 2023 11:06:44 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E792A412B8 for ; Thu, 3 Aug 2023 15:06:43 +0000 (UTC) X-FDA: 81083120286.17.DACEB0E Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf30.hostedemail.com (Postfix) with ESMTP id 263B780065 for ; Thu, 3 Aug 2023 15:04:27 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf30.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691075069; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MX4xB+zNzMtwXEWB6PpcwoNre782q2gVsUbkKLEbz+A=; b=ekUM1hFa2OL9Nds8/39LekzztcvKK0m3zLbG8GmGBnnQsmjTXf/oNtv5Th1LZjsCe6Z4jq foI5hHx5cL1Ookdw6OpX0LFltd3QMheNFiTHxJ93HnVng/5CqoE7x+Xnjp1mI51PPGVGFX 4PYTItZAnX9PDFOyzzM7JesGAi1/FJo= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf30.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691075069; a=rsa-sha256; cv=none; b=IR7xdyvbedQaUlN/M2kvwzHwgImE8/z/NXsv7kaMM0Fz9VThoxC/Oj/iwQguBj/A8UqBcG 57+bxkWLO/0TOsGal5BTAMARzwNf1me+uHdvfDyvxMvTmr9A7jDdlfmYlDWYALnyUs3g7N N9oG2IzEwTBib2QyxNYgM8J4m6kYpzc= Received: from dggpemm100001.china.huawei.com (unknown [172.30.72.54]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4RGsVG3JpXzNmZN; Thu, 3 Aug 2023 23:00:54 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemm100001.china.huawei.com (7.185.36.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Thu, 3 Aug 2023 23:04:20 +0800 Message-ID: <526b8b11-9d7e-0980-f7c8-6ad4222e2f92@huawei.com> Date: Thu, 3 Aug 2023 23:04:18 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.10.1 Subject: Re: Fwd: crash/hang in mm/swapfile.c:718 add_to_avail_list when exercising stress-ng Content-Language: en-US To: Aaron Lu , Bagas Sanjaya , Colin Ian King CC: Andrew Morton , Linus Torvalds , Linux Kernel Mailing List , Linux Memory Management List References: <43765f2d-f486-8b00-7fb9-9eaea5045bfe@gmail.com> <20230803060646.GA87850@ziqianlu-dell> <20230803134106.GA130558@ziqianlu-dell> From: Kefeng Wang In-Reply-To: <20230803134106.GA130558@ziqianlu-dell> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggpemm100001.china.huawei.com (7.185.36.93) X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 263B780065 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: tt8i3iaq7if9gaeirz47byje7mt9th6b X-HE-Tag: 1691075067-108731 X-HE-Meta: U2FsdGVkX1/cJefAsRBwyqzlG45/vE2VLNAwupBuxevXt7f4CUUo7R+zYOuwtN7e/KdSBrWGArYafPBZM/6MlgZ5QphWzsaBoPPZ24f9kYfcZ+wPfSE9wibFUrfBwMg2aPr8vJ02kbr2CCDUBeQWRnFIcFnMuaFgI0R0H1eO7INB7VwScTjNkaB60GM5uvqsDQ6t2HQNI9fC+gP7T/gjvHpkwzBb/bTGu8KndDOykfV1VqxrX9bkUL7IfBnjDm2WDDqIRmYgeHob8PTXJ6bplGQ5TXu8wsQFLAtaT2xhNaoWMdPAwCWe5nfYnDN97YSxaT72b3uaZ3yuLMANqL1aWsUsRlCP3xbQRXlISjbs1eaUWSuCB7GYBY/edRgn1aZkPM7jgwJGtCSROPKGI21ULnrq+eEcYFGVKuxwJTmHKBsDr6dc4/GPQAP2+iaVp1w9ZxeIxeJWTMOjNf4iWplTWttvPdRe1H6yoloOt+xDSV0wrRwRh5QvVEfI1vFAdzkDMPuNOAaXCDZAk+fr21sTpj++ehnAcUz8ssf/n3GhEFau0aERblQYOWNekGTi7pwmdBlsmyVVRUoQ+rxlJJMlml3l8mXjwSSyZuFG1U0DN/mdhrv6benafnKq/b0Wn/cdGLrkBVzFWwa3r04PlZYIw+jVTNRMaQiuH/tXDB+duTDXcf1YBFr4b3FffFyqmu40ImT6ga3BM8xmBkse7A4Fc7HIHKbFsKTUgbd52akEgFvN2vXmWQSfwLD28+jIA8CrFmBsJrFj3GYkU18qAIqTK4e5UQKxRvZij6n82T7G2cg41S1pB37A0nIZmJmAaybj9wtqnf0yuWEUIaBTby1XSyQGLekMsW8/usyspDa9TDTrM8abc9psRMRE2pqOvZLrCW/h/n2PBYEdk2g32VCkbFXZSKqNaY8+mfkQOtd3M3AfdUz5cYUdGgCL1PDMgbpxwYK2Vv/ZZ94VkUwSbY2 0StkZiaO IoM2aWej7TQSL4rB/WBK0B/tIn9ThPCjCE5pm4klRKuDeou9PBtnGerNodbrS8bcj/cfTUk/2hDAW5tU6Ni7dCkGQsid7wS+wMTIxsP7F/xSL9vJFFeTx5cSQ2/20RG7UWFjWJtkyy02F8f2mSN6Ok0Sag323KUygjq/SQTHrwct+0DDMyK4R8lgfoRbnUIjAGdfujipe/jWv2h0z/w/+v1rLusPaWXLCHAePWwrx0pi61bzaW9nY2xGy5DkqMEF3mTsHiyWVAxMojyRjj6pQ0hj87EJ0XHAYPR10pD3dj5MIrhA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/8/3 21:41, Aaron Lu wrote: > On Thu, Aug 03, 2023 at 02:06:46PM +0800, Aaron Lu wrote: >> On Wed, Aug 02, 2023 at 07:54:38PM +0700, Bagas Sanjaya wrote: >>> Hi, >>> >>> I notice a bug report on Bugzilla [1]. Quoting from it: >>> >>>> How to reproduce: >>>> >>>> Had 24 CPU Alderlake 16GB debian12 system running with default kernel (from makecondig) on 6.5-rc4, exercised with no swap to start with. >>>> >>>> using stress-ng tip commit 0f2ef02e9bc5abb3419c44be056d5fa3c97e0137 >>>> (see https://github.com/ColinIanKing/stress-ng ) >>>> >>>> build and run stress-ng for say 60 minutes: >>>> >>>> ./stress-ng --cpu-online 50 --brk 50 --swap 50 --vmstat 1 -t 60m >>>> >>>> Will hang in mm/swapfile.c:718 add_to_avail_list+0x93/0xa0 >>>> >>>> See attached file for an image of the console on the hang (I'm trying to get the full stack dump). >>> >>> See Bugzilla for the full thread and attached console image. >>> >>> FWIW, I have to forward this bug report to the mailing lists because >>> Thorsten noted that many developers don't take a look on Bugzilla >>> (see the BZ thread). >> >> Thanks. >> >> I can reproduce this issue using below cmdline: >> $ sudo ./stress-ng --brk 50 --swap 5 --vmstat 1 -t 60m >> >> I'll investigate what is happening. > > Hi Colin, > > Can you try the below diff on top of v6.5-rc4? It works for me here > although I got the warn in a different place in get_swap_pages(): > > WARN(!si->highest_bit, > "swap_info %d in list but !highest_bit\n", > si->type); > > I think the warn you got in add_to_avail_list() due to the swap device > is already in the list is similar, see below explanation. > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 8e6dde68b389..cb7e93ec1933 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -2330,7 +2330,8 @@ static void _enable_swap_info(struct swap_info_struct *p) > * swap_info_struct. > */ > plist_add(&p->list, &swap_active_head); > - add_to_avail_list(p); > + if (p->highest_bit) > + add_to_avail_list(p); > } There is a patch in next, commit bdfc7028681ddbce5ab08f4888d157a981060544 Author: Ma Wupeng Date: Tue Jun 27 20:08:33 2023 +0800 swap: stop add to avail list if swap is full > > static void enable_swap_info(struct swap_info_struct *p, int prio, > > The finding is, if a swap device failed to be swapoff, then it will be > reinsert_swap_info() -> _enable_swap_info() -> add_to_avail_list(). The > problem is, this swap device may run out of space with its highest_bit > being 0 and shouldn't be added to avail list. In your case, once its > highest_bit becomes non-zero, it will go through add_to_avail_list() > and since it's already in the list, thus the warn. > > If it works for you, I'll prepare a patch. Thanks. >