From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28FD7C636D7 for ; Sat, 11 Feb 2023 14:04:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 58FF16B0072; Sat, 11 Feb 2023 09:04:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 53FF76B0073; Sat, 11 Feb 2023 09:04:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 408336B0074; Sat, 11 Feb 2023 09:04:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 299C46B0072 for ; Sat, 11 Feb 2023 09:04:52 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id F1DB3401A3 for ; Sat, 11 Feb 2023 14:04:51 +0000 (UTC) X-FDA: 80455181982.21.7A257DA Received: from wp530.webpack.hosteurope.de (wp530.webpack.hosteurope.de [80.237.130.52]) by imf09.hostedemail.com (Postfix) with ESMTP id D944E14000C for ; Sat, 11 Feb 2023 14:04:48 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf09.hostedemail.com: domain of regressions@leemhuis.info designates 80.237.130.52 as permitted sender) smtp.mailfrom=regressions@leemhuis.info ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676124289; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5WbVT66yf5n+DWnrgjSX22nrjD0csjuoWzrrCDb7J50=; b=MufMtXbWBNi+5TbDr6KMEVBYHLJOh28gp+TUlpZi0eAnYyOoDdaw83DcdW8AztPfRi6R+o nFop018S+NAW4RMu/vAJQLDEujFBoeLLE0Tchp6R6nxT7acKCnJLiqIMqSckrDzGOR8Oeo a6jsa/a/uw6HpBHo53pUWL0wiO2GKAY= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf09.hostedemail.com: domain of regressions@leemhuis.info designates 80.237.130.52 as permitted sender) smtp.mailfrom=regressions@leemhuis.info ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676124289; a=rsa-sha256; cv=none; b=EG7Uj4jqaQPP9+OXCn12Kxd/0wduYp9g1c8qZm67cGH10nRjv6SrAie6yInxcgxcKbU6KR 08/tJ5f7atZs6mYf+I+LWGBEaA9iSgXIxxDm73m+ScxdRsx/yFNo7KTaupp3iLiZZZViYq gphT7NARJIHGjE+sZmao3BrjlmKXp9U= Received: from [2a02:8108:8980:2478:8cde:aa2c:f324:937e]; authenticated by wp530.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) id 1pQqUe-0007sk-UO; Sat, 11 Feb 2023 15:04:44 +0100 Message-ID: <6fa20ee8-7471-017d-55c1-e4dbe127b81a@leemhuis.info> Date: Sat, 11 Feb 2023 15:04:44 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: [PATCH] Fix page corruption caused by racy check in __free_pages Content-Language: en-US, de-DE To: David Chen , "linux-kernel@vger.kernel.org" Cc: Andrew Morton , "Matthew Wilcox (Oracle)" , "linux-mm@kvack.org" , "stable@vger.kernel.org" , Linux kernel regressions list References: From: "Linux regression tracking #adding (Thorsten Leemhuis)" Reply-To: Linux regressions mailing list In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-bounce-key: webpack.hosteurope.de;regressions@leemhuis.info;1676124289;fb63a574; X-HE-SMSGID: 1pQqUe-0007sk-UO X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: D944E14000C X-Stat-Signature: kz55cmzeyfm49bbbb4c43qupdyrfu3rz X-HE-Tag: 1676124288-784464 X-HE-Meta: U2FsdGVkX18VCo7TvRt4DDd8k8kkU2295mMOU2q7Ff/Q1TJmL8NtEudvCbVat4lq8egRKT34he6FNvKaUFBtvYAuxgpYr+Ghln4tWS6v6A9n6PsA8e6AGxOin06Af+rUl0ElmV2eWerLfoXI4vA9gBwDKscmOnzVMvl26OLQyydPyqkEVDNH3jXM4g6i2pekd0PkrydKnnXKweiRLmUztrAcTtfys0zEDQbE6yeodPU6WYnOkzPW766Fmz4lZs9orwpylmzPTzXoMzlygzQ1PdI3vtZhDeZhakwOTxaIxt8itmeoqPtELbEjuwBfsxjEOcWLcbU2fIKQZVyqeeBvasdaGtqb9RTuOL+EYkbcVgHH7R2I4i7HSPJNLc9oX+UA2UE+QuUd6sbeovIGqod/IT2DssVbv9cT87rmfDRVZ+qWujRawkd7eqUmLbHouFSv63FaH/eWYGkgnPr2YuHUDYnSF7J2x0aeJ9dIDEWac/KjefK25EGV1/tEDNwHJLZVJ+Yblso2PJVE2lM2Mt5+/IKNn48WMAIr4evjUhGPF/JE0exfR6s4QzlUM4u5ehY/3VB/7EOP6bPRLWtw9gBANh7/96vQwNHhbTXmGaDuJIsK/MbmyEyv8uRkQtMRtRg0CkxAEby2Mn2T9oHwtG92pSl5G10NdOk/lDOqwmx8J9eL3PoLnIeSniW9L7usz6/SNMlPY/IoWlEgerx/1D9iDPAz9OS5ajM6LLpC9isafTVTDFlecfIt+cS592PoX2tv+NlyT8aALNa9kh1EDR+uYicXRlrutgnYZNVZeCwChF6/rLTg/p0GCZMpcWxpIqB/s8Af7CuovR2I37LYodS8uzzV4WxUrRmcNGY4V3h7L8oZK69UPO6jeOgSNJmtAjxsNh/J/N+84/ksMUinexYPcjsXl+7cSdctt3ZsOV65nVwScqDnRv+6ZYyr5KB8/u/bdRwEAhR951s2NJ9FJAi tu4ZVh/U GZBpAwx0Ny1VFBmg+AaAM3tk3uG7bLIQdWuEanSb9+y2xJyfY+W+vxgoOqus7Xw0+1S+8VzK9CDWyyj/OGNyY5bhGQadjlJsk9AgHddpoXmHyCoO9vxvpSVH10qqXfm1JxdzoMEzEqhl9hiOIonSskGDMq7kMJmOayzErURPl9hQMfvdCUm2gMRQhuVKXgL7K+z7zMe+TOFuaHFvAov3fkwmy6t5vbWVq4EU/A1fLCEaXOEYC0P9062lIkRCPNiv9C+zFh1O097Eq6rbIDLOOQft7nYEEcaWWhTV+p10LNE1uMx+hQdbBkXoOCDBFF1FxFZEXE0lQjXhP426QZjhtfTIMI5kkNcj29ZgnSDFalad+9lDhSzR0Xxa+4db0gyTJQGnpVUqvAlofge7ahUFQUJP9tzmB8rS/X2yuJXVMDuv4xf23UXVhVNrltgoaL9CneFJSOxRdN41wNXaamjKcUKinGuWLKzjFcYDg8VN2OSVNhZUD4BHYrojxPqOgE/5tUPycFpGYzgeyLhfa/81VCx9c0bjLiY7hvrfPViUfqFHIJJRiGGdkJydX4I5Dll9u/RIK2RP00slDF7M2JM1I8W8MGndku9E6l9XvVG/q66CH3PMjzciK2OKIJndoKdxKB0gK0kaMT5n/y9T2zVfpvUhs+hXAwo7gh2+nwO6iW6GGny8RXuNJjHD0PwLehotL4bciiTsm6GK+avzB+KIV3vUDjeJTrHilcZhoz8G+2oJV7Kzr1r8XSKPU+g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: [TLDR: I'm adding this report to the list of tracked Linux kernel regressions; the text you find below is based on a few templates paragraphs you might have encountered already in similar form. See link in footer if these mails annoy you.] On 09.02.23 18:48, David Chen wrote: > When we upgraded our kernel, we started seeing some page corruption like > the following consistently: > > BUG: Bad page state in process ganesha.nfsd pfn:1304ca > page:0000000022261c55 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x1304ca > flags: 0x17ffffc0000000() > raw: 0017ffffc0000000 ffff8a513ffd4c98 ffffeee24b35ec08 0000000000000000 > raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000 > page dumped because: nonzero mapcount > CPU: 0 PID: 15567 Comm: ganesha.nfsd Kdump: loaded Tainted: P B O 5.10.158-1.nutanix.20221209.el7.x86_64 #1 > Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 > Call Trace: > dump_stack+0x74/0x96 > bad_page.cold+0x63/0x94 > check_new_page_bad+0x6d/0x80 > rmqueue+0x46e/0x970 > get_page_from_freelist+0xcb/0x3f0 > ? _cond_resched+0x19/0x40 > __alloc_pages_nodemask+0x164/0x300 > alloc_pages_current+0x87/0xf0 > skb_page_frag_refill+0x84/0x110 > ... > > Sometimes, it would also show up as corruption in the free list pointer and > cause crashes. > > After bisecting the issue, we found the issue started from e320d3012d25: > > if (put_page_testzero(page)) > free_the_page(page, order); > else if (!PageHead(page)) > while (order-- > 0) > free_the_page(page + (1 << order), order); > > So the problem is the check PageHead is racy because at this point we > already dropped our reference to the page. So even if we came in with > compound page, the page can already be freed and PageHead can return > false and we will end up freeing all the tail pages causing double free. > > Fixes: e320d3012d25 ("mm/page_alloc.c: fix freeing non-compound pages") > Cc: Andrew Morton > Cc: Matthew Wilcox (Oracle) > Cc: linux-mm@kvack.org > Cc: stable@vger.kernel.org > Signed-off-by: Chunwei Chen Thanks for the report and the patch. To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot: #regzbot ^introduced e320d3012d25 #regzbot title mm: page corruption caused by racy check in __free_pages #regzbot ignore-activity Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.