From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E00D0C54E41 for ; Wed, 6 Mar 2024 16:19:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 71DE66B0074; Wed, 6 Mar 2024 11:19:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6CE936B007D; Wed, 6 Mar 2024 11:19:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5BCA26B007E; Wed, 6 Mar 2024 11:19:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 4CB7E6B0074 for ; Wed, 6 Mar 2024 11:19:55 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id DF2CC1604CA for ; Wed, 6 Mar 2024 16:19:54 +0000 (UTC) X-FDA: 81867125508.10.DF2C446 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf09.hostedemail.com (Postfix) with ESMTP id 9830C140002 for ; Wed, 6 Mar 2024 16:19:52 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709741993; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D46MTLRKH6uUF2r9hNCRJGP1ZFigXp8xraCG+i5eu4w=; b=oWOChMwiYmHQZkgbp0VUKT22U2NEdL7mU9hswzCKOb9e3GQ4kTobe692MwOtBiaNWhqRRn WE4P1/jIk0AbrEJLqDcnGq2+xIKhZPIvwkJt5uC26VFP3S+RkGR7XCNpHSkkOXez1NDgAq YBOLD2reUcVajZ1CEPAwnu7wRolTZR0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709741993; a=rsa-sha256; cv=none; b=ITAJ8n1LGc4oBbAMEqmUo3yHrOEjVqm+CpTc6TKrxcgmli5FjeVOFJbqmuAQetXiQmsQ10 HFjsRs4gRSZ/PzB0M4JwNNZl0y7hIVCiFaSpjsWNNAL0fbLZAnGI8LO6RKDQbkItbbLJTw tLABodz5RFnNP1mBDMh3Sqz4xmPHhwc= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2CEC51FB; Wed, 6 Mar 2024 08:20:28 -0800 (PST) Received: from [10.57.68.241] (unknown [10.57.68.241]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 495413F73F; Wed, 6 Mar 2024 08:19:50 -0800 (PST) Message-ID: <85cc26ed-6386-4d6b-b680-1e5fba07843f@arm.com> Date: Wed, 6 Mar 2024 16:19:48 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 10/18] mm: Allow non-hugetlb large folios to be batch processed Content-Language: en-GB To: Matthew Wilcox Cc: Andrew Morton , linux-mm@kvack.org References: <20240227174254.710559-1-willy@infradead.org> <20240227174254.710559-11-willy@infradead.org> <367a14f7-340e-4b29-90ae-bc3fcefdd5f4@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 9830C140002 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: kj6m53hc6jdx8755kbec6u5z449j8ahn X-HE-Tag: 1709741992-579387 X-HE-Meta: U2FsdGVkX19Mt10M2957w5NfKN/3vrh1I46m/5eHu6fLEn314nQ6XfaqEqLjYdQa3MsqYEKTeNNUXruJLH0P0O8BMZ0NVa1J11dIyby+scjJ3p/gYpwOsvXpuw21Ww8hyyEJQsE3L4MMlWRU+v+Zw26XObnNnouKWXaoBANfDk7WL7Q5rODBjBA7ApgrVD1p0ypVxBNJ6RDzefv5dLYwCyzSlwdXUUXfn58Mzq381Uzj2rOL2hJ1y6r/FcgzHgkodnxcyZ4/OhltPqjahsIZrdY+xM9BgqilF4RvxyYvW12M+yfMW4CozHXT+01zEfPfFjYnq5bEXpnutCgYz2L4s0nZxdz/V6ETzpZcHXjdrqoPLKGUewqdvuB6bULFo8kl7HY0V6bHDSGZD2fXaVqHK9FVgUAcJ7UhiObeuQ2yRbbH9nN3XFrhaobxMd43dpG+h3Hz2kAtlhjIvH+xo5wPW+w2ymHTeH+76mbdBmrzwEfSqOjdxYaYV+3z8QL1hFEIFj85cC3HR8iVPMWHt0UXKdD36eD7uOEQoZue5WVdowfI0DCHtai9y/wGAKiHw6ctdcFz8U79yqwIik3YLi2dbyGAJZ5Cx5DtCtAr6piLRvFtMKLbSgOyvET/CKHkEsXHVv8D/aZ04AWGWPHpTt2dYeMwXh5usDg+F4LpQf/iIIJfyjxMd488lr12Ie+CxXeSHiMYOUn/21hNoCb5N42qBT3P+frUO6u56TCeZtc8xbFNgrqQZie9DgmhMyGmDJupICHgfjVmqxWMO/ec0PJ/YFEL0ghuy3yLWTlQe82bdxGkveh00WvOgMupDDYAB7/HI5Um5CrM8GvS45PZ/cEmfydmFm/zenWF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 06/03/2024 16:09, Matthew Wilcox wrote: > On Wed, Mar 06, 2024 at 01:42:06PM +0000, Ryan Roberts wrote: >> When running some swap tests with this change (which is in mm-stable) >> present, I see BadThings(TM). Usually I see a "bad page state" >> followed by a delay of a few seconds, followed by an oops or NULL >> pointer deref. Bisect points to this change, and if I revert it, >> the problem goes away. > > That oops is really messed up ;-( We're clearly got two CPUs oopsing at > the same time and it's all interleaved. That said, I can pick some > nuggets out of it. > >> [ 76.239466] BUG: Bad page state in process usemem pfn:2554a0 >> [ 76.240196] kernel BUG at include/linux/mm.h:1120! > > These are the two different BUGs being called simultaneously ... > > The first one is bad_page() in page_alloc.c and the second is > put_page_testzero() > VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); > > I'm sure it's significant that both of these are the same page (pfn > 2554a0). Feels like we have two CPUs calling put_folio() at the same > time, and one of them underflows. It probably doesn't matter which call > trace ends up in bad_page() and which in put_page_testzero(). > > One of them is coming from deferred_split_scan(), which is weird because > we can see the folio_try_get() earlier in the function. So whatever > this folio was, we found it on the deferred split list, got its refcount, > moved it to the local list, either failed to get the lock, or > successfully got the lock, split it, unlocked it and put it. > > (I can see this was invoked from page fault -> memcg shrinking. That's > probably irrelevant but explains some of the functions in the backtrace) > > The other call trace comes from migrate_folio_done() where we're putting > the _source_ folio. That was called from migrate_pages_batch() which > was called from kcompactd. > > Um. Where do we handle the deferred list in the migration code? > > > I've also tried looking at this from a different angle -- what is it > about this commit that produces this problem? It's a fairly small > commit: > > - if (folio_test_large(folio)) { > + /* hugetlb has its own memcg */ > + if (folio_test_hugetlb(folio)) { > if (lruvec) { > unlock_page_lruvec_irqrestore(lruvec, flags); > lruvec = NULL; > } > - __folio_put_large(folio); > + free_huge_folio(folio); > > So all that's changed is that large non-hugetlb folios do not call > __folio_put_large(). As a reminder, that function does: > > if (!folio_test_hugetlb(folio)) > page_cache_release(folio); > destroy_large_folio(folio); > > and destroy_large_folio() does: > if (folio_test_large_rmappable(folio)) > folio_undo_large_rmappable(folio); > > mem_cgroup_uncharge(folio); > free_the_page(&folio->page, folio_order(folio)); > > So after my patch, instead of calling (in order): > > page_cache_release(folio); > folio_undo_large_rmappable(folio); > mem_cgroup_uncharge(folio); > free_unref_page() > > it calls: > > __page_cache_release(folio, &lruvec, &flags); > mem_cgroup_uncharge_folios() > folio_undo_large_rmappable(folio); > > So have I simply widened the window for this race Yes that's the conclusion I'm coming to. I have reverted this patch and am still seeing what looks like the same problem very occasionally. (I was just about to let you know when I saw this reply). It's much harder to reproduce now... great. The original oops I reported against your RFC is here: https://lore.kernel.org/linux-mm/eeaf36cf-8e29-4de2-9e5a-9ec2a5e30c61@arm.com/ Looks like I had UBSAN enabled for that run. Let me turn on all the bells and whistles and see if I can get it to repro more reliably to bisect. Assuming the original oops and this are related, that implies that the problem is lurking somewhere in this series, if not this patch. I'll come back to you shortly... >, whatever it is > exactly? Something involving mis-handling of the deferred list? >