From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29D81C433FE for ; Tue, 18 Oct 2022 13:39:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F3DD6B0072; Tue, 18 Oct 2022 09:39:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 77CAC6B0075; Tue, 18 Oct 2022 09:39:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61D866B0078; Tue, 18 Oct 2022 09:39:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 52D9B6B0072 for ; Tue, 18 Oct 2022 09:39:31 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 0575D120DB6 for ; Tue, 18 Oct 2022 13:39:30 +0000 (UTC) X-FDA: 80034177342.17.9348637 Received: from wout5-smtp.messagingengine.com (wout5-smtp.messagingengine.com [64.147.123.21]) by imf02.hostedemail.com (Postfix) with ESMTP id 3DA8380037 for ; Tue, 18 Oct 2022 13:39:29 +0000 (UTC) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id 5AC9C32003F4; Tue, 18 Oct 2022 09:39:28 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Tue, 18 Oct 2022 09:39:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm1; t=1666100367; x=1666186767; bh=Dc GUg5ksLE5oyQ0xqWnxKqjgf4z8pJwiNSklauNTROE=; b=jz3Dyi0EXl3pzQfxnX 16Keq/RNhqFSQNWXfD+WegXggtRCxqXQiNTQ2TluZe/xEFz67Kclpox80F72eAzX u8DN6KLcqAWga5sjzyG2wi/v7voyEHMP233BveTgxFsb8uf5TeWgHrZpRwZTXQke aZ1Y4/Ors6FTlOBvrteOQ2ojr6YrLjPnQekFURG+Lv+LuSDLY7XL1ugcm68I05t/ Zo42edkwRLTmovzXl5arUcpAbsgnp0plZoPqh0Te9vz/F19CiXKdR7Jf6UtG6A8A vfo9/1QdCSHaiosckdHA4ApU1TYPwPJnUTEfdA2EOT5/ocNg/Dbv62npZVJeaPbq d94Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1666100367; x=1666186767; bh=DcGUg5ksLE5oyQ0xqWnxKqjgf4z8 pJwiNSklauNTROE=; b=BuNu/zjkV7QvHO8vgxVgM95Uc4amiTVe14EkrVZ5hLWR ablwO0DM1DhqbpOKhn31ZlEWMNa2wi+z5RG4HZ3uHgjX3zaTGnQ5x68LtA92lHBr a5IDdl7+m0IvjyC+10S8WILOvf1/hTsUh80xc7KnPzHAlOf1XPhDT694qWWMbL9f QdadbpNH89uoNa+1sS+EED2xrcEcttB2dxPZVyL9dixvc+hN1nHUbFUb5uS0N5Wg Jo65rKTKsV36/EE6vxXLea4qw3O5b+eTvwYYYMHvxnNCutaF9CIv0KUc/xExTnZr 6bka9CzWXH59IWJ1eklUtjqOhabfYm9uxtGnMApQeg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrfeelvddgfeekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvfevuffkfhggtggujgesthdttddttddtvdenucfhrhhomhepfdfmihhr ihhllhcutedrucfuhhhuthgvmhhovhdfuceokhhirhhilhhlsehshhhuthgvmhhovhdrnh grmhgvqeenucggtffrrghtthgvrhhnpeelgffhfeetlefhveffleevfffgtefffeelfedu udfhjeduteeggfeiheefteehjeenucffohhmrghinhepkhgvrhhnvghlrdhorhhgnecuve hluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirhhilhhl sehshhhuthgvmhhovhdrnhgrmhgv X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 18 Oct 2022 09:39:27 -0400 (EDT) Received: by box.shutemov.name (Postfix, from userid 1000) id 42B211046BC; Tue, 18 Oct 2022 16:39:23 +0300 (+03) Date: Tue, 18 Oct 2022 16:39:23 +0300 From: "Kirill A. Shutemov" To: Brian Foster Cc: linux-mm@kvack.org, Matthew Wilcox Subject: Re: [PATCH] mm/huge_memory: don't clear active swapcache entry from page->private Message-ID: <20221018133923.4wdzrgbmbbv6iz6v@box.shutemov.name> References: <20220906190602.1626037-1-bfoster@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220906190602.1626037-1-bfoster@redhat.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666100370; a=rsa-sha256; cv=none; b=78sjsRjN73txzOQoNjhjYHd9ZhR5bBQ3mmgabh02pNIR3uRm3vfHyqJSKdW69i8hsoo3T8 DUZWQ11KxUBSxBaKxcE4Af7v9/AQebbwnYTF3yBhBdl/DAtK0qPeWN6xaC6oF/W0ybj3+j Em89yeX7sroGKUOB7GvX+9QBi27QAZQ= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=shutemov.name header.s=fm1 header.b=jz3Dyi0E; dkim=pass header.d=messagingengine.com header.s=fm3 header.b="BuNu/zjk"; dmarc=none; spf=pass (imf02.hostedemail.com: domain of kirill@shutemov.name designates 64.147.123.21 as permitted sender) smtp.mailfrom=kirill@shutemov.name ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666100370; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DcGUg5ksLE5oyQ0xqWnxKqjgf4z8pJwiNSklauNTROE=; b=HMyfhVBydauKwtHfOt3L6rTs5wvmUJOR/Q0T0s59hQ5Sff4NlQrFzxAetOsDK2tdtQO8jM ZKBVsTXFucNSY1wgodRh4N9UAnLAjP6DqABk2bGXwYuqwZ3wBHd0QLVKwbZyGIHxTHg8Mk 2aAJBAvSimNCGE75xA5VlYI68GAJ8nc= X-Rspamd-Server: rspam05 X-Rspam-User: X-Rspamd-Queue-Id: 3DA8380037 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=shutemov.name header.s=fm1 header.b=jz3Dyi0E; dkim=pass header.d=messagingengine.com header.s=fm3 header.b="BuNu/zjk"; dmarc=none; spf=pass (imf02.hostedemail.com: domain of kirill@shutemov.name designates 64.147.123.21 as permitted sender) smtp.mailfrom=kirill@shutemov.name X-Stat-Signature: yyj1qj8ujdqbw5ztj5czbacjc69wictr X-HE-Tag: 1666100369-304816 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 06, 2022 at 03:06:02PM -0400, Brian Foster wrote: > If a swap cache resident hugepage is passed into > __split_huge_page(), the tail pages are incrementally split off and > each offset in the swap cache covered by the hugepage is updated to > point to the associated subpage instead of the original head page. > As a final step, each subpage is individually passed to > free_page_and_swap_cache() to free the associated swap cache entry > and release the page. This eventually lands in > delete_from_swap_cache(), which refers to page->private for the > swp_entry_t, which in turn encodes the swap address space and page > offset information. > > The problem here is that the earlier call to > __split_huge_page_tail() clears page->private of each tail page in > the hugepage. This means that the swap entry passed to > __delete_from_swap_cache() is zeroed, resulting in a bogus address > space and offset tuple for the swapcache update. If DEBUG_VM is > enabled, this results in a BUG() in the latter function upon > detection of the old value in the swap address space not matching > the page being removed. > > The ramifications are less clear if DEBUG_VM is not enabled. In the > particular stress-ng workload that reproduces this problem, this > reliably occurs via MADV_PAGEOUT, which eventually triggers swap > cache reclaim before the madvise() call returns. The swap cache > reclaim sequence attempts to reuse the entry that should have been > freed by the delete operation, but since that failed to correctly > update the swap address space, swap cache reclaim attempts to look > up the already freed page still stored at said offset and falls into > a tight loop in find_get_page() -> __filemap_get_folio() due to > repetitive folio_try_get_rcu() (reference count update) failures. > This leads to a soft lockup BUG and never seems to recover. > > To avoid this problem, update __split_huge_page_tail() to not clear > page->private when the associated page has the swap cache flag set. > Note that this flag is transferred to the tail page by the preceding > ->flags update. > > Fixes: b653db77350c7 ("mm: Clear page->private when splitting or migrating a page") > Signed-off-by: Brian Foster stable@ ? > --- > > Original bug report is here [1]. I figure there's probably at least a > couple different ways to fix this problem, but I started with what > seemed most straightforward. Thoughts appreciated.. > > Brian > > [1] https://lore.kernel.org/linux-mm/YxDyZLfBdFHK1Y1P@bfoster/ > > mm/huge_memory.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index e9414ee57c5b..c2ddbb81a743 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2445,7 +2445,8 @@ static void __split_huge_page_tail(struct page *head, int tail, > page_tail); > page_tail->mapping = head->mapping; > page_tail->index = head->index + tail; > - page_tail->private = 0; > + if (!PageSwapCache(page_tail)) > + page_tail->private = 0; The patch looks good to me, but this check deserves a comment. Otherwise: Acked-by: Kirill A. Shutemov > > /* Page flags must be visible before we make the page non-compound. */ > smp_wmb(); > -- > 2.37.1 > > -- Kiryl Shutsemau / Kirill A. Shutemov