From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AE07C433FE for ; Mon, 17 Oct 2022 16:14:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D907D6B0074; Mon, 17 Oct 2022 12:14:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D40FE6B0078; Mon, 17 Oct 2022 12:14:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C07F36B007B; Mon, 17 Oct 2022 12:14:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B0A066B0074 for ; Mon, 17 Oct 2022 12:14:52 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 7AA3EA0ADA for ; Mon, 17 Oct 2022 16:14:52 +0000 (UTC) X-FDA: 80030940024.23.60AC9A6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf10.hostedemail.com (Postfix) with ESMTP id 20C68C0020 for ; Mon, 17 Oct 2022 16:14:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666023291; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Z64bi4h76o14IntYd/ZglT4ZtVyT4Yjv/o/pjilc2RI=; b=ONnXodSU3hxU9EFYYwco1yDBdAnlaFk8FCqJqjIsGrfYWZ9BJ2tv6ZQY8YrWUkE9/EHjca Y2S1fNzJNHuR8lWMu7OEVuXD6W1kc8lWBgTy7cUdus26RGDNfhpH9qU3IQ5zIlx5w3bc6g HJw9y0sTUCJ+qX9SaPBRsFYA2FORpyk= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-287-_hKOXr08N9u93L8LVxRsIA-1; Mon, 17 Oct 2022 12:14:50 -0400 X-MC-Unique: _hKOXr08N9u93L8LVxRsIA-1 Received: by mail-qk1-f200.google.com with SMTP id u6-20020a05620a430600b006e47fa02576so10050772qko.22 for ; Mon, 17 Oct 2022 09:14:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Z64bi4h76o14IntYd/ZglT4ZtVyT4Yjv/o/pjilc2RI=; b=aJUdLTDPnN2MoNSYG4tWKU5IAXQK1VPf2FJYsskeD4E1fWx0eRz82kFBAERYd3waVL qZw490lwDGg8/yVtBVOEdHDy2aXBqkdbdmGSJeTu0TT03+EH0y7cgDI59Lrdq8x+eYEm YaZqN2ionJdpNl0Vc01uzn0jUEXLVxKBaMCA1xIw9WNlCsaJm5ePw7+vO0C/qyetds+L 1QvgSOKtCiaI4F71lYNnIhT0IH+GQljh0ivQw5UxbKIjKJrUKNJOSk0n2cf+sfITb/ft oSZtr5rhlaVLYDiZHh51dth9ZrPPbgs1v4/Z3v4LNLW4Aog38ebxGy3RwOIRDWYo4H5p VYBg== X-Gm-Message-State: ACrzQf2XQ+HV7H0DItUmmBc9MHCHIgYx3KAyZE7wcyORkHBdErrKsTcm h3NP6B60x55ND63Ti8/Kb3o122WHCtHvozX7HUQziu4GTJkacT/A7nX7ZSPHxeGmr0rghgBGRXe 3X1JNtNvsHZNFgDdf+J5dQOvQ2UTVQbkaTV/PH9mbqM01EVLp/WzlTs6FgdLVvg== X-Received: by 2002:a05:6214:401a:b0:4b1:c215:3980 with SMTP id kd26-20020a056214401a00b004b1c2153980mr9041073qvb.4.1666023289683; Mon, 17 Oct 2022 09:14:49 -0700 (PDT) X-Google-Smtp-Source: AMsMyM42NtC3CbwqQugUaQ/5jCC9KDWDKrwlj4VqmIhlF6mOW5+9AubQzB4Fzzr2epHNTPmCCXag1g== X-Received: by 2002:a05:6214:401a:b0:4b1:c215:3980 with SMTP id kd26-20020a056214401a00b004b1c2153980mr9041049qvb.4.1666023289424; Mon, 17 Oct 2022 09:14:49 -0700 (PDT) Received: from bfoster (c-24-61-119-116.hsd1.ma.comcast.net. [24.61.119.116]) by smtp.gmail.com with ESMTPSA id f11-20020a05620a408b00b006eeb51bb33dsm117153qko.78.2022.10.17.09.14.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Oct 2022 09:14:48 -0700 (PDT) Date: Mon, 17 Oct 2022 12:14:52 -0400 From: Brian Foster To: linux-mm@kvack.org Cc: Matthew Wilcox , Oleksandr Natalenko Subject: Re: [PATCH] mm/huge_memory: don't clear active swapcache entry from page->private Message-ID: References: <20220906190602.1626037-1-bfoster@redhat.com> MIME-Version: 1.0 In-Reply-To: <20220906190602.1626037-1-bfoster@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666023292; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Z64bi4h76o14IntYd/ZglT4ZtVyT4Yjv/o/pjilc2RI=; b=AulaXIASP4vJuPS8CIykJbsU67wV8OsSg6kLdjy6rYQDzeE5/NxWAYNqLnfNv6Hw15yPw8 E78BMAKGf7DJD+bjJOUWLHf3/PhD3+4cIDaafDrOm7ZFlUEWJ6h2Q3g8xeEw57HnrDfxT4 xtfqqg4YoDVrb0A7GpeZlOL/PzvXv7E= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ONnXodSU; spf=pass (imf10.hostedemail.com: domain of bfoster@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666023292; a=rsa-sha256; cv=none; b=TveqAU8IJcWraTMsb9r1AHFoVZmUtJHSd3aE6b68YKK/Dsc0IUm3xeFl8wetzNnUK5bxI7 0ucqGEnuveiFoRinWwHANRKdLHgV+xKmjQleYGkoAtoEv2r63UCmEbM//ElGN1o96iMv+/ Ww3yADcwQa32eaq+lyLil4ghvOnKgvU= X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 20C68C0020 X-Rspam-User: Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ONnXodSU; spf=pass (imf10.hostedemail.com: domain of bfoster@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: xxqyzdsf4a9y4uo5ffrf1hy7hdie7zwn X-HE-Tag: 1666023291-170110 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 06, 2022 at 03:06:02PM -0400, Brian Foster wrote: > If a swap cache resident hugepage is passed into > __split_huge_page(), the tail pages are incrementally split off and > each offset in the swap cache covered by the hugepage is updated to > point to the associated subpage instead of the original head page. > As a final step, each subpage is individually passed to > free_page_and_swap_cache() to free the associated swap cache entry > and release the page. This eventually lands in > delete_from_swap_cache(), which refers to page->private for the > swp_entry_t, which in turn encodes the swap address space and page > offset information. > > The problem here is that the earlier call to > __split_huge_page_tail() clears page->private of each tail page in > the hugepage. This means that the swap entry passed to > __delete_from_swap_cache() is zeroed, resulting in a bogus address > space and offset tuple for the swapcache update. If DEBUG_VM is > enabled, this results in a BUG() in the latter function upon > detection of the old value in the swap address space not matching > the page being removed. > > The ramifications are less clear if DEBUG_VM is not enabled. In the > particular stress-ng workload that reproduces this problem, this > reliably occurs via MADV_PAGEOUT, which eventually triggers swap > cache reclaim before the madvise() call returns. The swap cache > reclaim sequence attempts to reuse the entry that should have been > freed by the delete operation, but since that failed to correctly > update the swap address space, swap cache reclaim attempts to look > up the already freed page still stored at said offset and falls into > a tight loop in find_get_page() -> __filemap_get_folio() due to > repetitive folio_try_get_rcu() (reference count update) failures. > This leads to a soft lockup BUG and never seems to recover. > > To avoid this problem, update __split_huge_page_tail() to not clear > page->private when the associated page has the swap cache flag set. > Note that this flag is transferred to the tail page by the preceding > ->flags update. > > Fixes: b653db77350c7 ("mm: Clear page->private when splitting or migrating a page") > Signed-off-by: Brian Foster > --- > > Original bug report is here [1]. I figure there's probably at least a > couple different ways to fix this problem, but I started with what > seemed most straightforward. Thoughts appreciated.. > Ping? I can still reproduce this on latest kernels as of last week or so.. Brian > Brian > > [1] https://lore.kernel.org/linux-mm/YxDyZLfBdFHK1Y1P@bfoster/ > > mm/huge_memory.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index e9414ee57c5b..c2ddbb81a743 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2445,7 +2445,8 @@ static void __split_huge_page_tail(struct page *head, int tail, > page_tail); > page_tail->mapping = head->mapping; > page_tail->index = head->index + tail; > - page_tail->private = 0; > + if (!PageSwapCache(page_tail)) > + page_tail->private = 0; > > /* Page flags must be visible before we make the page non-compound. */ > smp_wmb(); > -- > 2.37.1 > >