From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A70CC4332F for ; Tue, 18 Oct 2022 17:41:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 89C6B6B0072; Tue, 18 Oct 2022 13:41:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 84D6A8E0003; Tue, 18 Oct 2022 13:41:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 715DF8E0002; Tue, 18 Oct 2022 13:41:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 615176B0072 for ; Tue, 18 Oct 2022 13:41:05 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 202B4AB738 for ; Tue, 18 Oct 2022 17:41:05 +0000 (UTC) X-FDA: 80034786090.09.C20A5DA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 5A88C1A0035 for ; Tue, 18 Oct 2022 17:41:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666114863; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=r8gQOMJjZQwwhGJop4zv1RNsCIOtvDAfXTb0bBtq9DI=; b=BcP7T/YLLCtE83bMbvPcSkMPVTBuoUQBs/9DwGaYCXnZ5yg6z1yIIfaW1JnGPCnjfGKpxw CtW2JNHrXosjsYyrGc6LRa2opeaxKhV/bXksxL1yr/r+uyl0UpHNcWRIwWCC5F7ooKigEI aysHueqkRuArHH6kqNdIj/eQvLULV3U= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-132-BC-adKsSPViafHwgwu3cuA-1; Tue, 18 Oct 2022 13:41:02 -0400 X-MC-Unique: BC-adKsSPViafHwgwu3cuA-1 Received: by mail-qv1-f71.google.com with SMTP id g12-20020a0cfdcc000000b004ad431ceee0so9010938qvs.7 for ; Tue, 18 Oct 2022 10:41:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=r8gQOMJjZQwwhGJop4zv1RNsCIOtvDAfXTb0bBtq9DI=; b=GXdU8rLshsSXcKsKGqCMuxiljNzvKY12LI1i+1eD3+mXFsYC+tcmQDEkL0KeLkoSSa f6CUUZ1gubE05ARogpBQdzdxRW5JhpEv6n6Nk6kxFhCpKJG51v0k2+d1HHwLDlq2xoho vLULr3a8J3y5vtT5aCo9m7vw3eWESwBRjGT+KrjgIWBFEC/h/MZtY3AnDJgZOM+2DjBX CCXSj5vEJ6nZdXW88brGmhB73q7lHSGs0Lrbjt46GKRL/TMdSwLhxH6I3tX3oouCjFaf q+Ppr+lc6F14wfJkCIsOYGrOp1EKcp3o0XYNurGD+4NPujZKSmZbxCpfIuQErEFnSUoh S2Kg== X-Gm-Message-State: ACrzQf0w8/i8bqf6T6HM66m+ZYtdjPUZGGnqiGnNj+1qyXKmEqU57SC1 +zrhxwnwJXKu2xxVvA6tZR854QXEKV7FZXUP8HVMSIxdpA9MjTQsuwnN7+MtyopXKvzza+kik5p Pt1hXFpmVzCk= X-Received: by 2002:a05:6214:2464:b0:4b3:6cce:9860 with SMTP id im4-20020a056214246400b004b36cce9860mr3101172qvb.120.1666114860658; Tue, 18 Oct 2022 10:41:00 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6v/3DIZBn8GzWXeKTTPCCOIxnOV1gevsE1Qyswm4I2hlwIC7islxjvCi6juHa0BUIs2cgKlQ== X-Received: by 2002:a05:6214:2464:b0:4b3:6cce:9860 with SMTP id im4-20020a056214246400b004b36cce9860mr3101161qvb.120.1666114860453; Tue, 18 Oct 2022 10:41:00 -0700 (PDT) Received: from bfoster (c-24-61-119-116.hsd1.ma.comcast.net. [24.61.119.116]) by smtp.gmail.com with ESMTPSA id z21-20020ac84555000000b00398df095cf5sm2299618qtn.34.2022.10.18.10.40.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Oct 2022 10:40:59 -0700 (PDT) Date: Tue, 18 Oct 2022 13:41:03 -0400 From: Brian Foster To: "Kirill A. Shutemov" Cc: linux-mm@kvack.org, Matthew Wilcox Subject: Re: [PATCH] mm/huge_memory: don't clear active swapcache entry from page->private Message-ID: References: <20220906190602.1626037-1-bfoster@redhat.com> <20221018133923.4wdzrgbmbbv6iz6v@box.shutemov.name> MIME-Version: 1.0 In-Reply-To: <20221018133923.4wdzrgbmbbv6iz6v@box.shutemov.name> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666114864; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=r8gQOMJjZQwwhGJop4zv1RNsCIOtvDAfXTb0bBtq9DI=; b=7ndMstAh6gflH8MhTMtqQ4iBw6Ae4CWPtiYOxFUxfd9TcmYDJaHii6QyuvH9Pk7NItAyES y1Dh1jbunxxcIPAmwWNEpjKn+i8iKJlwKXHeSY1O9VkIefBP06dszgOVleMOvSb+5PDrFA 79hjO4FflJtC7SAfVv9bOPmusfNEh6c= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="BcP7T/YL"; spf=pass (imf19.hostedemail.com: domain of bfoster@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666114864; a=rsa-sha256; cv=none; b=mKmipRik91eUz1m0tqzS43PJXBu4FZ0Ih+wCG/vVgCxX4XtM8EdBaLDnWOC4WurduaWrNZ KVYOyhJCUgBSyCXFL+wW1lpNIshcdeE7TdHi4YAIuvTHKYCIBxgEchtJuZVREtI3P2eHQj aQIA7eVIS6e42iD5lu/kH6AP5W+gV2M= X-Rspamd-Server: rspam12 X-Rspam-User: Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="BcP7T/YL"; spf=pass (imf19.hostedemail.com: domain of bfoster@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: dxo9h9b1nea6wswtecx3osnftw75stmu X-Rspamd-Queue-Id: 5A88C1A0035 X-HE-Tag: 1666114864-571752 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 18, 2022 at 04:39:23PM +0300, Kirill A. Shutemov wrote: > On Tue, Sep 06, 2022 at 03:06:02PM -0400, Brian Foster wrote: > > If a swap cache resident hugepage is passed into > > __split_huge_page(), the tail pages are incrementally split off and > > each offset in the swap cache covered by the hugepage is updated to > > point to the associated subpage instead of the original head page. > > As a final step, each subpage is individually passed to > > free_page_and_swap_cache() to free the associated swap cache entry > > and release the page. This eventually lands in > > delete_from_swap_cache(), which refers to page->private for the > > swp_entry_t, which in turn encodes the swap address space and page > > offset information. > > > > The problem here is that the earlier call to > > __split_huge_page_tail() clears page->private of each tail page in > > the hugepage. This means that the swap entry passed to > > __delete_from_swap_cache() is zeroed, resulting in a bogus address > > space and offset tuple for the swapcache update. If DEBUG_VM is > > enabled, this results in a BUG() in the latter function upon > > detection of the old value in the swap address space not matching > > the page being removed. > > > > The ramifications are less clear if DEBUG_VM is not enabled. In the > > particular stress-ng workload that reproduces this problem, this > > reliably occurs via MADV_PAGEOUT, which eventually triggers swap > > cache reclaim before the madvise() call returns. The swap cache > > reclaim sequence attempts to reuse the entry that should have been > > freed by the delete operation, but since that failed to correctly > > update the swap address space, swap cache reclaim attempts to look > > up the already freed page still stored at said offset and falls into > > a tight loop in find_get_page() -> __filemap_get_folio() due to > > repetitive folio_try_get_rcu() (reference count update) failures. > > This leads to a soft lockup BUG and never seems to recover. > > > > To avoid this problem, update __split_huge_page_tail() to not clear > > page->private when the associated page has the swap cache flag set. > > Note that this flag is transferred to the tail page by the preceding > > ->flags update. > > > > Fixes: b653db77350c7 ("mm: Clear page->private when splitting or migrating a page") > > Signed-off-by: Brian Foster > > stable@ ? > Ok. > > --- > > > > Original bug report is here [1]. I figure there's probably at least a > > couple different ways to fix this problem, but I started with what > > seemed most straightforward. Thoughts appreciated.. > > > > Brian > > > > [1] https://lore.kernel.org/linux-mm/YxDyZLfBdFHK1Y1P@bfoster/ > > > > mm/huge_memory.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index e9414ee57c5b..c2ddbb81a743 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -2445,7 +2445,8 @@ static void __split_huge_page_tail(struct page *head, int tail, > > page_tail); > > page_tail->mapping = head->mapping; > > page_tail->index = head->index + tail; > > - page_tail->private = 0; > > + if (!PageSwapCache(page_tail)) > > + page_tail->private = 0; > > The patch looks good to me, but this check deserves a comment. > Sure... not sure how descriptive a comment you're looking for. Something like the following perhaps? "If the hugepage is in swapcache, page_tail->private tracks the swap_entry_t of the tail page. We can't clear it until the tail page is removed from swapcache." I'll wait a bit for any further comment on Andrew's question [1] in the thread for the issue reported by Oleksandr (which so far also appears to be resolved by this patch). Barring further feedback, I'll plan a v2 that includes something like the above. > Otherwise: > > Acked-by: Kirill A. Shutemov > Thanks! Brian [1] https://lore.kernel.org/linux-mm/20221017152423.37a126325b4330e71cf8f869@linux-foundation.org/ > > > > /* Page flags must be visible before we make the page non-compound. */ > > smp_wmb(); > > -- > > 2.37.1 > > > > > > -- > Kiryl Shutsemau / Kirill A. Shutemov >