From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23652CA9EC7 for ; Thu, 31 Oct 2019 01:51:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CAF4420874 for ; Thu, 31 Oct 2019 01:51:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="WOf6Siy0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CAF4420874 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 649E86B0003; Wed, 30 Oct 2019 21:51:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F9FE6B0005; Wed, 30 Oct 2019 21:51:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E8CA6B0007; Wed, 30 Oct 2019 21:51:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0009.hostedemail.com [216.40.44.9]) by kanga.kvack.org (Postfix) with ESMTP id 2D72C6B0003 for ; Wed, 30 Oct 2019 21:51:18 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 837174DDC for ; Thu, 31 Oct 2019 01:51:17 +0000 (UTC) X-FDA: 76102402194.13.bean94_62857be2ff950 X-HE-Tag: bean94_62857be2ff950 X-Filterd-Recvd-Size: 9117 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Thu, 31 Oct 2019 01:51:16 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B9D752080F; Thu, 31 Oct 2019 01:51:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1572486676; bh=Ira/fcpSBOCsTOPMzOdka+hh9vzaHqUeWH2/D3rkkDo=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=WOf6Siy0xdgZObo+i0Dk/9ByB2UhObvzYRMbyPXAi7Pm7VG0JQoaCvM2PQS4hNNVl FDuze12f99cfM4I7p5yGSkZ4koAyTPIVIRl9pHD5RehKEgwHHZITAVpIoUsh6OsMhd mcVFA/tPVK7WTM3ZBxjhMaHImOKJ1+YcnJqPZvq4= Date: Wed, 30 Oct 2019 18:51:15 -0700 From: Andrew Morton To: Song Liu Cc: , , "Kirill A . Shutemov" , Hugh Dickins , William Kucharski , Johannes Weiner , Matthew Wilcox Subject: Re: [PATCH] mm/thp: flush file for !is_shmem PageDirty() case in collapse_file() Message-Id: <20191030185115.ad780e74c66b6286789559fd@linux-foundation.org> In-Reply-To: <20191030200736.3455046-1-songliubraving@fb.com> References: <20191030200736.3455046-1-songliubraving@fb.com> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 30 Oct 2019 13:07:36 -0700 Song Liu wrote: > For non-shmem file THPs, khugepaged only collapses read only .text mapping > (VM_DENYWRITE). These pages should not be dirty except the case where the > file hasn't been flushed since first write. > > Call filemap_flush() in collapse_file() to accelerate the write back in > such cases. > > Also add warning if PageDirty() triggered for pages from readahead path. > > Reported-and-tested-by: syzbot+efb9e48b9fbdc49bb34a@syzkaller.appspotmail.com If sysbot reported something then we definitely want to be able to review that report when reviewing the patch! So please do include a Link: for that sort of thing. A bit of sleuthing leads me to http://lkml.kernel.org/r/000000000000c50fd70595fdd5b2@google.com which shows that this patch is in fact a fix against mmthp-recheck-each-page-before-collapsing-file-thp.patch. This very important information wasn't in the changelog. And this patch doesn't really fix the sysbot hang, does it? The patch restores the flush which was removed in order to fix the sysbot hang. In general, please do try to include all this sort of information when preparing changelogs. > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1601,6 +1601,33 @@ static void collapse_file(struct mm_struct *mm, > result = SCAN_FAIL; > goto xa_unlocked; > } > + if (WARN_ON_ONCE(PageDirty(page))) { I'm not understanding what guarantees this. Can't another process which has the file open for writing come in and dirty the page after the readahead has completed and before this process locks the page? > + /* > + * page from readahead should not > + * be dirty. Show warning if this > + * somehow happens. > + */ > + result = SCAN_FAIL; > + goto out_unlock; > + } > + } else if (PageDirty(page)) { > + /* > + * khugepaged only works on read-only fd, > + * so this page is dirty because it hasn't > + * been flushed since first write. There > + * won't be new dirty pages. > + * > + * Trigger async flush here and hope the > + * writeback is done when khugepaged > + * revisits this page. > + * > + * This is a one-off situation. We are not > + * forcing writeback in loop. > + */ > + xas_unlock_irq(&xas); > + filemap_flush(mapping); > + result = SCAN_FAIL; > + goto xa_unlocked; > } else if (trylock_page(page)) { > get_page(page); > xas_unlock_irq(&xas); The patch mmthp-recheck-each-page-before-collapsing-file-thp.patch has undergone quite a bit of churn so I don't think it should be mainlined without more testing and review. But it fixes a significant issue. So could the appropriate developers please take some time to recheck and retest it all? To that end, here's the combination of mmthp-recheck-each-page-before-collapsing-file-thp.patch and this patch: From: Song Liu Subject: mm,thp: recheck each page before collapsing file THP In collapse_file(), for !is_shmem case, current check cannot guarantee the locked page is up-to-date. Specifically, xas_unlock_irq() should not be called before lock_page() and get_page(); and it is necessary to recheck PageUptodate() after locking the page. With this bug and CONFIG_READ_ONLY_THP_FOR_FS=y, madvise(HUGE)'ed .text may contain corrupted data. This is because khugepaged mistakenly collapses some not up-to-date sub pages into a huge page, and assumes the huge page is up-to-date. This will NOT corrupt data in the disk, because the page is read-only and never written back. Fix this by properly checking PageUptodate() after locking the page. This check replaces "VM_BUG_ON_PAGE(!PageUptodate(page), page);". Also, move PageDirty() check after locking the page. Current khugepaged should not try to collapse dirty file THP, because it is limited to read-only .text. Add a warning with the PageDirty() check as it should not happen. This warning is added after page_mapping() check, because if the page is truncated, it might be dirty. [songliubraving@fb.com: flush file for !is_shmem PageDirty() case in collapse_file()] Link: http://lkml.kernel.org/r/20191030200736.3455046-1-songliubraving@fb.com [songliubraving@fb.com: v4] Link: http://lkml.kernel.org/r/20191022191006.411277-1-songliubraving@fb.com [songliubraving@fb.com: fix deadlock in collapse_file()] Link: http://lkml.kernel.org/r/20191028221414.3685035-1-songliubraving@fb.com Link: http://lkml.kernel.org/r/20191018180345.4188310-1-songliubraving@fb.com Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Song Liu Acked-by: Johannes Weiner Cc: Kirill A. Shutemov Cc: Hugh Dickins Cc: William Kucharski Signed-off-by: Andrew Morton --- mm/khugepaged.c | 49 +++++++++++++++++++++++++++++++++++++--------- 1 file changed, 40 insertions(+), 9 deletions(-) --- a/mm/khugepaged.c~mmthp-recheck-each-page-before-collapsing-file-thp +++ a/mm/khugepaged.c @@ -1601,17 +1601,33 @@ static void collapse_file(struct mm_stru result = SCAN_FAIL; goto xa_unlocked; } - } else if (!PageUptodate(page)) { - xas_unlock_irq(&xas); - wait_on_page_locked(page); - if (!trylock_page(page)) { - result = SCAN_PAGE_LOCK; - goto xa_unlocked; + if (WARN_ON_ONCE(PageDirty(page))) { + /* + * page from readahead should not + * be dirty. Show warning if this + * somehow happens. + */ + result = SCAN_FAIL; + goto out_unlock; } - get_page(page); } else if (PageDirty(page)) { + /* + * khugepaged only works on read-only fd, + * so this page is dirty because it hasn't + * been flushed since first write. There + * won't be new dirty pages. + * + * Trigger async flush here and hope the + * writeback is done when khugepaged + * revisits this page. + * + * This is a one-off situation. We are not + * forcing writeback in loop. + */ + xas_unlock_irq(&xas); + filemap_flush(mapping); result = SCAN_FAIL; - goto xa_locked; + goto xa_unlocked; } else if (trylock_page(page)) { get_page(page); xas_unlock_irq(&xas); @@ -1626,7 +1642,12 @@ static void collapse_file(struct mm_stru * without racing with truncate. */ VM_BUG_ON_PAGE(!PageLocked(page), page); - VM_BUG_ON_PAGE(!PageUptodate(page), page); + + /* double check the page is up to date */ + if (unlikely(!PageUptodate(page))) { + result = SCAN_FAIL; + goto out_unlock; + } /* * If file was truncated then extended, or hole-punched, before @@ -1642,6 +1663,16 @@ static void collapse_file(struct mm_stru goto out_unlock; } + if (!is_shmem && PageDirty(page)) { + /* + * khugepaged only works on read-only fd, so this + * page is dirty because it hasn't been flushed + * since first write. + */ + result = SCAN_FAIL; + goto out_unlock; + } + if (isolate_lru_page(page)) { result = SCAN_DEL_PAGE_LRU; goto out_unlock; _