From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D449CC433E7 for ; Sat, 10 Oct 2020 15:09:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CD80A222EC for ; Sat, 10 Oct 2020 15:09:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="gG7YhX2L" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CD80A222EC Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E05B76B005C; Sat, 10 Oct 2020 11:09:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB67C6B0062; Sat, 10 Oct 2020 11:09:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF3BF900002; Sat, 10 Oct 2020 11:09:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0158.hostedemail.com [216.40.44.158]) by kanga.kvack.org (Postfix) with ESMTP id A29056B005C for ; Sat, 10 Oct 2020 11:09:23 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 4F5B6180AD806 for ; Sat, 10 Oct 2020 15:09:23 +0000 (UTC) X-FDA: 77356349406.05.juice49_30060e2271ea Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 32B8318018747 for ; Sat, 10 Oct 2020 15:09:23 +0000 (UTC) X-HE-Tag: juice49_30060e2271ea X-Filterd-Recvd-Size: 3800 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Sat, 10 Oct 2020 15:09:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=7Msw6fpgRxYN9+n0XAq2tl/9+x3JBz7DH4RT1m5W6lM=; b=gG7YhX2LzOY8pM12PPGk/xQmO/ og4fso+Qlgg4PN0Zy7fwqAf+Rj4Zbxchp0FUeQGdF/qoo7yy+IURApu7dMgcPSYgITDFiyjRLODFF zzKs91mJ0Z5MCmMqBfCXYZNFhdods9yliMOlrSSR3577CWhDwa8LloF7I7Tig0p4mp42m9uYHkE/8 0TDAZUeRtT1t+nGGrqJG4a0iGl+nRHdjzQ+P8oDVkhaJsNYU04f634Zosq072OOFBN32IJ6XR3QiK V0Y+1eDWaLgLz5h2fB4kLycTxRqssK6A6bj8zG4vAPRHGiABADmuIAU8HGanwOx2w5ahiZP08urgo z/XldW+A==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kRGUX-0004oC-Kf; Sat, 10 Oct 2020 15:09:01 +0000 Date: Sat, 10 Oct 2020 16:09:01 +0100 From: Matthew Wilcox To: Hugh Dickins Cc: Andrew Morton , Linus Torvalds , Song Liu , "Kirill A. Shutemov" , Yang Shi , Denis Lisov , Qian Cai , Suren Baghdasaryan , David Rientjes , Minchan Kim , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] mm/khugepaged: fix filemap page_to_pgoff(page) != offset Message-ID: <20201010150901.GX20115@casper.infradead.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Oct 09, 2020 at 08:07:59PM -0700, Hugh Dickins wrote: > There have been elusive reports of filemap_fault() hitting its > VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built > with CONFIG_READ_ONLY_THP_FOR_FS=y. > > Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and > CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged > without NUMA reuses the same huge page after collapse_file() failed > (whereas NUMA targets its allocation to the respective node each time). > And most of us were usually testing with CONFIG_NUMA=y kernels. Good catch. There have been at least three bugs in recent times which can cause this VM_BUG_ON_PAGE() to trigger. This one, one where swapping out a THP led to all 512 entries pointing to the same non-huge page on swapin (fixed in -mm) and one that I introduced for a few weeks in -mm where failing to split a THP would lead to random tree corruption due to a non-zeroed node being freed to the slab cache. There may yet be a fourth. I've seen it occasionally in recent testing so I'll add this patch and see if it disappears. > Instead, non-NUMA khugepaged_prealloc_page() release the old page > if anyone else has a reference to it (1% of cases when I tested). I think this is a good way to fix the problem. We could also change khugepaged to insert a frozen page, ensuring that find_get_entry() would spin until the collapse has succeeded or the page was removed from the cache again. But I have no problem with this approach. I want to note that this is a silent data corruption for reads. generic_file_buffered_read() has a reference to the page, so this patch will fix it, but before it could be copying the wrong data to userspace. Reviewed-by: Matthew Wilcox (Oracle)