From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5402C001DF for ; Wed, 16 Aug 2023 17:05:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 06CC08D0049; Wed, 16 Aug 2023 13:05:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F37B98D0001; Wed, 16 Aug 2023 13:05:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB1128D0049; Wed, 16 Aug 2023 13:05:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C71318D0001 for ; Wed, 16 Aug 2023 13:05:41 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 7967D140F4B for ; Wed, 16 Aug 2023 17:05:41 +0000 (UTC) X-FDA: 81130594482.17.2D1C6D8 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf05.hostedemail.com (Postfix) with ESMTP id 887B71001E1 for ; Wed, 16 Aug 2023 17:03:35 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=lSlrScPf; dmarc=none; spf=none (imf05.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692205416; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ybl5E+HH35TJ4OJlyeraQdmrFz9TQ6b1jqK4gcch/+o=; b=GDKC49sI8+vnRAvkwA20wWhqePDFGKdxNZ4oHeWvtQ9k47fY2SuL0c+6df1i9K3kmucYrT YvmX/gPdGWoKUNmBSgO8Nqnc1nzP1UnpXgJ5PgTkQ/oMEVIEFEcuBF9BhrGBs4U7cliQxp rRsrUdhp5bW4FKZ8bkzaWxxdGuQQHr8= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=lSlrScPf; dmarc=none; spf=none (imf05.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692205416; a=rsa-sha256; cv=none; b=ON7p53y13wgz5r1CSfd/dtTE8gt1RqckeCb61xWhuCLlnF3xmdd3Jvf+vhQ4WwHQxuMSn/ tEajIgT3i0rIUNNQYy6IR+J/7KZ8vpjTWviGfmwSJcuWDMfbYwAleDiWjRUbaR91ssrJTc hBjmAiwSRIeut9XIQP+PcWBji3n25IU= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=ybl5E+HH35TJ4OJlyeraQdmrFz9TQ6b1jqK4gcch/+o=; b=lSlrScPflY5V5alz4Q7fiD4GZb RbXBtv9j9UxtlCNlQzeQPqeDQisj3vC9o+38gDJy0Wa3gTUBxg5nANAI8MbdjIk26jIDQLGZ9jnnx LyEQgC0/xtQgFDVC70h9/u6RDEDioi8m5YUJK9oTA0zz12g2pM6nqO54do/ouzZqRgAcMIDJrIVH5 aZEv/RJxRkopAa74FxDdoHSjJ8+WFb/yzsHy+tS72UMFD2kJqfHshdaLNovApBFNWgGAwKMekVIhA 8XYWK2BPX17nIlZmEwYEy7EShOQVzeEC4sLO4+H3ffqGk9foAx+yLdEbxYBmhCGWe5gUbM7/9tWJ7 hq9uc2Yw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1qWJve-00FrK6-CR; Wed, 16 Aug 2023 17:03:30 +0000 Date: Wed, 16 Aug 2023 18:03:30 +0100 From: Matthew Wilcox To: Jens Axboe Cc: Andrew Morton , "Kirill A . Shutemov" , Hugh Dickins , linux-mm@kvack.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Fix rare user data corruption when using THP Message-ID: References: <20230814144100.596749-1-willy@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230814144100.596749-1-willy@infradead.org> X-Rspamd-Queue-Id: 887B71001E1 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 1hj8j3prkf778gmcjkn6jrisnhbhtoss X-HE-Tag: 1692205415-88058 X-HE-Meta: U2FsdGVkX18hCvkdEeE7TEvSVHseP4hW+mlx15J6vmVXHqp7fO/UY+2I2/C6pFwD4o2wX9lchcKz5ZJWYDpCmZfAImH8ds8HBd2g1T9NRHpAWaYhkP0yxHQrodiVxd7NCtny0HAN9F/RXAumRaF97Hum/2EcA2uQtQpDUJCOtP3ayEJGQ22t3KBh2avwsCLiWDzc+2fyMMKFM9SptS78/pI2yjWS/chw/nWuKdztBM7VfwTSw+rcfAP1imAxkwUltbDBuXpCiWRu6iblsZ3fYgx8QUNuu9cf9JeNeCnsH7gk54HxN72CVvGnrlgjpzqRIsSYGLItBlf5cxqajCEe5nkWN5uEClDI0Bh+f78sZ52X0f3g7NFMItg/T0aiVJ24VVCZppiKiI2jDj5Xiroe8MlJc6XshJDFmnUCWzQutpBOLhVjf9P6c/WAiyCfgBO99zrbGObUDE8rNUDFhiM1QVYBxNGNoaLzxvC/H51NfBebSkUuJDxvyVbGbFHDwQeSKUvcb5aGUvBpBIGF7QWuC3Ii1THoSR9Qgfy2E8KxW2eYyxvkJBiDPcUUV5WKFOlvWtH4hfWUvWLdA+krbqi52+WQ0KcI3ltyoMT95+vSd+ZFr3Xk43ruM07a/2FEMMLeUpU4APb/VIntjp6WwzarM/007q0AxFofykoSJx1RW2WAkGxfCbLKCMYEdcqQFFVPI5Pf17TgY1t122HMhX4M9rHNg2xRzywuElHgeWBHqK06YjaHhehmFVxzR0HoXjYDOD7j5yios5Lig14x9uSUd8d97/xrtjy4Wsng4aBRIGq73fx6QliR5p9anFpyUTRRfUlCfVMZnMQ8iDyQ+BiwEMitNhD07BZZEhO/LTf+/SAg8VINOU6FT24M43JqGr4sHbSPhUMyK3cpDaN5QOEt1wQLzMUHN8IDG+I93OOkOdQQH+sdiUolhLVLEzMRe47xqhKrXtz33CuyxP53eLK rL9ToTmA +4it8rgkmRPVIlfbn8xIi5dO0FwSCS1Gk+O0kj9bhBFn7cSQO5lurOFfYAAI2xXjzCl0Wy15ZksTZyklXE1B0poewzCUADJR3MVI0McC3bzX/XmaD2WHLAfUL/6dX0paiozqWyif9w0u9G6TWRSTwgxkW53IPO7qKL5gQJAHPZksFrF3lppD6VEp0TzBWKbSDmvuE2MqlM4UqE3Myp9u9tlOYyX3US5A/lrqbyu+PTSkLy9PT+as7zLebp20gehF1TF4nQK2C3CNQxt1RtLTNLbS3hhJVSPj8BkuboJfbPIRvTnTdZ1DPc8B5DCVEL3MDd8Cu X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Perhaps if I make the subject line more bold and flashy I'll get more reaction? On Mon, Aug 14, 2023 at 03:41:00PM +0100, Matthew Wilcox (Oracle) wrote: > The special casing was originally added in pre-git history; reproducing > the commit log here: > > > commit a318a92567d77 > > Author: Andrew Morton > > Date: Sun Sep 21 01:42:22 2003 -0700 > > > > [PATCH] Speed up direct-io hugetlbpage handling > > > > This patch short-circuits all the direct-io page dirtying logic for > > higher-order pages. Without this, we pointlessly bounce BIOs up to > > keventd all the time. > > In the last twenty years, compound pages have become used for more than > just hugetlb. Rewrite these functions to operate on folios instead > of pages and remove the special case for hugetlbfs; I don't think > it's needed any more (and if it is, we can put it back in as a call > to folio_test_hugetlb()). > > This was found by inspection; as far as I can tell, this bug can lead > to pages used as the destination of a direct I/O read not being marked > as dirty. If those pages are then reclaimed by the MM without being > dirtied for some other reason, they won't be written out. Then when > they're faulted back in, they will not contain the data they should. > It'll take a pretty unusual setup to produce this problem with several > races all going the wrong way. > > This problem predates the folio work; it could for example have been > triggered by mmaping a THP in tmpfs and using that as the target of an > O_DIRECT read. > > Fixes: 800d8c63b2e98 ("shmem: add huge pages support") > Cc: stable@vger.kernel.org > Signed-off-by: Matthew Wilcox (Oracle) > --- > block/bio.c | 46 ++++++++++++++++++++++++---------------------- > 1 file changed, 24 insertions(+), 22 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index 8672179213b9..f46d8ec71fbd 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -1171,13 +1171,22 @@ EXPORT_SYMBOL(bio_add_folio); > > void __bio_release_pages(struct bio *bio, bool mark_dirty) > { > - struct bvec_iter_all iter_all; > - struct bio_vec *bvec; > + struct folio_iter fi; > + > + bio_for_each_folio_all(fi, bio) { > + struct page *page; > + size_t done = 0; > > - bio_for_each_segment_all(bvec, bio, iter_all) { > - if (mark_dirty && !PageCompound(bvec->bv_page)) > - set_page_dirty_lock(bvec->bv_page); > - bio_release_page(bio, bvec->bv_page); > + if (mark_dirty) { > + folio_lock(fi.folio); > + folio_mark_dirty(fi.folio); > + folio_unlock(fi.folio); > + } > + page = folio_page(fi.folio, fi.offset / PAGE_SIZE); > + do { > + bio_release_page(bio, page++); > + done += PAGE_SIZE; > + } while (done < fi.length); > } > } > EXPORT_SYMBOL_GPL(__bio_release_pages); > @@ -1455,18 +1464,12 @@ EXPORT_SYMBOL(bio_free_pages); > * bio_set_pages_dirty() and bio_check_pages_dirty() are support functions > * for performing direct-IO in BIOs. > * > - * The problem is that we cannot run set_page_dirty() from interrupt context > + * The problem is that we cannot run folio_mark_dirty() from interrupt context > * because the required locks are not interrupt-safe. So what we can do is to > * mark the pages dirty _before_ performing IO. And in interrupt context, > * check that the pages are still dirty. If so, fine. If not, redirty them > * in process context. > * > - * We special-case compound pages here: normally this means reads into hugetlb > - * pages. The logic in here doesn't really work right for compound pages > - * because the VM does not uniformly chase down the head page in all cases. > - * But dirtiness of compound pages is pretty meaningless anyway: the VM doesn't > - * handle them at all. So we skip compound pages here at an early stage. > - * > * Note that this code is very hard to test under normal circumstances because > * direct-io pins the pages with get_user_pages(). This makes > * is_page_cache_freeable return false, and the VM will not clean the pages. > @@ -1482,12 +1485,12 @@ EXPORT_SYMBOL(bio_free_pages); > */ > void bio_set_pages_dirty(struct bio *bio) > { > - struct bio_vec *bvec; > - struct bvec_iter_all iter_all; > + struct folio_iter fi; > > - bio_for_each_segment_all(bvec, bio, iter_all) { > - if (!PageCompound(bvec->bv_page)) > - set_page_dirty_lock(bvec->bv_page); > + bio_for_each_folio_all(fi, bio) { > + folio_lock(fi.folio); > + folio_mark_dirty(fi.folio); > + folio_unlock(fi.folio); > } > } > > @@ -1530,12 +1533,11 @@ static void bio_dirty_fn(struct work_struct *work) > > void bio_check_pages_dirty(struct bio *bio) > { > - struct bio_vec *bvec; > + struct folio_iter fi; > unsigned long flags; > - struct bvec_iter_all iter_all; > > - bio_for_each_segment_all(bvec, bio, iter_all) { > - if (!PageDirty(bvec->bv_page) && !PageCompound(bvec->bv_page)) > + bio_for_each_folio_all(fi, bio) { > + if (!folio_test_dirty(fi.folio)) > goto defer; > } > > -- > 2.40.1 >