From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72272C55179 for ; Tue, 27 Oct 2020 13:35:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A69C3218AC for ; Tue, 27 Oct 2020 13:35:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="QCPIzvRs" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A69C3218AC Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1FD316B0062; Tue, 27 Oct 2020 09:35:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1ADA16B006C; Tue, 27 Oct 2020 09:35:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C4266B006E; Tue, 27 Oct 2020 09:35:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0129.hostedemail.com [216.40.44.129]) by kanga.kvack.org (Postfix) with ESMTP id C5DA16B0062 for ; Tue, 27 Oct 2020 09:35:56 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 6C1C28249980 for ; Tue, 27 Oct 2020 13:35:56 +0000 (UTC) X-FDA: 77417803512.24.wine10_01157612727c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id 4CEFD1A4A0 for ; Tue, 27 Oct 2020 13:35:56 +0000 (UTC) X-HE-Tag: wine10_01157612727c X-Filterd-Recvd-Size: 8014 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Tue, 27 Oct 2020 13:35:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=DulvVhWl5RiRgKJt0heQV46kS6rAIR2TQU6S5c4RDPg=; b=QCPIzvRsSEgHjBJ/N120pVMOZq U3QhOrTRCD9UVnFIMWZzoJnOKKjUJjSAonyFwf933tsCL8qxVpzQZ1dHO0D49+v9y8lUxRjHiCDxo LLe3F6W8iVb6lP9ZPLpmIAGfl3h2bJKmfyrVs9cLHGogUqNOY1ee+/N7QLLo/jbdB3U8SfCY6pFlE swfS73m3LX0NOTgPxx3sMZQe3wPUDziZLdEHEro8gF8GvLqcKUfjb6zP7J8mfHY0Rbtnp2SWlDttu qscalcVYeSQM/8GeeqNXPMqWX505t4umG8wuHtqXg0D/Vw1+R96LvuMDw8CYdlu+4cXDau4a704FY n1xtFMGA==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kXP8h-0006j7-U1; Tue, 27 Oct 2020 13:35:52 +0000 Date: Tue, 27 Oct 2020 13:35:51 +0000 From: Matthew Wilcox To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, axboe@kernel.dk, kent.overstreet@gmail.com Subject: Re: + fs-break-generic_file_buffered_read-up-into-multiple-functions.patch added to -mm tree Message-ID: <20201027133551.GW20115@casper.infradead.org> References: <20201025220817.XxXVE%akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201025220817.XxXVE%akpm@linux-foundation.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Oct 25, 2020 at 03:08:17PM -0700, akpm@linux-foundation.org wrote: > The patch titled > Subject: mm/filemap/c: freak generic_file_buffered_read up into multiple functions > has been added to the -mm tree. Its filename is > fs-break-generic_file_buffered_read-up-into-multiple-functions.patch Can we back this out? It really makes the THP patchset unhappy. I think we can do something like this afterwards, but doing it this way round is ridiculously hard. Long explanation ... If we readahead a THP and hit a read error, the iomap code currently tries to split the page to handle the error on a more fine-grained boundary. We can't split a page while someone else holds a reference to it, and generic/273 has 500 threads livelocking on the same page, all looking it up, trying to lock it, sleeping, eventually getting the lock, discovering somebody else has the lock, and retrying. I changed from using wait_on_page_locked_killable() to put_and_wait_on_page_locked(), so the threads all sleep without holding a reference to the page. It worked great, and was a nice simplification mm/filemap.c | 69 ++++++++++++++++++------------------------------------------- 1 file changed, 20 insertions(+), 49 deletions(-) With this reorganisation, not only do we try to take the page lock a long way down the chain, we've also got a batch of pages that we're holding a refcount on, so we need to call put_page() on the rest of the batch and then call put_and_wait_on_page_locked() on the one page that we need to wait for. I'm sure it's possible to make this work, but it's going to be completely unlike what I've been testing. I basically have to start again on doing these changes. commit 9fd2f26c6d00c7f0c6c730ec10a421780fedaa79 Author: Matthew Wilcox (Oracle) Date: Mon Oct 12 16:36:11 2020 -0400 mm/filemap: Don't hold a page reference while waiting for unlock In the upcoming THP patch series, if we find a !Uptodate page, it is because of a read error. In this case, we want to split the THP into smaller pages so we can handle the error in as granular a fashion as possible. But xfstests generic/273 defeats this strategy by having 500 threads all sleeping on the same page, each waiting for their turn to split the page. None of them will ever succeed because splitting a page requires that you hold the only reference to it. To fix this, use put_and_wait_on_page_locked() to sleep without holding a reference to the page. Each of the readers will then go back and retry the page lookup after the page is unlocked. Since we now get the page lock a little earlier in generic_file_buffered_read(), we can eliminate a number of duplicate checks. The original intent (commit ebded02788b5 ("avoid unnecessary calls to lock_page when waiting for IO to complete during a read") behind getting the page lock later was to avoid re-locking the page after it has been brought uptodate by another thread. We will still avoid that because we go through the normal lookup path again after the winning thread has brought the page uptodate. Using the "fail 10% of readaheads" patch, which will induce the !Uptodate case, I can detect no significant difference by applying this patch with the generic/273 test. Signed-off-by: Matthew Wilcox (Oracle) diff --git a/mm/filemap.c b/mm/filemap.c index e496833d3ede..a6a2132087cd 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1358,14 +1358,6 @@ static int __wait_on_page_locked_async(struct page *page, return ret; } -static int wait_on_page_locked_async(struct page *page, - struct wait_page_queue *wait) -{ - if (!PageLocked(page)) - return 0; - return __wait_on_page_locked_async(compound_head(page), wait, false); -} - /** * put_and_wait_on_page_locked - Drop a reference and wait for it to be unlocked * @page: The page to wait for. @@ -2279,34 +2271,37 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb, put_page(page); goto out; } - error = wait_on_page_locked_async(page, - iocb->ki_waitq); + error = lock_page_async(page, iocb->ki_waitq); + if (error) + goto readpage_error; + } else if (iocb->ki_flags & IOCB_NOWAIT) { + put_page(page); + goto would_block; } else { - if (iocb->ki_flags & IOCB_NOWAIT) { - put_page(page); - goto would_block; + if (!trylock_page(page)) { + put_and_wait_on_page_locked(page, + TASK_KILLABLE); + continue; } - error = wait_on_page_locked_killable(page); } - if (unlikely(error)) - goto readpage_error; if (PageUptodate(page)) - goto page_ok; + goto uptodate; + if (!page->mapping) { + unlock_page(page); + put_page(page); + continue; + } if (inode->i_blkbits == PAGE_SHIFT || !mapping->a_ops->is_partially_uptodate) - goto page_not_up_to_date; + goto readpage; /* pipes can't handle partially uptodate pages */ if (unlikely(iov_iter_is_pipe(iter))) - goto page_not_up_to_date; - if (!trylock_page(page)) - goto page_not_up_to_date; - /* Did it get truncated before we got the lock? */ - if (!page->mapping) - goto page_not_up_to_date_locked; + goto readpage; if (!mapping->a_ops->is_partially_uptodate(page, offset, iter->count)) - goto page_not_up_to_date_locked; + goto readpage; +uptodate: unlock_page(page); } page_ok: @@ -2372,30 +2367,6 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb, goto out; } continue; - -page_not_up_to_date: - /* Get exclusive access to the page ... */ - if (iocb->ki_flags & IOCB_WAITQ) - error = lock_page_async(page, iocb->ki_waitq); - else - error = lock_page_killable(page); - if (unlikely(error)) - goto readpage_error; - -page_not_up_to_date_locked: - /* Did it get truncated before we got the lock? */ - if (!page->mapping) { - unlock_page(page); - put_page(page); - continue; - } - - /* Did somebody else fill it already? */ - if (PageUptodate(page)) { - unlock_page(page); - goto page_ok; - } - readpage: if (iocb->ki_flags & (IOCB_NOIO | IOCB_NOWAIT)) { unlock_page(page);