From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.3 required=3.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EECFC433E1 for ; Wed, 10 Jun 2020 01:36:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B71792072E for ; Wed, 10 Jun 2020 01:36:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Kjv8bDsc" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B71792072E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 51D9A6B0005; Tue, 9 Jun 2020 21:36:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4AA856B0006; Tue, 9 Jun 2020 21:36:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 285AC6B0007; Tue, 9 Jun 2020 21:36:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0061.hostedemail.com [216.40.44.61]) by kanga.kvack.org (Postfix) with ESMTP id 10C8C6B0005 for ; Tue, 9 Jun 2020 21:36:51 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C4F841EFF for ; Wed, 10 Jun 2020 01:36:50 +0000 (UTC) X-FDA: 76911588180.09.ink52_501608a26dc7 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id A49CE18022BED for ; Wed, 10 Jun 2020 01:36:50 +0000 (UTC) X-HE-Tag: ink52_501608a26dc7 X-Filterd-Recvd-Size: 13069 Received: from mail-qk1-f194.google.com (mail-qk1-f194.google.com [209.85.222.194]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Wed, 10 Jun 2020 01:36:50 +0000 (UTC) Received: by mail-qk1-f194.google.com with SMTP id c185so594866qke.7 for ; Tue, 09 Jun 2020 18:36:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=CGyQhDfsfKTE8zMF+xX5eLjOgjMMyuE6tyL+hvFU80E=; b=Kjv8bDscOs9b2dudADin8KDhUPxg7+cJl59CPvDztOe6irGvFt+1E3uNK2fIAlrcMI ZUmkZK8e1HSO9H641IUrSzlY8IJAeiSbBGjkJxgeGsGfvS3Kbb5U2+YczVZpOxLJ2tM6 I6goEkS0zhWkiuspD5zbFIJsZjOxAB3Z+5vv36dIECjHJiYniI60Z9vm1m6Si7EBwK2O y4v8G26NPOvaReTxT1j4z+1OwHxyMh/01VL8oWEx1fB7kl7Ht6kntzOwTo6ELBGSdO2X OjKGPgY3/IBYgYgNurn6OSD91ndeIKqkfMHqW2pP8CL3Zmeqvx9NcC9FkBuu+lIRV6uq 2LLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CGyQhDfsfKTE8zMF+xX5eLjOgjMMyuE6tyL+hvFU80E=; b=V8SFrdWTCJNsTP8LpwPvQC1wC4Dany40QMF/dsVhL+dw1WY64+Yb0sMaeuTWwwilWU vOZpc1OL7Iee8hmd42yIAzLKLeLiSfQYjrXGcn5gFioiLlAiSjcu7ouq3HY/h30eBe2e Er6NrnEpntSK/qmqv5F8qYj3/DiyJyuqcM7OqUxzbqxrXfW+An8y34Vz03VK8SZ94qpI lKidQ1QfeBno6pDpN0Gl/+eSirnVCMfoEbdtwuXxd+k7SLhaPGeAwSicQGC+ZnpIwDS7 V4o8Gx23qSr//+ykllrLzDUv40OfIxG2DPlNg6IlUPaSPB3KfkRoErxEsE8HNTs33bZ5 9RYA== X-Gm-Message-State: AOAM533yz/9Y73RJpkCN/I0gSileDL3bi25ra05wWJLS6LgpX5i4OPtS CDOTHF7KHSurWNfGKoq5dA== X-Google-Smtp-Source: ABdhPJzUpzuiXcrrhf09Wdmpl0wXmUDx8LWdOS0lzy0GXqwt/P4aYX6Lfwrtvli0MEp+NgjnU8ArQg== X-Received: by 2002:ae9:f80e:: with SMTP id x14mr775553qkh.314.1591753009481; Tue, 09 Jun 2020 18:36:49 -0700 (PDT) Received: from moria.home.lan ([2601:19b:c500:a1:7285:c2ff:fed5:c918]) by smtp.gmail.com with ESMTPSA id d23sm11513406qtn.38.2020.06.09.18.36.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Jun 2020 18:36:48 -0700 (PDT) From: Kent Overstreet To: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, viro@zeniv.linux.org.uk, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Cc: Kent Overstreet Subject: [PATCH v2 2/2] fs: generic_file_buffered_read() now uses find_get_pages_contig Date: Tue, 9 Jun 2020 21:36:42 -0400 Message-Id: <20200610013642.4171512-2-kent.overstreet@gmail.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200610001036.3904844-1-kent.overstreet@gmail.com> References: <20200610001036.3904844-1-kent.overstreet@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: A49CE18022BED X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Convert generic_file_buffered_read() to get pages to read from in batches, and then copy data to userspace from many pages at once - in particular, we now don't touch any cachelines that might be contended while we're in the loop to copy data to userspace. This is is a performance improvement on workloads that do buffered reads with large blocksizes, and a very large performance improvement if that file is also being accessed concurrently by different threads. On smaller reads (512 bytes), there's a very small performance improvement (1%, within the margin of error). Signed-off-by: Kent Overstreet --- mm/filemap.c | 276 +++++++++++++++++++++++++++++---------------------- 1 file changed, 155 insertions(+), 121 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 206d51a1c9..4fb0e5a238 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2051,67 +2051,6 @@ static void shrink_readahead_size_eio(struct file_= ra_state *ra) ra->ra_pages /=3D 4; } =20 -static int generic_file_buffered_read_page_ok(struct kiocb *iocb, - struct iov_iter *iter, - struct page *page) -{ - struct address_space *mapping =3D iocb->ki_filp->f_mapping; - struct inode *inode =3D mapping->host; - struct file_ra_state *ra =3D &iocb->ki_filp->f_ra; - unsigned offset =3D iocb->ki_pos & ~PAGE_MASK; - unsigned bytes, copied; - loff_t isize, end_offset; - - BUG_ON(iocb->ki_pos >> PAGE_SHIFT !=3D page->index); - - /* - * i_size must be checked after we know the page is Uptodate. - * - * Checking i_size after the check allows us to calculate - * the correct value for "bytes", which means the zero-filled - * part of the page is not copied back to userspace (unless - * another truncate extends the file - this is desired though). - */ - - isize =3D i_size_read(inode); - if (unlikely(iocb->ki_pos >=3D isize)) - return 1; - - end_offset =3D min_t(loff_t, isize, iocb->ki_pos + iter->count); - - bytes =3D min_t(loff_t, end_offset - iocb->ki_pos, PAGE_SIZE - offset); - - /* If users can be writing to this page using arbitrary - * virtual addresses, take care about potential aliasing - * before reading the page on the kernel side. - */ - if (mapping_writably_mapped(mapping)) - flush_dcache_page(page); - - /* - * Ok, we have the page, and it's up-to-date, so - * now we can copy it to user space... - */ - - copied =3D copy_page_to_iter(page, offset, bytes, iter); - - iocb->ki_pos +=3D copied; - - /* - * When a sequential read accesses a page several times, - * only mark it as accessed the first time. - */ - if (iocb->ki_pos >> PAGE_SHIFT !=3D ra->prev_pos >> PAGE_SHIFT) - mark_page_accessed(page); - - ra->prev_pos =3D iocb->ki_pos; - - if (copied < bytes) - return -EFAULT; - - return !iov_iter_count(iter) || iocb->ki_pos =3D=3D isize; -} - static struct page * generic_file_buffered_read_readpage(struct file *filp, struct address_space *mapping, @@ -2255,6 +2194,79 @@ generic_file_buffered_read_no_cached_page(struct k= iocb *iocb, return generic_file_buffered_read_readpage(filp, mapping, page); } =20 +static int generic_file_buffered_read_get_pages(struct kiocb *iocb, + struct iov_iter *iter, + struct page **pages, + unsigned nr) +{ + struct file *filp =3D iocb->ki_filp; + struct address_space *mapping =3D filp->f_mapping; + struct file_ra_state *ra =3D &filp->f_ra; + pgoff_t index =3D iocb->ki_pos >> PAGE_SHIFT; + pgoff_t last_index =3D (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PA= GE_SHIFT; + int i, j, ret, err =3D 0; + + nr =3D min_t(unsigned long, last_index - index, nr); +find_page: + if (fatal_signal_pending(current)) + return -EINTR; + + ret =3D find_get_pages_contig(mapping, index, nr, pages); + if (ret) + goto got_pages; + + if (iocb->ki_flags & IOCB_NOWAIT) + return -EAGAIN; + + page_cache_sync_readahead(mapping, ra, filp, index, last_index - index)= ; + + ret =3D find_get_pages_contig(mapping, index, nr, pages); + if (ret) + goto got_pages; + + pages[0] =3D generic_file_buffered_read_no_cached_page(iocb, iter); + err =3D PTR_ERR_OR_ZERO(pages[0]); + ret =3D !IS_ERR_OR_NULL(pages[0]); +got_pages: + for (i =3D 0; i < ret; i++) { + struct page *page =3D pages[i]; + pgoff_t pg_index =3D index +i; + loff_t pg_pos =3D max(iocb->ki_pos, + (loff_t) pg_index << PAGE_SHIFT); + loff_t pg_count =3D iocb->ki_pos + iter->count - pg_pos; + + if (PageReadahead(page)) + page_cache_async_readahead(mapping, ra, filp, page, + pg_index, last_index - pg_index); + + if (!PageUptodate(page)) { + if (iocb->ki_flags & IOCB_NOWAIT) { + for (j =3D i; j < ret; j++) + put_page(pages[j]); + ret =3D i; + err =3D -EAGAIN; + break; + } + + page =3D generic_file_buffered_read_pagenotuptodate(filp, + iter, page, pg_pos, pg_count); + if (IS_ERR_OR_NULL(page)) { + for (j =3D i + 1; j < ret; j++) + put_page(pages[j]); + ret =3D i; + err =3D PTR_ERR_OR_ZERO(page); + break; + } + } + } + + if (likely(ret)) + return ret; + if (err) + return err; + goto find_page; +} + /** * generic_file_buffered_read - generic file read routine * @iocb: the iocb to read @@ -2275,86 +2287,108 @@ static ssize_t generic_file_buffered_read(struct= kiocb *iocb, struct iov_iter *iter, ssize_t written) { struct file *filp =3D iocb->ki_filp; + struct file_ra_state *ra =3D &filp->f_ra; struct address_space *mapping =3D filp->f_mapping; struct inode *inode =3D mapping->host; - struct file_ra_state *ra =3D &filp->f_ra; size_t orig_count =3D iov_iter_count(iter); - pgoff_t last_index; - int error =3D 0; + struct page *page_array[8], **pages; + unsigned nr_pages =3D ARRAY_SIZE(page_array); + unsigned read_nr_pages =3D ((iocb->ki_pos + iter->count + PAGE_SIZE-1) = >> PAGE_SHIFT) - + (iocb->ki_pos >> PAGE_SHIFT); + int i, pg_nr, error =3D 0; + bool writably_mapped; + loff_t isize, end_offset; =20 if (unlikely(iocb->ki_pos >=3D inode->i_sb->s_maxbytes)) return 0; iov_iter_truncate(iter, inode->i_sb->s_maxbytes); =20 - last_index =3D (iocb->ki_pos + iter->count + PAGE_SIZE-1) >> PAGE_SHIFT= ; - - for (;;) { - pgoff_t index =3D iocb->ki_pos >> PAGE_SHIFT; - struct page *page; + if (read_nr_pages > nr_pages && + (pages =3D kmalloc_array(read_nr_pages, sizeof(void *), GFP_KERNEL)= )) + nr_pages =3D read_nr_pages; + else + pages =3D page_array; =20 + do { cond_resched(); -find_page: - if (fatal_signal_pending(current)) { - error =3D -EINTR; - goto out; - } =20 - page =3D find_get_page(mapping, index); - if (!page) { - if (iocb->ki_flags & IOCB_NOWAIT) - goto would_block; - page_cache_sync_readahead(mapping, - ra, filp, - index, last_index - index); - page =3D find_get_page(mapping, index); - if (unlikely(page =3D=3D NULL)) { - page =3D generic_file_buffered_read_no_cached_page(iocb, iter); - if (!page) - goto find_page; - if (IS_ERR(page)) { - error =3D PTR_ERR(page); - goto out; - } - } - } - if (PageReadahead(page)) { - page_cache_async_readahead(mapping, - ra, filp, page, - index, last_index - index); + i =3D 0; + pg_nr =3D generic_file_buffered_read_get_pages(iocb, iter, + pages, nr_pages); + if (pg_nr < 0) { + error =3D pg_nr; + break; } - if (!PageUptodate(page)) { - if (iocb->ki_flags & IOCB_NOWAIT) { - put_page(page); - error =3D -EAGAIN; - goto out; - } =20 - page =3D generic_file_buffered_read_pagenotuptodate(filp, - iter, page, iocb->ki_pos, iter->count); - if (!page) - goto find_page; - if (IS_ERR(page)) { - error =3D PTR_ERR(page); - goto out; - } - } + /* + * i_size must be checked after we know the pages are Uptodate. + * + * Checking i_size after the check allows us to calculate + * the correct value for "nr", which means the zero-filled + * part of the page is not copied back to userspace (unless + * another truncate extends the file - this is desired though). + */ + isize =3D i_size_read(inode); + if (unlikely(iocb->ki_pos >=3D isize)) + goto put_pages; =20 - error =3D generic_file_buffered_read_page_ok(iocb, iter, page); - put_page(page); + end_offset =3D min_t(loff_t, isize, iocb->ki_pos + iter->count); =20 - if (error) { - if (error > 0) - error =3D 0; - goto out; + while ((iocb->ki_pos >> PAGE_SHIFT) + pg_nr > + (end_offset + PAGE_SIZE - 1) >> PAGE_SHIFT) + put_page(pages[--pg_nr]); + + /* + * Once we start copying data, we don't want to be touching any + * cachelines that might be contended: + */ + writably_mapped =3D mapping_writably_mapped(mapping); + + /* + * When a sequential read accesses a page several times, only + * mark it as accessed the first time. + */ + if (iocb->ki_pos >> PAGE_SHIFT !=3D + ra->prev_pos >> PAGE_SHIFT) + mark_page_accessed(pages[0]); + for (i =3D 1; i < pg_nr; i++) + mark_page_accessed(pages[i]); + + for (i =3D 0; i < pg_nr; i++) { + unsigned offset =3D iocb->ki_pos & ~PAGE_MASK; + unsigned bytes =3D min_t(loff_t, end_offset - iocb->ki_pos, + PAGE_SIZE - offset); + unsigned copied; + + /* + * If users can be writing to this page using arbitrary + * virtual addresses, take care about potential aliasing + * before reading the page on the kernel side. + */ + if (writably_mapped) + flush_dcache_page(pages[i]); + + copied =3D copy_page_to_iter(pages[i], offset, bytes, iter); + + iocb->ki_pos +=3D copied; + ra->prev_pos =3D iocb->ki_pos; + + if (copied < bytes) { + error =3D -EFAULT; + break; + } } - } +put_pages: + for (i =3D 0; i < pg_nr; i++) + put_page(pages[i]); + } while (iov_iter_count(iter) && iocb->ki_pos < isize && !error); =20 -would_block: - error =3D -EAGAIN; -out: file_accessed(filp); written +=3D orig_count - iov_iter_count(iter); =20 + if (pages !=3D page_array) + kfree(pages); + return written ? written : error; } =20 --=20 2.27.0