From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=aYM/=2C=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8AC8BC43603
	for <linux-mm@archiver.kernel.org>; Thu, 12 Dec 2019 10:45:15 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 27ECA206DA
	for <linux-mm@archiver.kernel.org>; Thu, 12 Dec 2019 10:45:15 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 27ECA206DA
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lichtvoll.de
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id C55BE6B3705; Thu, 12 Dec 2019 05:45:12 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id BDF086B3706; Thu, 12 Dec 2019 05:45:12 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id ACD886B3707; Thu, 12 Dec 2019 05:45:12 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0145.hostedemail.com [216.40.44.145])
	by kanga.kvack.org (Postfix) with ESMTP id 90CAA6B3705
	for <linux-mm@kvack.org>; Thu, 12 Dec 2019 05:45:12 -0500 (EST)
Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with SMTP id 42D46249C
	for <linux-mm@kvack.org>; Thu, 12 Dec 2019 10:45:12 +0000 (UTC)
X-FDA: 76256157264.17.air85_4a723b95a1a4c
X-HE-Tag: air85_4a723b95a1a4c
X-Filterd-Recvd-Size: 5702
Received: from mail.lichtvoll.de (luna.lichtvoll.de [194.150.191.11])
	by imf05.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Thu, 12 Dec 2019 10:45:11 +0000 (UTC)
Received: from 127.0.0.1 (localhost [127.0.0.1])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(No client certificate requested)
	by mail.lichtvoll.de (Postfix) with ESMTPSA id 1A8D699848;
	Thu, 12 Dec 2019 11:45:07 +0100 (CET)
From: Martin Steigerwald <martin@lichtvoll.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com
Subject: Re: [PATCHSET v3 0/5] Support for RWF_UNCACHED
Date: Thu, 12 Dec 2019 11:44:59 +0100
Message-ID: <63049728.ylUViGSH3C@merkaba>
In-Reply-To: <20191211152943.2933-1-axboe@kernel.dk>
References: <20191211152943.2933-1-axboe@kernel.dk>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="UTF-8"
Authentication-Results: mail.lichtvoll.de;
	auth=pass smtp.auth=martin smtp.mailfrom=martin@lichtvoll.de
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Hi Jens.

Jens Axboe - 11.12.19, 16:29:38 CET:
> Recently someone asked me how io_uring buffered IO compares to mmaped
> IO in terms of performance. So I ran some tests with buffered IO, and
> found the experience to be somewhat painful. The test case is pretty
> basic, random reads over a dataset that's 10x the size of RAM.
> Performance starts out fine, and then the page cache fills up and we
> hit a throughput cliff. CPU usage of the IO threads go up, and we have
> kswapd spending 100% of a core trying to keep up. Seeing that, I was
> reminded of the many complaints I here about buffered IO, and the
> fact that most of the folks complaining will ultimately bite the
> bullet and move to O_DIRECT to just get the kernel out of the way.
>=20
> But I don't think it needs to be like that. Switching to O_DIRECT
> isn't always easily doable. The buffers have different life times,
> size and alignment constraints, etc. On top of that, mixing buffered
> and O_DIRECT can be painful.
>=20
> Seems to me that we have an opportunity to provide something that sits
> somewhere in between buffered and O_DIRECT, and this is where
> RWF_UNCACHED enters the picture. If this flag is set on IO, we get
> the following behavior:
>=20
> - If the data is in cache, it remains in cache and the copy (in or
> out) is served to/from that.
>=20
> - If the data is NOT in cache, we add it while performing the IO. When
> the IO is done, we remove it again.
>=20
> With this, I can do 100% smooth buffered reads or writes without
> pushing the kernel to the state where kswapd is sweating bullets. In
> fact it doesn't even register.

A question from a user or Linux Performance trainer perspective:

How does this compare with posix_fadvise() with POSIX_FADV_DONTNEED that=20
for example the nocache=C2=B9 command is using? Excerpt from manpage=20
posix_fadvice(2):

       POSIX_FADV_DONTNEED
              The specified data will not be accessed  in  the  near
              future.

              POSIX_FADV_DONTNEED  attempts to free cached pages as=E2=80=90
              sociated with the specified region.  This  is  useful,
              for  example,  while streaming large files.  A program
              may periodically request the  kernel  to  free  cached
              data  that  has already been used, so that more useful
              cached pages are not discarded instead.

[1] packaged in Debian as nocache or available herehttps://github.com/
=46eh/nocache

In any way, would be nice to have some option in rsync=E2=80=A6 I still did=
 not=20
change my backup script to call rsync via nocache.

Thanks,
Martin

> Comments appreciated! This should work on any standard file system,
> using either the generic helpers or iomap. I have tested ext4 and xfs
> for the right read/write behavior, but no further validation has been
> done yet. Patches are against current git, and can also be found here:
>=20
> https://git.kernel.dk/cgit/linux-block/log/?h=3Dbuffered-uncached
>=20
>  fs/ceph/file.c          |  2 +-
>  fs/dax.c                |  2 +-
>  fs/ext4/file.c          |  2 +-
>  fs/iomap/apply.c        | 26 ++++++++++-
>  fs/iomap/buffered-io.c  | 54 ++++++++++++++++-------
>  fs/iomap/direct-io.c    |  3 +-
>  fs/iomap/fiemap.c       |  5 ++-
>  fs/iomap/seek.c         |  6 ++-
>  fs/iomap/swapfile.c     |  2 +-
>  fs/nfs/file.c           |  2 +-
>  include/linux/fs.h      |  7 ++-
>  include/linux/iomap.h   | 10 ++++-
>  include/uapi/linux/fs.h |  5 ++-
>  mm/filemap.c            | 95
> ++++++++++++++++++++++++++++++++++++----- 14 files changed, 181
> insertions(+), 40 deletions(-)
>=20
> Changes since v2:
> - Rework the write side according to Chinners suggestions. Much
> cleaner this way. It does mean that we invalidate the full write
> region if just ONE page (or more) had to be created, where before it
> was more granular. I don't think that's a concern, and on the plus
> side, we now no longer have to chunk invalidations into 15/16 pages
> at the time.
> - Cleanups
>=20
> Changes since v1:
> - Switch to pagevecs for write_drop_cached_pages()
> - Use page_offset() instead of manual shift
> - Ensure we hold a reference on the page between calling ->write_end()
> and checking the mapping on the locked page
> - Fix XFS multi-page streamed writes, we'd drop the UNCACHED flag
> after the first page


=2D-=20
Martin