From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CD84C00454 for ; Thu, 12 Dec 2019 21:45:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D3D342464B for ; Thu, 12 Dec 2019 21:45:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D3D342464B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lichtvoll.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6937D8E0005; Thu, 12 Dec 2019 16:45:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 643ED8E0001; Thu, 12 Dec 2019 16:45:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 559A38E0005; Thu, 12 Dec 2019 16:45:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0206.hostedemail.com [216.40.44.206]) by kanga.kvack.org (Postfix) with ESMTP id 4020C8E0001 for ; Thu, 12 Dec 2019 16:45:51 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 0117B2460 for ; Thu, 12 Dec 2019 21:45:51 +0000 (UTC) X-FDA: 76257822060.12.coast57_15e709a38ef05 X-HE-Tag: coast57_15e709a38ef05 X-Filterd-Recvd-Size: 4878 Received: from mail.lichtvoll.de (luna.lichtvoll.de [194.150.191.11]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Thu, 12 Dec 2019 21:45:50 +0000 (UTC) Received: from 127.0.0.1 (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.lichtvoll.de (Postfix) with ESMTPSA id 66EBA99BE3; Thu, 12 Dec 2019 22:45:48 +0100 (CET) From: Martin Steigerwald To: Jens Axboe Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com Subject: Re: [PATCHSET v3 0/5] Support for RWF_UNCACHED Date: Thu, 12 Dec 2019 22:45:47 +0100 Message-ID: <2091494.0NDvsO6yje@merkaba> In-Reply-To: <7bf74660-874e-6fd7-7a41-f908ccab694e@kernel.dk> References: <20191211152943.2933-1-axboe@kernel.dk> <63049728.ylUViGSH3C@merkaba> <7bf74660-874e-6fd7-7a41-f908ccab694e@kernel.dk> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" Authentication-Results: mail.lichtvoll.de; auth=pass smtp.auth=martin smtp.mailfrom=martin@lichtvoll.de X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Jens Axboe - 12.12.19, 16:16:31 CET: > On 12/12/19 3:44 AM, Martin Steigerwald wrote: > > Jens Axboe - 11.12.19, 16:29:38 CET: > >> Recently someone asked me how io_uring buffered IO compares to > >> mmaped > >> IO in terms of performance. So I ran some tests with buffered IO, > >> and > >> found the experience to be somewhat painful. The test case is > >> pretty > >> basic, random reads over a dataset that's 10x the size of RAM. > >> Performance starts out fine, and then the page cache fills up and > >> we > >> hit a throughput cliff. CPU usage of the IO threads go up, and we > >> have kswapd spending 100% of a core trying to keep up. Seeing > >> that, I was reminded of the many complaints I here about buffered > >> IO, and the fact that most of the folks complaining will > >> ultimately bite the bullet and move to O_DIRECT to just get the > >> kernel out of the way. > >>=20 > >> But I don't think it needs to be like that. Switching to O_DIRECT > >> isn't always easily doable. The buffers have different life times, > >> size and alignment constraints, etc. On top of that, mixing > >> buffered > >> and O_DIRECT can be painful. > >>=20 > >> Seems to me that we have an opportunity to provide something that > >> sits somewhere in between buffered and O_DIRECT, and this is where > >> RWF_UNCACHED enters the picture. If this flag is set on IO, we get > >> the following behavior: > >>=20 > >> - If the data is in cache, it remains in cache and the copy (in or > >> out) is served to/from that. > >>=20 > >> - If the data is NOT in cache, we add it while performing the IO. > >> When the IO is done, we remove it again. > >>=20 > >> With this, I can do 100% smooth buffered reads or writes without > >> pushing the kernel to the state where kswapd is sweating bullets. > >> In > >> fact it doesn't even register. > >=20 > > A question from a user or Linux Performance trainer perspective: > >=20 > > How does this compare with posix_fadvise() with POSIX_FADV_DONTNEED > > that for example the nocache=C2=B9 command is using? Excerpt from > > manpage>=20 > > posix_fadvice(2): > > POSIX_FADV_DONTNEED > > =20 > > The specified data will not be accessed in the near > > future. > > =20 > > POSIX_FADV_DONTNEED attempts to free cached pages as=E2= =80=90 > > sociated with the specified region. This is useful, > > for example, while streaming large files. A program > > may periodically request the kernel to free cached > > data that has already been used, so that more useful > > cached pages are not discarded instead. > >=20 > > [1] packaged in Debian as nocache or available > > herehttps://github.com/ Feh/nocache > >=20 > > In any way, would be nice to have some option in rsync=E2=80=A6 I still= did > > not change my backup script to call rsync via nocache. >=20 > I don't know the nocache tool, but I'm guessing it just does the > writes (or reads) and then uses FADV_DONTNEED to drop behind those > pages? That's fine for slower use cases, it won't work very well for > fast IO. The write side currently works pretty much like that > internally, whereas the read side doesn't use the page cache at all. Yes, it does that. And yeah I saw you changed the read site to bypass=20 the cache entirely. Also as I understand it this is for asynchronous using io uring=20 primarily? =2D-=20 Martin