From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CB26C00454 for ; Thu, 12 Dec 2019 22:15:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F046C2173E for ; Thu, 12 Dec 2019 22:15:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="wsU6P+eG" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F046C2173E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 915728E0005; Thu, 12 Dec 2019 17:15:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 89F538E0001; Thu, 12 Dec 2019 17:15:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 767A78E0005; Thu, 12 Dec 2019 17:15:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0211.hostedemail.com [216.40.44.211]) by kanga.kvack.org (Postfix) with ESMTP id 59F848E0001 for ; Thu, 12 Dec 2019 17:15:39 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id D4473180AD811 for ; Thu, 12 Dec 2019 22:15:38 +0000 (UTC) X-FDA: 76257897156.20.crowd32_888b71155da2c X-HE-Tag: crowd32_888b71155da2c X-Filterd-Recvd-Size: 6894 Received: from mail-pj1-f67.google.com (mail-pj1-f67.google.com [209.85.216.67]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Thu, 12 Dec 2019 22:15:38 +0000 (UTC) Received: by mail-pj1-f67.google.com with SMTP id n96so148437pjc.3 for ; Thu, 12 Dec 2019 14:15:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=Y5oT9uWtY0LeQYVivfgqNjCA9SpoRNFmIvY21McRdiQ=; b=wsU6P+eGgFAPh2UXy913mTpywcdlCws2X+H53i+IG5zcE96sWW7rNeLsv1WyKaZz5L qrASayNaoInK194CB5qqkkVvHfL5ccYL80KewoESrHyqY1P1Xc41Uf8mqjwr17sSFNJj kxQYg94vo9gyl/9M42tS2R77CXLFQpzW37MfbLumD0vWwh3ATPMEpq5Gek38um9PwFkp IYtrqbxhkgKVS6Sn6oo72BEoFUs6t5WkKgb4GuKSX6xZyoeQHld57dMsa5fy3wKZ++Ob 6RB+8/IXZi+euSOGMMOgHi6VrzlX2vrr1pATmjI7S0LrKA0dy/TSO/03MN9XRdmOopME yJJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Y5oT9uWtY0LeQYVivfgqNjCA9SpoRNFmIvY21McRdiQ=; b=IXvzRCwtExILr3EqvDeL3RZGt/Q9qQUYzd0zBgnnenQHnN++JRRyd3kdo4cnEgHBll UOfC3rrvYArFgt4zX/5zdF2cGvgYYnP9kNS4D77rK2b/MBRW2Xt5yoOj4hY+OXE8DFjH 1GDkAW4BBDs6SrWT5ZFP0t+Rn/d7xtAjXp7VpCboWE7YR97gMPLy++fMF/qyy4lxwziL EQxws+g7RkgRyr/kcjY27iGAZE/KBHyaq3GK61ApH1HOYl9j9uFKC9XdUgOBM0EyJShF 11NzFCRkvaexqGzSSmlGKGwQOYzsl4zUOanDc+pBGvTExIgL4PgcfsE455Bo8tewpFeE h/vg== X-Gm-Message-State: APjAAAWw8AYA7PqutXUeGvk9k6cye9bHeBEZi629D4NJwRrdb+jdWGoX 9zytittWzioYVYwfKe+dCQbNZA== X-Google-Smtp-Source: APXvYqz1otoGGvMZpzx90SoE/R6gEpiVj/8VFSGUpjMUZqD8uBZ74Uk+1vPN8q8O9/LDHyCDLgmZMg== X-Received: by 2002:a17:902:fe09:: with SMTP id g9mr11632167plj.162.1576188936703; Thu, 12 Dec 2019 14:15:36 -0800 (PST) Received: from [192.168.1.188] ([66.219.217.145]) by smtp.gmail.com with ESMTPSA id l66sm7878610pga.30.2019.12.12.14.15.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 12 Dec 2019 14:15:35 -0800 (PST) Subject: Re: [PATCHSET v3 0/5] Support for RWF_UNCACHED To: Martin Steigerwald Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com References: <20191211152943.2933-1-axboe@kernel.dk> <63049728.ylUViGSH3C@merkaba> <7bf74660-874e-6fd7-7a41-f908ccab694e@kernel.dk> <2091494.0NDvsO6yje@merkaba> From: Jens Axboe Message-ID: <05adab5c-1405-f4a3-b14f-3242fa5ce8fc@kernel.dk> Date: Thu, 12 Dec 2019 15:15:33 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: <2091494.0NDvsO6yje@merkaba> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12/12/19 2:45 PM, Martin Steigerwald wrote: > Jens Axboe - 12.12.19, 16:16:31 CET: >> On 12/12/19 3:44 AM, Martin Steigerwald wrote: >>> Jens Axboe - 11.12.19, 16:29:38 CET: >>>> Recently someone asked me how io_uring buffered IO compares to >>>> mmaped >>>> IO in terms of performance. So I ran some tests with buffered IO, >>>> and >>>> found the experience to be somewhat painful. The test case is >>>> pretty >>>> basic, random reads over a dataset that's 10x the size of RAM. >>>> Performance starts out fine, and then the page cache fills up and >>>> we >>>> hit a throughput cliff. CPU usage of the IO threads go up, and we >>>> have kswapd spending 100% of a core trying to keep up. Seeing >>>> that, I was reminded of the many complaints I here about buffered >>>> IO, and the fact that most of the folks complaining will >>>> ultimately bite the bullet and move to O_DIRECT to just get the >>>> kernel out of the way. >>>> >>>> But I don't think it needs to be like that. Switching to O_DIRECT >>>> isn't always easily doable. The buffers have different life times, >>>> size and alignment constraints, etc. On top of that, mixing >>>> buffered >>>> and O_DIRECT can be painful. >>>> >>>> Seems to me that we have an opportunity to provide something that >>>> sits somewhere in between buffered and O_DIRECT, and this is where >>>> RWF_UNCACHED enters the picture. If this flag is set on IO, we get >>>> the following behavior: >>>> >>>> - If the data is in cache, it remains in cache and the copy (in or >>>> out) is served to/from that. >>>> >>>> - If the data is NOT in cache, we add it while performing the IO. >>>> When the IO is done, we remove it again. >>>> >>>> With this, I can do 100% smooth buffered reads or writes without >>>> pushing the kernel to the state where kswapd is sweating bullets. >>>> In >>>> fact it doesn't even register. >>> >>> A question from a user or Linux Performance trainer perspective: >>> >>> How does this compare with posix_fadvise() with POSIX_FADV_DONTNEED >>> that for example the nocache=C2=B9 command is using? Excerpt from >>> manpage>=20 >>> posix_fadvice(2): >>> POSIX_FADV_DONTNEED >>> =20 >>> The specified data will not be accessed in the near >>> future. >>> =20 >>> POSIX_FADV_DONTNEED attempts to free cached pages as=E2= =80=90 >>> sociated with the specified region. This is useful, >>> for example, while streaming large files. A program >>> may periodically request the kernel to free cached >>> data that has already been used, so that more useful >>> cached pages are not discarded instead. >>> >>> [1] packaged in Debian as nocache or available >>> herehttps://github.com/ Feh/nocache >>> >>> In any way, would be nice to have some option in rsync=E2=80=A6 I sti= ll did >>> not change my backup script to call rsync via nocache. >> >> I don't know the nocache tool, but I'm guessing it just does the >> writes (or reads) and then uses FADV_DONTNEED to drop behind those >> pages? That's fine for slower use cases, it won't work very well for >> fast IO. The write side currently works pretty much like that >> internally, whereas the read side doesn't use the page cache at all. >=20 > Yes, it does that. And yeah I saw you changed the read site to bypass=20 > the cache entirely. >=20 > Also as I understand it this is for asynchronous using io uring=20 > primarily? Or preadv2/pwritev2, they also allow passing in RWF_* flags. --=20 Jens Axboe