From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1445C43603 for ; Thu, 12 Dec 2019 02:03:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 786A221655 for ; Thu, 12 Dec 2019 02:03:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="rn/OjmCU" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 786A221655 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1301F6B3516; Wed, 11 Dec 2019 21:03:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E0816B3517; Wed, 11 Dec 2019 21:03:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F11406B3518; Wed, 11 Dec 2019 21:03:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0002.hostedemail.com [216.40.44.2]) by kanga.kvack.org (Postfix) with ESMTP id D928D6B3516 for ; Wed, 11 Dec 2019 21:03:45 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 7EFFE2C2E for ; Thu, 12 Dec 2019 02:03:45 +0000 (UTC) X-FDA: 76254843210.23.thumb05_20b790cfe3658 X-HE-Tag: thumb05_20b790cfe3658 X-Filterd-Recvd-Size: 6523 Received: from mail-pj1-f65.google.com (mail-pj1-f65.google.com [209.85.216.65]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Thu, 12 Dec 2019 02:03:44 +0000 (UTC) Received: by mail-pj1-f65.google.com with SMTP id ca19so333908pjb.8 for ; Wed, 11 Dec 2019 18:03:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:from:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=Vdyn0MCNO1ty3ajxFsVqI/gZvnG7thnA7MHE+lTchGE=; b=rn/OjmCU+RdA5yV8DS2583/RK9D5MsA1C4cfbTjWvm+p7nU6R2nJ/gjga1u6cPD5tL pa0ORF/m1DnV36lu3OkmUxRv8DtzBktbxVBz37Rvq57cfMO/km2+luWjfo/3DDlN3Dax 0B8F+oGERkPsTUNLzrXbrmcs7JOFniPAxwAig4LIRPtMgwKOHo1un9j+jQqfm30fKlaa GmBgDeuSC5i0R0LBWOurEH2y2j9Xr5HWtznlPcLKFfY73YIbF7kmu2HA7x+clJ6TVpnP sMI0s03+YMxPCDYuCz273I6b0rlFi7dZ8yxcmbAMXjJ0KScBK2+Le3EKDe9z2+sEcJAt vgVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Vdyn0MCNO1ty3ajxFsVqI/gZvnG7thnA7MHE+lTchGE=; b=ECT6/mrE4tE3CebZZN3Is1fqZeLnY+2svfVSgTTxzSZswJ7tyJIt3oJhAoB6ls+sdo AOK9A8hP0Kcdff2PkhvmYzh8OtygLibp3kg34fd3IE/zJHFxWv7Sk4APiWXGn5ocXfaz uciq0z3mMRzdOGBUDwqTuS3VJTIwIk/rP1hY2HTOi0JW60CxJFkBUdU5lpFRRlfIaaGj xx5M1rwCWsOlD3EcyNS4nrSZ3K1vLV/mI+Uyl7HMk4XzsQnTiLIibfaqk1k8OO1yJBGI qYtNU/NASl4YRWXqNEvsb22AtWeXzqEcxcU9FQbIhyC0k9KTRzkopH8ja0o5hh7ncRsb xYJw== X-Gm-Message-State: APjAAAVmytNbXsL9oTimMIsoJJnIOL5rFbCMIt0VLnCQfbjv05D3g5w6 QTJfxwzCufTHtXiyUUOYE22WvQ== X-Google-Smtp-Source: APXvYqy+15um7yUOphomWYpUE0f9+qebFM2NA+q5sq8Js5dZUc2bAdagsOUJg5bL4ge7PGz0g3zqnA== X-Received: by 2002:a17:90a:35e6:: with SMTP id r93mr7180387pjb.44.1576116223607; Wed, 11 Dec 2019 18:03:43 -0800 (PST) Received: from [192.168.1.188] ([66.219.217.145]) by smtp.gmail.com with ESMTPSA id o31sm3982008pgb.56.2019.12.11.18.03.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 11 Dec 2019 18:03:42 -0800 (PST) Subject: Re: [PATCHSET v3 0/5] Support for RWF_UNCACHED From: Jens Axboe To: Linus Torvalds Cc: Linux-MM , linux-fsdevel , linux-block , Matthew Wilcox , Chris Mason , Dave Chinner , Johannes Weiner References: <20191211152943.2933-1-axboe@kernel.dk> <0d4e3954-c467-30a7-5a8e-7c4180275533@kernel.dk> Message-ID: <00a5c8b7-215a-7615-156d-d8f3dbb1cd3a@kernel.dk> Date: Wed, 11 Dec 2019 19:03:40 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12/11/19 6:09 PM, Jens Axboe wrote: > On 12/11/19 4:41 PM, Jens Axboe wrote: >> On 12/11/19 1:18 PM, Linus Torvalds wrote: >>> On Wed, Dec 11, 2019 at 12:08 PM Jens Axboe wrote: >>>> >>>> $ cat /proc/meminfo | grep -i active >>>> Active: 134136 kB >>>> Inactive: 28683916 kB >>>> Active(anon): 97064 kB >>>> Inactive(anon): 4 kB >>>> Active(file): 37072 kB >>>> Inactive(file): 28683912 kB >>> >>> Yeah, that should not put pressure on some swap activity. We have 28 >>> GB of basically free inactive file data, and the VM is doing something >>> very very bad if it then doesn't just quickly free it with no real >>> drama. >>> >>> In fact, I don't think it should even trigger kswapd at all, it should >>> all be direct reclaim. Of course, some of the mm people hate that with >>> a passion, but this does look like a prime example of why it should >>> just be done. >> >> For giggles, I ran just a single thread on the file set. We're only >> doing about 100K IOPS at that point, yet when the page cache fills, >> kswapd still eats 10% cpu. That seems like a lot for something that >> slow. > > Warning, the below is from the really crazy department... > > Anyway, I took a closer look at the profiles for the uncached case. > We're spending a lot of time doing memsets (this is the xa_node init, > from the radix tree constructor), and call_rcu for the node free later > on. All wasted time, and something that meant we weren't as close to the > performance of O_DIRECT as we could be. > > So Chris and I started talking about this, and pondered "what would > happen if we simply bypassed the page cache completely?". Case in point, > see below incremental patch. We still do the page cache lookup, and use > that page to copy from if it's there. If the page isn't there, allocate > one and do IO to it, but DON'T add it to the page cache. With that, > we're almost at O_DIRECT levels of performance for the 4k read case, > without 1-2%. I think 512b would look awesome, but we're reading full > pages, so that won't really help us much. Compared to the previous > uncached method, this is 30% faster on this device. That's substantial. > > Obviously this has issues with truncate that would need to be resolved, > and it's definitely dirtier. But the performance is very enticing... Tested and cleaned a bit, and added truncate protection through inode_dio_begin()/inode_dio_end(). https://git.kernel.dk/cgit/linux-block/commit/?h=buffered-uncached&id=6dac80bc340dabdcbfb4230b9331e52510acca87 This is much faster than the previous page cache dance, and I _think_ we're ok as long as we block truncate and hole punching. Comments? -- Jens Axboe