From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8158FC02183 for ; Mon, 13 Jan 2025 15:34:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0C3A6B0088; Mon, 13 Jan 2025 10:34:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EBC466B0089; Mon, 13 Jan 2025 10:34:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DAAE16B008A; Mon, 13 Jan 2025 10:34:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BDC2F6B0088 for ; Mon, 13 Jan 2025 10:34:24 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4A71BC04FF for ; Mon, 13 Jan 2025 15:34:24 +0000 (UTC) X-FDA: 83002825248.28.2383D5D Received: from mail-il1-f179.google.com (mail-il1-f179.google.com [209.85.166.179]) by imf05.hostedemail.com (Postfix) with ESMTP id 1757B100021 for ; Mon, 13 Jan 2025 15:34:21 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=bAd8lRuK; spf=pass (imf05.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.179 as permitted sender) smtp.mailfrom=axboe@kernel.dk; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736782462; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GEjZgb0EizZw0bzYqKOhvOuaHAC4ZwLKvUf+U47cO60=; b=UbqsjTNP63o9pgpKoXuf2+1ggc/WmRkk5sHqOxmXcX5leha1RTeWZS7zsq8rSzN3hnz3PF CxtGNIuvBbSioXvs9IhGjpyjqqx51f0EI3hL4ZlJvVNnh+CEtcJyk87o3JHrHDnMMTVyMm pq7fIRUc8+BXeM80LZykqVo0Lx6oTB4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736782462; a=rsa-sha256; cv=none; b=geh3+CQyhSchhOAItuxejJIa6yvjq+V+IyagClLcZBBu4TZoIw80+FeyGwFSRU08x6uTsS 0JjIGTjXGiZlu0c1aNJVmyk9NHF4+WQGNeE4KsEzqnwMRwj1KjLnhOszeqB5ga8PWf6IZu VaIZqncdfRneNkukuTg7jTThnTQrL80= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=bAd8lRuK; spf=pass (imf05.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.179 as permitted sender) smtp.mailfrom=axboe@kernel.dk; dmarc=none Received: by mail-il1-f179.google.com with SMTP id e9e14a558f8ab-3a8165cfae8so12534145ab.0 for ; Mon, 13 Jan 2025 07:34:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1736782461; x=1737387261; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=GEjZgb0EizZw0bzYqKOhvOuaHAC4ZwLKvUf+U47cO60=; b=bAd8lRuKTjsZr7/XPgC8cN4cLweqzPtPxlPIKCQUTOkLKsShw8xUOhiopJ/864/3b5 Bcmv97RHo27wAWOpcOAwCdisxnGdJd8Wc3VZ22bYqeZHgW84uGelolwWxXMQt2g0/gTD P1SjFV5PLwfBQ6kwhutJS5nBuv/wp1MSoZICIsg+GDwCVyWJRkIQp1QwfYD6pHWchdhK wWnTs1o8BcGk+iPoYOmki86G88yBvW9rzy+gCqGUEpJgerFLFZizdUM4VPLcnrR8aaTA bVO6MaRZ6PgiWzCl1DTX2yITCWfVl0DWgl3FZuTueLfzlmD8jyRl+cBFtb0l+vCzxkFx 2YFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736782461; x=1737387261; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GEjZgb0EizZw0bzYqKOhvOuaHAC4ZwLKvUf+U47cO60=; b=Glo2UzmjwD3k45heZfmvpdkF97tydIlJpMANPEguWWpHJHC6dg7sjlcoSrAlfV6Ykr IbmNusHRAsrWkHd3AJezRne9ANHey9psIhk2gA0SbWFeq6DFTymP/gfAAhWaGa8KB2d9 VmrlWlpMx47b+kAl2vqF/+XFBiAZ2+r8o57SntefRoeWye2IQAk23nQ++yHLuH/JuxV2 vjr6UKRtiaSzp+58qL39I5A/6UxbAwA+2cLONbnSfEDtfdvYJY6t5h0dKO3HXFfuIG81 JkMzVfC0tnJohBuXzSiGAD0Szv0QUbSiyJAejMO3DqNoKiGH/0Tr3apOY9ITPuMIA7Wf ae7A== X-Gm-Message-State: AOJu0Yygowg/kamAMBWgtjKViLmgGFFuOYQwzhKnijFIpcH4NcMdrHUE wLqngW6JAot8UiKJlbVSdnHLAcnK38e/8fbg7L9aohyyte47VV7QZlPohM7Hp6E= X-Gm-Gg: ASbGnctdC2hBVNgCiSMFbVjBdFpMOqiAoO6UJIMlWPVRkeT2OKsiHEqewhsDyGmKqzq eM1VS00M2z+gIlSHnUuWQDz164IfT/EllVUmtQq7x78rlo+c+Ig5+MbmV0ODoV/6ZvW36G7N0OV PWhwD9sCoEtgpy/+aJtf/jR/5QoPDulmvOmXZdmqfpwumy7WJD7wDLkTw5g286pDyTBaKvGvt2D fB49KM2YKUo6ZvBa7ctgTpdX80o0PLaEJJYNXVE+CoZs+81edoN X-Google-Smtp-Source: AGHT+IHeWU+1/sgyx4t7fFAnRIeJMvyo+nWNiMDWMOBaX1uADlQszp0S1Zd+4roi1BcUC+HGm8i57w== X-Received: by 2002:a05:6e02:2206:b0:3ce:6aa8:6c56 with SMTP id e9e14a558f8ab-3ce6aa86d30mr53055385ab.8.1736782460911; Mon, 13 Jan 2025 07:34:20 -0800 (PST) Received: from [192.168.1.116] ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id e9e14a558f8ab-3ce4adee7bfsm27398815ab.44.2025.01.13.07.34.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 13 Jan 2025 07:34:19 -0800 (PST) Message-ID: <3cba2c9e-4136-4199-84a6-ddd6ad302875@kernel.dk> Date: Mon, 13 Jan 2025 08:34:18 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCHSET v8 0/12] Uncached buffered IO To: Andrew Morton Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, hannes@cmpxchg.org, clm@meta.com, linux-kernel@vger.kernel.org, willy@infradead.org, kirill@shutemov.name, bfoster@redhat.com References: <20241220154831.1086649-1-axboe@kernel.dk> <20250107193532.f8518eb71a469b023b6a9220@linux-foundation.org> Content-Language: en-US From: Jens Axboe In-Reply-To: <20250107193532.f8518eb71a469b023b6a9220@linux-foundation.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 1757B100021 X-Stat-Signature: q576cj58ckt8dmjd91eb63z73xb4kx85 X-Rspam-User: X-HE-Tag: 1736782461-319768 X-HE-Meta: U2FsdGVkX19jyJT1v7qvdZbdD2s/LDTsaDfjkwABn5QtgY9SyMgrO10Y7CxrG7j8rFoZ5CUvsgOHcRCT00nH4hRGYHwDh3+X3mn3JeDLP73HNdCdRWElkvJTQCcoJXhXuL6HLAfzl/pYNRdRBVYWBRUcQXw6PTrNSxMtPWVp5GDkmz2Ql0g4NVtiHGMiOVAlwQe2t7g4KklAffra+IurB6Lq1o+1I59M2nEazl5FID0pOTW2KE5HhMXS+jldXMbjpYPNRaWBcMvLAlukU3UtsOI1PHuK3lxitDiDD9Jb+RXa6cdRyi/PcTzn41ZI70y7zXYDnVSBnf2T3aJMogdsIgoG7JbHGwpBNpNu+yLNwXI5fdaFuM7abJ1+6QmFAafgKofMz1pc1A6N7+lRQI+kKwtnL1Y1tlwwGXgPSKDD9C7ir4dcfJN19QuidTGXSr+8xIt/dooDO61YVC/V1DZI+yWI2Y2nPz8uT18nl8zfXW893qN9q2+F6r4rZ3JBKivEk+yHU1agxPqd1P62tpq9JKpg9F6pB8ri8N2y5f/usDh2B/R+usiabuHt9kimxR5OZcIBy2/tD9oVPAQrjoBGGL9N6nFtBsNyElR3cAKQ8UoSUCeVZKScsAfaWPmT8ckGLfFPFtO7IWCsZBJkFDpL7csjCpoS2Xd1afrAq4j4WQxVaEjU9fSwUCbrX6DeuKNC5bh06aGmhBu4hQNV8aRqr44A2p7023tjImd6+1W6zEc1L7MS67S7seZozF+gDtPoQU3hWiNsEoWTFFYSneq8nM3Zdw9WBvxBS9Zw+5Xvyv/FcbEmMRRejeF4JTpiCHn+j9v8rvwK0zYDQfHpotRLXSRJ2tP5PPPI6PGFs/RSGOZ+lsyRanK+n14brhMtsLNeFlZ/hRgomy2nqUDA4bIPsYkxhlYJucb/55ECC0usoWFNbA9WlxS0GJmRe+uQiFs2MMWm7AztxrjiN81VmhR kutuUuvF WM/irbA2/PQ9O3yxwWLeP4A9oqTMQZZmf+XHvphTf0CbCIlkqxr68DdEPzpvw9/jXXAYaJ6ry/bPoRiD6wOHx0QIdW9rggypZ9ciX77GevSgyQ118uJSe0f8GMmVOOuTYoi+ThqkGIRvxf3LOfINg8X+fwslNDlyXMvC5941AUh9rD0bfXBEF5ir8vgzb0AEScd2jcKrfg0OpSRHseQZAg6H8qB+nDf7Yu5ureMmmUwjwzuyfF4mnN84UEepNaDX3btoiGiGkPGjalrrHEKzkPkE3cgYG99pBGbKwem64rON07iMq8f7uv+tp/vR3pZbTa0pCfB2tVh8XyjtSINilxqO/Hm2FhgtJ3dIE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: (sorry missed this reply!) On 1/7/25 8:35 PM, Andrew Morton wrote: > On Fri, 20 Dec 2024 08:47:38 -0700 Jens Axboe wrote: > >> So here's a new approach to the same concent, but using the page cache >> as synchronization. Due to excessive bike shedding on the naming, this >> is now named RWF_DONTCACHE, and is less special in that it's just page >> cache IO, except it prunes the ranges once IO is completed. >> >> Why do this, you may ask? The tldr is that device speeds are only >> getting faster, while reclaim is not. Doing normal buffered IO can be >> very unpredictable, and suck up a lot of resources on the reclaim side. >> This leads people to use O_DIRECT as a work-around, which has its own >> set of restrictions in terms of size, offset, and length of IO. It's >> also inherently synchronous, and now you need async IO as well. While >> the latter isn't necessarily a big problem as we have good options >> available there, it also should not be a requirement when all you want >> to do is read or write some data without caching. > > Of course, we're doing something here which userspace could itself do: > drop the pagecache after reading it (with appropriate chunk sizing) and > for writes, sync the written area then invalidate it. Possible > added benefits from using separate threads for this. > > I suggest that diligence requires that we at least justify an in-kernel > approach at this time, please. Conceptually yes. But you'd end up doing extra work to do it. Some of that not so expensive, like system calls, and others more so, like LRU manipulation. Outside of that, I do think it makes sense to expose as a generic thing, rather than require applications needing to kick writeback manually, reclaim manually, etc. > And there's a possible middle-ground implementation where the kernel > itself kicks off threads to do the drop-behind just before the read or > write syscall returns, which will probably be simpler. Can we please > describe why this also isn't acceptable? That's more of an implementation detail. I didn't test anything like that, though we surely could. If it's better, there's no reason why it can't just be changed to do that. My gut tells me you want the task/CPU that just did the page cache additions to do the pruning to, that should be more efficient than having a kworker or similar do it. > Also, it seems wrong for a read(RWF_DONTCACHE) to drop cache if it was > already present. Because it was presumably present for a reason. Does > this implementation already take care of this? To make an application > which does read(/etc/passwd, RWF_DONTCACHE) less annoying? The implementation doesn't drop pages that were already present, only pages that got created/added to the page cache for the operation. So that part should already work as you expect. > Also, consuming a new page flag isn't a minor thing. It would be nice > to see some justification around this, and some decription of how many > we have left. For sure, though various discussions on this already occurred and Kirill posted patches for unifying some of this already. It's not something I wanted to tackle, as I think that should be left to people more familiar with the page/folio flags and they (sometimes odd) interactions. -- Jens Axboe