From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 587ADC369BA for ; Wed, 16 Apr 2025 16:43:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E6D1B2800D1; Wed, 16 Apr 2025 12:43:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DF5252800CE; Wed, 16 Apr 2025 12:43:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C96D62800D1; Wed, 16 Apr 2025 12:43:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A307E2800CE for ; Wed, 16 Apr 2025 12:43:24 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B72E3B11A7 for ; Wed, 16 Apr 2025 16:43:25 +0000 (UTC) X-FDA: 83340477570.04.F518F18 Received: from mail-qk1-f171.google.com (mail-qk1-f171.google.com [209.85.222.171]) by imf15.hostedemail.com (Postfix) with ESMTP id D74EDA000B for ; Wed, 16 Apr 2025 16:43:23 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jEEKKsqR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.222.171 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744821803; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JucdQ5DlZmC5EJz29kiAI+WzCFhEkuoOxCFmay4JEVU=; b=V7+09gZ2321irw4X1b4cfSer94Y4su4XIutAuzC2XXaXSjmAcIA1ak92kvH48ejdHzesWq CTuts6YhJZ8OxZE7qfpAXZ7M3Mo80M4eLw2uBKOMeH/IpBYivK4z41FJF9Gp4bqSSgnFKu WXfXBREUt75W63eUCYuh42NCn/E5Htg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744821803; a=rsa-sha256; cv=none; b=kklk+X/LOO0M7cTtJA78NrNA830royRaQagKV5eHxysJkuvCeOGS4+5B2dPjSScR3g+vqh cZj4ZJV2bE6dW1ettWJAUKGtWYw5PcxokthQl1ut1xvsrTAgJBb/kdW4j/e3UyPdbdET7w TRdnLRSwdl+mYfYby4E+OARq3lIF9DM= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jEEKKsqR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.222.171 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com Received: by mail-qk1-f171.google.com with SMTP id af79cd13be357-7c597760323so666624285a.3 for ; Wed, 16 Apr 2025 09:43:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1744821803; x=1745426603; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JucdQ5DlZmC5EJz29kiAI+WzCFhEkuoOxCFmay4JEVU=; b=jEEKKsqRXDc0KcOJDBaJr7nGykLT8x3ENsCiS6jaSpMDl8MQXywpbZ6nFO0P3qHqa5 6j4nZu+nVMqoalGvCyt2R4Nt0QzMLt1PvDZH26otddekeUKbdX5e9suy2bSjNZyWn4bw uSuMruXabFCNy6s90VU3yH0s3vUc6qC9Z+qddGerHSDNkfEBs7UStMoRJrXhd9qSTZpT jE1kXyi0qI++NXCmmGtnwYkLqjACgytDUQuwu7GQYiFOOrAUI2r/iMyLhzOJuebB1IBH aaHT3KjKlmBmsd5KQJwVU3EvXEh8fHKnbn4UVpGoJEH8oBRP8keQiSKbEU6qewTyffRE P4kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744821803; x=1745426603; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JucdQ5DlZmC5EJz29kiAI+WzCFhEkuoOxCFmay4JEVU=; b=Mjd3A9g4fzmCvin6PuB8/tNRrZ1EC5DkZ8VSvZ2NySesARRgt1m9BQBU5pQq/RTroN txc9nWMVrQVkyL1fV1W1q3ff4OeTt9rgwaBnLu/uQGt26I9PUtRKhjq23sUmtTyTQkyv BivBQ78UhxfuM2cQUPXu+Am4fJNtZDjXJQCv0XH3sGsSRhL0Toi9af6Wsntzu8yVE6Om 24x2yPZJBkgm+CoiAed3kTVeowsreGThqFa5VXLD6N7B3Jl6meMTkAqZWMTDv9Iso55t YntkNOxY1DKiwwsprq0J0udEAs9EvqetRb7p/uWVEU1ukGE7o4HBak1n6CSDyDDShOiA ML+Q== X-Forwarded-Encrypted: i=1; AJvYcCXc9YyxzHkkSSAPFWiebYYaTiSquesk9ebhVKjRJ9XpFxYmsASZLojMLaVbdTVVns0vNYHoqqnFjQ==@kvack.org X-Gm-Message-State: AOJu0Ywd5hO5TdLomOye/A05FE8ZI/23RwceRZ1BhAG6uUhh81gv4C0n 4s72+jdaZZhGjxtt53o2xbP0tG+IaKa2yrFDWB2A4OSNTGr++Cwn+Fd3uI+4WNogHRn6CD3IcjF 0gGkdG4ANt0XzcX/3BnQgVcSfxZU= X-Gm-Gg: ASbGncvVaDiNUvGEeRMMs6Ga84kNTaqycZGQ984TpmQrR8mx3N6bh64xCtATOt5ChVg TO2X0fwr0/LthjnZnoq0N3XQ+VvyeJ4ekufIoau75RzSyzcvXWe7Dh9uLhBkRgeAc3cr22bg1IY f7XiQuvvES2j6+JRlM1FOFzA== X-Google-Smtp-Source: AGHT+IEywHem12CF/QZaY5V7xsM+8mI8ZK/OyLdC8/u/nwQnpmqFBwB3XE5PJMRMBk4z1R0WvhVUwAnCZUsUsX0UNh0= X-Received: by 2002:a05:620a:424c:b0:7c5:4b91:6a41 with SMTP id af79cd13be357-7c9190633aemr278630185a.42.1744821802838; Wed, 16 Apr 2025 09:43:22 -0700 (PDT) MIME-Version: 1.0 References: <20250404181443.1363005-1-joannelkoong@gmail.com> <20250404181443.1363005-4-joannelkoong@gmail.com> <7e9b1a40-4708-42a8-b8fc-44fa50227e5b@linux.alibaba.com> <9a3cfb55-faae-4551-9bef-b9650432848a@linux.alibaba.com> In-Reply-To: From: Joanne Koong Date: Wed, 16 Apr 2025 09:43:12 -0700 X-Gm-Features: ATxdqUFdkj-lUnkpAvC23WtI0NTAhmFoMjBMxIeN1VnDZ3P5KfQM4kJi4zK58sU Message-ID: Subject: Re: [PATCH v7 3/3] fuse: remove tmp folio for writebacks and internal rb tree To: Jingbo Xu Cc: miklos@szeredi.hu, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, shakeel.butt@linux.dev, david@redhat.com, bernd.schubert@fastmail.fm, ziy@nvidia.com, jlayton@kernel.org, kernel-team@meta.com, Miklos Szeredi Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: D74EDA000B X-Rspam-User: X-Stat-Signature: 37hbc9xmgh33arzqjjuimb513q7c37t5 X-HE-Tag: 1744821803-612054 X-HE-Meta: U2FsdGVkX1/uO3b7F4+U41cEmK2D5JXRQPRhdD7suvBqzWzpbM5iD3i3w6ICpbk5bFsWQFDFRag6nCS2aQ/Lx2FQU03+2xPORy7EvI8t6mcTBXGomaGW3vGcwU6c9ziJn5R9PlEInrJEzSgsE1JeTpHNVZGmICwzo9GXhiu6Ls0QyHsSkvGTP71ZWPkwHvLFvzZ3/zxNCCG1IkUbi8ZtnPk97PgA0Jd221o8FZ+mOKW3eTYWLxwN0lBYOJ06ylTfqIqx/HNrjTCwz0JV0SKKA7/xtSBDwGApurEAw2sjHhqE3hfS/Q69SUZNTOUtvvsgxd/wVr587TEBAXGNSKJ47cpNPBD+2zRy4CdMubvz8HnbNSZHiNgh65Uo3ZSQcV08aBUfAa1d4xR/RNkWuZVAnSXTSeEsFPgVbi2oZMOx7khrAhb9HRprEst1V/bv2Z7lDHt8eehEB7gpTimqeuelKMofD4T4CD7moraA1II7+12h8MTnrKPmwrvqa0BUXS28PKoTKvlH/s9qFf0oWdCi8Hsf2fPFjr89YSRW2L6bNbyy/gFOYMtpfj/8qpYEXYJyFS4oQ/DAZBni+shF6d8njgkV9rZPw1vcW1lKVrrdIcOyXKF8C8Uib25j1pgLkXz01oqXAPKxM/wAzfPg0t0IaG6ZyvP3KfP2rtSx0NJPvHfU8CanZG4esqd2ItOpd0OA+sAti+gyOZ9GHIZNUu659ClVmUswxkRcqR3AiE7DqqRF+7IBxasQmU1Y8mL0roDnK0MB7tkDVm1sxnqxRSgLfq3Pkj4tA0Y13523Mep7Boa3184ZZBH/RygzznDj3Y34woJYGj4wQFchwtgr8lnNX4A8SmFvLcYG58XI07Ta9lOz5mpJZNUdpjsTnULgsa26tMzsrKCmW3yW8Hkvvooygf0qSTwN+Nm84p4Juzm5o0g1LIMFH5xeEY4cJU+pS/frlZkB0PjQCOUEI2eiXbv IS8KNs37 1e/XiP+O8LmlQ8Uwuv4JDNvoFwzajWIZgUyC+faxcwAVxcNN7USjpUFIIdL8w3W6Q2LHwfiJCLv1jL7OKcnogXpp5wsKCeJdKca042djaUCVJtG/5LLyL/ii/9fzZ7FK2JfvmXMg3pX5bOXYCKp/0B3a5HVVbBsNTko6c6z5isZiE4WdKbzwBDu0dAb4INSwVwafDdAt0Xg0ZSgMMqBcyLRMf+Y16/OBytm2JXKNZlvC69GxwzVFXSSLGoCwM003xcwVKGlXrBUS31qLbx6XlujthaE3wNeJTnQOQk0i2bIdRozV72IYD4zdzSjkERkalH7COBINQrX5K5xsM2ERM2x7WPDbHxdvw9S8AACJWyK3j4hjCeG7/i8xFFvivCzZo1TDt/QKo4xPBTyw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 15, 2025 at 6:40=E2=80=AFPM Jingbo Xu wrote: > > On 4/15/25 11:59 PM, Joanne Koong wrote: > > On Tue, Apr 15, 2025 at 12:49=E2=80=AFAM Jingbo Xu wrote: > >> > >> Hi Joanne, > >> > >> Sorry for the late reply... > > > > Hi Jingbo, > > > > No worries at all. > >> > >> > >> On 4/11/25 12:11 AM, Joanne Koong wrote: > >>> On Thu, Apr 10, 2025 at 8:11=E2=80=AFAM Jingbo Xu wrote: > >>>> > >>>> On 4/10/25 11:07 PM, Joanne Koong wrote: > >>>>> On Wed, Apr 9, 2025 at 7:12=E2=80=AFPM Jingbo Xu wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 4/10/25 7:47 AM, Joanne Koong wrote: > >>>>>>> On Tue, Apr 8, 2025 at 7:43=E2=80=AFPM Jingbo Xu wrote: > >>>>>>>> > >>>>>>>> Hi Joanne, > >>>>>>>> > >>>>>>>> On 4/5/25 2:14 AM, Joanne Koong wrote: > >>>>>>>>> In the current FUSE writeback design (see commit 3be5a52b30aa > >>>>>>>>> ("fuse: support writable mmap")), a temp page is allocated for = every > >>>>>>>>> dirty page to be written back, the contents of the dirty page a= re copied over > >>>>>>>>> to the temp page, and the temp page gets handed to the server t= o write back. > >>>>>>>>> > >>>>>>>>> This is done so that writeback may be immediately cleared on th= e dirty page, > >>>>>>>>> and this in turn is done in order to mitigate the following dea= dlock scenario > >>>>>>>>> that may arise if reclaim waits on writeback on the dirty page = to complete: > >>>>>>>>> * single-threaded FUSE server is in the middle of handling a re= quest > >>>>>>>>> that needs a memory allocation > >>>>>>>>> * memory allocation triggers direct reclaim > >>>>>>>>> * direct reclaim waits on a folio under writeback > >>>>>>>>> * the FUSE server can't write back the folio since it's stuck i= n > >>>>>>>>> direct reclaim > >>>>>>>>> > >>>>>>>>> With a recent change that added AS_WRITEBACK_INDETERMINATE and = mitigates > >>>>>>>>> the situations described above, FUSE writeback does not need to= use > >>>>>>>>> temp pages if it sets AS_WRITEBACK_INDETERMINATE on its inode m= appings. > >>>>>>>>> > >>>>>>>>> This commit sets AS_WRITEBACK_INDETERMINATE on the inode mappin= gs > >>>>>>>>> and removes the temporary pages + extra copying and the interna= l rb > >>>>>>>>> tree. > >>>>>>>>> > >>>>>>>>> fio benchmarks -- > >>>>>>>>> (using averages observed from 10 runs, throwing away outliers) > >>>>>>>>> > >>>>>>>>> Setup: > >>>>>>>>> sudo mount -t tmpfs -o size=3D30G tmpfs ~/tmp_mount > >>>>>>>>> ./libfuse/build/example/passthrough_ll -o writeback -o max_thr= eads=3D4 -o source=3D~/tmp_mount ~/fuse_mount > >>>>>>>>> > >>>>>>>>> fio --name=3Dwriteback --ioengine=3Dsync --rw=3Dwrite --bs=3D{1= k,4k,1M} --size=3D2G > >>>>>>>>> --numjobs=3D2 --ramp_time=3D30 --group_reporting=3D1 --director= y=3D/root/fuse_mount > >>>>>>>>> > >>>>>>>>> bs =3D 1k 4k 1M > >>>>>>>>> Before 351 MiB/s 1818 MiB/s 1851 MiB/s > >>>>>>>>> After 341 MiB/s 2246 MiB/s 2685 MiB/s > >>>>>>>>> % diff -3% 23% 45% > >>>>>>>>> > >>>>>>>>> Signed-off-by: Joanne Koong > >>>>>>>>> Reviewed-by: Jingbo Xu > >>>>>>>>> Acked-by: Miklos Szeredi > >>>>>>>> > >>>>>>> > >>>>>>> Hi Jingbo, > >>>>>>> > >>>>>>> Thanks for sharing your analysis for this. > >>>>>>> > >>>>>>>> Overall this patch LGTM. > >>>>>>>> > >>>>>>>> Apart from that, IMO the fi->writectr and fi->queued_writes mech= anism is > >>>>>>>> also unneeded then, at least the DIRECT IO routine (i.e. > >>>>>>> > >>>>>>> I took a look at fi->writectr and fi->queued_writes and my > >>>>>>> understanding is that we do still need this. For example, for > >>>>>>> truncates (I'm looking at fuse_do_setattr()), I think we still ne= ed to > >>>>>>> prevent concurrent writeback or else the setattr request and the > >>>>>>> writeback request could race which would result in a mismatch bet= ween > >>>>>>> the file's reported size and the actual data written to disk. > >>>>>> > >>>>>> I haven't looked into the truncate routine yet. I will see it lat= er. > >>>>>> > >>>>>>> > >>>>>>>> fuse_direct_io()) doesn't need fuse_sync_writes() anymore. That= is > >>>>>>>> because after removing the temp page, the DIRECT IO routine has = already > >>>>>>>> been waiting for all inflight WRITE requests, see > >>>>>>>> > >>>>>>>> # DIRECT read > >>>>>>>> generic_file_read_iter > >>>>>>>> kiocb_write_and_wait > >>>>>>>> filemap_write_and_wait_range > >>>>>>> > >>>>>>> Where do you see generic_file_read_iter() getting called for dire= ct io reads? > >>>>>> > >>>>>> # DIRECT read > >>>>>> fuse_file_read_iter > >>>>>> fuse_cache_read_iter > >>>>>> generic_file_read_iter > >>>>>> kiocb_write_and_wait > >>>>>> filemap_write_and_wait_range > >>>>>> a_ops->direct_IO(),i.e. fuse_direct_IO() > >>>>>> > >>>>> > >>>>> Oh I see, I thought files opened with O_DIRECT automatically call t= he > >>>>> .direct_IO handler for reads/writes but you're right, it first goes > >>>>> through .read_iter / .write_iter handlers, and the .direct_IO handl= er > >>>>> only gets invoked through generic_file_read_iter() / > >>>>> generic_file_direct_write() in mm/filemap.c > >>>>> > >>>>> There's two paths for direct io in FUSE: > >>>>> a) fuse server sets fi->direct_io =3D true when a file is opened, w= hich > >>>>> will set the FOPEN_DIRECT_IO bit in ff->open_flags on the kernel si= de > >>>>> b) fuse server doesn't set fi->direct_io =3D true, but the client o= pens > >>>>> the file with O_DIRECT > >>>>> > >>>>> We only go through the stack trace you listed above for the b) case= . > >>>>> For the a) case, we'll hit > >>>>> > >>>>> if (ff->open_flags & FOPEN_DIRECT_IO) > >>>>> return fuse_direct_read_iter(iocb, to); > >>>>> > >>>>> and > >>>>> > >>>>> if (ff->open_flags & FOPEN_DIRECT_IO) > >>>>> return fuse_direct_write_iter(iocb, from); > >>>>> > >>>>> which will invoke fuse_direct_IO() / fuse_direct_io() without going > >>>>> through the kiocb_write_and_wait() -> filemap_write_and_wait_range(= ) / > >>>>> kiocb_invalidate_pages() -> filemap_write_and_wait_range() you list= ed > >>>>> above. > >>>>> > >>>>> So for the a) case I think we'd still need the fuse_sync_writes() i= n > >>>>> case there's still pending writeback. > >>>>> > >>>>> Do you agree with this analysis or am I missing something here? > >>>> > >>>> Yeah, that's true. But instead of calling fuse_sync_writes(), we ca= n > >>>> call filemap_wait_range() or something similar here. > >>>> > >>> > >>> Agreed. Actually, the more I look at this, the more I think we can > >>> replace all fuse_sync_writes() and get rid of it entirely. > >> > >> > >> I have seen your latest reply that this cleaning up won't be included = in > >> this series, which is okay. > >> > >> > >>> fuse_sync_writes() is called in: > >>> > >>> fuse_fsync(): > >>> /* > >>> * Start writeback against all dirty pages of the inode, then > >>> * wait for all outstanding writes, before sending the FSYNC > >>> * request. > >>> */ > >>> err =3D file_write_and_wait_range(file, start, end); > >>> if (err) > >>> goto out; > >>> > >>> fuse_sync_writes(inode); > >>> > >>> /* > >>> * Due to implementation of fuse writeback > >>> * file_write_and_wait_range() does not catch errors. > >>> * We have to do this directly after fuse_sync_writes() > >>> */ > >>> err =3D file_check_and_advance_wb_err(file); > >>> if (err) > >>> goto out; > >>> > >>> > >>> We can get rid of the fuse_sync_writes() and > >>> file_check_and_advance_wb_err() entirely since now without temp pages= , > >>> the file_write_and_wait_range() call actually ensures that writeback > >>> is completed > >>> > >>> > >>> > >>> fuse_writeback_range(): > >>> static int fuse_writeback_range(struct inode *inode, loff_t > >>> start, loff_t end) > >>> { > >>> int err =3D > >>> filemap_write_and_wait_range(inode->i_mapping, start, LLONG_MAX); > >>> > >>> if (!err) > >>> fuse_sync_writes(inode); > >>> > >>> return err; > >>> } > >>> > >>> > >>> We can replace fuse_writeback_range() entirely with > >>> filemap_write_and_wait_range(). > >>> > >>> > >>> > >>> fuse_direct_io(): > >>> if (fopen_direct_io && fc->direct_io_allow_mmap) { > >>> res =3D filemap_write_and_wait_range(mapping, pos, po= s + > >>> count - 1); > >>> if (res) { > >>> fuse_io_free(ia); > >>> return res; > >>> } > >>> } > >>> if (!cuse && filemap_range_has_writeback(mapping, pos, (pos + > >>> count - 1))) { > >>> if (!write) > >>> inode_lock(inode); > >>> fuse_sync_writes(inode); > >>> if (!write) > >>> inode_unlock(inode); > >>> } > >>> > >>> > >>> I think this can just replaced with > >>> if (fopen_direct_io && (fc->direct_io_allow_mmap || != cuse)) { > >>> res =3D filemap_write_and_wait_range(mapping, > >>> pos, pos + count - 1); > >>> if (res) { > >>> fuse_io_free(ia); > >>> return res; > >>> } > >>> } > >> > >> Alright. But I would prefer doing this filemap_write_and_wait_range() = in > >> fuse_direct_write_iter() rather than fuse_direct_io() if possible. > >> > >>> since for the !fopen_direct_io case, it will already go throug= h > >>> filemap_write_and_wait_range(), as you mentioned in your previous > >>> message. I think this also fixes a bug (?) in the original code - in > >>> the fopen_direct_io && !fc->direct_io_allow_mmap case, I think we > >>> still need to write out dirty pages first, which we don't currently > >>> do. > >> > >> Nope. In case of fopen_direct_io && !fc->direct_io_allow_mmap, there > >> won't be any page cache at all, right? > >> > > > > Isn't there still a page cache if the file was previously opened > > without direct io and then the client opens another handle to that > > file with direct io? In that case, the pages could still be dirty in > > the page cache and would need to be written back first, no? > > Do you mean that when the inode is firstly opened, FOPEN_DIRECT_IO is > not set by the FUSE server, while it is secondly opened, the flag is set? > > Though the behavior of the FUSE daemon is quite confusing in this case, > it is completely possible in real life. So yes we'd better add > filemap_write_and_wait_range() unconditionally in fopen_direct_io case. > I think this behavior on the server side is pretty common. From what I've seen on most servers, the server when handling the open sets fi->direct_io depending on if the client opens with O_DIRECT, eg if (fi->flags & O_DIRECT) fi->direct_io =3D 1; If a client opens a file without O_DIRECT and then opens the same file with O_DIRECT, then we run into this case. Though I'm not sure how common it generally is for clients to do this. Thanks, Joanne > > -- > Thanks, > Jingbo