From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD263C4332F for ; Mon, 13 Nov 2023 18:06:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D8C3E6B0203; Mon, 13 Nov 2023 13:06:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D3B716B020E; Mon, 13 Nov 2023 13:06:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BDBDB6B0246; Mon, 13 Nov 2023 13:06:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id ABE8E6B0203 for ; Mon, 13 Nov 2023 13:06:21 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 7F6A480813 for ; Mon, 13 Nov 2023 18:06:21 +0000 (UTC) X-FDA: 81453710562.28.A819DCB Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com [209.85.219.42]) by imf11.hostedemail.com (Postfix) with ESMTP id 4DA2940009 for ; Mon, 13 Nov 2023 18:06:19 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WVMNaAHu; dmarc=none; spf=pass (imf11.hostedemail.com: domain of tavianator@gmail.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=tavianator@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699898779; a=rsa-sha256; cv=none; b=kcUai6vvwmiiphC5DpmhOfTUY9ASSpS0cv54JMNdFV5NQpFB5u+5sz+wTpeGoY0cj8H6is HnBJjPCTA6kSFf2xxExS5INOW6TuJhgUd1m1pvdqzwVIefQvqplgntw5MrM8xcxcJ0/INO 84dbCDYU1cncr6F18c8DWV0dugZ2ZyQ= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WVMNaAHu; dmarc=none; spf=pass (imf11.hostedemail.com: domain of tavianator@gmail.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=tavianator@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699898779; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2OOyP4HGPLZfbscvLdPNJ6jUz6uuQNDl4ThZcDvNOS8=; b=O/hwMO3H/K0lNI55+LGX8CDo/a4d3uAGoqqrs6Inob6H9zRT55CW8Mguax5LbEn8hUFhdN E7/kZCHLgB840+LORuopBba7L8LATNgCP2WcZHz2/0iFF49MKmRzQIRdL8hxLJDJgeFl5V QcMGeIdYYaatSxfUlUReKIkk4eOkyVY= Received: by mail-qv1-f42.google.com with SMTP id 6a1803df08f44-66d122e0c85so28035346d6.3 for ; Mon, 13 Nov 2023 10:06:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699898778; x=1700503578; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=2OOyP4HGPLZfbscvLdPNJ6jUz6uuQNDl4ThZcDvNOS8=; b=WVMNaAHuxsJ95XPqT+sjMF0CpHchGh/GbEvU4VkjWPJWDncNmFU8rlDuB/VwKXS7FU 93eHn4+ZVMHjglIvWl0jkTP5UpaK3GfLKskiedpUwXAMMSdVp586gMBKPx6pY/Wwnum3 ADGWQJk52YjEdo2CFtAVRTEWZFDMNgOd56BGsNtfODEBprHrM+Pe8yS09IxxP9u9Wq4z CFzWozUXIemuCtZcmcFjZIYJQguLULzrbFwVdPBq/wi29erz2lNHCKFVtAlB3ZnoZhyy 1MOpMPVtvFwHHwyWt1MOLP8GY6x68h43SBchsFxQ5EG3OqaV6xra9+fGqBgiRhj7mEBg /OXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699898778; x=1700503578; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=2OOyP4HGPLZfbscvLdPNJ6jUz6uuQNDl4ThZcDvNOS8=; b=IDE8qH80oZEHauHnu7k/5y1NkXSWn7rtl1W3uOZ2dUnu/hpd12tA6M8+gBE0oMIEs/ 6f7MXm5EzJNTPaYcVW5ubVOBQVj2FUIYpD6MvKXOxX8/mI/+Q5CJZ4qulxggg955pCTl 8oiFOqg9gWoQQKrhZ5Q16xhAuSnWQ3xMkDqsSCK7NJUQbEAQgUtfKy0N+5gW4N/HkAwQ hrhWRMr6QgHI/R9xFj2tz5AR+JpJWwyUTW6rb8Iyx0JIHBPA8gT6lEU2QMkQYAG/ei5x D/hG6VuVWoiyLVl/ya0k6Y76EVcKvfldt9aNnXMeSFDLdIiECudOAaQwm/xfcK2vEaow 2Syg== X-Gm-Message-State: AOJu0YxF/y2tkImGaM5oGYk8Zj7UMqWoWFSYhBE+yfJPG55zkq8YuvPh wXSqXZMJFDm3Y361JmRbww4= X-Google-Smtp-Source: AGHT+IGyVixjN9yp5Q58U8VgP3g0g7Cg7z/i8NGIgxfDOgeqX/rZ1TiPXlCinQz6mKjM/ez8c0bOCg== X-Received: by 2002:a05:6214:a48:b0:675:b8ee:e3a1 with SMTP id ee8-20020a0562140a4800b00675b8eee3a1mr6782113qvb.7.1699898778295; Mon, 13 Nov 2023 10:06:18 -0800 (PST) Received: from tachyon.tail92c87.ts.net ([192.159.180.233]) by smtp.gmail.com with ESMTPSA id jy20-20020a0562142b5400b00671ab3da5d0sm2197928qvb.105.2023.11.13.10.06.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Nov 2023 10:06:18 -0800 (PST) From: tavianator@tavianator.com To: cel@kernel.org Cc: akpm@linux-foundation.org, brauner@kernel.org, chuck.lever@oracle.com, hughd@google.com, jlayton@redhat.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, viro@zeniv.linux.org.uk Subject: Re: [PATCH v7 3/3] shmem: stable directory offsets Date: Mon, 13 Nov 2023 13:06:16 -0500 Message-ID: <20231113180616.2831430-1-tavianator@tavianator.com> X-Mailer: git-send-email 2.42.1 In-Reply-To: <168814734331.530310.3911190551060453102.stgit@manet.1015granger.net> References: <168814734331.530310.3911190551060453102.stgit@manet.1015granger.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 4DA2940009 X-Stat-Signature: y1r6dox1gazdq188bfwisx8ax9pqksp8 X-HE-Tag: 1699898779-198829 X-HE-Meta: U2FsdGVkX18SGQFIc3ywyM9/2AQvdyGRzxIvGizMdOc5uWI4FtZBbG87o7/9rpGj6EcQoopa2mwdM0aw1pGP7m7nV/rhMNSoCAOy3DI2hA2NrNmmf+fB5Ote5JXLiOr8dVh+xXwELWxQseCc7TfMHfqJb5xfIOVvkS/Wo5jzm6PTSn1/DQrg/vk4BPa79JMr4PHZietZ2NEw27aRhpmJ7dGpe99LOTpy62LOcxMKyCumkRtbMfB2iHlv6KQwnPutU8wW1KY+QlngrY0Xg+9tVL+rBGgumhLyAp5jfkEV201vfPcD/3vi7HqmEcyjC+J7gscRssxGW3eTpYWFXA22SDON/hiZyOeTnLAkeNP434dyEO/2/LYK2nmebk1EkU1Bn91+VMwH9oQJsh/69j2UsuzxAlPV2ofa96SkY3F9wYdFIcpLrQmY5Lm2J7/pOi++VWzB0ZGEP78FOljf6oGKuIhSfEBkH78hJAGYNageYJbbeVfigpB93zuVKjhKikT/5xNnDIjeRGOSjzLQffjjx9cXw2qkybDxb0+1o1HfwsixoMwL+3lISX4Ad8VjpEY9XzmR6tzARPnnjBZ7aBe6pGZ5kA5btgVXaPUN/WNQF5AKMwyTztXXQYGkSeEAa4g8pNdkW+HvLo+QKqNN7NCRqLbEDqMH/TyWcd9enA/bhQ97kEZ77wqyVkaPoXgye/dCyJbwWgZwFzvRMCxP4+fPFYVohyuykMRPzmF6mXKSbsIFGUmCFI3gIv3UbmkALe8+BrDtiM/lSeH5Ihl2VMlZpGHwJROha6WRnqOrTHZS9HTNmnsvBQ8rKdWqiUDQudlvCl70WlkhRgjlE+Z/HHxi6L4PbZJ9iSSIOwNZjSHsqKp4DJQ1S8GhuJmCYFwjHEgknZXXz7k238EUwp3Ki+8f0X8JaERg9CnbX2dBSemOk8UTBxrYS92gHjZ29CekYZsU+VBHZsi+qlnfpMiH3Ju omt9qcMZ 1ZHxfdHJscUOPS7M5iiZaixKp3+9U9JAQVeYm9+TebC1bVwpd1jaWRjz1VZ4o4gIGnYZxmZLu3Y+8uguzEqoaTuTL7a1f2kp4qDe+1rZhJCQ9qaSySvRJr2IZPpSHrNBkKGjloB05cnWXSu/MWCZx1jHNBLVfRn4d01b/28qd7HzgbK7PTmmJM1JUtUpA1aH82kJqa5rWEe0kEcDlvDaYUR/qDKlxSEpCy6ncTL0EMIXCrCGxa/+d1X1GnO8MCVAXtxcC/34BIIHzja8eKF7eemxiyoyPmd2E4cHy18uqMtBBnottyPAu4gLS5bZB2EJJRUFAASTkRJKBrEgfn9h0CgVfb9fSwRHVADfn/gwz9lxtejSJ1e0zayebCC064D45X07w7TBbPQnjKXxgeUvHmDHxRT7VTnkn62v6kaXjTPt0esYd/LR7xKitdFmGmezBQQzsU2Pjw6jJv/6Rq0qDDGgWBHNjebc0bhcABN42RDFLuHQ6xJ2jKWbVhc4islfhdzY/jS7TyCKIcvUoFz07pif27gNVJlp+VuTq3CD9SBhIImyAvlVkRL8IZBNkTWhxONRX9vwVo/zn1Mh86/dGXRu6BXYOsuPaw+Ww X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 30 Jun 2023 at 13:49:03 -0400, Chuck Lever wrote: > From: Chuck Lever > > The current cursor-based directory offset mechanism doesn't work > when a tmpfs filesystem is exported via NFS. This is because NFS > clients do not open directories. Each server-side READDIR operation > has to open the directory, read it, then close it. The cursor state > for that directory, being associated strictly with the opened > struct file, is thus discarded after each NFS READDIR operation. > > Directory offsets are cached not only by NFS clients, but also by > user space libraries on those clients. Essentially there is no way > to invalidate those caches when directory offsets have changed on > an NFS server after the offset-to-dentry mapping changes. Thus the > whole application stack depends on unchanging directory offsets. > > The solution we've come up with is to make the directory offset for > each file in a tmpfs filesystem stable for the life of the directory > entry it represents. > > shmem_readdir() and shmem_dir_llseek() now use an xarray to map each > directory offset (an loff_t integer) to the memory address of a > struct dentry. I believe this patch is responsible for a tmpfs behaviour change when a directory is modified while being read. The following test program #include #include #include #include #include #include #include int main(int argc, char *argv[]) { const char *tmp = "/tmp"; if (argc >= 2) tmp = argv[1]; char *dir_path; if (asprintf(&dir_path, "%s/foo.XXXXXX", tmp) < 0) err(EXIT_FAILURE, "asprintf()"); if (!mkdtemp(dir_path)) err(EXIT_FAILURE, "mkdtemp(%s)", dir_path); char *file_path; if (asprintf(&file_path, "%s/bar", dir_path) < 0) err(EXIT_FAILURE, "asprintf()"); if (creat(file_path, 0644) < 0) err(EXIT_FAILURE, "creat(%s)", file_path); DIR *dir = opendir(dir_path); if (!dir) err(EXIT_FAILURE, "opendir(%s)", dir_path); struct dirent *de; while ((de = readdir(dir))) { printf("readdir(): %s/%s\n", dir_path, de->d_name); if (de->d_name[0] == '.') continue; if (unlink(file_path) != 0) err(EXIT_FAILURE, "unlink(%s)", file_path); if (creat(file_path, 0644) < 0) err(EXIT_FAILURE, "creat(%s)", file_path); } return EXIT_SUCCESS; } when run on Linux 6.5, doesn't print the new directory entry: tavianator@graphene $ uname -a Linux graphene 6.5.9-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 26 Oct 2023 00:52:20 +0000 x86_64 GNU/Linux tavianator@graphene $ gcc -Wall foo.c -o foo tavianator@graphene $ ./foo readdir(): /tmp/foo.wgmdmm/. readdir(): /tmp/foo.wgmdmm/.. readdir(): /tmp/foo.wgmdmm/bar But on Linux 6.6, readdir() never stops: tavianator@tachyon $ uname -a Linux tachyon 6.6.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 08 Nov 2023 16:05:38 +0000 x86_64 GNU/Linux tavianator@tachyon $ gcc foo.c -o foo tavianator@tachyon $ ./foo readdir(): /tmp/foo.XnIRqj/. readdir(): /tmp/foo.XnIRqj/.. readdir(): /tmp/foo.XnIRqj/bar readdir(): /tmp/foo.XnIRqj/bar readdir(): /tmp/foo.XnIRqj/bar readdir(): /tmp/foo.XnIRqj/bar readdir(): /tmp/foo.XnIRqj/bar readdir(): /tmp/foo.XnIRqj/bar readdir(): /tmp/foo.XnIRqj/bar readdir(): /tmp/foo.XnIRqj/bar ... foo: creat(/tmp/foo.TTL6Fg/bar): Too many open files POSIX says[1] > If a file is removed from or added to the directory after the most recent > call to opendir() or rewinddir(), whether a subsequent call to readdir() > returns an entry for that file is unspecified. so this isn't necessarily a *bug*, but I just wanted to point out the behaviour change. I only noticed it because it broke one of my tests in bfs[2] (in a non-default build configuration). [1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html [2]: https://github.com/tavianator/bfs/blob/main/tests/gnu/ignore_readdir_race_notdir.sh