From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D1C6C31D97 for ; Wed, 3 Jul 2024 13:53:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E7DDF6B0082; Wed, 3 Jul 2024 09:53:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DBD3B6B0088; Wed, 3 Jul 2024 09:53:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A5C536B0083; Wed, 3 Jul 2024 09:53:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7843C6B0082 for ; Wed, 3 Jul 2024 09:53:22 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0322280390 for ; Wed, 3 Jul 2024 13:53:21 +0000 (UTC) X-FDA: 82298583444.05.4E4310E Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) by imf30.hostedemail.com (Postfix) with ESMTP id 1CB7F8001A for ; Wed, 3 Jul 2024 13:53:19 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FD8wgIFf; spf=pass (imf30.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720014788; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WNsoF1dP6vGJ9qPRYJ4Nm9afRtYjJpZP/prTQMZ/X8o=; b=PPEyQsXEbNDOSg3DWwiHcv6OQsNNiRgW0V1H2Y64Aj4x3F5PXhJ9YCBacI7ike5E/Wth5E kXCKgvyMyPfB2N2oJhRDrSzDJ1KdeQntmX7KdY3sSitqTNfu0QeE2FcXR7oJ0Hl4Yd6EDW BQ5kkz1HxAkrkNVhryLh926VKqXHPC4= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FD8wgIFf; spf=pass (imf30.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720014788; a=rsa-sha256; cv=none; b=cN4kweAi/J5c/20trVb3RtMND82ulfpxNIB5MnwYPw3Vm6DiJpGDOaBDA6D6I/eAeOvQvq KoIjk05hpoW3voL6ta+aRa17cfGtdYllqrSUXDlZyco9GqqR0QYYyPgaZ4r4Q2kcmFjuV1 kszezCv7m+rZ7qOwb9MdZGzCV9tyZDs= Received: by mail-ed1-f44.google.com with SMTP id 4fb4d7f45d1cf-58be2b8b6b2so2374183a12.3 for ; Wed, 03 Jul 2024 06:53:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720014798; x=1720619598; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=WNsoF1dP6vGJ9qPRYJ4Nm9afRtYjJpZP/prTQMZ/X8o=; b=FD8wgIFfYo2diSzqmgY2SpY42UBuXOI3rAKiHnHnP/VN85S/5PQnnsnJpNrR6bDwv9 o6xUWlVQhbOJg+Xp9FRUTz4ln9bnMLnnll6B8IpCFtJnwodWkg6ZCxL/6x4wHJn9spw3 bt2HUHRzh+9l/3dL+CnSAqCtyqv8zkpsKrPz8UUKivgquT0WPewPlWKwDONaNja3A5G5 /oR/CHvxp4ZOcVU4qI8w1YjWzPX6I6spkYxXCSPzyWEEUOf8PWGUu9/ubnjZ09VMXQF/ BzK5jrzWrn38ZUlRxKXSFy43NV+VapC6EDFWGtQoxHFqhx5XgwL8aYKaha2aN0H6UKc3 +iqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720014798; x=1720619598; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=WNsoF1dP6vGJ9qPRYJ4Nm9afRtYjJpZP/prTQMZ/X8o=; b=cm0W+Qi38HLqNB/MLxZl5PKjLHBb2TsHn1qHsAVlcRAIcUXUcfCGdVK11qbRcyHHVC 41cBEpLqdvTz00VrgruXdC2hzENmEOOZJBjAq1VCi6OX8ZjI/jqqFfw9iUNomOHUj9za IvLVfvvguw1oIh9VapsmnRCOeNHtWhmQBXNSyQO2oTZ+x5o4sFm56K2FoGMZ9RrTP7lK yfJveUtg12WZwWO0yhcZZUg8QSNbZvJSFcJSy+D/9dJnYlwMIn6gIV1nAHIfhM5blaIq wRdwZ/NBG1P6afcA842Xzk5pg50pJHJP57rTwqWI50aA/ZLdW6Bb3WRuntj5mXhJ5TGQ qw+w== X-Forwarded-Encrypted: i=1; AJvYcCVJ5FtuGrWYUYWoecSXsQT1dvkae9GSg5q8yTuhHIfrLrao7UQ0pinBMIlWx8ZiSGH2uo5tRkMV7zrCJytQW/khVdU= X-Gm-Message-State: AOJu0YxD1movAqYkpeJErsuPXeQK4zMUiJ+Ag3CiYGqQtG4hUyCGPU/M 1CNfZZbGxW3KiS81pzhfqmyEcge0DqRR85iLmTvMh+2kuA5J3r4p X-Google-Smtp-Source: AGHT+IEFwG2a2heX4Ew995cCwgqt2PoyU07ntTaJh3FewhD2gMwaGCACyn6wl2SdSLJNhHXGxlQByg== X-Received: by 2002:a17:907:7676:b0:a70:7a10:de0b with SMTP id a640c23a62f3a-a75144301c5mr741197566b.21.1720014798072; Wed, 03 Jul 2024 06:53:18 -0700 (PDT) Received: from f (cst-prg-31-46.cust.vodafone.cz. [46.135.31.46]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a72aafb95b2sm509091766b.96.2024.07.03.06.53.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Jul 2024 06:53:17 -0700 (PDT) Date: Wed, 3 Jul 2024 15:53:08 +0200 From: Mateusz Guzik To: Linus Torvalds Cc: Christian Brauner , kernel test robot , oe-lkp@lists.linux.dev, lkp@intel.com, Linux Memory Management List , linux-kernel@vger.kernel.org, ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com Subject: Re: [linux-next:master] [lockref] d042dae6ad: unixbench.throughput -33.7% regression Message-ID: References: <3g3arsrwnyvv562v2rsfv2ms4ht4mk45vwdkvssxkrjhfjtpdz@umyx5tl2du7o> <6knlkefvujkry65gx6636u6e7rivqrn5kqjovs4ctjg7xtzrmo@2zd4wjx6zcym> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 1CB7F8001A X-Stat-Signature: cxf78xrtep891ifppdabc1esjqex1max X-HE-Tag: 1720014799-688096 X-HE-Meta: U2FsdGVkX18gYbHe4Jd5xKjBebxUNO5Yxmn5EQXFKrLHn1+4KYGhHAS7tI9ag4zLGn6bJaw2+uVijlUQWpWlLR3OnyWcXJLcgiC3nSTVyXDTz65BffnqL537yQRjJVJcbgaKyxNOOhRdaLpLOUSS82yrYg7aG37owr1d+2Be8vMZHyR5fDYraLItVOtWz1Ol29G4xZBaaAxlqxxfNLwmpAsZWaobOx5DMyDa/jYDGoSlvy24YRmmVfjfuN0gdLGvsT0Ge/3m0n7melFlXAEQMn4lXmqlLF3wk5C+0AMnoHq5ax70RIXXmI1SwZw6auvxR+Q09ySm5w2wQGYtnXiOIHEgNSpY7XVcT3JIbePWLCONK6T5Y6SYnS8bSZWD+lWldx6nqCIn6rO4G9gHZvLjyWcy9qGA/FeM6nThkMCc/tEch9k1sPxOu1lP0vk1tnZ97cI7vJjWqPXWZ8Ci/SsDaVxtN/4c4MZtsrEzBxfrja9ttZevgJRCbxgl7e6vpbiX0xwSqnBw94qjhDH6IoJXjvNgzRY1EMII1L5RrFRkJDMn8L9skBzoNsxQMkjHv1yJB9ESKU7v/CXnCiD/wljaPzNucn+mQEbOBJ2hPPYt6kH9Wc9O1WpMeSwa6383fXR6nbFHNIRFXL+nmn9jkSrsN4SpFAI0fQmgjhxX6xd4P+jpMe+eCA749W/xiUOIyjw5f04g60jrqUjs/R94XcsNqeTepNHYjQ5z85zf1FqLmsLViLrVeMZ2bL4c0njVuL/dvrkbbPSMWvPoukqi6KujRO8j43NIPnvs0HlHfgYDVT4duE/b5DQ5bs0ZjQk1c+eL0Oitjgbj3ITke5GbdRu+Q1dnMyNNLOrZdClqr85NVdCZh2DLfTF6+SO9fiNhAN4ks9rZ6g7elyNR/qU5wjrtInz81j4EpIXe8hr4Z1XULSSgx+l98ROFLyAjbnDCMcwFn1uSdVyoA8S8Md0VHQt 2TcFaqVy txJ6Cr6/DLu4Xhh7bmi9tqbHczf2ZOFBl+0PN8VJqlzHriHdh5vGgitGSX9a6/RsHSHuqMZ0l/KiCFotQlQpr/xx45GVZjySELnZ/aEqhVGM9fkuEAvrsLW0tj+FNJTm/G4UFkksWfcnWy38UZI+LtUG7eLsQ53fnFbOKZ+PY3hmFDuHM4F4Aaud6ixUA3dogGu//u6gdES5LiYLv/qqYSuFYkJgw43qU1EnqdQhu1Ds/65SEscu5bkL1vxdfFxiaB8VwhEf4iOXsfyyCEaBq9F6yUfAGsBIAeP2MWnVbywV8tOGLXv0sIGpBPdPkQ0/Rpv4Xhz/W+9/wOmD58DY0RBk8U7songZexrLOiFtX6n2k40I= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 02, 2024 at 03:14:54PM -0700, Linus Torvalds wrote: > On Tue, 2 Jul 2024 at 14:15, Mateusz Guzik wrote: > > > > I asked you something in the previous e-mail though (with some nastiness > > of the problem pointed out) concerning handling of slow vs fastpath, > > here it is again: > > > > [..]for example did you know xfs does not honor rcu grace periods when > > recycling inodes? > > I don't think it should matter. > > Yes, the vfs layer will access the inode, but then the sequence number > checks should always verify that whatever access we did is then > validated after-the-fact. > [..] > Or do you see something I'm missing? I'm not loving how we deal with > dentry->d_inode, but considering that the whole point of the RCU > lookup is that we don't hold any locks, I think it's almost > unavoidable. > Now I'm confused mate. Based on the convo so far I would expect you would consider the xfs thing a no-go for the machinery. You were rightfully pointing out the relationship dentry<->inode is not stable and care is needed to grab the pointer, and even then the pointer may be wrong by the time one finishes the work. I presume you are also worried about callbacks not taking proper steps when looking at the inode itself -- after all they can be racing with teardown and have to handle it gracefully by returning an error. This much comes with territory. Inode changing identities adds potential trouble which does not need to be there. Certain things remain stable, for example the inode type (regular, fifo etc.). This is where xfs not respecting grace periods comes in and removes it. It may happen to be for stat et al the way they can be implemented at the moment this does not matter, but then it may start being a factor in the future. Say a type-dependent union showed up and needs to be looked at: union { fifo_pointer_t *fifo_data; dev_t i_rdev; ...; }; area pointed to by fifo_data is rcu-protected. Then this: if (IS_FIFO(inode)) { fifo_pointer_t *f = rcu_dereference(inode->fifo_data); if (!f) return -ENODICE; /* use f here */ } while may look safe in terms of taking precatuions and bailing from the fast path happens to be wrong. Suppose the inode got reused and is now representing a device, i_rdev is some funky value. This can race to read that int and trap dereferencing what it found there. This would not happen if the grace period was respected. This is a bug class which does not need to be there and there may be other possible bugs I did not think of. There is also potential trouble with security modules as they unfortunately have a hook for getattr. They presumably can be massaged into handling a possibly dying inode (or failing with -ENODICE), I would not put any stake into handling an inode which becomes something else as they deal with it. As a side note this is where I should also note the *current* LSM hooks are racy as is. Suppose you can stat a particular file now, but a chown to 1234 means you can't. Then your stat call can race against chown + other ops. You can end up in a state where LSMs gave a green light and only then the state changed to one you are not allowed to look at. This would be solvable with per-inode sequence counters, but is arguably beyond the scope of this patch and I'm not interested in working on it. [I read the rest of the mail and have no immediate commentary, we will see after I take a stab at implementing this]