From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBF50C5475B for ; Wed, 6 Mar 2024 17:57:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 53CAB6B0081; Wed, 6 Mar 2024 12:57:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4ED5C6B0082; Wed, 6 Mar 2024 12:57:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B51E6B0083; Wed, 6 Mar 2024 12:57:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 272566B0081 for ; Wed, 6 Mar 2024 12:57:58 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 02746140A6E for ; Wed, 6 Mar 2024 17:57:57 +0000 (UTC) X-FDA: 81867372636.04.2B6C7E3 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf23.hostedemail.com (Postfix) with ESMTP id 2548614000D for ; Wed, 6 Mar 2024 17:57:55 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DxxO0SM4; spf=pass (imf23.hostedemail.com: domain of jthoughton@google.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709747876; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Iz/q/61+6k4jv/ntFkhv95Db7Kj6nibuvxIyJnMWHgM=; b=AOz4u4Gg3gysnLyrXpdkZcjsQdjmbOhqoZPFzZOGmukzC3wy5uirlNb4FM0ZcRIXq0fL9p q0N2BJzpNyKfpjFFFKx9fcSmd+G5mUtOc9DQ9oSfGGAY5bidFivd6H/LgkW6h5gp2uSBGg 2niBgVOsCjj60PaaWpFmdbj5PkRTHTU= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DxxO0SM4; spf=pass (imf23.hostedemail.com: domain of jthoughton@google.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709747876; a=rsa-sha256; cv=none; b=108sW54uo8eMfBi4+/dt9Vq6eS0w0C7YUVzwlKnBK0CdSuXSwKvfurKs5pwoJFYIJHJ4Xm DEaqMK8nncAcs/E+KKsWmIuYZUsbUrIw5/tnbp7ZU/brKRLoolmihyM6tnX5Lk9Y3xZ+of ppgWtxiqvEfpxzClv67lHOE00UxCXLo= Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-1dc9f4d1106so4525ad.0 for ; Wed, 06 Mar 2024 09:57:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709747875; x=1710352675; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Iz/q/61+6k4jv/ntFkhv95Db7Kj6nibuvxIyJnMWHgM=; b=DxxO0SM4uROGWne3mG0SJfFDOsleJWManfmEX8fO8WcPGZ3PpszNafj/HDMz4cypRi GF9z6tPlonoDUzbjQsvRq+KtK4L6vZyUOJeTnxsnoGFLSkF7DqhSe450hsPmFmC/sagm CQCMdGnQmQHJyNMRaWoIh2DcRGL4svoKeqwz6ozJ7fbKg3wYRAmksyKziRBqzIk/RcLw NTvghUMTeFkDubK6pnBmX9bgZMRJms6TOKMrs4Z7f07BMTaVeAZ2khK3fsJAPQ6WMwwj lFuVNrr7V3TfyPXH5h/xJN9k36nrjIDsbMHSDHfge86JQyEQESRTwSaCYA6vyRRRn77M aY/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709747875; x=1710352675; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Iz/q/61+6k4jv/ntFkhv95Db7Kj6nibuvxIyJnMWHgM=; b=SSrvaziwa4a0ML7Ig5IIZHO54BInkvTGd2I8VUarydVdQ/o81uTCg1ia8IqNE0NeUy VmY6fIUyAaoh6ibvXXocqIF5o0zcUgu3aDz2/a9UMFRMyJ7lJKjwxVwNkdQKJJcrilNN 6O15I5fPkK4UbLcehA+cIvLaBsro/58k7mL77B7P8ff+WvdFMqncFXYqaCm+ZChP2uZC zhgkXwlyh2Fe10DL6Ev3wbNmCkls08fotYQEGVv/rE/9+w/lKbFB20FFOd9dEB4nTrd6 gaJ0ZuK2kd/uVcksYZXemILL1+hWhMstD7NMeVN1Y1m8M8g9cKjvvf4zhZq3Axxxt8k/ kI0Q== X-Forwarded-Encrypted: i=1; AJvYcCXQpqXaS3XDV+Th9LPu/M6PzrlPgl8h1x2THnBb6p1ore7Hy8UNl+WLDThdsWmaLT7SCD10TGtyG5E3WhAWEjdsN0s= X-Gm-Message-State: AOJu0YxMCUZgbWMsoEKNd4c9BTfzIsX2uo9B0Zhewjh5g3267x5yTLtB 8U7wb5V4z7bJviWwio4/EVbhZmLh9XsdqMgYH6ELyCRFleYm6I9HToOTLI9tCD32mznMoCcwq0k cA20TXD+JnrHc/Gcf1BifQRnnhBj5/AERkEac X-Google-Smtp-Source: AGHT+IGONinR76McGlcNk+1yKf1YpKero3GD9FraFlLoMGj1RcyafvVPOHQGAkTJt08dI8zLZzKp+OmVcM7sg9VFamY= X-Received: by 2002:a17:902:684a:b0:1dc:89f6:8b8d with SMTP id f10-20020a170902684a00b001dc89f68b8dmr29328pln.5.1709747874585; Wed, 06 Mar 2024 09:57:54 -0800 (PST) MIME-Version: 1.0 References: <20240306001511.932348-1-jthoughton@google.com> In-Reply-To: From: James Houghton Date: Wed, 6 Mar 2024 09:57:17 -0800 Message-ID: Subject: Re: [PATCH] mm: Add an explicit smp_wmb() to UFFDIO_CONTINUE To: Peter Xu Cc: Axel Rasmussen , Andrew Morton , Muchun Song , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 2548614000D X-Rspam-User: X-Stat-Signature: ht8aj3yybikqh7qykgcx7mqnyz8ntqhf X-Rspamd-Server: rspam01 X-HE-Tag: 1709747875-129944 X-HE-Meta: U2FsdGVkX1/PkRwQVeUQLBgt1OLwhfCA/eSGFaVvpkL1cKfw8d4XKas+Q3xDGqCHJmCiIJdPWcNy34Gd+TpzlBLROaK0/a37E9xidgNizm0hpAWQdEQMxjzq4C9tssANFg9+JWXuNMo/JBVe+E444PIGsEOnhnw83WA7O4dS2QWOmyGMIHbpIzrWDZOXzYeYx4mANQjTPEJOpL6vmfqL3mD83yQok26EXx72F37dXRJUrzlGfM0HnVZeJYINZLu3ucVYEyhRT/iNBV8F2TI/nXBXOcdIN9VCvqYmy9CtwXnQqYxdDnADDLGpzm9qFGgZy/L0bs2fvF6+VQao1hgd/ltvbY4JoNXLMPHLHqM717ZWRZMByDO9zIcufeeWviJgbP6n8+nIxRHa9Know53ogNGuq/QxF0/A1MvW5J6v86MHn4OrWpTw02xdgeYUWJHmZNB20AFXROZBjJ96bg3qr4NG/VTKLhx8/eA2HXruTMxtzNY6SSAp2Dan+7VnpxUaV0QFw/HUFp4/6zxRTG1r7G1YwhKRzA+QnLHt2iheJyhVgoU5BGVnD1S0fgZyZMBxn8E1sPxDwLKIDprO2e4ShTCrXZWmrYJ4FA4ABa3CAwaSEkb8eKKj7IEbpg9KUK+GbPXZxQiyq5/Af0jfYDUNZ67UZF4V9zqaUnAHlSNJV6JLfN0L4+F8WqHmZBRBoNS4G7JkTxJAdq1aHXkGru8O1fnycrIJElXj79FCPyVOW14gHa28muXefgIpQNtODtHX6vONnNbRm37m63Hh/3md6NYbGspKw/ONjWUjR7DcslNW3x7R62/eoUUZfj8ZYqn35iAQ9xBe4CWHPS1WJVPDTQYwrX/zftUdptPJTLMOtyzCmrG/PxaVfw8BnyfPcDSM4IJtVOJRipastvjBeVILbPy+HMM42GlTsGrmsx7RtMsJMeCPUySsO17Il6Vn9aAk2vxlJ8xkBckTgp3Tu3R KcWzLcuk FwdLB0xNMbl1Tw35Bxtu0io8SVQj4C72lkos6FaHjv4Xafa9zYLQ+d1vJszLAic62o0fa4OrpMpqAua8/pREV/LnvbZzDlFbjGBM8FCM/2torBhR7FHBzIC1AWue3c0nCvZmu+voEgL5ko6yO/N3JL3KMh1iUCnzhaYhy/u/Iin43HQAqfGrC/SI5G9RTWJxFRG4nHug1KZdeTbY3gQLH7QlvLmpauIVeQGQaeniss8DsLfOmDvVYHHSS1TLP7IvWVikry19WkZ8erCQq3EhA1Ii8B5tuLt9EqN64ptCzznXJosjNA1Bia+5xAzKnBk5J77NS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 5, 2024 at 7:41=E2=80=AFPM Peter Xu wrote: > > On Wed, Mar 06, 2024 at 12:15:10AM +0000, James Houghton wrote: > > I've tried to see if I can legitimately get a user to read stale data, > > and a few attempts with this test[2] have been unsuccessful. > > AFAICT that won't easily reproduce even if the problem existed, as we > contain so many implict memory barriers here and there. E.g. right at th= e > entry of ioctl(), mmget_not_zero() already contains a full ordering > constraint: > > /** > * atomic_inc_not_zero() - atomic increment unless zero with full orderin= g > * @v: pointer to atomic_t Oh yes, of course. Thanks for pointing this out. So we definitely don't need a Fixes. > > I was expecting the syscall routine will guarantee an ordering already bu= t > indeed I can't find any. I also checked up Intel's spec and SYSCALL inst > document only has one paragraph on ordering: > > Instruction ordering. Instructions following a SYSCALL may be > fetched from memory before earlier instructions complete executio= n, > but they will not execute (even speculatively) until all > instructions prior to the SYSCALL have completed execution (the > later instructions may execute before data stored by the earlier > instructions have become globally visible). > > I guess it implies a hardware reordering is indeed possible in this case? That's my understanding as well. > > > > > [1]: commit 153132571f02 ("userfaultfd/shmem: support UFFDIO_CONTINUE f= or shmem") > > [2]: https://gist.github.com/48ca/38d0665b0f1a6319a56507dc73a173f9 > > > > mm/hugetlb.c | 15 +++++++++------ > > mm/userfaultfd.c | 18 ++++++++++++++++++ > > 2 files changed, 27 insertions(+), 6 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index bb17e5c22759..533bf6b2d94d 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -6779,12 +6779,15 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, > > } > > } > > > > - /* > > - * The memory barrier inside __folio_mark_uptodate makes sure tha= t > > - * preceding stores to the page contents become visible before > > - * the set_pte_at() write. > > - */ > > - __folio_mark_uptodate(folio); > > + if (!is_continue) { > > + /* > > + * The memory barrier inside __folio_mark_uptodate makes = sure > > + * that preceding stores to the page contents become visi= ble > > + * before the set_pte_at() write. > > + */ > > + __folio_mark_uptodate(folio); > > Can we move the comment above the "if", explaining both conditions? Yes, I'll do that. I think the explanation for WARN_ON_ONCE(!folio_test_uptodate(folio)) is: We only need to `__folio_mark_uptodate(folio)` if we have allocated a new folio, and HugeTLB pages will always be Uptodate if they are in the pagecache. We could even drop the WARN_ON_ONCE. > > > + } else > > + WARN_ON_ONCE(!folio_test_uptodate(folio)); > > > > /* Add shared, newly allocated pages to the page cache. */ > > if (vm_shared && !is_continue) { > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > index 503ea77c81aa..d515b640ca48 100644 > > --- a/mm/userfaultfd.c > > +++ b/mm/userfaultfd.c > > @@ -531,6 +531,10 @@ static __always_inline ssize_t mfill_atomic_hugetl= b( > > goto out_unlock; > > } > > > > + if (uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) > > + /* See the comment in mfill_atomic. */ > > + smp_wmb(); > > + > > while (src_addr < src_start + len) { > > BUG_ON(dst_addr >=3D dst_start + len); > > > > @@ -743,6 +747,20 @@ static __always_inline ssize_t mfill_atomic(struct= userfaultfd_ctx *ctx, > > uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) > > goto out_unlock; > > > > + if (uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) > > + /* > > + * A caller might reasonably assume that UFFDIO_CONTINUE > > + * contains a wmb() to ensure that any writes to the > > + * about-to-be-mapped page by the thread doing the > > + * UFFDIO_CONTINUE are guaranteed to be visible to subseq= uent > > + * reads of the page through the newly mapped address. > > + * > > + * For MFILL_ATOMIC_COPY, the wmb() is done for each COPY= ed > > + * page. We can do the wmb() now for CONTINUE as the user= has > > + * already prepared the page contents. > > + */ > > + smp_wmb(); > > + > > Why you did it twice separately? Can we still share the code? > > I'm wildly guessing: I don't worry on an extra wmb() in failure paths, as > that's never a performance concern to make failure slightly slower, IMHO. You're right, I shouldn't care so much about making the slow paths a little bit slower. I'll move the wmb earlier so that we only need it in one place. > > Thanks, Thanks Peter!