From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74868CD1288 for ; Thu, 4 Apr 2024 21:04:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E4E356B0095; Thu, 4 Apr 2024 17:04:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DFE9B6B0098; Thu, 4 Apr 2024 17:04:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C9F386B009A; Thu, 4 Apr 2024 17:04:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AE9006B0095 for ; Thu, 4 Apr 2024 17:04:45 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6D6328106E for ; Thu, 4 Apr 2024 21:04:45 +0000 (UTC) X-FDA: 81973078530.19.6D6C9A1 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 52CA6A0006 for ; Thu, 4 Apr 2024 21:04:42 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JqeQpyPC; spf=pass (imf15.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712264682; a=rsa-sha256; cv=none; b=UXbGv1Az0aIKwX28mLh4d7lRbpkqtiDDU8HCrYgrdOKKBUwtNdCR9En4necHtSnzZ4EBAs AyQsFtq+K/pOtjVyPNS/FPLKrU4Hkp6p3c5tKdtpJsNx+iwYsXW/RF4UxWT/cIkbjKmHuV DWvTlu5k+5vweuN5Nww+Cisv5fvbJGw= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JqeQpyPC; spf=pass (imf15.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712264682; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hUgDUhG3HIXFbkjkshMpADyV4u/Ui5LQyO5DFIIXqgk=; b=5BFMqpUwsC4gTUnP2I+FKJij4RbuwNqnTU/KDNQcT7rJtffxvximM+6yirzP3iMgBwny7/ W1xqwrdHSX2iqDAP5KHUuNWA9CSMw/+XAjx7p0Tt9pU+1cNLwV3ggLa+sUMxYrjaKIhD6T +Yvb0eTrBNr/v3UgIH/ba4mSYG+5o+M= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712264681; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hUgDUhG3HIXFbkjkshMpADyV4u/Ui5LQyO5DFIIXqgk=; b=JqeQpyPC9J4XISDaPiNPQs+NyBFinw0PLk+XFxYO4kTQyTG2XypE5oyPnUISxHdAitfwHw WujDwum1nMWAPLkiryeh6gynFI9rWRQQh+5HSl/MAqPnrEqUyqEB26Hxv81jz7hOm0cn4O 0HF3sEyn9LRPItMKDB5KjqYzihxLy/k= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-26-JuzdJGdaPpCbHuSGVz7ymw-1; Thu, 04 Apr 2024 17:04:40 -0400 X-MC-Unique: JuzdJGdaPpCbHuSGVz7ymw-1 Received: by mail-qv1-f69.google.com with SMTP id 6a1803df08f44-69939cb15faso1394606d6.0 for ; Thu, 04 Apr 2024 14:04:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712264679; x=1712869479; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hUgDUhG3HIXFbkjkshMpADyV4u/Ui5LQyO5DFIIXqgk=; b=oa34TwiI9C3RSDNumfH+NEnRIrZGnX2Ry/5C+9UmlvTITBeuHO0ONbMYH16ZyDyMRw Zm7BT2ZfD+RAqSI/Jut1JZQFrAcdoerlIEtE/LZup3qLob7VAmb7QKy0lvseVMi8Ms5y Zhbw3lJRYb/3lMp3DomcaTlvtAlsQW00oz+hKSHKnmq3S15b/zqycDL9Rln6iA426qAD aUc5SBUOtktmE8B/0B+K65VhmI9q/SYP6sFi/gnD7IMrs7YL3qIWtoglYkNQ3itvKogK 2miqiSsmiekoSC1OiIcREGO6t+TuGb9v/klpqCcicsdCU2/cM/4h08zU+3On8k4VotaW AUFw== X-Forwarded-Encrypted: i=1; AJvYcCUjnol89S+SyGsHlppBGpzx3FdZwPd6Rhksmznl4OH2dDkRYAwgQ4wprPqZdGylzuqW+n7ihGKTnYzSmAyeXB/+dME= X-Gm-Message-State: AOJu0Yxgj6cIgYgS2QJqH+m1DzIP53w3lZ4gVkAd7kQzpyyijpGjmbTn vmSafxG/iqwvWNRdamX52SaGRG+I6sNYbvNs5xPNMMj4wXIJ6ITaSi45QR07envDZSxYB9GazKQ 9SGv/ZHS2ytpHjmGywE49kRjwcFK+Q4ZaR3sV1rT3mThollSo X-Received: by 2002:a05:6214:da1:b0:690:c35c:7590 with SMTP id h1-20020a0562140da100b00690c35c7590mr3582422qvh.2.1712264679512; Thu, 04 Apr 2024 14:04:39 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGd25oECSiWeotr+DeqrKMuWiZa2C+O1IQw5Iq5hwr/oAUoe2kVPaxSIx4TWN3eZ1JafmxKGA== X-Received: by 2002:a05:6214:da1:b0:690:c35c:7590 with SMTP id h1-20020a0562140da100b00690c35c7590mr3582391qvh.2.1712264678979; Thu, 04 Apr 2024 14:04:38 -0700 (PDT) Received: from x1n ([99.254.121.117]) by smtp.gmail.com with ESMTPSA id j9-20020a0ceb09000000b0069931523262sm61042qvp.129.2024.04.04.14.04.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Apr 2024 14:04:38 -0700 (PDT) Date: Thu, 4 Apr 2024 17:04:36 -0400 From: Peter Xu To: Suren Baghdasaryan Cc: Matthew Wilcox , Lokesh Gidra , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, aarcange@redhat.com, david@redhat.com, zhengqi.arch@bytedance.com, kaleshsingh@google.com, ngeoffray@google.com Subject: Re: [PATCH] userfaultfd: change src_folio after ensuring it's unpinned in UFFDIO_MOVE Message-ID: References: <20240404171726.2302435-1-lokeshgidra@google.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 52CA6A0006 X-Stat-Signature: 4ottey5nt3axdodkzkye6e7rss1a6ad7 X-Rspam-User: X-HE-Tag: 1712264682-605473 X-HE-Meta: U2FsdGVkX1+1eobCltvvA5ag9eF4M18cdWSo4jTy/bPekdczZfw6rcjLMuoLS0xuxV0ee/YZ7Nc4J8zag2vx7I+voOMasNSsW/fsZdCy0foBSSK0+chntxTtA3iLUnvWN66ifcKUW/9kQlZ+dNMP5yC6pFrA3ilXPEUYAAF8NBboX2epQ+JFbliqBhQk+TYFV+d7/q12M0mw8tz8sgw+C5/e655A8d72enhrXGIFlHgjNbcyGX8FJe0wE0ldRUtK587cfXDUK8X5QsCNrpkjbk9SFAy0yY1fkLajfS0zNsCWVVFCpJP2/LL2uI2ruxbc/kqRG4sE5ek7rCn/6ZD4SFnPddHLUdrwqbpT73rQDmDIhgpg/DZ3w/7wVljfVYFoi1B55oyDUEVe7vrToO4fOpd5sfIJLHFfrazy1TtSFb9zcMo2CNmLQAdi/u/ibVIXbVS17aAAZm7IC9r4OpACM7LUQwSDhEeajTt7swIhQn2EY7bpYKRlzY97CxZBMuioHgw/nKNbBGEMRhH1Fl4iImoYAliz9LgnNGXJC5VxMtaEJihXcuflrU8WpYKzStJMaAcXnzgIlYki9CO/gCRRf+2PD7yND8Vh/f0cIoNt6d8NAbEfj6oshXS+8/d1W2es0LzWdP+O7C0BYnLS0u0Re8deJ44ofbGNsDLDndGIuRR5m7m2CK2VxGxUQHHL4BfheWxhlI2jy9GUgsM5s8xvFYu0JfqMxPKJivLcgv1D5alJMBxRdqPyb83BHEXApIUCIAK/GUbwOswFca12NF5sQ/WK27cK6k/VxBcUFa8cHr67mYiU7xb2XlhcbqcPyJAmE8iAEoAxG/njsXtA+p9o2vOSJFn5jKTDYRZkHCoAcC+T04h22nUtbB77FnaEftjeXPVgSXLTUVO9OeFXigAhZjDE36Tu0pxyrtNg0ydbroBhSBwnZpEVZ/Suw631jkZPl6vpCm74eMsKn3/g78Q t+p4ZZW9 1d9Va4alf+gdRxKMctHyJAio2A3Qj54wvCSt4v3ADQTL65FapyInGJ6Tzs2qpYadsCW29ZAkZfZabcXJDFdga7MDVFjCF2aiEByUwpWqnWa7to7c5bvqSU+uh6w5HFhaacoKsNclC5ayq0qzK9JuGv7zFSNAEOFqJqIoVTlpNQ9SlhBqGgAdWAujAX6ec5Qm74bdZ+E/hn6oU85gvTsYLVczrEOEdg4g7BovcO1r7/0NRs0ZRXCNh89dM1BplbOZLGdH3ARcMpyouESeQaKQcSswVlyv7f+wG0piVEa+RTd8K3fbsFX4itJYkTA2AtlTDO6TJ5jl6ACZWzylLtXxYvoLzceJua13R8cbquPMosdRDEzWqfJGOpLSD3PmYdxLba3cXY/9ecDhJ+bbcFOSUeCoZAZhqfyrWsmbf+CuRyVhUjaH/g9pTI8MdbAxVBrTL2AE+K7m5tywNDXk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 04, 2024 at 01:55:07PM -0700, Suren Baghdasaryan wrote: > On Thu, Apr 4, 2024 at 1:32 PM Peter Xu wrote: > > > > On Thu, Apr 04, 2024 at 06:21:50PM +0100, Matthew Wilcox wrote: > > > On Thu, Apr 04, 2024 at 10:17:26AM -0700, Lokesh Gidra wrote: > > > > - folio_move_anon_rmap(src_folio, dst_vma); > > > > - WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, dst_addr)); > > > > - > > > > src_pmdval = pmdp_huge_clear_flush(src_vma, src_addr, src_pmd); > > > > /* Folio got pinned from under us. Put it back and fail the move. */ > > > > if (folio_maybe_dma_pinned(src_folio)) { > > > > @@ -2270,6 +2267,9 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm > > > > goto unlock_ptls; > > > > } > > > > > > > > + folio_move_anon_rmap(src_folio, dst_vma); > > > > + WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, dst_addr)); > > > > + > > > > > > This use of WRITE_ONCE scares me. We hold the folio locked. Why do > > > we need to use WRITE_ONCE? Who's looking at folio->index without > > > holding the folio lock? > > > > Seems true, but maybe suitable for a separate patch to clean it even so? > > We also have the other pte level which has the same WRITE_ONCE(), so if we > > want to drop we may want to drop both. > > Yes, I'll do that separately and will remove WRITE_ONCE() in both places. Thanks, Suren. Besides, any comment on below? It's definely a generic per-vma question too (besides my willingness to remove that userfault specific code..), so comments welcomed. > > > > > I just got to start reading some the new move codes (Lokesh, apologies on > > not be able to provide feedbacks previously..), but then I found one thing > > unclear, on special handling of private file mappings only in userfault > > context, and I didn't know why: > > > > lock_vma(): > > if (vma) { > > /* > > * lock_vma_under_rcu() only checks anon_vma for private > > * anonymous mappings. But we need to ensure it is assigned in > > * private file-backed vmas as well. > > */ > > if (!(vma->vm_flags & VM_SHARED) && unlikely(!vma->anon_vma)) > > vma_end_read(vma); > > else > > return vma; > > } > > > > AFAIU even for generic users of lock_vma_under_rcu(), anon_vma must be > > stable to be used. Here it's weird to become an userfault specific > > operation to me. > > > > I was surprised how it worked for private file maps on faults, then I had a > > check and it seems we postponed such check until vmf_anon_prepare(), which > > is the CoW path already, so we do as I expected, but seems unnecessary to > > that point? > > > > Would something like below make it much cleaner for us? As I just don't > > yet see why userfault is special here. > > > > Thanks, > > > > ===8<=== > > diff --git a/mm/memory.c b/mm/memory.c > > index 984b138f85b4..d5cf1d31c671 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3213,10 +3213,8 @@ vm_fault_t vmf_anon_prepare(struct vm_fault *vmf) > > > > if (likely(vma->anon_vma)) > > return 0; > > - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { > > - vma_end_read(vma); > > - return VM_FAULT_RETRY; > > - } > > + /* We shouldn't try a per-vma fault at all if anon_vma isn't solid */ > > + WARN_ON_ONCE(vmf->flags & FAULT_FLAG_VMA_LOCK); > > if (__anon_vma_prepare(vma)) > > return VM_FAULT_OOM; > > return 0; > > @@ -5817,9 +5815,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > > * find_mergeable_anon_vma uses adjacent vmas which are not locked. > > * This check must happen after vma_start_read(); otherwise, a > > * concurrent mremap() with MREMAP_DONTUNMAP could dissociate the VMA > > - * from its anon_vma. > > + * from its anon_vma. This applies to both anon or private file maps. > > */ > > - if (unlikely(vma_is_anonymous(vma) && !vma->anon_vma)) > > + if (unlikely(!(vma->vm_flags & VM_SHARED) && !vma->anon_vma)) > > goto inval_end_read; > > > > /* Check since vm_start/vm_end might change before we lock the VMA */ > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > index f6267afe65d1..61f21da77dcd 100644 > > --- a/mm/userfaultfd.c > > +++ b/mm/userfaultfd.c > > @@ -72,17 +72,8 @@ static struct vm_area_struct *lock_vma(struct mm_struct *mm, > > struct vm_area_struct *vma; > > > > vma = lock_vma_under_rcu(mm, address); > > - if (vma) { > > - /* > > - * lock_vma_under_rcu() only checks anon_vma for private > > - * anonymous mappings. But we need to ensure it is assigned in > > - * private file-backed vmas as well. > > - */ > > - if (!(vma->vm_flags & VM_SHARED) && unlikely(!vma->anon_vma)) > > - vma_end_read(vma); > > - else > > - return vma; > > - } > > + if (vma) > > + return vma; > > > > mmap_read_lock(mm); > > vma = find_vma_and_prepare_anon(mm, address); > > -- > > 2.44.0 > > > > > > -- > > Peter Xu > > > -- Peter Xu