From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EC0BC48292 for ; Mon, 5 Feb 2024 21:47:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C911B6B0072; Mon, 5 Feb 2024 16:47:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C1ABA6B007B; Mon, 5 Feb 2024 16:47:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A94016B007D; Mon, 5 Feb 2024 16:47:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 929126B0072 for ; Mon, 5 Feb 2024 16:47:10 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 5ECFD140220 for ; Mon, 5 Feb 2024 21:47:10 +0000 (UTC) X-FDA: 81759086220.22.ECCF4C2 Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com [209.85.128.175]) by imf02.hostedemail.com (Postfix) with ESMTP id BD7A880006 for ; Mon, 5 Feb 2024 21:47:08 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=hwBbRKj1; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of surenb@google.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707169628; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YSizpbEWnE13YJNMJ42hfW3O1AVhhtGuGyUG9Qkenj8=; b=7rf7u5AOjZ7XNJhMY81wPsXAfRA7/rpVt5TVJHiC9wwUjM1TDj0lbCZfQDJpm7/s/x4zJz JfQi0qGdneniW2BKDy/t3eaKeV8y8S5ZkhTjtxrvDo7a/xdNvemjB71yCDM1bYa0Ne4SlL hVsrR1oNCJfmqxsqecojI5qlqpoltfQ= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=hwBbRKj1; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of surenb@google.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707169628; a=rsa-sha256; cv=none; b=V83iPMaEWnVYRva9iZaN1x77J1ZpBLOYpCusAVAKHehqQBTB1Bkbnmo3EdXf4WKYqBXOqb IQi6pf1bYF8WU+RKiGWuj5mNb4Nw5xgzinoU4ahOpGG7WKTMvF7tarIHlxsuGZFDJtkpsq nJ14e8GS6cCD+HjVm8OCkuFI0Wd67S0= Received: by mail-yw1-f175.google.com with SMTP id 00721157ae682-60412f65124so45531877b3.3 for ; Mon, 05 Feb 2024 13:47:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707169628; x=1707774428; darn=kvack.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=YSizpbEWnE13YJNMJ42hfW3O1AVhhtGuGyUG9Qkenj8=; b=hwBbRKj16tj6Yr2x1mqGPLGhCR9xrPGeCBWaI4NA6d1W9JsWctrZp3kqrxyRinLqbX 3M5yuJxvI4Iu1qhX/vIKwWNgDk/KrjWlc7w790T0c802OBlCD+jePzGn2HOxsi+NrFga /CdLwxb2RpHscECk0LoM+oe6AsF9z3i9mH7FrCBwcmMgJrtnyl9iar3uZlmL2x1j4cta zP2v+Z6pQwqgZEddYjtTU1F59NpyB+MFIlB9ZG3xKkAn1KSr9Mev55rE1Yo7WsXjSCA+ ElbUZLvDQPAlqDE3233OSSTxTJZD5VeTJyuXQhtk00MeSqbe3U2jB3YP+QFGkVhaOvSM OBAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707169628; x=1707774428; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YSizpbEWnE13YJNMJ42hfW3O1AVhhtGuGyUG9Qkenj8=; b=KAKxP34UIHycOCTvh/B4Bg2W8q0jV4CbAr7Z+aOVuU2S67Fa9L1w4W/Op/td2/4Y5R Y6Gjqw1qS6qRHJWcXU3Jl4Q5HSXQ1ol4FBBSOygt1tnaZ46MGzfkgkoF/uQXS8SSSQd+ h1Xr06rLFeLpAHXKikUE0Z2J+WcVPyEqzDNXOSAf8eKvNLAPhfxRToqvCMzKwfUbr3TR tP3iKf6T5wWe1sUVN+Ftltzo74fzmsOBETKmdZ1OFW/qwGsgwVcEaECdW3t3T2H9PauR 1+rL2/7HrF1axSymSmI2edxxA0MknEoBGVLqFQZ3ij+ptlytjFxBvHt3QCvP5+1tl18+ 7KLQ== X-Gm-Message-State: AOJu0YxXYGUsBwUmc8b3MGkiVP7Xe37DSKwDJlIIclAbeS+F496O6hxU dtonf06WCuuve/Ek76xNPfd3EZts9Z8So76fh1vBNB3AvGUdo/TTyTUIksTJ2zmeyXUfu8WR6Yv 0YDwH+bXFeSLShMamFwcdBt276Upk8hs5WWWv X-Google-Smtp-Source: AGHT+IEphcfl8B473ExqoDpauaRG7hqodOc+VOgJG3cmKcSbGn/sTQn22Y3ofZYLc+wnDRSnVqOObRe/rssntFsrq00= X-Received: by 2002:a81:4e44:0:b0:5ff:5fda:fdae with SMTP id c65-20020a814e44000000b005ff5fdafdaemr898990ywb.42.1707169627564; Mon, 05 Feb 2024 13:47:07 -0800 (PST) MIME-Version: 1.0 References: <20240129193512.123145-1-lokeshgidra@google.com> <20240129193512.123145-4-lokeshgidra@google.com> <20240129203626.uq5tdic4z5qua5qy@revolver> <20240130025803.2go3xekza5qubxgz@revolver> <20240131214104.rgw3x5vuap43xubi@revolver> In-Reply-To: <20240131214104.rgw3x5vuap43xubi@revolver> From: Suren Baghdasaryan Date: Mon, 5 Feb 2024 13:46:54 -0800 Message-ID: Subject: Re: [PATCH v2 3/3] userfaultfd: use per-vma locks in userfaultfd operations To: "Liam R. Howlett" , Lokesh Gidra , Suren Baghdasaryan , akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, selinux@vger.kernel.org, kernel-team@android.com, aarcange@redhat.com, peterx@redhat.com, david@redhat.com, axelrasmussen@google.com, bgeffon@google.com, willy@infradead.org, jannh@google.com, kaleshsingh@google.com, ngeoffray@google.com, timmurray@google.com, rppt@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: BD7A880006 X-Stat-Signature: qxn8trpiaggae9qd4o8753epkgicroco X-HE-Tag: 1707169628-765378 X-HE-Meta: U2FsdGVkX1+cQXPy/pC6orSfPFnldTkWBgdCE/bBcA6eeJ6YPED2tKMFPLHZZ+BHueBSybEPkxbu0OQ/7HsZji8G8l6yvQNrtInwmAVTxbi2FQ/NckZ5kqd48lOdmM5UZPM80w6/XY0SnTZF9FjnuhaF9JEzv0/2jIue0URPvmvQQQgiMQtyEJuU9vSlNhStWsitxqi8E41J/QNSYm4Jy+mQCPIVxHJsewnx7xsVqnPSaqI61eXKcLGz/bZj8cBJkf6x+MbqGdc5jc0ml1htEIYNx5yY+BYx8VtX5HR9uzpYvoC8Tig4KcrySZ/590PqPCDBFCXQOC27muOZ1QHstFDlXBoG6X74m+m9pB+KSd0epxMK9T7vRXW6vOHLkKX1sWrwuNScQ5hlaxjLQR6Q4JbG1Z7dzr13LkqGXyooNQs3JuawiyM4XJOPHM4OF0qhHnpwYLr+4ZboTFQu6/qfkVc0LxIgWkLPiM2reRzoAmJCPEBJ1p0K0ulqrO7piimkgugG62E56BPk0un0U5SlugzSu8hQFLBizk7C99DQdVWYcUC30Eme9OobDSWQsLKtFdtj57LvOqNCEA4Pb7ugPlg+wpqRCx0+DXbLMX3QtAFc3/GUeuCPVHmsHcZ3FKNDSLWkglXrtxhsbN4MnjkaTcyngg3ScEZ1lFcxUQg9W41TutGPlmVIbU4dz9Wf3M/QfgBXWLxfFIyRX9nZen8me3T/4MYhRMhQc0gMYo6Ur/fjJAMM5Xec9noqTTAeSFgDLQ47SRDY/uBTYXmfrMbU9W7oxeGUEruYwDmKvNcyp6QC7maJ8AhEP1U/SEiwwfSsTC1JNIjJyCdZevqS23vlHMclLtuiKm7BRca0TBG5RdjPbkvadLld7HQsYM7ErRScmzuNgp07XGTPj4go0ZZdmXIge4d4vlx81448bxOs9xktxbTqM8KqAR0VFrR4tfpMlTlI2SCmpidgOZXhsz7 7g0fyVua fKoIh X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 31, 2024 at 1:41=E2=80=AFPM Liam R. Howlett wrote: > > * Lokesh Gidra [240130 21:49]: > > On Mon, Jan 29, 2024 at 6:58=E2=80=AFPM Liam R. Howlett wrote: > > > > > > * Lokesh Gidra [240129 19:28]: > > > > On Mon, Jan 29, 2024 at 12:53=E2=80=AFPM Suren Baghdasaryan wrote: > > > > > > > > > > ... > > > > > > > > Your suggestion is definitely simpler and easier to follow, but due= to > > > > the overflow situation that Suren pointed out, I would still need t= o > > > > keep the locking/boolean dance, no? IIUC, even if I were to return > > > > EAGAIN to the userspace, there is no guarantee that subsequent ioct= ls > > > > on the same vma will succeed due to the same overflow, until someon= e > > > > acquires and releases mmap_lock in write-mode. > > > > Also, sometimes it seems insufficient whether we managed to lock vm= a > > > > or not. For instance, lock_vma_under_rcu() checks if anon_vma (for > > > > anonymous vma) exists. If not then it bails out. > > > > So it seems to me that we have to provide some fall back in > > > > userfaultfd operations which executes with mmap_lock in read-mode. > > > > > > Fair enough, what if we didn't use the sequence number and just locke= d > > > the vma directly? > > > > Looks good to me, unless someone else has any objections. > > > > > > /* This will wait on the vma lock, so once we return it's locked */ > > > void vma_aquire_read_lock(struct vm_area_struct *vma) > > > { > > > mmap_assert_locked(vma->vm_mm); > > > down_read(&vma->vm_lock->lock); > > > } > > > > > > struct vm_area_struct *lock_vma(struct mm_struct *mm, > > > unsigned long addr)) /* or some better name.. */ > > > { > > > struct vm_area_struct *vma; > > > > > > vma =3D lock_vma_under_rcu(mm, addr); > > > if (vma) > > > return vma; > > > > > > mmap_read_lock(mm); > > > /* mm sequence cannot change, no mm writers anyways. > > > * find_mergeable_anon_vma is only a concern in the page faul= t > > > * path > > > * start/end won't change under the mmap_lock > > > * vma won't become detached as we have the mmap_lock in read > > > * We are now sure no writes will change the VMA > > > * So let's make sure no other context is isolating the vma > > > */ > > > vma =3D lookup_vma(mm, addr); > > > if (vma) > > We can take care of anon_vma as well here right? I can take a bool > > parameter ('prepare_anon' or something) and then: > > > > if (vma) { > > if (prepare_anon && vma_is_anonymous(vma)) && > > !anon_vma_prepare(vma)) { > > vma =3D ERR_PTR(-ENOMEM); > > goto out_unlock; > > } > > > vma_aquire_read_lock(vma); > > } > > out_unlock: > > > mmap_read_unlock(mm); > > > return vma; > > > } > > Do you need this? I didn't think this was happening in the code as > written? If you need it I would suggest making it happen always and > ditch the flag until a user needs this variant, but document what's > going on in here or even have a better name. I think yes, you do need this. I can see calls to anon_vma_prepare() under mmap_read_lock() protection in both mfill_atomic_hugetlb() and in mfill_atomic(). This means, just like in the pagefault path, we modify vma->anon_vma under mmap_read_lock protection which guarantees that adjacent VMAs won't change. This is important because __anon_vma_prepare() uses find_mergeable_anon_vma() that needs the neighboring VMAs to be stable. Per-VMA lock guarantees stability of the VMA we locked but not of its neighbors, therefore holding per-VMA lock while calling anon_vma_prepare() is not enough. The solution Lokesh suggests would call anon_vma_prepare() under mmap_read_lock and therefore would avoid the issue. > > Thanks, > Liam