From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD6EFC4828D for ; Wed, 7 Feb 2024 21:22:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 38B0C6B0082; Wed, 7 Feb 2024 16:22:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 33ABC6B0083; Wed, 7 Feb 2024 16:22:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 229E16B0087; Wed, 7 Feb 2024 16:22:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 111216B0082 for ; Wed, 7 Feb 2024 16:22:45 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AF10912095E for ; Wed, 7 Feb 2024 21:22:44 +0000 (UTC) X-FDA: 81766282248.08.CFB39F5 Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) by imf06.hostedemail.com (Postfix) with ESMTP id D3E1A180008 for ; Wed, 7 Feb 2024 21:22:42 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=mM+k0UrR; spf=pass (imf06.hostedemail.com: domain of jannh@google.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707340962; a=rsa-sha256; cv=none; b=365GNJXLua3mG06r6qMsozA0/1m3CO3mNK58U5KY0QJuqViAv7jAQmYHzHiZfmB9UZ85tI g3o6Du7n2KDymlwYeEyjYFGRyXI0RmZI3XYUC4gUelLaBH/SsnzsJQ7gDasR2Cg+1ytnBH rTfQFqQfZzO5jZgXRid/4hwMmu5jYbg= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=mM+k0UrR; spf=pass (imf06.hostedemail.com: domain of jannh@google.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707340962; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nOqzPRykbp6YnfN+MvcqOmBBzcnuz5hTg+hCiKSQFW4=; b=g5tH9e/Yl230zQn3WyokBG7+EELTjGjjNc5LVOirK/n/9SPIDWEcoQ6UyMc9vQAbjEhy84 3tj23Q1wXGzYvrNI1RSl5Sypgg5pYb8KDKea0yVS9ePcwwwNbBajBwbSBSocltAcHLGow7 yra976gbZ51W0pNg1V/Peli5ZGom1G8= Received: by mail-ed1-f51.google.com with SMTP id 4fb4d7f45d1cf-56037115bb8so3464a12.0 for ; Wed, 07 Feb 2024 13:22:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707340961; x=1707945761; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nOqzPRykbp6YnfN+MvcqOmBBzcnuz5hTg+hCiKSQFW4=; b=mM+k0UrRr0Osl8PMmLAay/hoT90VVodTh8MH5xtt4ZtGUTzcHBoMOVFCcmsECWrPjJ x4dahYPx+ZKKn/ixswn8WnrlKSpCDqyIaqpvSGqesGtm8oOfLEZ7A+iFcAKKdOBdZaF6 sRmRgBlc3Zj8dD9kaZ0HGsXp/HseHi/SbSrzhb7p3oJfj8HpEX0w8NVb05ZMeSBLN9AP pW4HEwuLc/Qp/gm88YeWz+h4a3otevrQyLy1ZiDaBFyHZR94FIJXkkDkG8nBzH/CzlNU Y6kQtXddEh+4uCM9bFgNh/neRumK2X8AsEl1tcvXUWcMhm78CY88hAoQOJuUgRab8qI9 Micw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707340961; x=1707945761; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nOqzPRykbp6YnfN+MvcqOmBBzcnuz5hTg+hCiKSQFW4=; b=XXzTsIRZL8v9prSfiXC7aaAcz1IerRNmKJliv6Y0+aURMuBFU73/54TkHFoEsOChyA buOUI7ykFZN3LuUy18FDxotvWyE9/fSPZB9e2rWfJQlKnpBIobdUi2JkNhm+ExKpl8RO fRhZnJ7irsHo0Ni4cnOXeqxYJd/LU7sAVlZ7FgVwA1Ek33B7uTAKMFAxdSAihiFANC3k LIwHiyHisGyygYLLptXE5h5qzVzHKIqpHar7RQRSaVGU8at14OTWhQ26CwH7zeq90kKP EmUp0c4CjOnXNysn4ihkl11A7Tdm9spC5sxobaLxvHkagqCbnttyyD+brpAFxlDtN75m d2mw== X-Forwarded-Encrypted: i=1; AJvYcCXfiqcA28hx2fKKR7U2CQr4OQ3//f5iLLNyK/VT+oiv6K7k76LwDJuRmQzcVj8jzie7YLDlsznEyVTLiF69MDAsbG0= X-Gm-Message-State: AOJu0YwjphuRw834VkbJEmXHH8jNqi3S4N/Tu0Y6bKr63oBnLE1VrrFa s/AMkBSPNkYNQEoTd7mcReP1rT+uLGCAoHLssfpqFF5o5y7ZXcFwKzVmJPgH2WWl6ZBhoZfDgsT 6BVftsXKC/3PoMhVaauHQ/Cb8rTRShPmwRpwR X-Google-Smtp-Source: AGHT+IFZLDOLewby3fPcJ2/xTe7Q7BYva9t0F0dbjHitWJFDydBe0Rgh+gs9IznsVWYrixoO8DjRgm80BJu8J4yc2Tw= X-Received: by 2002:a50:d64e:0:b0:560:f37e:2d5d with SMTP id c14-20020a50d64e000000b00560f37e2d5dmr101860edj.5.1707340961021; Wed, 07 Feb 2024 13:22:41 -0800 (PST) MIME-Version: 1.0 References: <20240206010919.1109005-1-lokeshgidra@google.com> <20240206010919.1109005-4-lokeshgidra@google.com> In-Reply-To: From: Jann Horn Date: Wed, 7 Feb 2024 22:22:03 +0100 Message-ID: Subject: Re: [PATCH v3 3/3] userfaultfd: use per-vma locks in userfaultfd operations To: Lokesh Gidra Cc: akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, selinux@vger.kernel.org, surenb@google.com, kernel-team@android.com, aarcange@redhat.com, peterx@redhat.com, david@redhat.com, axelrasmussen@google.com, bgeffon@google.com, willy@infradead.org, kaleshsingh@google.com, ngeoffray@google.com, timmurray@google.com, rppt@kernel.org, Liam.Howlett@oracle.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: D3E1A180008 X-Stat-Signature: fzcikrfpxi85msx8tc91j89cp8pzdusi X-Rspam-User: X-HE-Tag: 1707340962-198348 X-HE-Meta: U2FsdGVkX19GdFMJQ7dTtkSZLTZr+CN4HL1NIMIUUXpPX2xwX7eNQ/r3yR/cS+XmWGx93UiZC5+tvNm4wxces7fDLvGKmAov9rmeC+zF3o7cMG6l3i2p55r/ocupCYBqfCueR0GcV+jbdc3YV3/kwB2H1KXel9BBAKoPCq0jhektbHP813dQp9IHWZjEgJ8vE25hVFM8X8Sp/lg9q9lhWUgoGwcLtqlLAhfNZ6rbfsvjKL36YwUhecu5YadhaA32NkdjdUpNsFnVWIAbrLMM6y2NB7sLc9w5P/T1INXfaLQoOsvGVSQ4wrxtUvaUkNTWRFvycaaoDKgrl/Dosy3tCPjwODL3l9dOtM+y4+siRgU1nNbX093CQS3fxD1jRndZU0txNHiBJ/XXpg2b0lU4AxswDBaYM1OlRf9OFENmyuCA4iND4Sw4i+i4AiYwb1rgz/JWmdFoiU+rcPtXWM76tJQDMrNTF9n2RHi6IAknh/uZkPR8ZNUUktU14F1wR8JD8S1T+AnmKyy/rAS3Hs4IrDxIL44IPjj714VFKLMrLCS6lXyrQQnh4ssLRRg5m/zloBcu3cYPt/g4ujq4TPUv4TZLkz2F+LeIou5xWnzNw/q/XMguadL+fxlO5I0YenZ4Ho/XK4Ms4cogCYmBpRp5pVhK/YoGcur6HSqNa+G4Xd317W3JBZXvZ+mCiHoO7CtPbAzcDrXCa+EjOdCtZnBgPvmogjWt5dL83PSZ0uejHg+clL0OUYE/B2XC5S1aHUvHHO//sFdAgdQ4Hz2EaD/qAorBViNzp1WYUY2+4CuImPuppHuYGo1LPxrzgPhLg+OznKBrtmNFrE5awCTpo9Dz8zKQDK1SVorUcWMPeaa35O4HZXyeOxwpGzOw29Il71JAAkILjhVf4Ke22bU1jjTmaSLmGIf6o3GW4jsWYmqgZzrW/GNG3N63TFLOnR+M2PH+G7q9fHbSAiGXiR54qds D7PJAEou /AAlx1DpcQpZg5lgl3AxseNAYkKx0PnMwHXCQZI88P8hFqH3DdxVH/Tzj0FxY5GLRJ/L026QhuBaT4Qp/+ye9bljniMB0G7VDDuVTHkjb68JbCYlg68pNOc/lAPe96yYcXQvZXTaOt6OLJsYoOr902md5qiylVLvLmBvPHMa0VScAjvse0uOWg+fO3hmXka8qnErUDo2sJ+1PZvdo1fxoBw/Kvy+5SPzo02m+R/ql0S+TCz8vVyP7N03/LA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 7, 2024 at 7:50=E2=80=AFPM Lokesh Gidra wrote: > On Tue, Feb 6, 2024 at 10:28=E2=80=AFAM Jann Horn wrot= e: > > On Tue, Feb 6, 2024 at 2:09=E2=80=AFAM Lokesh Gidra wrote: > > > All userfaultfd operations, except write-protect, opportunistically u= se > > > per-vma locks to lock vmas. On failure, attempt again inside mmap_loc= k > > > critical section. > > > > > > Write-protect operation requires mmap_lock as it iterates over multip= le > > > vmas. > > > > > > Signed-off-by: Lokesh Gidra > > [...] > > > diff --git a/mm/memory.c b/mm/memory.c > > > index b05fd28dbce1..393ab3b0d6f3 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > [...] > > > +/* > > > + * lock_vma() - Lookup and lock VMA corresponding to @address. > > > + * @prepare_anon: If true, then prepare the VMA (if anonymous) with = anon_vma. > > > + * > > > + * Should be called without holding mmap_lock. VMA should be unlocke= d after use > > > + * with unlock_vma(). > > > + * > > > + * Return: A locked VMA containing @address, NULL of no VMA is found= , or > > > + * -ENOMEM if anon_vma couldn't be allocated. > > > + */ > > > +struct vm_area_struct *lock_vma(struct mm_struct *mm, > > > + unsigned long address, > > > + bool prepare_anon) > > > +{ > > > + struct vm_area_struct *vma; > > > + > > > + vma =3D lock_vma_under_rcu(mm, address); > > > + > > > + if (vma) > > > + return vma; > > > + > > > + mmap_read_lock(mm); > > > + vma =3D vma_lookup(mm, address); > > > + if (vma) { > > > + if (prepare_anon && vma_is_anonymous(vma) && > > > + anon_vma_prepare(vma)) > > > + vma =3D ERR_PTR(-ENOMEM); > > > + else > > > + vma_acquire_read_lock(vma); > > > > This new code only calls anon_vma_prepare() for VMAs where > > vma_is_anonymous() is true (meaning they are private anonymous). > > > > [...] > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > > index 74aad0831e40..64e22e467e4f 100644 > > > --- a/mm/userfaultfd.c > > > +++ b/mm/userfaultfd.c > > > @@ -19,20 +19,25 @@ > > > #include > > > #include "internal.h" > > > > > > -static __always_inline > > > -struct vm_area_struct *find_dst_vma(struct mm_struct *dst_mm, > > > - unsigned long dst_start, > > > - unsigned long len) > > > +/* Search for VMA and make sure it is valid. */ > > > +static struct vm_area_struct *find_and_lock_dst_vma(struct mm_struct= *dst_mm, > > > + unsigned long dst= _start, > > > + unsigned long len= ) > > > { > > > - /* > > > - * Make sure that the dst range is both valid and fully withi= n a > > > - * single existing vma. > > > - */ > > > struct vm_area_struct *dst_vma; > > > > > > - dst_vma =3D find_vma(dst_mm, dst_start); > > > - if (!range_in_vma(dst_vma, dst_start, dst_start + len)) > > > - return NULL; > > > + /* Ensure anon_vma is assigned for anonymous vma */ > > > + dst_vma =3D lock_vma(dst_mm, dst_start, true); > > > > lock_vma() is now used by find_and_lock_dst_vma(), which is used by > > mfill_atomic(). > > > > > + if (!dst_vma) > > > + return ERR_PTR(-ENOENT); > > > + > > > + if (PTR_ERR(dst_vma) =3D=3D -ENOMEM) > > > + return dst_vma; > > > + > > > + /* Make sure that the dst range is fully within dst_vma. */ > > > + if (dst_start + len > dst_vma->vm_end) > > > + goto out_unlock; > > > > > > /* > > > * Check the vma is registered in uffd, this is required to > > [...] > > > @@ -597,7 +599,15 @@ static __always_inline ssize_t mfill_atomic(stru= ct userfaultfd_ctx *ctx, > > > copied =3D 0; > > > folio =3D NULL; > > > retry: > > > - mmap_read_lock(dst_mm); > > > + /* > > > + * Make sure the vma is not shared, that the dst range is > > > + * both valid and fully within a single existing vma. > > > + */ > > > + dst_vma =3D find_and_lock_dst_vma(dst_mm, dst_start, len); > > > + if (IS_ERR(dst_vma)) { > > > + err =3D PTR_ERR(dst_vma); > > > + goto out; > > > + } > > > > > > /* > > > * If memory mappings are changing because of non-cooperative > > > @@ -609,15 +619,6 @@ static __always_inline ssize_t mfill_atomic(stru= ct userfaultfd_ctx *ctx, > > > if (atomic_read(&ctx->mmap_changing)) > > > goto out_unlock; > > > > > > - /* > > > - * Make sure the vma is not shared, that the dst range is > > > - * both valid and fully within a single existing vma. > > > - */ > > > - err =3D -ENOENT; > > > - dst_vma =3D find_dst_vma(dst_mm, dst_start, len); > > > - if (!dst_vma) > > > - goto out_unlock; > > > - > > > err =3D -EINVAL; > > > /* > > > * shmem_zero_setup is invoked in mmap for MAP_ANONYMOUS|MAP_= SHARED but > > > @@ -647,16 +648,6 @@ static __always_inline ssize_t mfill_atomic(stru= ct userfaultfd_ctx *ctx, > > > uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) > > > goto out_unlock; > > > > > > - /* > > > - * Ensure the dst_vma has a anon_vma or this page > > > - * would get a NULL anon_vma when moved in the > > > - * dst_vma. > > > - */ > > > - err =3D -ENOMEM; > > > - if (!(dst_vma->vm_flags & VM_SHARED) && > > > - unlikely(anon_vma_prepare(dst_vma))) > > > - goto out_unlock; > > > > But the check mfill_atomic() used to do was different, it checked for V= M_SHARED. > > Thanks so much for catching this. > > > > Each VMA has one of these three types: > > > > 1. shared (marked by VM_SHARED; does not have an anon_vma) > > 2. private file-backed (needs to have anon_vma when storing PTEs) > > 3. private anonymous (what vma_is_anonymous() detects; needs to have > > anon_vma when storing PTEs) > > As in the case of mfill_atomic(), it seems to me that checking for > VM_SHARED flag will cover both (2) and (3) right? Yes, I think that should work.