From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE997C4829E for ; Mon, 12 Feb 2024 18:08:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6655D6B0071; Mon, 12 Feb 2024 13:08:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 615046B0072; Mon, 12 Feb 2024 13:08:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B5DF6B0074; Mon, 12 Feb 2024 13:08:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 3C79E6B0071 for ; Mon, 12 Feb 2024 13:08:30 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D2A92A0930 for ; Mon, 12 Feb 2024 18:08:29 +0000 (UTC) X-FDA: 81783936738.03.EA37F21 Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) by imf09.hostedemail.com (Postfix) with ESMTP id 09D40140037 for ; Mon, 12 Feb 2024 18:08:27 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="ph0+KAt/"; spf=pass (imf09.hostedemail.com: domain of lokeshgidra@google.com designates 209.85.221.42 as permitted sender) smtp.mailfrom=lokeshgidra@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707761308; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=M8eG2vfx+alRR2jEJWaH33x2v1+Lh70lXZyOGEZrheI=; b=c3fKt0bUfjoRxCKrjy2xhheFNZPr2I6Bee6K+MsCgBQ6ZVR6H+VU/S+3iTdwfIYUt5pOAh L5rBWN1Yb94tkES+PyhYHX9pPm6R3y6OMuEwYu1oevoSnHQy1RL99DvsdFJLW9LPGRm9T1 Mu8l2TD0c7Lx2lOHl8JxcgRz9+z1kwI= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="ph0+KAt/"; spf=pass (imf09.hostedemail.com: domain of lokeshgidra@google.com designates 209.85.221.42 as permitted sender) smtp.mailfrom=lokeshgidra@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707761308; a=rsa-sha256; cv=none; b=j6/BSVujta4MhPTTqTxp02IF2g1oW5i88KrZC7Jp1C7drysD5ljVLfSI5uzGpRUwdonwAq 6NwThIOneMQBV2ezQen8Yisd5HkEpaYsnq5XoPwho4vOsoQVHQ0Jvks16hJV+/InqpNzwm +H63zPbUezGNjv/rtWigdArb8PQUXqo= Received: by mail-wr1-f42.google.com with SMTP id ffacd0b85a97d-33b189ae5e8so1478763f8f.2 for ; Mon, 12 Feb 2024 10:08:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707761306; x=1708366106; darn=kvack.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=M8eG2vfx+alRR2jEJWaH33x2v1+Lh70lXZyOGEZrheI=; b=ph0+KAt/2Yi5q561C4ggYoPSKh2TjHghoJpyo/zM83eJuwDjUmWvZNAKqTIV+ZhWQL tUGChZT65UNo5C/UhYNLaqundB7L+juuBPTwho374zQlexd/k6v+kS17qbRKxF7rXE4R UCwctfmT+EGqLlGmmQuQl7eEYx7a/awc5n8E2620QXmzpE8YOgLlk5RGjax67rV2wgRY pbJZ9daJYe1FwZ9BSPTeJmXimEjDpoQ/6xaAzInBwWWm45CkvaICIUsUt73oq90UjkEp kOhDO4QSuXwdel3ICQxzmiZCJ/sSzZPdP4ZTAJtSQua/JVa+Z12niHmGPjluY/np/IHA MrZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707761306; x=1708366106; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=M8eG2vfx+alRR2jEJWaH33x2v1+Lh70lXZyOGEZrheI=; b=sRODMc3wFkVp6hk6bV6mpk1dHaWAlmsokznl4R9P9gpI+GYneaigO4Ue68Y5VsbTbl 0QcBzytGPSK+siuiBhkoyZYfUdE6PlQOqIS8fTZxa2nhZoQ6NJgKnPIBJx8LGJZMZDNP zqgpLEwcsmTr1XiYfdPgm4eHxtGW226imBsuYe1Hpt/5HSgs+jn71znJIn0UbPn8RGcC DzDvjVCQv2cQ3ysk+nZVTKi28EWMDh6Qv36+kCB6lXhVoEhtpXLm5sHCz9nc2Kt4UINb US2T6mavXdInyVskrwpRrHdI19c4cBCrJHLB7hwJz0KPGLAEw86DY5Kl+of8m9LyG6V4 80gQ== X-Gm-Message-State: AOJu0YxpHaxkG9bsQroxF+Fpjw2pSWXgw2Bgh8BQmGbkGqRq1Aa9Mcyw 5YaqZY33VcxOhu9AKaXgSP/RvDfzTlqnVoDAQ6kvauqbIszlohofxKIJyd0B7NpLozjiOTjCXwo S/lJPZiVZojlwq8qEGb6dIkLenOXRvnW7zoyz X-Google-Smtp-Source: AGHT+IGvSi87FXWNJLShT2MHSY/26nRA1F0T3W3Z1QBasH8tQNph0vir2QtvDBXT/NWKQhu2BqbqDAPSLJ4Fg4qtFQM= X-Received: by 2002:adf:cd8d:0:b0:33b:66bf:769d with SMTP id q13-20020adfcd8d000000b0033b66bf769dmr5366876wrj.61.1707761306190; Mon, 12 Feb 2024 10:08:26 -0800 (PST) MIME-Version: 1.0 References: <20240208212204.2043140-1-lokeshgidra@google.com> <20240208212204.2043140-4-lokeshgidra@google.com> <20240209030654.lxh4krmxmiuszhab@revolver> <20240209190605.7gokzhg7afy7ibyf@revolver> <20240209193110.ltfdc6nolpoa2ccv@revolver> <20240212151959.vnpqzvpvztabxpiv@revolver> In-Reply-To: <20240212151959.vnpqzvpvztabxpiv@revolver> From: Lokesh Gidra Date: Mon, 12 Feb 2024 10:08:13 -0800 Message-ID: Subject: Re: [PATCH v4 3/3] userfaultfd: use per-vma locks in userfaultfd operations To: "Liam R. Howlett" , Lokesh Gidra , akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, selinux@vger.kernel.org, surenb@google.com, kernel-team@android.com, aarcange@redhat.com, peterx@redhat.com, david@redhat.com, axelrasmussen@google.com, bgeffon@google.com, willy@infradead.org, jannh@google.com, kaleshsingh@google.com, ngeoffray@google.com, timmurray@google.com, rppt@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 09D40140037 X-Rspam-User: X-Stat-Signature: nceehu9hiu5suwxu5bs99dc7oqp3asin X-Rspamd-Server: rspam01 X-HE-Tag: 1707761307-485924 X-HE-Meta: U2FsdGVkX181qC7OFKx13DDfDC1BIWGPG0kNxGUUyZ6xvQcrBC7VGaH+82dvf3J7plYdxPRxnDV62ndG3KQ1+Cuy019fB/lYlmAkVLBrgNUD07LpCX4LHNJLey3LIldMtdXz/Z+gsf43Og2vR1QMhE4WeRY5b+B9jpFrz255XZ+VVRnS/N8tXrgF6cd2m6ZyFYGuMRWpbDybbNxeVXEHG+yjw/z6aq6gjpOfvgyeaeJLamBRXayZYWejKwpVRpLk0TbmgDFEn3sIT835jdUD5W2qGrNyS5rIxdI+Da79rjbdrKoFxWo0xG/aXR5PAU/VS2EkmhhGNzSK7MFjq5aV+Xhj/zzzYPABidd1E0SwV+h2vGDnXUGuClSQz0PXgcfYxNh+JrFx7KEw/ZPFYzkTh9sOknY6qvyR81i5fkSfeQQ5vH7iiIJWlttBoyHKuiUpo65k1XTUq/slJhZRT/OjU2tJuyeDEhWzyytKDAzlbt8lN0QrTvYIj6k/q5loqMzn9Rh3ytuIeKWLqCjsLQl1dJqLXFaURPhYpjRGt2M0soh0V5p9Xpah5EI9bTDBKp7mRMvcO61p0WyZGBo9qE2Cj/L4fQCEK1gj1VC/y3sHkmVJVtzr0DaG9qmOo6zLhneRb4f+en9PAURGhlKze+3rYR9kwlFGOTLLwyXNGkxB1URM2yAwDs+C2QGDr9i/rUzi9NgbGACEEOf/fmWOHOYBfGRvf8HQJbMIImF3nk9ji/9thJ/8E+PibvuSoLT5EgExrxaXk7fCUULYkOll/Duy/OGMPX4dlRgvsosVai4ogGKn+Py4QAqVfdU9+I79zzNkf4jZ5IqqugBwKhY77EbTGFKwKio3Q4UY0VnNLb+CgiIpIZjq37lJ9QbNMFzbcEKZ6+y0tfavBz7XxB/s5p5CWTmE2W0vSX1AdlApD7WlnU+ELyDSlSI2CXtY5eEYa5ViMvYo3GAxQXer3B4vvA5 4zZ6x+Zl omB836F4lwJ2+w+hjpyuHmEFPshrDrBlNoRXCsUWAYhlEqGvkBU4RztS7qZAfSpOQU4Oj1lOmDX0B0OnTl8WslzFgGEBgKMd5S1vhlPsKEtmfa4EwkhiNT6x+KpRn3Wu/Rv6jwxlXGmTFLId5if5VpqV2V9BfXrA6plmWILAPPQNG3YZ1w7iI+6VsTjn37wtP69jC3YfAXnOmMzziwgiqIf2btjCfP0jHMQernsM09yxeIq1RfTZmj6p5mxeOKi+JnuIMEXREU2Hw+M4FDllDSoEgmg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 12, 2024 at 7:20=E2=80=AFAM Liam R. Howlett wrote: > > * Lokesh Gidra [240209 15:59]: > > On Fri, Feb 9, 2024 at 11:31=E2=80=AFAM Liam R. Howlett wrote: > ... > > > > > > > > > +static struct vm_area_struct *lock_vma(struct mm_struct *m= m, > > > > > > > > + unsigned long address, > > > > > > > > + bool prepare_anon) > > > > > > > > +{ > > > > > > > > + struct vm_area_struct *vma; > > > > > > > > + > > > > > > > > + vma =3D lock_vma_under_rcu(mm, address); > > > > > > > > + if (vma) { > > > > > > > > + /* > > > > > > > > + * lock_vma_under_rcu() only checks anon_vma = for private > > > > > > > > + * anonymous mappings. But we need to ensure = it is assigned in > > > > > > > > + * private file-backed vmas as well. > > > > > > > > + */ > > > > > > > > + if (prepare_anon && !(vma->vm_flags & VM_SHAR= ED) && > > > > > > > > + !vma->anon_vma) > > > > > > > > + vma_end_read(vma); > > > > > > > > + else > > > > > > > > + return vma; > > > > > > > > + } > > > > > > > > + > > > > > > > > + mmap_read_lock(mm); > > > > > > > > + vma =3D vma_lookup(mm, address); > > > > > > > > + if (vma) { > > > > > > > > + if (prepare_anon && !(vma->vm_flags & VM_SHAR= ED) && > > > > > > > > + anon_vma_prepare(vma)) { > > > > > > > > + vma =3D ERR_PTR(-ENOMEM); > > > > > > > > + } else { > > > > > > > > + /* > > > > > > > > + * We cannot use vma_start_read() as = it may fail due to > > > > > > > > + * false locked (see comment in vma_s= tart_read()). We > > > > > > > > + * can avoid that by directly locking= vm_lock under > > > > > > > > + * mmap_lock, which guarantees that n= obody can lock the > > > > > > > > + * vma for write (vma_start_write()) = under us. > > > > > > > > + */ > > > > > > > > + down_read(&vma->vm_lock->lock); > > > > > > > > + } > > > > > > > > + } > > > > > > > > + > > > > > > > > + mmap_read_unlock(mm); > > > > > > > > + return vma; > > > > > > > > +} > > > > > > > > + > > > > > > > > +static void unlock_vma(struct vm_area_struct *vma) > > > > > > > > +{ > > > > > > > > + vma_end_read(vma); > > > > > > > > +} > > > > > > > > + > ... > > > > > The current implementation has a deadlock problem: > > > > thread 1 > > thread 2 > > > > vma_start_read(dst_vma) > > > > > > mmap_write_lock() > > > > vma_start_write(src_vma) > > vma_start_read(src_vma) fails > > mmap_read_lock() blocks > > > > > > vma_start_write(dst_vma) > > blocks > > > > > > I think the solution is to implement it this way: > > > > long find_and_lock_vmas(...) > > { > > dst_vma =3D lock_vma(mm, dst_start, true); > > if (IS_ERR(dst_vma)) > > return PTR_ERR(vma); > > > > src_vma =3D lock_vma_under_rcu(mm, src_start); > > if (!src_vma) { > > long err; > > if (mmap_read_trylock(mm)) { > > src_vma =3D vma_lookup(mm, = src_start); > > if (src_vma) { > > > > down_read(&src_vma->vm_lock->lock); > > err =3D 0; > > } else { > > err =3D -ENOENT; > > } > > mmap_read_unlock(mm); > > } else { > > vma_end_read(dst_vma); > > err =3D lock_mm_and_find_vma= s(...); > > if (!err) { > > Right now lock_mm_and_find_vmas() doesn't use the per-vma locking, so > you'd have to lock those here too. I'm sure you realise that, but this > is very convoluted. That's right. I realized that after I sent this email. > > > mmap_read_unlo= ck(mm); > > } > > } > > return err; > > On contention you will now abort vs block. Is it? On contention mmap_read_trylock() will fail and we do the whole operation using lock_mm_and_find_vmas() which blocks on mmap_lock. Am I missing something? > > > } > > return 0; > > } > > > > Of course this would need defining lock_mm_and_find_vmas() regardless > > of CONFIG_PER_VMA_LOCK. I can also remove the prepare_anon condition > > in lock_vma(). > > You are adding a lot of complexity for a relatively rare case, which is > probably not worth optimising. > > I think you'd be better served by something like : > > if (likely(src_vma)) > return 0; > > /* Undo any locking */ > vma_end_read(dst_vma); > > /* Fall back to locking both in mmap_lock critical section */ Agreed on reduced complexity. But as Suren pointed out in one of his replies that lock_vma_under_rcu() may fail due to seq overflow. That's why lock_vma() uses vma_lookup() followed by direct down_read() on vma-lock. IMHO what we need here is exactly lock_mm_and_find_vmas() and the code can be further simplified as follows: err =3D lock_mm_and_find_vmas(...); if (!err) { down_read(dst_vma...); if (dst_vma !=3D src_vma) down_read(src_vma....); mmap_read_unlock(mm); } return err; (another thing I'm fixing in the next version is avoiding locking the vma twice if both src_start and dst_start are in same vma) > mmap_read_lock(); > /* > * probably worth creating an inline function for a single vma > * find/populate that can be used in lock_vma() that does the anon vma > * population? > */ Good idea. This can simplify both lock_vma() as well as lock_mm_and_find_vm= as(). > dst_vm =3D lock_vma_populate_anon(); > ... > > src_vma =3D lock_vma_under_rcu(mm, src_start); > ... > > > mmap_read_unlock(); > return 0; > > src_failed: > vma_end_read(dst_vma); > dst_failed: > mmap_read_unlock(); > return err; > > Thanks, > Liam > > ... >