From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13E00C27C52 for ; Thu, 6 Jun 2024 17:33:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 830876B00AF; Thu, 6 Jun 2024 13:33:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7E1046B00B0; Thu, 6 Jun 2024 13:33:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 681186B00B1; Thu, 6 Jun 2024 13:33:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 49ADC6B00AF for ; Thu, 6 Jun 2024 13:33:38 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id F3374407C1 for ; Thu, 6 Jun 2024 17:33:37 +0000 (UTC) X-FDA: 82201160874.17.6014134 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) by imf24.hostedemail.com (Postfix) with ESMTP id 2E954180008 for ; Thu, 6 Jun 2024 17:33:35 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=SWWD3GOD; spf=pass (imf24.hostedemail.com: domain of surenb@google.com designates 209.85.128.53 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717695216; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mPjsJlKDBXzcFKJiIo/4f6GGzkeINELRnLiFwHcGpOo=; b=iPBk4KEkr0ZvfD83X56rxRJVCmKUe1Lsff7xjexEksXHQo2MJeaRep4EW4rpRA099rNvab GJBGKgaqD3xttQm/JR15ZAY+r1rkxPAX4zN4QyDthQUpBvJIvDwr9o+mc5EcwN7NFsnMoX cskPsnd28xZatzj+FAF7iynwzUKXjNI= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=SWWD3GOD; spf=pass (imf24.hostedemail.com: domain of surenb@google.com designates 209.85.128.53 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717695216; a=rsa-sha256; cv=none; b=Tk8hh6YLHvSndBHQp01GJzUiVAsjS1hMXGR45CyO4tUaU+X02s2yGO5nuSL+UwLRCG8Rdg 1f30ZbAOaCL8eamAjniWlaJVc+uL6wPr3tj8Ap1u7SE5qxmHzI0PgTlGVh3RAhL67ZWmxb OglXriEvdI0VjGL8OFMXvQLGYMDvkZI= Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-42158db02c3so13483435e9.2 for ; Thu, 06 Jun 2024 10:33:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717695215; x=1718300015; darn=kvack.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mPjsJlKDBXzcFKJiIo/4f6GGzkeINELRnLiFwHcGpOo=; b=SWWD3GODdEVITBHyTVyDyGlY9rLi+mtPYFhnfAP6iV9O16oSrcEyqsEvKg17Np2RjQ yiwpfpLxD6knrIpl1T+v1uklTUraw1lvI2Euepj2rbWC6UFH71+vxnQxAs0RZ7HahqFH XQGKxxiXe2dJOCMH0a4qF9as3fvcUquQ+RiNxXwAAHrwnKG9VOqAHkthrMrVOYIOd2jB 99nhGoO3A3c6IbVgAHTnXH7Z93nJlkk3lMUfUHhTZpkpMePBYRVx17YMSp7dKGF2N1A9 m+yA9Oir/zBa+YcgQlyvCuCIDo40o9wD/MDltvBWDGswGXtJ9bbDFXKIZDD7HryAGMkf 8gkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717695215; x=1718300015; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mPjsJlKDBXzcFKJiIo/4f6GGzkeINELRnLiFwHcGpOo=; b=jk2cRNzBKI+b7ZpVo/ZJFoT2lIIaOU6WMxoG0HSHvfobU0Ij8ZUo+kyH/SIl5cvgKp Mz859i+mzAvxenjxbHrYsMBja58OMHIKoEgDOhx0BM0B414iWHnyfYIHxfvQzHV1w/0D DAJGIBv4LFN7jV61TSyyEaaYq8BzyrOZbkbnQrp2gfbH6u1uuY6wRFAA1VvW6WQLMTd1 Pl1NPuRZGzX6VmrGEg+YjbBc3oqrlEp4jqPIhbjmGvG2k+drWHb+ijRdZgo3ZrHfEPHI bDfEqKNyYsm2uFI9sIkCW8/K4uchxTnvIms1cHn0ItkBzJbPNsURiSUeUo+XUT68yMsp L9bQ== X-Forwarded-Encrypted: i=1; AJvYcCWczC/Kh3690K36hEAid3MapwQpTuOhBcFi/VvOqTMgEibY+1XQ23jmxB6eCakdsjRb1ApKCRS7KFD8hxryG5FsJ7Y= X-Gm-Message-State: AOJu0YxlGB5o74f0RpRPE84P/YiwbAYH0mR+LthjllYd+RpNjBrpfY9Z kZcfVO98nKQOoxO0okCbd/GBi/l6rqEqM5uKSckKLrAFyi4BRUmixPp3gyqfdbadgCuAZMNmlnY f2A2rX3uVykE7taP50tvoy/D7BsU3wz7/E5Wi X-Google-Smtp-Source: AGHT+IE4QPvkBQBYH5qBQiqsMSACkO+0MruSkYzXUyRwuzkvF/JqqLhtKdRfSDNl8JmBwmG1H52WpeDjd6vt0pJCRZ4= X-Received: by 2002:a05:600c:54ca:b0:420:309a:fe63 with SMTP id 5b1f17b1804b1-42164a030a5mr3279955e9.22.1717695214069; Thu, 06 Jun 2024 10:33:34 -0700 (PDT) MIME-Version: 1.0 References: <20240605002459.4091285-1-andrii@kernel.org> <20240605002459.4091285-5-andrii@kernel.org> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 6 Jun 2024 10:33:19 -0700 Message-ID: Subject: Re: [PATCH v3 4/9] fs/procfs: use per-VMA RCU-protected locking in PROCMAP_QUERY API To: "Liam R. Howlett" , Andrii Nakryiko , Suren Baghdasaryan , Andrii Nakryiko , linux-fsdevel@vger.kernel.org, brauner@kernel.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, gregkh@linuxfoundation.org, linux-mm@kvack.org, rppt@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 2E954180008 X-Stat-Signature: f35gjmuff56mym47xct6tmn71ashgrwj X-HE-Tag: 1717695215-412340 X-HE-Meta: U2FsdGVkX19qUE8bsgxnskfrwtzxm2vG4p3xU7mwwmnBAxOdRNlT0XgGQaSy1OUMBimFPARYG8HRRYsL4bn1jQLOl57Cp93z4KYOnvnMICQK0i+hvgP3XdUq2ilwDEbJSNq+bocYV07UqJHhzRYdoKXWtzg5nmEvVGyrhqbqZfm/Lqd9X6Hb5OSM1dvg7n+uD6of5i9a8Wf/4n7Pck6D5CUVj0wVVxJEFRS+toGvtIxZFzN4/99Dd2mtnAdVaJxOas8NMEDqIO67l8uJfsAASQaBsciLD+cq42MS2C/GdcO08QGoFEiTAs1EN0CB7TLXybWlJHJxOMSXOquHP2264NrT1AmbSFpdN99DdMCBtwjlF73t6E3MJDF/0TP9iWBOibHHWYu+IbAJHEy6uY3ju1Ps8u+feEMaOGrGzOKaFznMXYKqU8RfSgDg51dMPsyYVKsrtpqqLdyUFWdPdON7llg9XoUn7F2YTqi7uaoz8sKE69KvCMr1EbDGOwxegtXcWTFqo0HXFcJ+TMORfW3eqRGa85b0HNhe+N9IOkzNs79EIZC+73GTRqxEoi5jihZWiBdwPOO4SukviZBOLS7wRWWT8/nwV/E9hI/2WG/ulILakmtjEaOlQ6/cK01otAJ2RvJYt0hfM1EFjpeEM2LxEmzqhm6yE8hcTwyDRaYnzii2ZR3VBUpfbJ+sgU9ViecpZgp868QQTKOwfxfBZI6y0o8GfpJPWE6+rd6SqpQFYtCvPC4INxhiIme2Rmje2Arw2iiiTBGpmkmq4K0jEOKDstLr/I4IhD1+oUbFrYZSM9BK6gz7/+n5w7IkaTW6B4QNR6CGQL+r7lqYrgUgSZf0L65Zw9zRP14OZsuw5jozZ79ViIL9m3mV+M+N9pbzoSxOhMgfLToHtRhHErjyQgYtp4XtJwWnbo0xzq8m16g5oB8vBP5vsXHEtOF5Pllaapz8A5DB1sT/y7nW/ltwuF5 7iY0bx18 gialNhPUKey0/DEkV0aHd4bj7fy2K7cozS0F83EWicE80ziDoIu0b2tn+4rQj63DtRpBkjBb31LZ4kj6iWJhHbno+HQNHlTDnL3Yk5hTqbZcqzZd1JsStp+R3WC8l4y2l6vyq015z727di+GhtzyxgMvXf2Z4mlt7UNdKOkm8Uj08/jIXlNWnBTIgOUkZM8X1rsBJiQIGbz6NXYgDOyznA5FsilitVcaPSCp69lHA62TgwKocPmniWnfPzNyQOltbyckimmxQ+O1CAwhuQOCwKUOiS6//lhjieRjesbBByfYvyX8gt+BPLxvteMzFf0JC14gqdSNzoOX79pnDK6puiaIwS0YiOBfxreh98y7DfUdaOGYxCDBcUtWOQYO31ab5LzXKcTNmCqiPRiEzFNjSB+pOs7FRVDG619mT8FZYs5IpgtBlCDT6XyKiI8qREkPU/j4/DeIvcljgY3J6be5vst7r7twSY8RAcaTa/jHcXaE+bWL2rq3YfmDF+ix4RtmMyX4DGdynM/0flXrNykSOMIbUgOyLhvCGkMkP32165VQKNpmKSe4Dw0XlkSkdntJMvuIfRwGHPRydxadvwrbCGhyvpA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 6, 2024 at 10:15=E2=80=AFAM Liam R. Howlett wrote: > > * Andrii Nakryiko [240606 12:52]: > > On Wed, Jun 5, 2024 at 4:16=E2=80=AFPM Suren Baghdasaryan wrote: > > > > > > On Tue, Jun 4, 2024 at 5:25=E2=80=AFPM Andrii Nakryiko wrote: > > > > > > > > Attempt to use RCU-protected per-VMA lock when looking up requested= VMA > > > > as much as possible, only falling back to mmap_lock if per-VMA lock > > > > failed. This is done so that querying of VMAs doesn't interfere wit= h > > > > other critical tasks, like page fault handling. > > > > > > > > This has been suggested by mm folks, and we make use of a newly add= ed > > > > internal API that works like find_vma(), but tries to use per-VMA l= ock. > > > > > > > > We have two sets of setup/query/teardown helper functions with diff= erent > > > > implementations depending on availability of per-VMA lock (conditio= ned > > > > on CONFIG_PER_VMA_LOCK) to abstract per-VMA lock subtleties. > > > > > > > > When per-VMA lock is available, lookup is done under RCU, attemptin= g to > > > > take a per-VMA lock. If that fails, we fallback to mmap_lock, but t= hen > > > > proceed to unconditionally grab per-VMA lock again, dropping mmap_l= ock > > > > immediately. In this configuration mmap_lock is never helf for long= , > > > > minimizing disruptions while querying. > > > > > > > > When per-VMA lock is compiled out, we take mmap_lock once, query VM= As > > > > using find_vma() API, and then unlock mmap_lock at the very end onc= e as > > > > well. In this setup we avoid locking/unlocking mmap_lock on every l= ooked > > > > up VMA (depending on query parameters we might need to iterate a fe= w of > > > > them). > > > > > > > > Signed-off-by: Andrii Nakryiko > > > > --- > > > > fs/proc/task_mmu.c | 46 ++++++++++++++++++++++++++++++++++++++++++= ++++ > > > > 1 file changed, 46 insertions(+) > > > > > > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > > > index 614fbe5d0667..140032ffc551 100644 > > > > --- a/fs/proc/task_mmu.c > > > > +++ b/fs/proc/task_mmu.c > > > > @@ -388,6 +388,49 @@ static int pid_maps_open(struct inode *inode, = struct file *file) > > > > PROCMAP_QUERY_VMA_FLAGS \ > > > > ) > > > > > > > > +#ifdef CONFIG_PER_VMA_LOCK > > > > +static int query_vma_setup(struct mm_struct *mm) > > > > +{ > > > > + /* in the presence of per-VMA lock we don't need any setup/= teardown */ > > > > + return 0; > > > > +} > > > > + > > > > +static void query_vma_teardown(struct mm_struct *mm, struct vm_are= a_struct *vma) > > > > +{ > > > > + /* in the presence of per-VMA lock we need to unlock vma, i= f present */ > > > > + if (vma) > > > > + vma_end_read(vma); > > > > +} > > > > + > > > > +static struct vm_area_struct *query_vma_find_by_addr(struct mm_str= uct *mm, unsigned long addr) > > > > +{ > > > > + struct vm_area_struct *vma; > > > > + > > > > + /* try to use less disruptive per-VMA lock */ > > > > + vma =3D find_and_lock_vma_rcu(mm, addr); > > > > + if (IS_ERR(vma)) { > > > > + /* failed to take per-VMA lock, fallback to mmap_lo= ck */ > > > > + if (mmap_read_lock_killable(mm)) > > > > + return ERR_PTR(-EINTR); > > > > + > > > > + vma =3D find_vma(mm, addr); > > > > + if (vma) { > > > > + /* > > > > + * We cannot use vma_start_read() as it may= fail due to > > > > + * false locked (see comment in vma_start_r= ead()). We > > > > + * can avoid that by directly locking vm_lo= ck under > > > > + * mmap_lock, which guarantees that nobody = can lock the > > > > + * vma for write (vma_start_write()) under = us. > > > > + */ > > > > + down_read(&vma->vm_lock->lock); > > > > > > Hi Andrii, > > > The above pattern of locking VMA under mmap_lock and then dropping > > > mmap_lock is becoming more common. Matthew had an RFC proposal for an > > > API to do this here: > > > https://lore.kernel.org/all/ZivhG0yrbpFqORDw@casper.infradead.org/. I= t > > > might be worth reviving that discussion. > > > > Sure, it would be nice to have generic and blessed primitives to use > > here. But the good news is that once this is all figured out by you mm > > folks, it should be easy to make use of those primitives here, right? > > > > > > > > > + } > > > > + > > > > + mmap_read_unlock(mm); > > > > > > Later on in your code you are calling get_vma_name() which might call > > > anon_vma_name() to retrieve user-defined VMA name. After this patch > > > this operation will be done without holding mmap_lock, however per > > > https://elixir.bootlin.com/linux/latest/source/include/linux/mm_types= .h#L582 > > > this function has to be called with mmap_lock held for read. Indeed > > > with debug flags enabled you should hit this assertion: > > > https://elixir.bootlin.com/linux/latest/source/mm/madvise.c#L96. > > The documentation on the first link says to hold the lock or take a > reference, but then we assert the lock. If you take a reference to the > anon vma name, then we will trigger the assert. Either the > documentation needs changing or the assert is incorrect - or I'm missing > something? I think the documentation is correct. It says that at the time of calling anon_vma_name() the mmap_lock should be locked (hence the assertion). Then the user can raise anon_vma_name refcount, drop mmap_lock and safely continue using anon_vma_name object. IOW this is fine: mmap_read_lock(vma->mm); anon_name =3D anon_vma_name(vma); anon_vma_name_get(anon_name); mmap_read_unlock(vma->mm); // keep using anon_name anon_vma_name_put(anon_name); > > > > > Sigh... Ok, what's the suggestion then? Should it be some variant of > > mmap_assert_locked() || vma_assert_locked() logic, or it's not so > > simple? > > > > Maybe I should just drop the CONFIG_PER_VMA_LOCK changes for now until > > all these gotchas are figured out for /proc//maps anyway, and > > then we can adapt both text-based and ioctl-based /proc//maps > > APIs on top of whatever the final approach will end up being the right > > one? > > > > Liam, any objections to this? The whole point of this patch set is to > > add a new API, not all the CONFIG_PER_VMA_LOCK gotchas. My > > implementation is structured in a way that should be easily amenable > > to CONFIG_PER_VMA_LOCK changes, but if there are a few more subtle > > things that need to be figured for existing text-based > > /proc//maps anyways, I think it would be best to use mmap_lock > > for now for this new API, and then adopt the same final > > CONFIG_PER_VMA_LOCK-aware solution. > > The reason I was hoping to have the new interface use the per-vma > locking from the start is to ensure the guarantees that we provide to > the users would not change. We'd also avoid shifting to yet another > mmap_lock users. > > I also didn't think it would complicate your series too much, so I > understand why you want to revert to the old locking semantics. I'm > fine with you continuing with the series on the old lock. Thanks for > trying to make this work. > > Regards, > Liam