From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34F27C27C52 for ; Thu, 6 Jun 2024 18:03:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 81E066B009A; Thu, 6 Jun 2024 14:03:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7CE696B00A4; Thu, 6 Jun 2024 14:03:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66EF96B00A5; Thu, 6 Jun 2024 14:03:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4B59F6B009A for ; Thu, 6 Jun 2024 14:03:26 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B4FFE1A1059 for ; Thu, 6 Jun 2024 18:03:25 +0000 (UTC) X-FDA: 82201235970.13.DC7CC66 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf06.hostedemail.com (Postfix) with ESMTP id DFD6818001F for ; Thu, 6 Jun 2024 18:03:23 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WbSnmdtq; spf=pass (imf06.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717697003; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=D642LaXYHltFUziB57voG7OLuiRYrwbp9evgeaa63eE=; b=QQUbH3EKXdSuc3hL4hZtqdxiFI5XjyHWwP4Rp4gSD7oROv9ONZY6C6eoW3Srcdpzfww2F6 dSw9BSm29QXW2cPvFVm/ij+D8tYLtRLe2xTd7fYXH6C/bN+sNJApcKvBmYbUSF+JQzc3BP CQCjTUuluwTU/fe91r3KFo31rmqsPD4= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WbSnmdtq; spf=pass (imf06.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717697003; a=rsa-sha256; cv=none; b=O3p8Y/olL8T5hg8WMjnXUND6qAMED5+JCtwkRrrecZO0K9F1sb2OnWMvUYSVJz8PqOwEmO //AJGo30hey2z02oYos0pfxb6ASGnHGJyEXF3S+jy7BnnmdVekQvbUQeZK5qZKZvFjQglR VzdV9mNf+9Pio+yq7Ri9OW7Xw4vSqv8= Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1f6b941844dso10428445ad.2 for ; Thu, 06 Jun 2024 11:03:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717697003; x=1718301803; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=D642LaXYHltFUziB57voG7OLuiRYrwbp9evgeaa63eE=; b=WbSnmdtqsymWPh3KwG6GWklYMgub1v3ZTKdFf43tKLmZ8bfoDgs27o0sXmZhmCKtHJ OyeUdqnvSrfB9dcL7BJ9bOzmHeGO4DXiWfS7rasnLGadJ/idxnzP7u0ipddauEECrPQB uRO2Ome0n7VKM+6hC5J/3DW7HozMPuzDrjaRCVHgwPBPC5T8lDCnY0sP1g3KWJi0aRge KxaQ5FDiaeDdxM4iY+6UpUpuWdEpeuILVT+ftdd/OvR5ck13nQSTxzVFP84Y3HyV4kWn jy8ZNiQjjv+Sd41UpApsXaPmWkhKhztEtt8hiN+Q40i7T0kKFfnr6pkoq5fCV9AIeHFV WDVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717697003; x=1718301803; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=D642LaXYHltFUziB57voG7OLuiRYrwbp9evgeaa63eE=; b=JNYwHUPqg9RpY6hrb229w8KPEITUBvzvd/Xpnr1p8G3Sp3Tv4aWLhzPPvBDRoxu6O/ MMLZNOIpgosWihnCFIrDzQQnAVTP3ntrdTxOZxNFkMz84iVqwOz1IQltuVQX1AQU4U96 lJ0MVpe1ejdOqLN8fijQl02tCftoKKLNYMkskaZXcwB57/FuTnlrVckm4dz+JBBox609 wTBhdHWNXuAJBWxQ1QUWE3xYpa1ArNBl7+JViEkU+y6hi5PpeDwhDuamr4OfQ3gsrYig w7p0+pxiQGR6YppuyeA49dG+FPBNAO2Nd/Kf0jEFlcic//FFutVBg1EToVDgG9z4NzF2 SUhw== X-Forwarded-Encrypted: i=1; AJvYcCWfC08TbK8q9GvNp0g3tR3t2MtP1sTzPsQK7/mrvFmQpnSVjhAUteM2wd7nfXzTKqPc9vu/Uirc3qZeayRjMjB9GBc= X-Gm-Message-State: AOJu0YymdxGJiZ8nlGLV5R92Ch8BZPPH4dAIFtzoP6TyoDmAFwe+dIjO b5cI6yzDdAXtMh31HkizyfpQC4BPX1mQn3wGHsqGvVS3e7/237KX3SX/7l8wUZ9gDoopzc/ZtjJ J6ALzXSPCeFqPXV9tiby3+1jFv4Y= X-Google-Smtp-Source: AGHT+IEdFmXNqU52VSDHeMcceD3GDuoJ/fU3V0V+QCiiKaqL658RtlSpMoscDJxCTnJiBX1QCuYykKwDDo83/qhYQrc= X-Received: by 2002:a17:90a:5207:b0:2c2:1d0f:3c6e with SMTP id 98e67ed59e1d1-2c2bcc6b446mr245234a91.37.1717697002522; Thu, 06 Jun 2024 11:03:22 -0700 (PDT) MIME-Version: 1.0 References: <20240605002459.4091285-1-andrii@kernel.org> <20240605002459.4091285-5-andrii@kernel.org> In-Reply-To: From: Andrii Nakryiko Date: Thu, 6 Jun 2024 11:03:10 -0700 Message-ID: Subject: Re: [PATCH v3 4/9] fs/procfs: use per-VMA RCU-protected locking in PROCMAP_QUERY API To: Suren Baghdasaryan Cc: Andrii Nakryiko , linux-fsdevel@vger.kernel.org, brauner@kernel.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, gregkh@linuxfoundation.org, linux-mm@kvack.org, liam.howlett@oracle.com, rppt@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: DFD6818001F X-Stat-Signature: gwuztezfg6b54mqhbykgbnowt7t75eyq X-Rspam-User: X-HE-Tag: 1717697003-251629 X-HE-Meta: U2FsdGVkX18gsNApgm9X4C94DhH7MayQPD3va/ARImP4rwWjjJHCqQP6e+tuIRZTbuElXmF5r4esOBrgqWogapDGK4jzc4oC892LVqskRI+tmaEL3GhzYNkVO+/FQQfhdX6hTOFedS3oKhUEKBORVffjMWolK7uQ8nhGJhAiVwIsgGpQm9UuWElSa9CVqML5sZnl++hRsTvneOryWQK73xWQmKF6WXPukX3jWyJQvddW+FGJZQolEdAH/lv68/MyvSoT5bFUBXZobmPuNxKSwC1Epj71ug+sz8ryZKS/fKSEqQ+A09HdrAQxOZRC+RU97bZXPHFSeVtlagpUOSQUxYrsQ0GSpWXNw8Nq+ZWrfb7N/E6QXycCfiOUKUNrXEA7YIky7MHlvszIbEvY9Sn9lao4qGMMS4Hwb6QVFJpO1Ymd1cI8u4vZ6I3swExQdIgqG4I5DUGYTUrbm/mhAHgB/sJGu7luqmw9nm1LflwIwp3xjhTPIvhJ44Wu4c1PHwsLed+lC6HgkwEs16HVQCaGW1/QqndcLb1z8ZPXmRdK2SK5ZbvUo9L9fdZbbMaWfrEw+q+Z2h7TiAEzGanUC3iLuVOriObGdfW2/Dg03XdxfQDOo4wFrq7NBEBASs4tidt20ENfxRsDg7rrSlEuz4uONwH9ozGMaUh9EwzvRMzq7iOqbYCC1gnof6uyQJWJVtY09F2FJCquI3k6QTQHKnTwEqUhApPDIW7yJw9zRqztXSSgd5Xc56pP8Fn4vMATltG5wgD83V9vfJXKLQuclDlvDd4wGwEOHYI7VZHZUHucp4EuBV1msppHjocMZ+O937Q8qrMoVvddFj9hv6BAljY0NttG9TJKF7WtBfeAx+Cq8PaI0vwBxOEecLTa2hpgr87kH/Ms5NyaVpxM+13WqbHVrNwIaR7g5xPBjLVfw382Qhfo1ny29T7TIiLgvOTypclOY6qI9dgtNKTRjtFHl+M I8TlbCao ADdafVKK6Qd73KEYKxRQo5WRqOfVNKTwQfR6Eh7Uoig+A9p81zoBix5/nHLlCE5SZgGAnjwTA7ZzZtiZwmZSOZQr/GDMTqsQl8SpBTwJnxdNNtpYFDA/qWEHJl9eMxuhSzOlLuyOJfo31o7S9pda7VT5IPGjxQoxBDFn6IraqHGoZdqsugs+wWfy7yPgzhI9G070gfmTIMEl4+o6gClKa3QpD5WKObSKqs3wgNP+q1MKsQegQAGfhMADtRllMyZ1RM7dzFk/+QqtrtxVdQO1Y1tIzpOqVe5eu1yOgbwV0WMV2v+ucBw12cU4asbuYnT0uYmDu/IcRaTqNVXid7DkFrbHy5cVLZbvwKIVbQ9p4R4Jo2uckjmRgfFI3CRhpWPYPowdN78ScE17l650dZrHY6uVWAF5SslTSM4FPCgVfgMNTQczhT+5lFX4BdR4pn/WraMgh95DAQ6TNox8zBAs0X2DyylB+wR/6iyFfoMXoU+ojG52ym+FXVITmWp6DP4dno3VLeZLrR/WSYywqO7srM9Tn6u+1M3YwQbM2tgY0vJ478gOL+r8z779uTDkxfm5hPSZz9T4W9hZEsg+cIwYrXSeKPlgtIVHQ422xn4pjSIF25j2GROeYM/SLj5qiJw70ITkH15re2FqVaQE+2R5HF3/yCg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 6, 2024 at 10:12=E2=80=AFAM Suren Baghdasaryan wrote: > > On Thu, Jun 6, 2024 at 9:52=E2=80=AFAM Andrii Nakryiko > wrote: > > > > On Wed, Jun 5, 2024 at 4:16=E2=80=AFPM Suren Baghdasaryan wrote: > > > > > > On Tue, Jun 4, 2024 at 5:25=E2=80=AFPM Andrii Nakryiko wrote: > > > > > > > > Attempt to use RCU-protected per-VMA lock when looking up requested= VMA > > > > as much as possible, only falling back to mmap_lock if per-VMA lock > > > > failed. This is done so that querying of VMAs doesn't interfere wit= h > > > > other critical tasks, like page fault handling. > > > > > > > > This has been suggested by mm folks, and we make use of a newly add= ed > > > > internal API that works like find_vma(), but tries to use per-VMA l= ock. > > > > > > > > We have two sets of setup/query/teardown helper functions with diff= erent > > > > implementations depending on availability of per-VMA lock (conditio= ned > > > > on CONFIG_PER_VMA_LOCK) to abstract per-VMA lock subtleties. > > > > > > > > When per-VMA lock is available, lookup is done under RCU, attemptin= g to > > > > take a per-VMA lock. If that fails, we fallback to mmap_lock, but t= hen > > > > proceed to unconditionally grab per-VMA lock again, dropping mmap_l= ock > > > > immediately. In this configuration mmap_lock is never helf for long= , > > > > minimizing disruptions while querying. > > > > > > > > When per-VMA lock is compiled out, we take mmap_lock once, query VM= As > > > > using find_vma() API, and then unlock mmap_lock at the very end onc= e as > > > > well. In this setup we avoid locking/unlocking mmap_lock on every l= ooked > > > > up VMA (depending on query parameters we might need to iterate a fe= w of > > > > them). > > > > > > > > Signed-off-by: Andrii Nakryiko > > > > --- > > > > fs/proc/task_mmu.c | 46 ++++++++++++++++++++++++++++++++++++++++++= ++++ > > > > 1 file changed, 46 insertions(+) > > > > > > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > > > index 614fbe5d0667..140032ffc551 100644 > > > > --- a/fs/proc/task_mmu.c > > > > +++ b/fs/proc/task_mmu.c > > > > @@ -388,6 +388,49 @@ static int pid_maps_open(struct inode *inode, = struct file *file) > > > > PROCMAP_QUERY_VMA_FLAGS \ > > > > ) > > > > > > > > +#ifdef CONFIG_PER_VMA_LOCK > > > > +static int query_vma_setup(struct mm_struct *mm) > > > > +{ > > > > + /* in the presence of per-VMA lock we don't need any setup/= teardown */ > > > > + return 0; > > > > +} > > > > + > > > > +static void query_vma_teardown(struct mm_struct *mm, struct vm_are= a_struct *vma) > > > > +{ > > > > + /* in the presence of per-VMA lock we need to unlock vma, i= f present */ > > > > + if (vma) > > > > + vma_end_read(vma); > > > > +} > > > > + > > > > +static struct vm_area_struct *query_vma_find_by_addr(struct mm_str= uct *mm, unsigned long addr) > > > > +{ > > > > + struct vm_area_struct *vma; > > > > + > > > > + /* try to use less disruptive per-VMA lock */ > > > > + vma =3D find_and_lock_vma_rcu(mm, addr); > > > > + if (IS_ERR(vma)) { > > > > + /* failed to take per-VMA lock, fallback to mmap_lo= ck */ > > > > + if (mmap_read_lock_killable(mm)) > > > > + return ERR_PTR(-EINTR); > > > > + > > > > + vma =3D find_vma(mm, addr); > > > > + if (vma) { > > > > + /* > > > > + * We cannot use vma_start_read() as it may= fail due to > > > > + * false locked (see comment in vma_start_r= ead()). We > > > > + * can avoid that by directly locking vm_lo= ck under > > > > + * mmap_lock, which guarantees that nobody = can lock the > > > > + * vma for write (vma_start_write()) under = us. > > > > + */ > > > > + down_read(&vma->vm_lock->lock); > > > > > > Hi Andrii, > > > The above pattern of locking VMA under mmap_lock and then dropping > > > mmap_lock is becoming more common. Matthew had an RFC proposal for an > > > API to do this here: > > > https://lore.kernel.org/all/ZivhG0yrbpFqORDw@casper.infradead.org/. I= t > > > might be worth reviving that discussion. > > > > Sure, it would be nice to have generic and blessed primitives to use > > here. But the good news is that once this is all figured out by you mm > > folks, it should be easy to make use of those primitives here, right? > > > > > > > > > + } > > > > + > > > > + mmap_read_unlock(mm); > > > > > > Later on in your code you are calling get_vma_name() which might call > > > anon_vma_name() to retrieve user-defined VMA name. After this patch > > > this operation will be done without holding mmap_lock, however per > > > https://elixir.bootlin.com/linux/latest/source/include/linux/mm_types= .h#L582 > > > this function has to be called with mmap_lock held for read. Indeed > > > with debug flags enabled you should hit this assertion: > > > https://elixir.bootlin.com/linux/latest/source/mm/madvise.c#L96. > > > > Sigh... Ok, what's the suggestion then? Should it be some variant of > > mmap_assert_locked() || vma_assert_locked() logic, or it's not so > > simple? > > > > Maybe I should just drop the CONFIG_PER_VMA_LOCK changes for now until > > all these gotchas are figured out for /proc//maps anyway, and > > then we can adapt both text-based and ioctl-based /proc//maps > > APIs on top of whatever the final approach will end up being the right > > one? > > > > Liam, any objections to this? The whole point of this patch set is to > > add a new API, not all the CONFIG_PER_VMA_LOCK gotchas. My > > implementation is structured in a way that should be easily amenable > > to CONFIG_PER_VMA_LOCK changes, but if there are a few more subtle > > things that need to be figured for existing text-based > > /proc//maps anyways, I think it would be best to use mmap_lock > > for now for this new API, and then adopt the same final > > CONFIG_PER_VMA_LOCK-aware solution. > > I agree that you should start simple, using mmap_lock first and then > work on improvements. Would the proposed solution become useless with > coarse mmap_lock'ing? Sorry, it's not clear what you mean by "proposed solution"? If you mean this new ioctl-based API, no it's still very useful and fast even if we take mmap_lock. But if you mean vm_lock, then I'd say that due to anon_vma_name() complication it makes vm_lock not attractive anymore, because vma_name will be requested pretty much always. And if we need to take mmap_lock anyways, then what's the point of per-VMA lock, right? I'd like to be a good citizen here and help you guys not add new mmap_lock users (and adopt per-VMA lock more widely), but I'm not sure I can solve the anon_vma_name() conundrum, unfortunately. Ultimately, I do care the most about having this new API available for my customers to take advantage of, of course. > > > > > > > > > > + } > > > > + > > > > + return vma; > > > > +} > > > > +#else > > > > static int query_vma_setup(struct mm_struct *mm) > > > > { > > > > return mmap_read_lock_killable(mm); > > > > @@ -402,6 +445,7 @@ static struct vm_area_struct *query_vma_find_by= _addr(struct mm_struct *mm, unsig > > > > { > > > > return find_vma(mm, addr); > > > > } > > > > +#endif > > > > > > > > static struct vm_area_struct *query_matching_vma(struct mm_struct = *mm, > > > > unsigned long addr= , u32 flags) > > > > @@ -441,8 +485,10 @@ static struct vm_area_struct *query_matching_v= ma(struct mm_struct *mm, > > > > skip_vma: > > > > /* > > > > * If the user needs closest matching VMA, keep iterating. > > > > + * But before we proceed we might need to unlock current VM= A. > > > > */ > > > > addr =3D vma->vm_end; > > > > + vma_end_read(vma); /* no-op under !CONFIG_PER_VMA_LOCK */ > > > > if (flags & PROCMAP_QUERY_COVERING_OR_NEXT_VMA) > > > > goto next_vma; > > > > no_vma: > > > > -- > > > > 2.43.0 > > > >