From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 829D0C27C54 for ; Thu, 6 Jun 2024 17:12:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E39B16B0095; Thu, 6 Jun 2024 13:12:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DEAAC6B00A1; Thu, 6 Jun 2024 13:12:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB0C96B00A3; Thu, 6 Jun 2024 13:12:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AD57F6B0095 for ; Thu, 6 Jun 2024 13:12:38 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1FD418163C for ; Thu, 6 Jun 2024 17:12:38 +0000 (UTC) X-FDA: 82201107996.04.4924758 Received: from mail-yb1-f182.google.com (mail-yb1-f182.google.com [209.85.219.182]) by imf08.hostedemail.com (Postfix) with ESMTP id 4A98116000A for ; Thu, 6 Jun 2024 17:12:35 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=w4cJFGmA; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of surenb@google.com designates 209.85.219.182 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717693955; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kZKE81qoxM89lbOangpx2QgEODfNB1onCrSNSnUNWZg=; b=MZ7ubUmll9enMcvARZR7KRUel3QiKo+IJ2AzQzc+xsrvXlFh2tREuYIaqdxpak2TNTsteX lFZZJmV9hyc+OPyNxdhkrYcKbfYUrKMlNuuBWykHZvxpwCgF9VRmT6rNEJQ+2NqwlJ46w3 AgL/KZ9K5nrM/1+gt/aHlPR+oeWrYNI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717693955; a=rsa-sha256; cv=none; b=pdbNS7J2J9he6Wa0vnhDLkmWvtc0gVLKYAcAXfUnDI+Seo/nS9nrEBHXgEMhRlK973Nexb vJnH85CFGSu3BWHbrIairOEFdWqhD55qHNFr5QXWJBS6PcfB3krkCyj+ePdWtm9COHUKNx q0u/BUNqcWEEDZDvd/idV+TE4qaIrQ4= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=w4cJFGmA; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of surenb@google.com designates 209.85.219.182 as permitted sender) smtp.mailfrom=surenb@google.com Received: by mail-yb1-f182.google.com with SMTP id 3f1490d57ef6-dfa4876a5bbso1222281276.2 for ; Thu, 06 Jun 2024 10:12:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717693954; x=1718298754; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kZKE81qoxM89lbOangpx2QgEODfNB1onCrSNSnUNWZg=; b=w4cJFGmAXlmrZ/9orKHv0Ptb354WG87vIGw+IdWws0Ft4fsaMRxUzldC8pSdp++M+/ awV8qMxdHmuhQ06jQJEGS9lphEHXAB9JChcPkguQiGwJL2Sucy3/hDn6RMH5rGznZ9b0 lfUJHfJB6bkaaIeWwX/s+yDI/cqrC1i18JeWTFgWKu5y2UZLP7CgZ/eJMFGfcxVDZoVF Xl4AxTj9QxuHnPq7Y45UE04uinarPX6o+8MMDinR1nfa1jTZ1PjHol4GBf61GUy+uJ66 KCusKsH36Fgs9d9UBWp9hOvxZAEgEuB88Nzj/viwPMTEDyUhZm3GjCbxEhVmF7OvuVKb 95Bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717693954; x=1718298754; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kZKE81qoxM89lbOangpx2QgEODfNB1onCrSNSnUNWZg=; b=dJ9CkeCKT6xozDl4NPoKl3KisPjLXEeEfIPH29UT1JvryGayrE5ZedT9vlg88qRLk3 3YbF1t3ssqTf+w2u3mLlQbIiuqSzhqv9yDTCDkbgxYFpsUePeq7yKxvMo1aLkENtW1rc opjGeCt3JV1rbBdHRVw/4y7/e/FY8jqtf7eiLwwsEJj1M/U23NsGm5uUA1nCznzY+c/t 83rziNntCxn03vP/3hEtSOcZfpt4XR9Lgs1UNlV51Id38GVGstnww1iIKYpb2Msgb8ax ER5h1NDf3tIToAmbFsC2FixdEkSYtgNgdPrOiWc7j1NYCKG4xeNauPEv9i3wHTIvOEt9 uYYg== X-Forwarded-Encrypted: i=1; AJvYcCUSK7oRLCwoICerMQ9A4JdHt9X9Q22MOAcgGEN4ZqDP0IchXiLGIPK7N9y1zPIN8nQMu5D7pQH/Mbj0sEnwE+qkPfY= X-Gm-Message-State: AOJu0YyTGPQoB0mf0KgzXJR6nW0nluvOEU1mO1RI51vRM784D6aIf9kp 1ui4YU3t8aNEPbcIGYyb5gRDh6is98BlOM4OpCNBoGTiBKI91qH6lnJKfPvWvAqPez2DLgVUvfM YMB60v/tFpESGEZ6FO0XTDhKsAKXytbLsxpJj X-Google-Smtp-Source: AGHT+IH1cpi2LYBIddbBM2tFPkjXrGUnnuJw36AgrAa/4/yXidELcXd58akGUTBX9aNDWCS+xMYyUK+XSoCTEqGqMzk= X-Received: by 2002:a25:c744:0:b0:dfa:49f9:d334 with SMTP id 3f1490d57ef6-dfaf6645fd6mr6214276.48.1717693953710; Thu, 06 Jun 2024 10:12:33 -0700 (PDT) MIME-Version: 1.0 References: <20240605002459.4091285-1-andrii@kernel.org> <20240605002459.4091285-5-andrii@kernel.org> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 6 Jun 2024 10:12:19 -0700 Message-ID: Subject: Re: [PATCH v3 4/9] fs/procfs: use per-VMA RCU-protected locking in PROCMAP_QUERY API To: Andrii Nakryiko Cc: Andrii Nakryiko , linux-fsdevel@vger.kernel.org, brauner@kernel.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, gregkh@linuxfoundation.org, linux-mm@kvack.org, liam.howlett@oracle.com, rppt@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: by5tsierdozqz3kegngmsba8b1ee64h3 X-Rspamd-Queue-Id: 4A98116000A X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1717693955-280751 X-HE-Meta: U2FsdGVkX1/j8bCM6Szs+u/1oobxIAOOEkdOHKcuLNoEmUKEG4wOcUbG1TvEB1ansg64WeG+xcM+FD749TNQjlr0b82YSTvWjgrMqvB7oQlVl0DWuHCq5RFVgHoUnC/S3nuGT4nQy4eqceK6b6uaLNbZZlREIuVVbifnzKRgxu2vdCsTRxLMWvP2v0a+szfMruF5SVNxax/8VY62aEgy6Sw2GfMpNvjqv8+7wb78hXY/+2ivFdW84U6COP0pxB1biE+bn0c9ZnKZelpowDgX3i6iocwMg9LShtuL8d7yTDKlgft1gAcQ1nXuNU96m/uD+qPA+AF81Hl6GoLDyzGHl/zuXd8JE+sQkYTl0Y0IFD6MO/8Hfa7g6MBL2dZZHUSI494xncqSOrQXCOYSEB/zrE+01a64yz0l93zSL6TElqC3ijkkKHJ9f7ExriztD7FEKK1FrfPaOrAI3tNtmpkwOnY0BDHjGuNwetA5/eoZ4lb+aR5KPCv8yTngAkCK8EsWTVW5cVx9J7V/RqxYx/qoURgq9u2UAwiVKdGASQrscIw+srYTYTwJSFCc/SGIHYFAeJd7sZ1B7GH2a7Fuy6v6e9keXENgajzJvYGilCBbE8uapDRXn9c5nwo2nt2hlo7/gWPBkVvxB6VU0HoRCu0DvRpyeDgvJdakGTw6pS7Nmskz9Xg5Vadbg5VubnV31QgyTzoBupEqHA9jZBZf3QkQQVvIQW2MSBR/1JYkDmhhrsKHFU+oYgj1VeZ7V9yZpeKJjxlOxo8us6iNgPvzxxIOlgnqSU8UKORz5leeUAVN7l8y3qzM6k400qoMTH7N37Qi9yW7H9CjJAf88OaaymHQhvO1IqtBY/LTmXqKsyL41p40JPbHkfH6EtSEvlRJfdj09Fk48AU0qUXTE49rlWTg9AhIcivHqWzXQtlT4LtwfOGhI/w4odvH+tshit7qvNYstcVm93004eEb2kiJVgw cyHb0WoO SGahfbt7h54JgZKbms+WmLFo5Bi61mVtzaixV5aafk0MNhn0CmXRhu44Kii+E30tWvkWuedL1VqL2XyNr0iiLkpBFS2O2Jd5hbBpONYp69TtUjB9PL2mPFHzGkC5loGxRMoYRCx8n6sogY0snlW8EBgC4EaIFpx/s8AY5wwcQr/CSWrxdRY2WacFWyK2wDLCFG8DQQaSTuibfK1NQ+TNBaESR492xxMuTTivbMxMwRsTFkLA9Ac3fSDgJtDUMG3Rv6Mw3cTM0xQ2QhUqGRKy4Buvc9BWnEgWcafYRAv0bvVWcmvK8xurqZt6SMEPbw1lBPJMMFvRt5fFdq88/qwEovxgDVP/ARYtUOkg+LTWPv32+DR2ORK+IPIoAwmB0TXdjwr6pl+y5Gml+sEhKrf6Kka7/2ZdNvHE2KeplI+zuVkwOyrpkF3U2S9JuhucQOSpW9pdKlXPBKasQMCFs0L6ouW34ZHBylx6EZCgx2DQYcDfshmZQ4D+C/zfBHhEH2WLdnr4Z1SfHkL4KCmHg9g/IkkBk8qVosEau4K+sUHy0Su+Xp0kZXVXuZdNY9brB9RDO6Uxy X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 6, 2024 at 9:52=E2=80=AFAM Andrii Nakryiko wrote: > > On Wed, Jun 5, 2024 at 4:16=E2=80=AFPM Suren Baghdasaryan wrote: > > > > On Tue, Jun 4, 2024 at 5:25=E2=80=AFPM Andrii Nakryiko wrote: > > > > > > Attempt to use RCU-protected per-VMA lock when looking up requested V= MA > > > as much as possible, only falling back to mmap_lock if per-VMA lock > > > failed. This is done so that querying of VMAs doesn't interfere with > > > other critical tasks, like page fault handling. > > > > > > This has been suggested by mm folks, and we make use of a newly added > > > internal API that works like find_vma(), but tries to use per-VMA loc= k. > > > > > > We have two sets of setup/query/teardown helper functions with differ= ent > > > implementations depending on availability of per-VMA lock (conditione= d > > > on CONFIG_PER_VMA_LOCK) to abstract per-VMA lock subtleties. > > > > > > When per-VMA lock is available, lookup is done under RCU, attempting = to > > > take a per-VMA lock. If that fails, we fallback to mmap_lock, but the= n > > > proceed to unconditionally grab per-VMA lock again, dropping mmap_loc= k > > > immediately. In this configuration mmap_lock is never helf for long, > > > minimizing disruptions while querying. > > > > > > When per-VMA lock is compiled out, we take mmap_lock once, query VMAs > > > using find_vma() API, and then unlock mmap_lock at the very end once = as > > > well. In this setup we avoid locking/unlocking mmap_lock on every loo= ked > > > up VMA (depending on query parameters we might need to iterate a few = of > > > them). > > > > > > Signed-off-by: Andrii Nakryiko > > > --- > > > fs/proc/task_mmu.c | 46 ++++++++++++++++++++++++++++++++++++++++++++= ++ > > > 1 file changed, 46 insertions(+) > > > > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > > index 614fbe5d0667..140032ffc551 100644 > > > --- a/fs/proc/task_mmu.c > > > +++ b/fs/proc/task_mmu.c > > > @@ -388,6 +388,49 @@ static int pid_maps_open(struct inode *inode, st= ruct file *file) > > > PROCMAP_QUERY_VMA_FLAGS \ > > > ) > > > > > > +#ifdef CONFIG_PER_VMA_LOCK > > > +static int query_vma_setup(struct mm_struct *mm) > > > +{ > > > + /* in the presence of per-VMA lock we don't need any setup/te= ardown */ > > > + return 0; > > > +} > > > + > > > +static void query_vma_teardown(struct mm_struct *mm, struct vm_area_= struct *vma) > > > +{ > > > + /* in the presence of per-VMA lock we need to unlock vma, if = present */ > > > + if (vma) > > > + vma_end_read(vma); > > > +} > > > + > > > +static struct vm_area_struct *query_vma_find_by_addr(struct mm_struc= t *mm, unsigned long addr) > > > +{ > > > + struct vm_area_struct *vma; > > > + > > > + /* try to use less disruptive per-VMA lock */ > > > + vma =3D find_and_lock_vma_rcu(mm, addr); > > > + if (IS_ERR(vma)) { > > > + /* failed to take per-VMA lock, fallback to mmap_lock= */ > > > + if (mmap_read_lock_killable(mm)) > > > + return ERR_PTR(-EINTR); > > > + > > > + vma =3D find_vma(mm, addr); > > > + if (vma) { > > > + /* > > > + * We cannot use vma_start_read() as it may f= ail due to > > > + * false locked (see comment in vma_start_rea= d()). We > > > + * can avoid that by directly locking vm_lock= under > > > + * mmap_lock, which guarantees that nobody ca= n lock the > > > + * vma for write (vma_start_write()) under us= . > > > + */ > > > + down_read(&vma->vm_lock->lock); > > > > Hi Andrii, > > The above pattern of locking VMA under mmap_lock and then dropping > > mmap_lock is becoming more common. Matthew had an RFC proposal for an > > API to do this here: > > https://lore.kernel.org/all/ZivhG0yrbpFqORDw@casper.infradead.org/. It > > might be worth reviving that discussion. > > Sure, it would be nice to have generic and blessed primitives to use > here. But the good news is that once this is all figured out by you mm > folks, it should be easy to make use of those primitives here, right? > > > > > > + } > > > + > > > + mmap_read_unlock(mm); > > > > Later on in your code you are calling get_vma_name() which might call > > anon_vma_name() to retrieve user-defined VMA name. After this patch > > this operation will be done without holding mmap_lock, however per > > https://elixir.bootlin.com/linux/latest/source/include/linux/mm_types.h= #L582 > > this function has to be called with mmap_lock held for read. Indeed > > with debug flags enabled you should hit this assertion: > > https://elixir.bootlin.com/linux/latest/source/mm/madvise.c#L96. > > Sigh... Ok, what's the suggestion then? Should it be some variant of > mmap_assert_locked() || vma_assert_locked() logic, or it's not so > simple? > > Maybe I should just drop the CONFIG_PER_VMA_LOCK changes for now until > all these gotchas are figured out for /proc//maps anyway, and > then we can adapt both text-based and ioctl-based /proc//maps > APIs on top of whatever the final approach will end up being the right > one? > > Liam, any objections to this? The whole point of this patch set is to > add a new API, not all the CONFIG_PER_VMA_LOCK gotchas. My > implementation is structured in a way that should be easily amenable > to CONFIG_PER_VMA_LOCK changes, but if there are a few more subtle > things that need to be figured for existing text-based > /proc//maps anyways, I think it would be best to use mmap_lock > for now for this new API, and then adopt the same final > CONFIG_PER_VMA_LOCK-aware solution. I agree that you should start simple, using mmap_lock first and then work on improvements. Would the proposed solution become useless with coarse mmap_lock'ing? > > > > > > + } > > > + > > > + return vma; > > > +} > > > +#else > > > static int query_vma_setup(struct mm_struct *mm) > > > { > > > return mmap_read_lock_killable(mm); > > > @@ -402,6 +445,7 @@ static struct vm_area_struct *query_vma_find_by_a= ddr(struct mm_struct *mm, unsig > > > { > > > return find_vma(mm, addr); > > > } > > > +#endif > > > > > > static struct vm_area_struct *query_matching_vma(struct mm_struct *m= m, > > > unsigned long addr, = u32 flags) > > > @@ -441,8 +485,10 @@ static struct vm_area_struct *query_matching_vma= (struct mm_struct *mm, > > > skip_vma: > > > /* > > > * If the user needs closest matching VMA, keep iterating. > > > + * But before we proceed we might need to unlock current VMA. > > > */ > > > addr =3D vma->vm_end; > > > + vma_end_read(vma); /* no-op under !CONFIG_PER_VMA_LOCK */ > > > if (flags & PROCMAP_QUERY_COVERING_OR_NEXT_VMA) > > > goto next_vma; > > > no_vma: > > > -- > > > 2.43.0 > > >