From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF039C27C54 for ; Thu, 6 Jun 2024 16:51:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 660956B00A2; Thu, 6 Jun 2024 12:51:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E9F66B00A4; Thu, 6 Jun 2024 12:51:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 489F76B00A6; Thu, 6 Jun 2024 12:51:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 28A1D6B00A2 for ; Thu, 6 Jun 2024 12:51:58 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C88FDA0BAA for ; Thu, 6 Jun 2024 16:51:57 +0000 (UTC) X-FDA: 82201055874.07.C272664 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) by imf26.hostedemail.com (Postfix) with ESMTP id F398F14000D for ; Thu, 6 Jun 2024 16:51:55 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JCakPVV0; spf=pass (imf26.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717692716; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pRcNGWjjh7UdvSzMfC+vxv6k260+2UboBQ1hJWD5TG8=; b=CMXKMQLfF+nsj0DQppCCpM2mK6K7XmrafJ1LoAsAAj04JJjQ3vgcU9CAUkSz74lxmtL0rO qaMLxTwKTfzIp8GyPz2qAq1HJmyJkrhFOk4UCmJYwCzc7tv00mbwb+ruz+wPX8dbR9SzGg NpIL9VtEMkyGBiBBIxfCMlYBhohW4GU= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JCakPVV0; spf=pass (imf26.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717692716; a=rsa-sha256; cv=none; b=efLTRvpCaHtUIS0Npyz3hSeM6Y2iHAhUvO4vhbAjfLkft5VZYEvL2etqagtDYDoKoWgfd3 rxO82TxK6teFkAdeH3mgfwyBjdc+iytwhrUPMsEURjOOVzJsE88RV0FQxKO3eDNSe8wzZp waIChPlPnERLSzgCxU7asC59K0rHnQI= Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-6bce380eb96so824889a12.0 for ; Thu, 06 Jun 2024 09:51:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717692715; x=1718297515; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=pRcNGWjjh7UdvSzMfC+vxv6k260+2UboBQ1hJWD5TG8=; b=JCakPVV0E4XRr37d8auzX2xiyq9W9iTZWQq3ccxPbIbmaoiDmdzbVT1imyaFi6m+Cl r/Gp4xPrYmJNCwq7UXuTCnOx60DeQM7EmkozrVnvGrWqlfofAt/dA4kj8ffx9lIXIduI 8FuUEelMlUeQxp6XNdJ1t58Il9oeUXkDFfY2dVdjHc85gUhuu/v3YiUKRLT9wD1Icbrm Giur0NcUUukeC4hUqR2FMOHXod3b/jmhtu8x2StQnUBKJag0EmbLua0VR6UcQQUTIYdC S11pToJ6fvKEAsRY/MHdjn0QEaP+Ypl6l7wZnzLZBRrzE3grFWbz2xH8gAqef/PToURJ 5c0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717692715; x=1718297515; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pRcNGWjjh7UdvSzMfC+vxv6k260+2UboBQ1hJWD5TG8=; b=Yq3l82pAOWWkse7AjIOtHGjpMrSUpqmKjE2ARG1BAeKjPYImG53ioAGiZ6tZ8Tbo+2 KylUZ38RAGbi+/WAVbPA702OMa4d8aEugWZuotfvih0AI3D5E76oz2VVkotIZEm51Onh F4+dpsFmyE4tN0MiomZ/h6CPpUd33BFKPRT76eQQDBuAmLjS8Jz8P8u0cYtpXacDiz5I fwtBv+UwzfnaSplgff5dCFcZVQMH03pdPDz5MiJ6iESXYu4CpCriRt2P/POg7v0FI5y+ 75zYvt7heRi4snynRqd8nAMu+WIyAuQnwUtx8dFxvSZp6LtIPo8BMg5R5Y6x2+bh3DZF C1ow== X-Forwarded-Encrypted: i=1; AJvYcCWJ5SZIc6Ah/UJKM9pSwmrVYihSgf6XR1mq1e62wBAOKLlxNqD52lU/rVr2YY2fzg47dDzn7+Cea9xWPiHRaop73zc= X-Gm-Message-State: AOJu0Yy5rNskY01mduLFGI+QTWifRJCnlJClU3zOGLCYnazAZMXeg8Bv SgLpqai/BDEH7DxrkiLrUBHyEp6tft/KyO6+NPwH1oIK9tkoXhoPP4vHuJhFAbClUjddorc2PYa qK83J9Rilwz/ZZ5aRa51TV4Em+y0= X-Google-Smtp-Source: AGHT+IG+iriY1C6m2KcPqevYlxmvmTV9onEU6KUKB7+3t3oIqLC1nn1err2v+gcspD5d//2VeMoAebazpM8D9YLNI3g= X-Received: by 2002:a17:90b:1008:b0:2c1:9c10:8b87 with SMTP id 98e67ed59e1d1-2c2bcc0bed1mr76660a91.29.1717692714697; Thu, 06 Jun 2024 09:51:54 -0700 (PDT) MIME-Version: 1.0 References: <20240605002459.4091285-1-andrii@kernel.org> <20240605002459.4091285-5-andrii@kernel.org> In-Reply-To: From: Andrii Nakryiko Date: Thu, 6 Jun 2024 09:51:41 -0700 Message-ID: Subject: Re: [PATCH v3 4/9] fs/procfs: use per-VMA RCU-protected locking in PROCMAP_QUERY API To: Suren Baghdasaryan Cc: Andrii Nakryiko , linux-fsdevel@vger.kernel.org, brauner@kernel.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, gregkh@linuxfoundation.org, linux-mm@kvack.org, liam.howlett@oracle.com, rppt@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: F398F14000D X-Stat-Signature: 8o1f7rs7b86u9yghmsgxwzfted49gbjx X-Rspam-User: X-HE-Tag: 1717692715-358847 X-HE-Meta: U2FsdGVkX18RcR0LsFhNLrWs6QSpWxY+Gk7DoGR0eZ2T0exWjzjg2pb90h8szskAfB8CLqLzPdpsoAzXgnsuH0JqutglNI8QoVhl239E7796zCPTGoJ/IgHyIvHLBPh4V61WDoACwSOlYW9MfzZbEDlCmBhHL14MXph4jNbf6on+4BaN5nNTsDJtbyBUtcx2ZhDWizjka/fUIw8rxEp1KQnD2h9UV8y0NH0aZuD5VsXzD8aJzPIoiR+jpb4QcDTL2XA/bLrxtJP+VAUCSfCgyofJcZ1OUjPo38L7y69XEuondKCQDmin1wcRfieErzYIX6EdT5RjKcIfo1TxlpjHR9U4+HZkUXvBDvZSa1Us2NvIbssE0ayzmCVoXB6wdsUQryuQPzr4q1H0Eo4AQrfphaHDy++gNYTW+XnqHn6dF3nXb0JZFUliA8bslZOgk3kelr3bFCt5q97YQYgs4XnniVh3gf3TBNUqgTCyIUOfCbz+rzFLkAmlfz9rchLrxT2OvmCYiV5j6aH+axB8hKlUgtax1W60Obdr7sWR2SEup9QhK3iqytlnmzlRSJBrWv6QPYRgSm4cBqSVusV7qJdH+9wCp/BmjgNODosa8x9xEVcK+7QAycPXglDin3ajC8uKilpzNOCZwbijaX4EDKGGUUyBCmmEUSCw7nOsUyHT/7qI/xdXl3mUk8IC47YGouwxS4i3sOy+jPOOeOE0RFwe7wDi/7+6e0///AOzqXLkWmukquBXK8Kny9k68FTIhZwon+w1BkIR6rDyG/rh94yM5fSCJYy94Jd0bZegA0LzgqofiTGBuxiVpamDPpTXiv7YPcM+6h7Ft29TayYIFWIFJHGSEsCxeaG11eqla1VzuqRjOeYiZ0EnmP66PD1cO5Sp/sv3mOMCKKuMhK/qCwrPELaibYCJRr8/4vMB+tvTUs5ers/gjXQfeVt8RhW+IrpvobKrrN7qgzr0kAWZg9s soLRNg1A aUDKDLzOvPk7zYEHIBe71ONWWRDNuOvO2KvrdEXCqCFkU/CEQmFi3VDRI0FcsbWULeBiG8W3Gg/3yDENz3Z+2sqHX2+ZMajncI+WibJlLByoEhvjQYBtpl8ospXMpHhDUNCFK4ts/c8PrstfWMclyNsIygahjt+d2g0Bi4T1XS9qThG/4V3wiwAiEZRJNvnmlPDbEonMz97YIGtNgNFTY9/frKX/Pk+vmSWEkK3HyHSvtGs6rXS7neCqMKQez7EuLEJtuNgmKR63gin+s2Z4VkwH4du2DXbtHyipBhi2Yann+6f9oSbBmAc9RH1Yb3S//l4hDOjTAUnGQo4cg8bHRGf0qZLgEovNQNXKyiK6v3wGuaAbPtwAGI58dV5qbmH/m+kPQa4babvibtEUDZSjFYgVNozjv6r8LebdrZWiayxR2yJzxxmvcYQgLR+0njZAOpD5dWxMbvdLZynd3tFlNPhLjX9JIf7TrySwXbuNdI6tSGVlpb1t4Kg+LBV5L0lNSbemTo6lScxfpWBT5WPWT41fOPWQmtH3nVMhUrdqr93caM8sTXkk0gzZPrvv56bexUqLEgML3DYLIdCTvrgO1bt2hjl2GY7TtN5gCYGHvzC+7+T8ExQuiRIJxvq1PdqSSzf6E X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 5, 2024 at 4:16=E2=80=AFPM Suren Baghdasaryan wrote: > > On Tue, Jun 4, 2024 at 5:25=E2=80=AFPM Andrii Nakryiko wrote: > > > > Attempt to use RCU-protected per-VMA lock when looking up requested VMA > > as much as possible, only falling back to mmap_lock if per-VMA lock > > failed. This is done so that querying of VMAs doesn't interfere with > > other critical tasks, like page fault handling. > > > > This has been suggested by mm folks, and we make use of a newly added > > internal API that works like find_vma(), but tries to use per-VMA lock. > > > > We have two sets of setup/query/teardown helper functions with differen= t > > implementations depending on availability of per-VMA lock (conditioned > > on CONFIG_PER_VMA_LOCK) to abstract per-VMA lock subtleties. > > > > When per-VMA lock is available, lookup is done under RCU, attempting to > > take a per-VMA lock. If that fails, we fallback to mmap_lock, but then > > proceed to unconditionally grab per-VMA lock again, dropping mmap_lock > > immediately. In this configuration mmap_lock is never helf for long, > > minimizing disruptions while querying. > > > > When per-VMA lock is compiled out, we take mmap_lock once, query VMAs > > using find_vma() API, and then unlock mmap_lock at the very end once as > > well. In this setup we avoid locking/unlocking mmap_lock on every looke= d > > up VMA (depending on query parameters we might need to iterate a few of > > them). > > > > Signed-off-by: Andrii Nakryiko > > --- > > fs/proc/task_mmu.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 46 insertions(+) > > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > index 614fbe5d0667..140032ffc551 100644 > > --- a/fs/proc/task_mmu.c > > +++ b/fs/proc/task_mmu.c > > @@ -388,6 +388,49 @@ static int pid_maps_open(struct inode *inode, stru= ct file *file) > > PROCMAP_QUERY_VMA_FLAGS \ > > ) > > > > +#ifdef CONFIG_PER_VMA_LOCK > > +static int query_vma_setup(struct mm_struct *mm) > > +{ > > + /* in the presence of per-VMA lock we don't need any setup/tear= down */ > > + return 0; > > +} > > + > > +static void query_vma_teardown(struct mm_struct *mm, struct vm_area_st= ruct *vma) > > +{ > > + /* in the presence of per-VMA lock we need to unlock vma, if pr= esent */ > > + if (vma) > > + vma_end_read(vma); > > +} > > + > > +static struct vm_area_struct *query_vma_find_by_addr(struct mm_struct = *mm, unsigned long addr) > > +{ > > + struct vm_area_struct *vma; > > + > > + /* try to use less disruptive per-VMA lock */ > > + vma =3D find_and_lock_vma_rcu(mm, addr); > > + if (IS_ERR(vma)) { > > + /* failed to take per-VMA lock, fallback to mmap_lock *= / > > + if (mmap_read_lock_killable(mm)) > > + return ERR_PTR(-EINTR); > > + > > + vma =3D find_vma(mm, addr); > > + if (vma) { > > + /* > > + * We cannot use vma_start_read() as it may fai= l due to > > + * false locked (see comment in vma_start_read(= )). We > > + * can avoid that by directly locking vm_lock u= nder > > + * mmap_lock, which guarantees that nobody can = lock the > > + * vma for write (vma_start_write()) under us. > > + */ > > + down_read(&vma->vm_lock->lock); > > Hi Andrii, > The above pattern of locking VMA under mmap_lock and then dropping > mmap_lock is becoming more common. Matthew had an RFC proposal for an > API to do this here: > https://lore.kernel.org/all/ZivhG0yrbpFqORDw@casper.infradead.org/. It > might be worth reviving that discussion. Sure, it would be nice to have generic and blessed primitives to use here. But the good news is that once this is all figured out by you mm folks, it should be easy to make use of those primitives here, right? > > > + } > > + > > + mmap_read_unlock(mm); > > Later on in your code you are calling get_vma_name() which might call > anon_vma_name() to retrieve user-defined VMA name. After this patch > this operation will be done without holding mmap_lock, however per > https://elixir.bootlin.com/linux/latest/source/include/linux/mm_types.h#L= 582 > this function has to be called with mmap_lock held for read. Indeed > with debug flags enabled you should hit this assertion: > https://elixir.bootlin.com/linux/latest/source/mm/madvise.c#L96. Sigh... Ok, what's the suggestion then? Should it be some variant of mmap_assert_locked() || vma_assert_locked() logic, or it's not so simple? Maybe I should just drop the CONFIG_PER_VMA_LOCK changes for now until all these gotchas are figured out for /proc//maps anyway, and then we can adapt both text-based and ioctl-based /proc//maps APIs on top of whatever the final approach will end up being the right one? Liam, any objections to this? The whole point of this patch set is to add a new API, not all the CONFIG_PER_VMA_LOCK gotchas. My implementation is structured in a way that should be easily amenable to CONFIG_PER_VMA_LOCK changes, but if there are a few more subtle things that need to be figured for existing text-based /proc//maps anyways, I think it would be best to use mmap_lock for now for this new API, and then adopt the same final CONFIG_PER_VMA_LOCK-aware solution. > > > + } > > + > > + return vma; > > +} > > +#else > > static int query_vma_setup(struct mm_struct *mm) > > { > > return mmap_read_lock_killable(mm); > > @@ -402,6 +445,7 @@ static struct vm_area_struct *query_vma_find_by_add= r(struct mm_struct *mm, unsig > > { > > return find_vma(mm, addr); > > } > > +#endif > > > > static struct vm_area_struct *query_matching_vma(struct mm_struct *mm, > > unsigned long addr, u3= 2 flags) > > @@ -441,8 +485,10 @@ static struct vm_area_struct *query_matching_vma(s= truct mm_struct *mm, > > skip_vma: > > /* > > * If the user needs closest matching VMA, keep iterating. > > + * But before we proceed we might need to unlock current VMA. > > */ > > addr =3D vma->vm_end; > > + vma_end_read(vma); /* no-op under !CONFIG_PER_VMA_LOCK */ > > if (flags & PROCMAP_QUERY_COVERING_OR_NEXT_VMA) > > goto next_vma; > > no_vma: > > -- > > 2.43.0 > >