From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3145BC25B76 for ; Wed, 5 Jun 2024 23:16:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 82EC16B0098; Wed, 5 Jun 2024 19:16:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7DCC96B00AD; Wed, 5 Jun 2024 19:16:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A4B36B00AE; Wed, 5 Jun 2024 19:16:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4D1756B00AD for ; Wed, 5 Jun 2024 19:16:15 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 085851208A7 for ; Wed, 5 Jun 2024 23:16:15 +0000 (UTC) X-FDA: 82198395510.28.0362DED Received: from mail-yb1-f176.google.com (mail-yb1-f176.google.com [209.85.219.176]) by imf27.hostedemail.com (Postfix) with ESMTP id 46C6040015 for ; Wed, 5 Jun 2024 23:16:13 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Q+kzgeKY; spf=pass (imf27.hostedemail.com: domain of surenb@google.com designates 209.85.219.176 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717629373; a=rsa-sha256; cv=none; b=uwv6QNNpSDNiEMHsrvUtThL/e9K7+Pi2vhMI0oJktS8oRXgJyMGJ5ZDPvzq2HtDrV4tcDB PWsg+Gg+G/vK+QnHl/qbrCkx5Mk1V3F43RahRxZFk4AfCwAxDu1Mh9tZRfs+lMui6bkMi9 3eJ8q3ivA9y5dd9Ewk5FfepXGP1d1Rs= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Q+kzgeKY; spf=pass (imf27.hostedemail.com: domain of surenb@google.com designates 209.85.219.176 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717629373; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=It1jGSpY/d5JoW3BH0YDpFaZbUvrsvHdjibkMTnj6oI=; b=sIJq0ZkF5XjRE71D+YnDIEf15VooOu1oWo3SeyXBqv4vKqRVrgpUaGIO2s4wvp3YWwUmn2 vyJGsP3w9YQfaCYJEhESLHn9joVGJooqlsIbH5g4QMznhhMNQp7MYQooPnMYJmt/ct7alD cY+wvpFFIgQLzPSjUi3n6j8/5uvoVlw= Received: by mail-yb1-f176.google.com with SMTP id 3f1490d57ef6-dfa7797e897so438291276.2 for ; Wed, 05 Jun 2024 16:16:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717629372; x=1718234172; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=It1jGSpY/d5JoW3BH0YDpFaZbUvrsvHdjibkMTnj6oI=; b=Q+kzgeKYVITBkgIg1dAIWS7Nq+SjTpq8YmEyrZ59vNHIi/0I3WekcHeMav761SP1in Q3I4Z07J6O0pLBBWhoaYNnnb/uZqFC5/RrOvnim2PZYsMIYuVPdzBdbs9ndGYSYjrblF 4JXn4jilFSlXsosOwLN5YMFNfOs4sLsKF26kDj/9FmdexXCmZQS4pSJMzw0PFKr7PS9S ZiCixf2jkew/aqluUG/rTc3JYJLyMk4KI84hXAjdqa6VygB+j/IcX6JVQ167ZE77HO+Y jr9XG+HOntBpks6gP/pNoD4BgvmsKxn3Lsh7NoztjTgb3I5yAY5jgkO/PAfcQaR0xNxL JtIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717629372; x=1718234172; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=It1jGSpY/d5JoW3BH0YDpFaZbUvrsvHdjibkMTnj6oI=; b=mpU4GxYXVXTbnO075F49aGq3/jyK2+UYZRAH4iIltU5lbI+SB1OGO6DwEQ2tmE9/u7 Sxzgc/Ht0TbZOkmUX5SLIfFHmk6WSCZg072rmCShYxeiZKH808EXDMUtwgxPA4y6BeNo A+JRRH9U911qeOwKsp4UGCYfCWxeOwy7qX7L5kM0g0S3Sw06LtL3haX61Tl5MjdmsY7M qx50rtLqpJ14BNjRGUYjgEZKI5Ql7n5RADHVlDr0PDp8niabH6kF/81KRJq3HS0o9O0/ fxjdVVSMdxXrlI+x5RDzTHo0Ln2tq8zSWfCX61DBrFC1vSlAoCftMgIcF0w916/D5RxC fdqQ== X-Forwarded-Encrypted: i=1; AJvYcCUIj5230z7d+Tnutwx8DhiVhTjBojiWViVAhC8skIYqsS2FnGfaKoqRBmKhrJWCschaj/oKEwOA+LXHHBq9vx9yPEA= X-Gm-Message-State: AOJu0Yxk4NVkVqivxVoUyNwZKWWUlIJmva3NFS9TdGwfETtCaACh+6K4 hT7jtHOi7MbV4Gw7jWttOe56w2iTyrcYJx6NZgBW+XuMyjGWGfzG1P9Bba9dblMU6HSj7ztw2De zLIGDuLMPMJH2y0nK6Vd2wf48IYHp7VdrZTRE X-Google-Smtp-Source: AGHT+IG//Bbpm13yX5er5mRkf9mAC+1ekahyf3BmUiMZJv5AfaAy7zvY2vctDaLWIjOaU+8qTcrwQqETtL7ybQfO+V0= X-Received: by 2002:a25:b299:0:b0:dfa:5d84:716b with SMTP id 3f1490d57ef6-dfacac6b7f4mr3630672276.57.1717629371946; Wed, 05 Jun 2024 16:16:11 -0700 (PDT) MIME-Version: 1.0 References: <20240605002459.4091285-1-andrii@kernel.org> <20240605002459.4091285-5-andrii@kernel.org> In-Reply-To: <20240605002459.4091285-5-andrii@kernel.org> From: Suren Baghdasaryan Date: Wed, 5 Jun 2024 16:15:58 -0700 Message-ID: Subject: Re: [PATCH v3 4/9] fs/procfs: use per-VMA RCU-protected locking in PROCMAP_QUERY API To: Andrii Nakryiko Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, gregkh@linuxfoundation.org, linux-mm@kvack.org, liam.howlett@oracle.com, rppt@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 46C6040015 X-Stat-Signature: 11tnu5wo3j7tfckh4woo3rnxxpn6pze8 X-HE-Tag: 1717629373-207170 X-HE-Meta: U2FsdGVkX1+49X+zRH8tNaY44CfO8DOnjGuRt7FtQ/hNllKmgmuGy1Es5LznIzyJSkGcrEmkMovaHfotLFkGBP3RlKeYk0n8/OG5yeVrqJMjvGBWsnuJHycGqrvjuHUeQViJISznSoAA/yyjtbJwwXweQLndeWPshFOLPO1dqlR1vHqCKyT/lehGIIpWdyIvj1okYvATinbyScpWY8fFjigqF7AYp05e7oSGK8z7PTgZz23Kcr5WC24DyICXA4wI4jzwcIPyRQVXWSTaqlqIRdaO2e81glsvWkZ5l3X44hsJNtH/kkyaLFjeqkNiRh9fZOaeBCRPpvthRNG67sPuSIZHiiSYyOr8scqxfNfXXvbAWLOTuh6V6A0drkuWgJTVJL+M6jBmHDthBm07ZOw05mMdBU1EqKrm7nY2KgflosbGmGTw23dbfTJgkMleg4vPvB87GI8kflSQunm7FQ84JBRB2/ZZxNhtKtcTqKyy+5rJO7yIRD0z0Oxw7sTplm3+j+PJ8PY8MeZyUthMMjkHaTXH0Ik/oCx5jfarixaUHIRBs3/aTggEMMqobEsOSvCjEpAb0e15CMuXs+cWzdGuIl2heBpcwh1xM1YMbX2XiAspVCLyyoWmJmKNO9CEZRYXO0zWR6dz/Sn8vpcv9fzWewPAFqR3quENG4mW2CDSlr6412LKeRoePrwaVa7mGL74VIoz9kpnAh0oKCdFkevWbIIw5DQ/UadiG59F/K9/ZfsoPxSF7uQmod2uXqw9vX/dsoY7zpjoZ54Lg5lVrw+GmpdI8pSQZFqwqRN/kDphKZ0v6aRz8ZW2CYMs5B4rGLSAz/3FxefuefFpA1L4OCZy8f80S1VdGUnJXwSAMClOXCnxQAfKnXuhoT6OWuX1TP1e8GznrdzFn3tWPtXaazA50+OtRdUqOU929HvxnMkjRpuAYbxhTKhAHuXDqxBYRUmQL1hNiD7FhKkhrH5kb9O i3CHES74 2BX7nzuRZg8lk7aOt7UlRDUeF9IMyS/q44ZviMev/gLyQX5dh/1FiIkYaI+1w6V6enEwX9ASDnSTf3f6dQSxny7Lm9HgKifb48IoMO3PWZIBPcjLRGhoMSGqy5Yyhn/NRiUKY1IsQKBmy78BPPiPZXuPZ39v82hzdVp90bFp3kJp4e3BxTXmIV2KgrfqI+ifuutMGhIhvvaIH7pqzRhW/5GGx8laUuBwWoHYjoQ3tgKJPIVBB2EzOWyI6OG7ZY8KuHLdsfAsGFcURf5/iPQYr1r4KquFl9g4CpUCHUwly/9O5SybupfhbYUzeXOZ+8xaEtxkor0qEzYrfQW0nmchq96CoUj4WHNfK6mkgyj/fwsYZuqt94aM8aNOljefXwQkz2hA4RtCH1InJRFlZ46JcDagPZ6CGg5blji0lBU7p4x6SRtFjrFSXncZ9tFj4AhQsOF5oyqcYHmBle38= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 4, 2024 at 5:25=E2=80=AFPM Andrii Nakryiko = wrote: > > Attempt to use RCU-protected per-VMA lock when looking up requested VMA > as much as possible, only falling back to mmap_lock if per-VMA lock > failed. This is done so that querying of VMAs doesn't interfere with > other critical tasks, like page fault handling. > > This has been suggested by mm folks, and we make use of a newly added > internal API that works like find_vma(), but tries to use per-VMA lock. > > We have two sets of setup/query/teardown helper functions with different > implementations depending on availability of per-VMA lock (conditioned > on CONFIG_PER_VMA_LOCK) to abstract per-VMA lock subtleties. > > When per-VMA lock is available, lookup is done under RCU, attempting to > take a per-VMA lock. If that fails, we fallback to mmap_lock, but then > proceed to unconditionally grab per-VMA lock again, dropping mmap_lock > immediately. In this configuration mmap_lock is never helf for long, > minimizing disruptions while querying. > > When per-VMA lock is compiled out, we take mmap_lock once, query VMAs > using find_vma() API, and then unlock mmap_lock at the very end once as > well. In this setup we avoid locking/unlocking mmap_lock on every looked > up VMA (depending on query parameters we might need to iterate a few of > them). > > Signed-off-by: Andrii Nakryiko > --- > fs/proc/task_mmu.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 46 insertions(+) > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 614fbe5d0667..140032ffc551 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -388,6 +388,49 @@ static int pid_maps_open(struct inode *inode, struct= file *file) > PROCMAP_QUERY_VMA_FLAGS \ > ) > > +#ifdef CONFIG_PER_VMA_LOCK > +static int query_vma_setup(struct mm_struct *mm) > +{ > + /* in the presence of per-VMA lock we don't need any setup/teardo= wn */ > + return 0; > +} > + > +static void query_vma_teardown(struct mm_struct *mm, struct vm_area_stru= ct *vma) > +{ > + /* in the presence of per-VMA lock we need to unlock vma, if pres= ent */ > + if (vma) > + vma_end_read(vma); > +} > + > +static struct vm_area_struct *query_vma_find_by_addr(struct mm_struct *m= m, unsigned long addr) > +{ > + struct vm_area_struct *vma; > + > + /* try to use less disruptive per-VMA lock */ > + vma =3D find_and_lock_vma_rcu(mm, addr); > + if (IS_ERR(vma)) { > + /* failed to take per-VMA lock, fallback to mmap_lock */ > + if (mmap_read_lock_killable(mm)) > + return ERR_PTR(-EINTR); > + > + vma =3D find_vma(mm, addr); > + if (vma) { > + /* > + * We cannot use vma_start_read() as it may fail = due to > + * false locked (see comment in vma_start_read())= . We > + * can avoid that by directly locking vm_lock und= er > + * mmap_lock, which guarantees that nobody can lo= ck the > + * vma for write (vma_start_write()) under us. > + */ > + down_read(&vma->vm_lock->lock); Hi Andrii, The above pattern of locking VMA under mmap_lock and then dropping mmap_lock is becoming more common. Matthew had an RFC proposal for an API to do this here: https://lore.kernel.org/all/ZivhG0yrbpFqORDw@casper.infradead.org/. It might be worth reviving that discussion. > + } > + > + mmap_read_unlock(mm); Later on in your code you are calling get_vma_name() which might call anon_vma_name() to retrieve user-defined VMA name. After this patch this operation will be done without holding mmap_lock, however per https://elixir.bootlin.com/linux/latest/source/include/linux/mm_types.h#L58= 2 this function has to be called with mmap_lock held for read. Indeed with debug flags enabled you should hit this assertion: https://elixir.bootlin.com/linux/latest/source/mm/madvise.c#L96. > + } > + > + return vma; > +} > +#else > static int query_vma_setup(struct mm_struct *mm) > { > return mmap_read_lock_killable(mm); > @@ -402,6 +445,7 @@ static struct vm_area_struct *query_vma_find_by_addr(= struct mm_struct *mm, unsig > { > return find_vma(mm, addr); > } > +#endif > > static struct vm_area_struct *query_matching_vma(struct mm_struct *mm, > unsigned long addr, u32 = flags) > @@ -441,8 +485,10 @@ static struct vm_area_struct *query_matching_vma(str= uct mm_struct *mm, > skip_vma: > /* > * If the user needs closest matching VMA, keep iterating. > + * But before we proceed we might need to unlock current VMA. > */ > addr =3D vma->vm_end; > + vma_end_read(vma); /* no-op under !CONFIG_PER_VMA_LOCK */ > if (flags & PROCMAP_QUERY_COVERING_OR_NEXT_VMA) > goto next_vma; > no_vma: > -- > 2.43.0 >