From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0D2DC47258 for ; Mon, 15 Jan 2024 18:38:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B17D6B0088; Mon, 15 Jan 2024 13:38:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2EBD56B0089; Mon, 15 Jan 2024 13:38:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 119CF6B008A; Mon, 15 Jan 2024 13:38:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F0B9F6B0088 for ; Mon, 15 Jan 2024 13:38:49 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B5B01A1CD2 for ; Mon, 15 Jan 2024 18:38:49 +0000 (UTC) X-FDA: 81682406778.13.4A6385D Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf01.hostedemail.com (Postfix) with ESMTP id 075B240012 for ; Mon, 15 Jan 2024 18:38:47 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="g/etgMAm"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of 3t3ulZQYKCHgoqnajXckkcha.Ykihejqt-iigrWYg.knc@flex--surenb.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3t3ulZQYKCHgoqnajXckkcha.Ykihejqt-iigrWYg.knc@flex--surenb.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705343928; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=21zH497B8+dg0O/C7EwYJfD0SYQoqr7syeq2JJc4d60=; b=g5sSMK3yHwiemVExYN/8KMf3rNY/uDlamulPvKehVrmt7gOjrqUio2EfmDbOEJPs5Siswh UftfRQiRuvWEKOKqsnd8qSjtJCHZTb1VPGvtGFHiEFZegByHjCLCnlMrhCg5C4uRKyE6JO uhsYg2CsUKR6FMvJteXg/Fn9wxP9o6A= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="g/etgMAm"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of 3t3ulZQYKCHgoqnajXckkcha.Ykihejqt-iigrWYg.knc@flex--surenb.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3t3ulZQYKCHgoqnajXckkcha.Ykihejqt-iigrWYg.knc@flex--surenb.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705343928; a=rsa-sha256; cv=none; b=6KZiVWFiCuTRc30Prp6ESfUu7bSP6rNh9yMJj1a102ddJ4i3GVoap3E2HWMycPhIybGzaj XmtJO5u5KAvCMuxQKR6SkWYuVsKv2FZfvAIXK0Xi0uBzAnbxgac8obuoh+6cHw7aijkEdF eAVRXwYKD5XYuYnUyTb06dquI0s6mdU= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-5fc6463b0edso37611547b3.0 for ; Mon, 15 Jan 2024 10:38:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705343927; x=1705948727; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=21zH497B8+dg0O/C7EwYJfD0SYQoqr7syeq2JJc4d60=; b=g/etgMAmwJ1WJmm0l+ZbuHwkVQqX7qMRmAyyztyOZ5GQhIr1F9TiuhYa3RT/Vns05t tgSZnxvimEYylOa7wLH8jvg75ygJqz1vRrqKBetmRcTCUo3j+KyDnboxcuNDa6U/RJmf AkSBEKzWFM7UGR7FrKftS5ejj7wCEv3o7HSAEMFE8UTRPcgBFZTbuZ7TTHRu6TFFuURp u/1fUlk758W/IjwLSaJxKnYB8qkomrznk3dbR6TZj4a1MEvCCm3sAdPmxIa/XUPBiksj GHmrJ/exeD2E3sBLGnCaSncRaVLvXa2BapqRfF8H2NCNqRg0HvgTM2vz7a/h84PsIS/I bjdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705343927; x=1705948727; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=21zH497B8+dg0O/C7EwYJfD0SYQoqr7syeq2JJc4d60=; b=W3itlah2LbWd0/BjqqarKTsOS4MPNu9BH1ll7re/tU2siaTfPhx2jWzof9lKp30TH7 5FlArE1I05kahg08he7CGpDLQ5bGKXYzk4NZW3jCh7HOO1G8/tZ1BLCL05zQ8ckI5qn5 cKgQ79uIBgZSrUnbDkWSgVbiqhFx3euBN6gPEM0vuBV85eEeuFMT8QCDinf9wzcVX1PK xUHZcltwPj3gJE4OEqDN6QtkFPTHN/8ThSBQitD7QGky2nke96OFQBA/yVML7Ko3Vlvu xv8aRPLdeRQDSfKUQ5yl/73ONNDNvJy6YqaDjz8as/FT2TxW6bmFoiJ9byPRbq7Dtchg jXCQ== X-Gm-Message-State: AOJu0Yxky914qp0O+lkUuP0SibzPit/ctW2dx0AolqCl0ulFQfMQk4HC NK6ZNSxNFNDcj+y5C/9DdIIOHaurggymUkM59g== X-Google-Smtp-Source: AGHT+IFB8/hnPbdwsoiv7tQuwUtxLgPQDLNW02byS+mXpmjbpqM8r5SRgtD+OQwpIXqVBxjlqXtVMNNKQrk= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:3af2:e48e:2785:270]) (user=surenb job=sendgmr) by 2002:a05:690c:805:b0:5fc:4ef9:9d6b with SMTP id bx5-20020a05690c080500b005fc4ef99d6bmr2038449ywb.9.1705343927162; Mon, 15 Jan 2024 10:38:47 -0800 (PST) Date: Mon, 15 Jan 2024 10:38:36 -0800 In-Reply-To: <20240115183837.205694-1-surenb@google.com> Mime-Version: 1.0 References: <20240115183837.205694-1-surenb@google.com> X-Mailer: git-send-email 2.43.0.381.gb435a96ce8-goog Message-ID: <20240115183837.205694-4-surenb@google.com> Subject: [RFC 3/3] mm/maps: read proc/pid/maps under RCU From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, dchinner@redhat.com, casey@schaufler-ca.com, ben.wolsieffer@hefring.com, paulmck@kernel.org, david@redhat.com, avagin@google.com, usama.anjum@collabora.com, peterx@redhat.com, hughd@google.com, ryan.roberts@arm.com, wangkefeng.wang@huawei.com, Liam.Howlett@Oracle.com, yuzhao@google.com, axelrasmussen@google.com, lstoakes@gmail.com, talumbau@google.com, willy@infradead.org, vbabka@suse.cz, mgorman@techsingularity.net, jhubbard@nvidia.com, vishal.moola@gmail.com, mathieu.desnoyers@efficios.com, dhowells@redhat.com, jgg@ziepe.ca, sidhartha.kumar@oracle.com, andriy.shevchenko@linux.intel.com, yangxingui@huawei.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, kernel-team@android.com, surenb@google.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 075B240012 X-Stat-Signature: azhjrdkeb6w6tujuf3cw789mmmyq5k1i X-Rspam-User: X-HE-Tag: 1705343927-201734 X-HE-Meta: U2FsdGVkX19SE+K4mXvIXnWTlIH6l73q0Mn8KK7ngngv+RBN2anMAWrzScu2lg7UiyukFatZN4UiQRY7Pk+7k8Q0M2q1XlOMXmeN7eAOi8WySZ/SAZXb7xoq3PBD4Oihd9FkOjkCSXhP3AOGl0IvRupZCW1HjwY/yd9NQsMD/gbQsqri9GbA6+xsrYITCr8ixfksjEZHFBsBLv8b/sbAlgdtM9ZQc2DdBhaS80cTTHZKnbgmMx6Trrm8+Fl7gCvouUar/xp/HBfbfXo76WSKbkw3xhZVusATJo0UWQHKH3Yf3igMBOhujHDcFIx/LsGjtyUd+aJrifz26EOuW5kvaRU3dRhLfIgPmJhjkID2Trx3N/NaDJa4I5Y9rqIWscTP8f3cHHNWXGARbGyXpFabNgDPVuFfe5cnzN9TRV/CNVsrVJyZpBFNvcEtHccSIMLIiSDNv8uuz8UXvaJ/5DxthDKxM3fN7ValJm/1Q1Jlcgczn0rFZfugpKEeovawT5uzrEarIf2QHZVwlpMtP/Qoy/5mK/WzJ2QhG7Im3xZNXkP+9uxlNRzPl8lXL8UDbXQb4tWs1UqvW+v/2n3mNsAV7i9az3FYVhWr6oi8zOm6YZFlHQrx8uwp7eAEhLFReXmJbOA7KgmvClXaEU64bVhkvGK3wx79zpBQGA9+dAiy6u8n8+XydesTWiTbmtmvwT3XdfHRiTTun56xzXXTytqomZj7rUo22F7EIVGprXGqocPRfCXRgai2PIznetY/wnc8NFxKtPGMq4yoac3CWsoJxiqmp/DhrW76zqrEZhvah49kasYSv+KK51hE1KSmB593YkTDc40OPBOacgbCKx7kEEZxLBRcFbUeWbQ9HQY9iCJPcnZ+ZAUQJkXCbfzSYut7l1dwMgMNXKXg2+5b9ku+lXwy7VnQC8DSpvBf4ikl49Mn1ZWOYmjo3AZ/8ngfTHyViKWmAoFXV6nk1CvF8at ekmuN+mM KceKH/wctC2/LgZ2UehF/b6OlDOb4SB4vnG/R+JwebU9XqmLqCTeCaou3YI+T+S8iXQ7f6w86W8HEEGVofD+Ctp4dv+4k3UzIfQnNzj/OCUHPU9PAA00D3x5L/MMcldHAkGHbC4vyoTyls6l4cKhzp9F5wvzISYkYLlPcMWdC3TOddwyTaaiSvMA9si1fcAb8LX/hrsx9NRracopbn/HE8f7CwdpaMkbTox3foeqIDkTMbxNtu+w3N/r6ahPoXXt4INnuqhPnuoFg5uryLpC5eLZQn0asxNvZpbDeOI0PI5hu3MvE9/WHSNKQcC3n+SjVKlSWZfPLsWzTlAmWj0zxrocOaH+Rw/lBwBMVuBfV4vn0QBgraaM7agAWajoIdYE/biyK3I+NQ07oxV28SUeAEEsuEN9kS8Rbzvs3S/I8DxSAdAGrzUR3whPdEzGTpJeJ97IEOeCS5TFn+Lyc//y2MGZFxs+/KGJ10ip//PhG8i3W+XDUbTQhVVKtmtOh+tb4/R5ly1/TFvcsy8s0ueCx0umPYcDPCUApJPbCwLKWMGbaIFAwxd6iaYITvM+kqAV4j3oO5fMHD+9Ot1XQ6w3C0WnruWJQPL5LZusK3+lrScZDnwGzlI5YPU1ZLP19bhTvkkfQzr93YVqa5l/TUVT8tnCDhr1trz46dIeRvn4vfqtHAuY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: With maple_tree supporting vma tree traversal under RCU and per-vma locks making vma access RCU-safe, /proc/pid/maps can be read under RCU and without the need to read-lock mmap_lock. However vma content can change from under us, therefore we need to pin pointer fields used when generating the output (currently only vm_file and anon_name). In addition, we validate data before publishing it to the user using new seq_file validate interface. This way we keep this mechanism consistent with the previous behavior where data tearing is possible only at page boundaries. This change is designed to reduce mmap_lock contention and prevent a process reading /proc/pid/maps files (often a low priority task, such as monitoring/data collection services) from blocking address space updates. Signed-off-by: Suren Baghdasaryan --- fs/proc/internal.h | 3 ++ fs/proc/task_mmu.c | 130 ++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 120 insertions(+), 13 deletions(-) diff --git a/fs/proc/internal.h b/fs/proc/internal.h index a71ac5379584..47233408550b 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -290,6 +290,9 @@ struct proc_maps_private { struct task_struct *task; struct mm_struct *mm; struct vma_iterator iter; + int mm_lock_seq; + struct anon_vma_name *anon_name; + struct file *vm_file; #ifdef CONFIG_NUMA struct mempolicy *task_mempolicy; #endif diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 62b16f42d5d2..d4305cfdca58 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -141,6 +141,22 @@ static struct vm_area_struct *proc_get_vma(struct proc_maps_private *priv, return vma; } +static const struct seq_operations proc_pid_maps_op; + +static inline bool needs_mmap_lock(struct seq_file *m) +{ +#ifdef CONFIG_PER_VMA_LOCK + /* + * smaps and numa_maps perform page table walk, therefore require + * mmap_lock but maps can be read under RCU. + */ + return m->op != &proc_pid_maps_op; +#else + /* Without per-vma locks VMA access is not RCU-safe */ + return true; +#endif +} + static void *m_start(struct seq_file *m, loff_t *ppos) { struct proc_maps_private *priv = m->private; @@ -162,11 +178,17 @@ static void *m_start(struct seq_file *m, loff_t *ppos) return NULL; } - if (mmap_read_lock_killable(mm)) { - mmput(mm); - put_task_struct(priv->task); - priv->task = NULL; - return ERR_PTR(-EINTR); + if (needs_mmap_lock(m)) { + if (mmap_read_lock_killable(mm)) { + mmput(mm); + put_task_struct(priv->task); + priv->task = NULL; + return ERR_PTR(-EINTR); + } + } else { + /* For memory barrier see the comment for mm_lock_seq in mm_struct */ + priv->mm_lock_seq = smp_load_acquire(&priv->mm->mm_lock_seq); + rcu_read_lock(); } vma_iter_init(&priv->iter, mm, last_addr); @@ -195,7 +217,10 @@ static void m_stop(struct seq_file *m, void *v) return; release_task_mempolicy(priv); - mmap_read_unlock(mm); + if (needs_mmap_lock(m)) + mmap_read_unlock(mm); + else + rcu_read_unlock(); mmput(mm); put_task_struct(priv->task); priv->task = NULL; @@ -283,8 +308,10 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma) start = vma->vm_start; end = vma->vm_end; show_vma_header_prefix(m, start, end, flags, pgoff, dev, ino); - if (mm) - anon_name = anon_vma_name(vma); + if (mm) { + anon_name = needs_mmap_lock(m) ? anon_vma_name(vma) : + anon_vma_name_get_rcu(vma); + } /* * Print the dentry name for named mappings, and a @@ -338,19 +365,96 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma) seq_puts(m, name); } seq_putc(m, '\n'); + if (anon_name && !needs_mmap_lock(m)) + anon_vma_name_put(anon_name); +} + +/* + * Pin vm_area_struct fields used by show_map_vma. We also copy pinned fields + * into proc_maps_private because by the time put_vma_fields() is called, VMA + * might have changed and these fields might be pointing to different objects. + */ +static bool get_vma_fields(struct vm_area_struct *vma, struct proc_maps_private *priv) +{ + if (vma->vm_file) { + priv->vm_file = get_file_rcu(&vma->vm_file); + if (!priv->vm_file) + return false; + + } else + priv->vm_file = NULL; + + if (vma->anon_name) { + priv->anon_name = anon_vma_name_get_rcu(vma); + if (!priv->anon_name) { + if (priv->vm_file) { + fput(priv->vm_file); + return false; + } + } + } else + priv->anon_name = NULL; + + return true; +} + +static void put_vma_fields(struct proc_maps_private *priv) +{ + if (priv->anon_name) + anon_vma_name_put(priv->anon_name); + if (priv->vm_file) + fput(priv->vm_file); } static int show_map(struct seq_file *m, void *v) { - show_map_vma(m, v); + struct proc_maps_private *priv = m->private; + + if (needs_mmap_lock(m)) + show_map_vma(m, v); + else { + /* + * Stop immediately if the VMA changed from under us. + * Validation step will prevent publishing already cached data. + */ + if (!get_vma_fields(v, priv)) + return -EAGAIN; + + show_map_vma(m, v); + put_vma_fields(priv); + } + return 0; } +static int validate_map(struct seq_file *m, void *v) +{ + if (!needs_mmap_lock(m)) { + struct proc_maps_private *priv = m->private; + int mm_lock_seq; + + /* For memory barrier see the comment for mm_lock_seq in mm_struct */ + mm_lock_seq = smp_load_acquire(&priv->mm->mm_lock_seq); + if (mm_lock_seq != priv->mm_lock_seq) { + /* + * mmap_lock contention is detected. Wait for mmap_lock + * write to be released, discard stale data and retry. + */ + mmap_read_lock(priv->mm); + mmap_read_unlock(priv->mm); + return -EAGAIN; + } + } + return 0; + +} + static const struct seq_operations proc_pid_maps_op = { - .start = m_start, - .next = m_next, - .stop = m_stop, - .show = show_map + .start = m_start, + .next = m_next, + .stop = m_stop, + .show = show_map, + .validate = validate_map, }; static int pid_maps_open(struct inode *inode, struct file *file) -- 2.43.0.381.gb435a96ce8-goog