From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 416B3C47077 for ; Tue, 16 Jan 2024 14:46:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CF7F56B0080; Tue, 16 Jan 2024 09:46:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CA7526B0082; Tue, 16 Jan 2024 09:46:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B48586B0083; Tue, 16 Jan 2024 09:46:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A51536B0080 for ; Tue, 16 Jan 2024 09:46:18 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 76E56A0A86 for ; Tue, 16 Jan 2024 14:46:18 +0000 (UTC) X-FDA: 81685449636.17.AAB1F04 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf24.hostedemail.com (Postfix) with ESMTP id EE0FC180011 for ; Tue, 16 Jan 2024 14:46:15 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=lQqUXsnl; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=+fZaUTfd; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=lQqUXsnl; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=+fZaUTfd; dmarc=none; spf=pass (imf24.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705416376; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bT0MT3XlGoMwAzogIhvJc3mKg3ZimZEteCxcRZq5Wqo=; b=yikHrvoVl05zSngWKJLMYT1vLW7f0WsyKal+zSU5chz1T7HUxH66/+KUtCOvT7Laphr+uE 3xTOCcL2CAMaRRG1PLzWPM3psW3tA6JkNTimUiJkrTX7mru99Ww69ZXwvAmOwoonabQrzU Wk2BFIhY5gDB9j0fmiKyAkQO34J5cHw= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=lQqUXsnl; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=+fZaUTfd; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=lQqUXsnl; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=+fZaUTfd; dmarc=none; spf=pass (imf24.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705416376; a=rsa-sha256; cv=none; b=dcUjOwm6Mngph1AgzAhcGDOmFQtJsmprEHsHmHebI6vIi5lBiCgfOClfWxnlHwupGyocOd GLJTo23hHmyx9w83bYtTBOAaXEgcnLQoNwYIaR25JGv4BSPHU5apTcTsmVzzatj5CUkLQx 34jlujV5Rp1aIldmQCv0xb3Q+/t1WK0= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 1E8ED2214A; Tue, 16 Jan 2024 14:46:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1705416374; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bT0MT3XlGoMwAzogIhvJc3mKg3ZimZEteCxcRZq5Wqo=; b=lQqUXsnlwlIzuC8UGxgLozYLz1jm1dPJwHYN8kiCZiDOXCyG6Rr5AjduysVMxA83i8ial0 0Z/2XjnAfCzpvuBrPrcS7nNXkSqeypOUTYIC4FZ+1yePg4YY5cXsb0pWwBL3CLejGxKLvu D2mcj4D6Z3Y0j4y5pCsYbMvdAUKTuDE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1705416374; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bT0MT3XlGoMwAzogIhvJc3mKg3ZimZEteCxcRZq5Wqo=; b=+fZaUTfd5wv9LnDNge6h6lAuuYhj+s32EZ25T3FtmpoEFvQvF4W3xs2Snc+Mm9bceLtoRo ETmpLzeh88uoFYDg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1705416374; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bT0MT3XlGoMwAzogIhvJc3mKg3ZimZEteCxcRZq5Wqo=; b=lQqUXsnlwlIzuC8UGxgLozYLz1jm1dPJwHYN8kiCZiDOXCyG6Rr5AjduysVMxA83i8ial0 0Z/2XjnAfCzpvuBrPrcS7nNXkSqeypOUTYIC4FZ+1yePg4YY5cXsb0pWwBL3CLejGxKLvu D2mcj4D6Z3Y0j4y5pCsYbMvdAUKTuDE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1705416374; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bT0MT3XlGoMwAzogIhvJc3mKg3ZimZEteCxcRZq5Wqo=; b=+fZaUTfd5wv9LnDNge6h6lAuuYhj+s32EZ25T3FtmpoEFvQvF4W3xs2Snc+Mm9bceLtoRo ETmpLzeh88uoFYDg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id C8F9E132FA; Tue, 16 Jan 2024 14:46:13 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id V7pRMLWWpmXEdQAAD6G6ig (envelope-from ); Tue, 16 Jan 2024 14:46:13 +0000 Message-ID: <74005ee1-b6d8-4ab5-ba97-92bec302cc4b@suse.cz> Date: Tue, 16 Jan 2024 15:46:13 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC 0/3] reading proc/pid/maps under RCU Content-Language: en-US From: Vlastimil Babka To: Suren Baghdasaryan , akpm@linux-foundation.org Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, dchinner@redhat.com, casey@schaufler-ca.com, ben.wolsieffer@hefring.com, paulmck@kernel.org, david@redhat.com, avagin@google.com, usama.anjum@collabora.com, peterx@redhat.com, hughd@google.com, ryan.roberts@arm.com, wangkefeng.wang@huawei.com, Liam.Howlett@Oracle.com, yuzhao@google.com, axelrasmussen@google.com, lstoakes@gmail.com, talumbau@google.com, willy@infradead.org, mgorman@techsingularity.net, jhubbard@nvidia.com, vishal.moola@gmail.com, mathieu.desnoyers@efficios.com, dhowells@redhat.com, jgg@ziepe.ca, sidhartha.kumar@oracle.com, andriy.shevchenko@linux.intel.com, yangxingui@huawei.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, kernel-team@android.com References: <20240115183837.205694-1-surenb@google.com> <1bc8a5df-b413-4869-8931-98f5b9e82fe5@suse.cz> In-Reply-To: <1bc8a5df-b413-4869-8931-98f5b9e82fe5@suse.cz> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: EE0FC180011 X-Stat-Signature: 1ttdzzabwumafuejszthpg8sspi7gyd5 X-HE-Tag: 1705416375-964640 X-HE-Meta: U2FsdGVkX1/uL+7IgGJfYcdeFLPLA9m/znOXaIluyd3ZXhBPzUSxkUqcNU/qMAjllVumRU1Fc9YTkK8vlFO/Rc/xua7qYAX1/K/aJjT0FIx5tmYNghnfo694i/xqEIDJ4brmd0XKyWn/9uIWc+hr0CJW5KTAVrgXs5PHE3YsShaMPLsWAFPNVJDpWSDeqf4ZHpC7AbTaLkBCbLvGC3fjh2oHWO3FUlbLGw+vq4j9cpHIHIu0xcepIY8xTJvdKVsnkT14X8XnNLGhHtdgP7Tr0ABP+eyo1meHXJqTpGfGgMiwLRCRcgdan3bq76Nwfr4egGEk5Yv85eO9m6WOsxrjsUwOz6ntCKjGmQYv7JWFbyoN3fa+gaLQGOg6F/FXtlum5XfaJ5LahbNNgpMiy8bXHZ7zd9Qv2bCMRxPFsIT8bBYR4/n6cKT81l/ThB0jWDPDEz4gmsrM/uTKHrjAN59W4lGVsYc2HZzM77c3oYxuVW6NOAvmc8mBoP81WicIOQd9fm7um+U+QELYSsgMeRjrul4ODHfCiJDmVGNywVVypWgPa5wnMYWMB8akfaWldoXCafrxPX8Lq/sqbhv5uPrbVpjNAJ2d6cHPRo8Hi0Izej318cqYsaRZL6htXQ0gIVUziRqQNNsgUDpHPa8t927Jvg7Zsnpz+0LQeJY9IBZMqb58Mc3hRE6Ub6pL5/TUJl+6rRXJ9C6dnjLnlaKSRLx/V51V/Lp79KsoFuDVyKH+wmnBJlji6HdwrEteoq4zWbvnsdes79UpU9J2OKfyHCAxFrx9vFzxVbAEHJ2sRl0t0T4GN4VH3Q29Z04TkcJDX5asq3b4bqNKLss+BxFpGPOD1Zycdy7Zl/GVE5AXB11RMVVPdmwwNd9MczyPn0gwQz3yR/VKCNeu5X/lfZs00/jZTd15lNCBrrzttIWMAzUf8Tbx3MZFbs3TjnY8j55M+uyne90W9ZD22hZc0BtcsyA Ln2fQVzA JoZNv3hdd+MoUhFuh2wsEL2ZLjlBtEkQztTeCLjN/di30EJNfEZ/FHHKy+FD7KfOziPe+PhLlpWL38Ptq9nJcSzgoi7UYuvDMucUNQf4t3sJmeENvrb28SOfNM9plpn5CS6a3Itwu/QAE1sQXPVmJSxMZhavfCCIWJBKZ+rTwZeWtlQMMerWSnGI8MKkf2e9JEmASqnx2tlQ6xjKf9cvWtAcwHzHDyCvd7q8dTworHvTFMUzWZb34s5seQQgw+q2Mo7/vBWa1xFST4lkD03qGSabJqPd7DtZ/gynj4zMRcpWaUXwuNkYskAuNkesLyAaLU3R/EX46YnxqRGk5jE39Izc0bY1Gp4pDk7oUQJ0oYu1zF93WqMi9VR+wFc76X09OFQ5TV9CopOlvnGY9SKMSm97uUZatkvl5RbQql5icjG8weprtOHZTbyDBZzhdQg9gn8qTc37y2TGpj6A= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 1/16/24 15:42, Vlastimil Babka wrote: > On 1/15/24 19:38, Suren Baghdasaryan wrote: > > Hi, > >> The issue this patchset is trying to address is mmap_lock contention when >> a low priority task (monitoring, data collecting, etc.) blocks a higher >> priority task from making updated to the address space. The contention is >> due to the mmap_lock being held for read when reading proc/pid/maps. >> With maple_tree introduction, VMA tree traversals are RCU-safe and per-vma >> locks make VMA access RCU-safe. this provides an opportunity for lock-less >> reading of proc/pid/maps. We still need to overcome a couple obstacles: >> 1. Make all VMA pointer fields used for proc/pid/maps content generation >> RCU-safe; >> 2. Ensure that proc/pid/maps data tearing, which is currently possible at >> page boundaries only, does not get worse. > > Hm I thought we were to only choose this more complicated in case additional > tearing becomes a problem, and at first assume that if software can deal > with page boundary tearing, it can deal with sub-page tearing too? > >> The patchset deals with these issues but there is a downside which I would >> like to get input on: >> This change introduces unfairness towards the reader of proc/pid/maps, >> which can be blocked by an overly active/malicious address space modifyer. > > So this is a consequence of the validate() operation, right? We could avoid > this if we allowed sub-page tearing. > >> A couple of ways I though we can address this issue are: >> 1. After several lock-less retries (or some time limit) to fall back to >> taking mmap_lock. >> 2. Employ lock-less reading only if the reader has low priority, >> indicating that blocking it is not critical. >> 3. Introducing a separate procfs file which publishes the same data in >> lock-less manner. Oh and if this option 3 becomes necessary, then such new file shouldn't validate() either, and whoever wants to avoid the reader contention and converts their monitoring to the new file will have to account for this possible extra tearing from the start. So I would suggest trying to change the existing file with no validate() first, and if existing userspace gets broken, employ option 3. This would mean no validate() in either case? >> I imagine a combination of these approaches can also be employed. >> I would like to get feedback on this from the Linux community. >> >> Note: mmap_read_lock/mmap_read_unlock sequence inside validate_map() >> can be replaced with more efficiend rwsem_wait() proposed by Matthew >> in [1]. >> >> [1] https://lore.kernel.org/all/ZZ1+ZicgN8dZ3zj3@casper.infradead.org/ >> >> Suren Baghdasaryan (3): >> mm: make vm_area_struct anon_name field RCU-safe >> seq_file: add validate() operation to seq_operations >> mm/maps: read proc/pid/maps under RCU >> >> fs/proc/internal.h | 3 + >> fs/proc/task_mmu.c | 130 ++++++++++++++++++++++++++++++++++---- >> fs/seq_file.c | 24 ++++++- >> include/linux/mm_inline.h | 10 ++- >> include/linux/mm_types.h | 3 +- >> include/linux/seq_file.h | 1 + >> mm/madvise.c | 30 +++++++-- >> 7 files changed, 181 insertions(+), 20 deletions(-) >> >