From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6134C4332F for ; Thu, 29 Dec 2022 14:22:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1526B8E0002; Thu, 29 Dec 2022 09:22:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1025D8E0001; Thu, 29 Dec 2022 09:22:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F0CAB8E0002; Thu, 29 Dec 2022 09:22:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id DE0238E0001 for ; Thu, 29 Dec 2022 09:22:37 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A8D3F1601C3 for ; Thu, 29 Dec 2022 14:22:37 +0000 (UTC) X-FDA: 80295559554.01.39F3E30 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf04.hostedemail.com (Postfix) with ESMTP id 0D56040008 for ; Thu, 29 Dec 2022 14:22:35 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=PnR3c9s3; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672323756; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YAZjrTEycnkrvx/nvtQaXpsT4YZfiiYq97FiKf73vmU=; b=0EYPTatoCzlYCX5P5eoVNWjul5XIlIcmOkQTb1SnM0JusvXD3sd6s89sdM+xwKg0U9qUeR kOPaHiOPxcxIX7ZLFEnOR6mkfB9ycCuYKClRHVJTzlIXIZIuGjBZryDwOOCaDXdLzFMlXo FP0QyHdWNx70puzpvYXZmkJ9EKiEJO0= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=PnR3c9s3; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672323756; a=rsa-sha256; cv=none; b=noDDdiweutSdLvZA73eSrZpc2DFbKB+6d3qQwCMaDgM2EmqVr7DKT/xfg1Z20lAUdhu1rW ZR/QT4QJpCl9zeNJS6UiI2GBiTu21EDjpa8Nv/Vw2YsxqvOQtCEVOrkR7x7BDEUPVfM+tq tyYB18Rx/dmV2hc+HyLVLQyVFzFqauA= Received: by mail-pj1-f52.google.com with SMTP id o8-20020a17090a9f8800b00223de0364beso23050744pjp.4 for ; Thu, 29 Dec 2022 06:22:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=YAZjrTEycnkrvx/nvtQaXpsT4YZfiiYq97FiKf73vmU=; b=PnR3c9s3SziqlQAoe6sq7da0dTiUqeWZbvtDIfl0O5cUKM35WNKo0RBDqVgqM/acUT FloGbpl5hiu5JyNBjeHuXlTtT0bGarkzFeoIrGkiJos1DA1O648oi1F3zAnwYsBNkicY cKb2pWuLWWBPMnVLKjJjuSLt+L8l+nuxo3TDplA4r3WZmkFfRsB75dIg8MgC//yak2F3 oSgBVTnkqbIOXrfrd0ZEbSH6xyx15rr+EgqOCDiR2yjRrqmTn8kA6f4BPHzJ3AM5t/Jn ibPiY78F/VyWwQnOM0sMCF5aAuZcs2ELVwgaaQPwipWDmSKxdJwvPZSwzmnSrAfDp/tM 8Lxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=YAZjrTEycnkrvx/nvtQaXpsT4YZfiiYq97FiKf73vmU=; b=71IWEsA5HEGX0O0bdI5uExuV3aGGX8jwJabdx8bhhQz0ReCBQf09sSCEFfbKO2Gdc2 NlgHnjwBWpv6o/ESP3xGWlG3mxdSeCPWgh33aOXWG2Ic21zDJlIPr76o0zV17AeehuOR Ax9yRHvSsCgsJbj/bNMwszSyCYo18s2iFJ1OKvOVPOG77nxioOIcbFE+j0D3Z5sAHayF vqwnaXfoutWxdP3REPKugw6wvWT7lhvb0dyjaD3zjtvTbHNFg9LWACLLHvgkKFVJOycE 5AqE3glXjVgGWw+nDrTyL0zYBV87uc1Qi+CARhcilakRcTr4a8DABuotUtOwQMLque7/ zZKQ== X-Gm-Message-State: AFqh2krj4HcIRoApduNtm5iukACGiuzjlrupnFJelul17Nfw94pmhJMM lztHRXdY/jeYQ6fYIWkOch4= X-Google-Smtp-Source: AMrXdXvewYOsHvJSxyOsd1Sb9F2ZuDNon7RPE5qX2tZ93c3HkySt+cQA2/dAK/IKEMBd5xxZydMztg== X-Received: by 2002:a17:902:9f87:b0:189:b0ff:e319 with SMTP id g7-20020a1709029f8700b00189b0ffe319mr29567730plq.58.1672323754752; Thu, 29 Dec 2022 06:22:34 -0800 (PST) Received: from hyeyoo ([114.29.91.56]) by smtp.gmail.com with ESMTPSA id w4-20020a170902e88400b00183c67844aesm12976733plg.22.2022.12.29.06.22.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Dec 2022 06:22:33 -0800 (PST) Date: Thu, 29 Dec 2022 23:22:28 +0900 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: Matthew Wilcox Cc: linux-mm@kvack.org, liam.howlett@oracle.com, surenb@google.com, ldufour@linux.ibm.com, michel@lespinasse.org, vbabka@suse.cz, linux-kernel@vger.kernel.org Subject: Re: [QUESTION] about the maple tree and current status of mmap_lock scalability Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 0D56040008 X-Stat-Signature: eijrpmhr97osid8nqix1nh1aderu47hf X-HE-Tag: 1672323755-279793 X-HE-Meta: U2FsdGVkX18U2Y7QV8jmBDnXzSmT0+/InxwOX7Q0ZLfeZrkhF+z3hTxbMTFAzi8tRu7PdGiT1rsLvxOyjVIQBP5Q7TwHFShcNMCEKTHJhiXHk8ANvDV6NRG8xiE9jmcna0DVlHOTRJkslXmt7eQvvVNjEu13AEa3qpXHZzUwIVs6Rrfm0m4lj0ztu0BjBhIKZgdyRwAwl8qPtADeczIHAshFI7uRGVPvMbg8AsOA0mq9A38g6rWBV0HVaurEkKMfK4iVFcmQIoo5nYWeqGjtzIZZ7gwuXNJXyKx8/XMiaLCjjX291XrwiPV38LjWaUbMdKUlcBabrmuCnhywD0H/k49GPxH72TdIHchbSU0lJPiyPlod9qfX5nR3sA84LGWTDPe9eELMhagLWtoqlhtqI1wXHCCqUuBAWsZ1E8GyvZo2LxkfuBYnuZuCQTLLvZMCc2ynGh3CRZ+3FgdMfYbtppDoEnO9UnK6AmJNhoCaa70a4VqtvNx6bpGHnaD4l7nkeljd6IzLGTtMAOkJ5mULwU5ZmqK7v02M2M12tcyAPFpEByI0vCZuTPHi5aHZpQkOw8H5CIjG1o44ov85Uto7Xgphwici5bLQ8tCmPgugafIubj61N5njmBHJCvfMCa9Q05YhiC1K4j25KR7P2/05MQocNEywEhcHAuXKakUxUIprqm7x6p7LKEJLI/mg5NIzoKmEreog8gJvkoSdsCIepz+qerRB2CHX5dGPeqoAO9Lg/nSVW1yTIOIgavrxMDEmyTo5JmOlACmK/sAbtk4l9M4rfEyJkfg7+EWmkcnEy3arc1IgboS5opNUyqHXGvAXg/PkHiN4kobLXewghyBlYXb1o+qW7qcqlo5CS0Fa1e/puO5ht3Ls/fulWbcF2cEMejQmdY5YPqBjpZwPEToe0EQlkcigXbmT7ghitfb+PhVYjRPgQoQmjiWxJTdtlBxpjPLHm59IBG9hutNb9rU B6eGiRnM ++U/mL/tLu1fvF4vydJA0TZoZ4RMr8GDFvVgNTSNEbUW588xaEPHUJn6qAtBAy2fCWeq/ZUtaQlqDc+MlXT2dKTIAzkVsm9OqbDPbTHqJIl2e1HXlqVyHS1iJf9H98Umq75N2uQ8wQUeUKKao0KQ16V45Y8GeU+v1rNb0cBLpLwB8X8k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Dec 28, 2022 at 08:50:36PM +0000, Matthew Wilcox wrote: > On Wed, Dec 28, 2022 at 09:48:51PM +0900, Hyeonggon Yoo wrote: > > Hello mm folks, > > > > I have a few questions about the current status of mmap_lock scalability. > > > > ============================================================= > > What is currently causing the kernel to use mmap_lock to protect the maple tree? > > ============================================================= > > > > I understand that the long-term goal is to remove the need for mmap_lock in readers > > while traversing the maple tree, using techniques such as RCU or SPF. > > What is the biggest obstacle preventing this from being achieved at this time? > > The long term goal is even larger than this. Ideally, the VMA tree > would be protected by a spinlock rather than a mutex. You mean replacing mmap_lock rwsem with a spinlock? How is that possible if readers can take it for page fault? > That turned out > to be too large a change for the moment (and isn't all that important > compared to enabling RCU readers) Yeah, better to take one step at a time. > > > ================================================== > > How does the maple tree provide RCU-safe manipulation of VMAs? > > ================================================== > > > > Is it similar to the approach suggested in the RCUVM paper (replacing the original > > root node with a new root node that shares most of its nodes and deferring > > the freeing of stale nodes using RCU)? > > > > I'm having difficulty understanding the design of the maple tree in this regard. > > > > [RCUVM paper] https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf > > While I've read the RCUVM paper, I wouldn't say it was particularly an > inspiration. The Maple Tree is independent of the VM; it's a general > purpose B-tree. My intention was to ask how to synchronize with other VMA operations after the tree traversal with RCU. (Because it's unreasonable to handle page fault in RCU read-side critical section) Per-VMA lock seem to solve it by taking the VMA lock in read mode within RCU read-side critical section. > As with any B-tree, when modifying a node, we don't > touch nodes that we don't need to touch. As with any RCU data structure, > we defer freeing things while RCU readers might still have a reference > to them. > > We don't necessarily go all the way to the root node when modifying a > leaf node. For example, if we have this structure: > > Root: Node A, 4000, Node B > Node A: p1, 50, p2, 100, p3, 150, p4, 200, NULL, 250, p6, 1000, p7 > Node B: p8, 4050, p9, 4100, p10, 4150, p11, 4200, NULL, 4250, p13 > > and we replace p4 with a NULL over the whole range from 150-199, > we construct a new Node A2 that contains: > > Node A2: p1, 50, p2, 100, p3, 150, NULL, 250, p6, 1000, p7 > > and we simply write A2 over the entry in Root. Then we mark Node A as > dead and RCU-free Node A. There's no need to replace Root as stores > to a pointer are atomic. Thank you for explaining things in an easy and intuitive way. Okay, I get it's not a big problem to update the value(s) in a B-tree in RCU-safe way. > If we need to rebalance between Node A and > Node B, we will need to create a new Root (as well as both A and B), > mark all of them as dead and RCU-free them. -- Thanks, Hyeonggon