From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BAFAC00A5A for ; Tue, 17 Jan 2023 22:33:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F3B626B0071; Tue, 17 Jan 2023 17:33:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EEB2C6B0072; Tue, 17 Jan 2023 17:33:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB2D26B0073; Tue, 17 Jan 2023 17:33:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CE8BB6B0071 for ; Tue, 17 Jan 2023 17:33:22 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 96616405B8 for ; Tue, 17 Jan 2023 22:33:22 +0000 (UTC) X-FDA: 80365743444.16.7D18892 Received: from mail-yw1-f169.google.com (mail-yw1-f169.google.com [209.85.128.169]) by imf24.hostedemail.com (Postfix) with ESMTP id 0AA3D18000F for ; Tue, 17 Jan 2023 22:33:20 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=WXapkaaU; spf=pass (imf24.hostedemail.com: domain of surenb@google.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673994801; a=rsa-sha256; cv=none; b=nJAMBkgAh1JhJ9C/TPn/9AlWmjq7vNxwC2V2aSSFgmPfjKuSeu3I6Y59WxzXjr1XTbX03f M8/ChGN7Xmy0V9UUDkhJy7qwNuoxwGdFx2TFXzDKhdoPKDTwNRhejKp1hujr8NvfeM5a8z tcWQyybmHsDvP8nsVenIQ9atYCiyebM= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=WXapkaaU; spf=pass (imf24.hostedemail.com: domain of surenb@google.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673994801; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=l6Pu7+4aPnAnLv5bWCRJUVspIRgH9CCgCfRjxyIMzEU=; b=YjE3zjiabJZN/gucvzKh65logie7RCmxDTiQfXniAXXO1vIdcaVWaLmmLNMC37UYup6RaD VJJZKWUswkzpmeKPD9rk27TDEKf8GViTSXtj5NbFmOsrWMLUALLwDiEQvlQMrPUJSifKGd 2CLDx58LkkxeQYNG/5KIXjCZm8r/Cyc= Received: by mail-yw1-f169.google.com with SMTP id 00721157ae682-4b718cab0e4so443879337b3.9 for ; Tue, 17 Jan 2023 14:33:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=l6Pu7+4aPnAnLv5bWCRJUVspIRgH9CCgCfRjxyIMzEU=; b=WXapkaaU4Iq6lFD3/jgzkfqjRX6r4ZbOdG53AqnSWkJqcj+NABq+nFQQ97h3/iDE4W Za0ialnTuq4r2yl1WpFAAX2CMaDZ9tJyVM1HDZmoCBeJpFIJUuVXTFCHRRFAc3POaO1X y1PPSFI4MOlhK4eGFhPMSua/cuf4D+PEZjgBkV413b3KBeTpOuLyAIsaZKslzIbmFUMe k7vQoybQvOGIqXEwEFhsURvnlRThGcGKTRI8uwhgN9T+niHiObTfbgU32joQ07OrF3k1 kOpzjvFCBeIneShTfDzdhfIhRUdsEtGGvd2zanDGDf1gk/nIkJI9tGKmQ4HePeYjEUfm ACMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=l6Pu7+4aPnAnLv5bWCRJUVspIRgH9CCgCfRjxyIMzEU=; b=Akxx2ahCBB2sJQ3sldaYBMZJ9t7XGKx8FNbP4iBb/+kVqFbrVl1W/sq5JJLKB1u0MG AoPboxRxBqnrU7honYB5oqgDmHDYetKsYjcedfm8ys1vlvksUAm9pX2zTrbnTdpEm/n/ kSP4U5LIL/LZMxv5jl1XejazD45U8Vxr5b1EGjgSoqQN1wegdwfEZLsxuQuOpAUNOygO N4pE6LfEYAA38lVfaT9uNdNY91KLthJ/TYrG8dXKfmrGV/coGem5JxwC3MV3wKddQ6tO qBhgb5wwf3vtZWVBeOtsEVyU7Auu9mUAUkB497fcZHs68YFj9LCpBxIx8MnbhqY8YkTl ycnQ== X-Gm-Message-State: AFqh2kocD+kdJ9x0KN9OayWuCoex6a4Z4gedqOosWfyoSy4Gx/5F2IgO UF/vRQZvsgfn6PoK7mZTFZYYcEiWsvUXOaoQOZ03Xg== X-Google-Smtp-Source: AMrXdXtPEWGhfFtAZyaB6FCmy/ElQF0QS8mVbjf/1PWGddpXj66ZQtHOzMYnavA7o2Coxw4Jydy4LOOdnf1j6/Ecn7I= X-Received: by 2002:a0d:d484:0:b0:4dc:4113:f224 with SMTP id w126-20020a0dd484000000b004dc4113f224mr688104ywd.455.1673994799864; Tue, 17 Jan 2023 14:33:19 -0800 (PST) MIME-Version: 1.0 References: <20230109205336.3665937-1-surenb@google.com> <20230109205336.3665937-13-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Tue, 17 Jan 2023 14:33:08 -0800 Message-ID: Subject: Re: [PATCH 12/41] mm: add per-VMA lock and helper functions to control it To: Matthew Wilcox Cc: Michal Hocko , akpm@linux-foundation.org, michel@lespinasse.org, jglisse@google.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, paulmck@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, hughlynch@google.com, leewalsh@google.com, posk@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Queue-Id: 0AA3D18000F X-Rspamd-Server: rspam01 X-Stat-Signature: 1grpexstqsoouss4rxz3khsph1uc7s59 X-HE-Tag: 1673994800-816953 X-HE-Meta: U2FsdGVkX19ERemiGsvobItoENJEb13Qvhv+MrdOkc1DxHzCwNukwwUAJpsJwA6W2hVEMX/rNkXMeROJYxiyfuYBXsNJJlEVhzolr8cdfHecumRFzo/aQZm/TNe4Rm7Xxh53yjn6Zn42mPw7jqcd+E+wp0ne0VMpMQr2EItkDUohyusJ2UpW6o9uAv4dRHAytfu8Cnzmve5DTjPRqy55cZJJv+jsAeI4iDGf/ABLEy5Ic4N9eAK7SBwHu367U7pxNJS7qnThXu9Kf79BmusWtLmulE78OpVQHBbu3kP5WBItd+dOUO+s1bcb+xbj4DdMXqBeG0QPgoCAyUPTbWllHYg5KoFn3FQFa9PVkBSjCiuraLE09vRYsfsikvUhNkliQBVmfuHK0dPe3Ayj6wbYrhMf5Mos8Xt1hFNmFxaAhWqnzPeS01eApvV6MyrEcbwq918z0K5DKIGnNLOzVKqUnbXXPrnnZBzClpLw4A7DTQROWXlwv6foWZ7+0VrJzrmX6lVuxQTPgKYbl407hYcxZdfruaQ64HUSgQeMo5wyX3ZqJ7SG8kyCc5fBBNoxQ4nLXIv9Edq0hhhqQpTzWK5Rwk4sqPj/SmcBB7qupgWQixrrph3JanvJRb7QHGQPOyUEqKjX3zpisYbCgmrTAilsS5zfl14ODma/mwpWdFfRaxMzgOm7m25YYsGgpuiw3o58SZvJQggYewEQRkZd/F4zWWl7Ddozwy3sK+IOHEMFNdSzhG6g1pSt0R14qLuWA+KpjYabJCqomB6QK6pSCW+eBG347+nw0LP1xokvcj3HGTntzslPCRpll11kOqdV5XYoQyNVpUP3jKFpU5m9UbQIa8rvxveC2eR+V3SDi1kQwDK1O2DPpSqvwx7cHqsjA2feIQ7RU6Svd1ygJZzC/CnVGLXo382sF1AYzJV7OSava3QyP+rvvvKjutnW6GjR3ufloIG3qdusrbWuZLA9dJX CzjFaoCy dPQOuNhl7mll5Y6s= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 17, 2023 at 1:54 PM Matthew Wilcox wrote: > > On Tue, Jan 17, 2023 at 01:21:47PM -0800, Suren Baghdasaryan wrote: > > On Tue, Jan 17, 2023 at 7:12 AM Michal Hocko wrote: > > > > > > On Tue 17-01-23 16:04:26, Michal Hocko wrote: > > > > On Mon 09-01-23 12:53:07, Suren Baghdasaryan wrote: > > > > > Introduce a per-VMA rw_semaphore to be used during page fault handling > > > > > instead of mmap_lock. Because there are cases when multiple VMAs need > > > > > to be exclusively locked during VMA tree modifications, instead of the > > > > > usual lock/unlock patter we mark a VMA as locked by taking per-VMA lock > > > > > exclusively and setting vma->lock_seq to the current mm->lock_seq. When > > > > > mmap_write_lock holder is done with all modifications and drops mmap_lock, > > > > > it will increment mm->lock_seq, effectively unlocking all VMAs marked as > > > > > locked. > > > > > > > > I have to say I was struggling a bit with the above and only understood > > > > what you mean by reading the patch several times. I would phrase it like > > > > this (feel free to use if you consider this to be an improvement). > > > > > > > > Introduce a per-VMA rw_semaphore. The lock implementation relies on a > > > > per-vma and per-mm sequence counters to note exclusive locking: > > > > - read lock - (implemented by vma_read_trylock) requires the the > > > > vma (vm_lock_seq) and mm (mm_lock_seq) sequence counters to > > > > differ. If they match then there must be a vma exclusive lock > > > > held somewhere. > > > > - read unlock - (implemented by vma_read_unlock) is a trivial > > > > vma->lock unlock. > > > > - write lock - (vma_write_lock) requires the mmap_lock to be > > > > held exclusively and the current mm counter is noted to the vma > > > > side. This will allow multiple vmas to be locked under a single > > > > mmap_lock write lock (e.g. during vma merging). The vma counter > > > > is modified under exclusive vma lock. > > > > > > Didn't realize one more thing. > > > Unlike standard write lock this implementation allows to be > > > called multiple times under a single mmap_lock. In a sense > > > it is more of mark_vma_potentially_modified than a lock. > > > > In the RFC it was called vma_mark_locked() originally and renames were > > discussed in the email thread ending here: > > https://lore.kernel.org/all/621612d7-c537-3971-9520-a3dec7b43cb4@suse.cz/. > > If other names are preferable I'm open to changing them. > > I don't want to bikeshed this, but rather than locking it seems to be > more: > > vma_start_read() > vma_end_read() > vma_start_write() > vma_end_write() > vma_downgrade_write() Couple corrections, we would have to have vma_start_tryread() and vma_end_write_all(). Also there is no vma_downgrade_write(). mmap_write_downgrade() simply does vma_end_write_all(). > > ... and that these are _implemented_ with locks (in part) is an > implementation detail? > > Would that reduce people's confusion? > > > > > > > > - write unlock - (vma_write_unlock_mm) is a batch release of all > > > > vma locks held. It doesn't pair with a specific > > > > vma_write_lock! It is done before exclusive mmap_lock is > > > > released by incrementing mm sequence counter (mm_lock_seq). > > > > - write downgrade - if the mmap_lock is downgraded to the read > > > > lock all vma write locks are released as well (effectivelly > > > > same as write unlock). > > > -- > > > Michal Hocko > > > SUSE Labs