From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3F66C54E76 for ; Tue, 17 Jan 2023 21:55:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 742826B0074; Tue, 17 Jan 2023 16:55:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F1AB6B007B; Tue, 17 Jan 2023 16:55:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B9F26B007D; Tue, 17 Jan 2023 16:55:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4CDAC6B0074 for ; Tue, 17 Jan 2023 16:55:30 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 29EA5A0C16 for ; Tue, 17 Jan 2023 21:55:30 +0000 (UTC) X-FDA: 80365648020.23.6CE0262 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf27.hostedemail.com (Postfix) with ESMTP id C65C74000F for ; Tue, 17 Jan 2023 21:55:27 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=O6tZSHHz; spf=none (imf27.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673992528; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dYK7QHEqGROj9cTFeHvfSuKosaNcqMrUb865g495bOk=; b=CAUNuGo2ClT8SOYGcPEjjOIB1rlCeGmeUPuWz+LyJ4p9jaEQFZje/xjZSm2eQ/KLj7N5Cw AofgDGJrHygcfW6wWbBRl+B1lVfc9lEPbrQqiiH6BS7NLWTGA4t7vhBtt5pPkcqaGFo/td B03EmjZXOYRL5Vz+jqoHSPGfZH0tGdQ= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=O6tZSHHz; spf=none (imf27.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673992528; a=rsa-sha256; cv=none; b=3f5mVsB1Z9BUy65iwqK/Aiegj1yXSDUZOquj/4Bl7jF0ngmUar+ARAc7aURLOVRVcnvTwz CXd3XY9BaVFhZCLpQCyp9DAVNUBP3Co41AGe7G4RrOnKkvBbwt08t7E+F1NeWQ5rtQ4QWZ 8KohOGiJrM/wZ2Vogjec/+udVQ9R7uA= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=dYK7QHEqGROj9cTFeHvfSuKosaNcqMrUb865g495bOk=; b=O6tZSHHzGKJZX1iAYHUlbpNdcq ZxrxvmTmmsbqr936EiEWOWMMqob8GgiHmdccZoQjFy2M7JSHlbZzbvs2TchncI+j0k3viYTmseIPC SRjOwRtExn9hUARvpUyEETZYLIXkShGyDfwMwJ/WYav3L6y7Iyc00H+U3nCuYm8W5n4LDo5BjuZ9p +4aRxAyoACCM8sQessBpBPk+zQMyvZ5Hh/ve5p02fFVP7yXt2nK4sqO5wiKupykfS5P04QW3XJyeU 2jtbz1crNDA/cVwMfi6Bbdl8kHWAt1s79upbuw/UbeV07ohd0sPjx5OZI3OnS4vNsVmqG228akiwH AVrAhxbQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pHtv0-00A4f1-Fg; Tue, 17 Jan 2023 21:54:58 +0000 Date: Tue, 17 Jan 2023 21:54:58 +0000 From: Matthew Wilcox To: Suren Baghdasaryan Cc: Michal Hocko , akpm@linux-foundation.org, michel@lespinasse.org, jglisse@google.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, paulmck@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, hughlynch@google.com, leewalsh@google.com, posk@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH 12/41] mm: add per-VMA lock and helper functions to control it Message-ID: References: <20230109205336.3665937-1-surenb@google.com> <20230109205336.3665937-13-surenb@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C65C74000F X-Rspam-User: X-Stat-Signature: zaoyd8ct7ejwu41fe76fd51isighywqb X-HE-Tag: 1673992527-614631 X-HE-Meta: U2FsdGVkX1/67knyvQHjc3K5qv4sqr1/pf0HDZBzeRe6cIxpm9FefMNGZLFdbv+Sm8QkQLB8pl94TQnXMxWMiuoL8WVLJnq7o4MGPS7HbIf99yJG41rKceC21teQXkKHnDWu6I0h4psVptc+Y9kBH+pprSuuFqujNR30KOtep6iaPT444wq/dFLgCjfzvbfGqTt6RS0w7b9D4uZtb8mHBX1UUkEYpdJ4SsFB4UVWNRab5EyN2fEpkiXY/RECCYjR8V1fKmxzaiANdzcKe4MVNCUGl891NEejcKu8avlyebxNJ+HyY5Sa9IJvyqiJok89KsFH+I1Q2XMMemhDVLJ8AyElFyOit6T0mxcVEJDqF0pJbz6+o+Id8zTY+i8VnGTBXWGmWDbKlZ+xZ51mNNy4h3+SqzbuFIP8SNCWQaC78KB2UFwJSq5iOxBOZ6k22m+1nQE/l3i689TbIac1Z3Imt4wj36OjDmmHIDYlDdbZogk5XDJj/6X74r+btmIa/BBhlFXiSSZt6V2yU4E0/fifqv7JmZyGCD+6nEpfUW4eDq5/kFLadAlOByHrLDjaU9MEo09uJ61ehtL6e7mUQcGdzc13XN2BgcwMxIMwx355xJLGXB8MIkqUKIGk/U4FAJYnJBC8M0YNKikDw1D0MYiSl2n8znFLD7l2k2MKktibaXncmATyyQH2TIK7WaXW6itiFQ+XevGXO+6etnGXfWEq8bMuDxrZmlyOo6Rk7MB1cRInu4yuehtnurt5lqU0bZLkzyp0ZPvNT+ZIbEx+ubvCSzExUm5WSM4j4k04cz9w6FA64jGQF3Lzy/3RBNJUa/wm9DZNCS0Th2M9fl7AfPleBcBNW4FbEhkpy4aZKGp9TqY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 17, 2023 at 01:21:47PM -0800, Suren Baghdasaryan wrote: > On Tue, Jan 17, 2023 at 7:12 AM Michal Hocko wrote: > > > > On Tue 17-01-23 16:04:26, Michal Hocko wrote: > > > On Mon 09-01-23 12:53:07, Suren Baghdasaryan wrote: > > > > Introduce a per-VMA rw_semaphore to be used during page fault handling > > > > instead of mmap_lock. Because there are cases when multiple VMAs need > > > > to be exclusively locked during VMA tree modifications, instead of the > > > > usual lock/unlock patter we mark a VMA as locked by taking per-VMA lock > > > > exclusively and setting vma->lock_seq to the current mm->lock_seq. When > > > > mmap_write_lock holder is done with all modifications and drops mmap_lock, > > > > it will increment mm->lock_seq, effectively unlocking all VMAs marked as > > > > locked. > > > > > > I have to say I was struggling a bit with the above and only understood > > > what you mean by reading the patch several times. I would phrase it like > > > this (feel free to use if you consider this to be an improvement). > > > > > > Introduce a per-VMA rw_semaphore. The lock implementation relies on a > > > per-vma and per-mm sequence counters to note exclusive locking: > > > - read lock - (implemented by vma_read_trylock) requires the the > > > vma (vm_lock_seq) and mm (mm_lock_seq) sequence counters to > > > differ. If they match then there must be a vma exclusive lock > > > held somewhere. > > > - read unlock - (implemented by vma_read_unlock) is a trivial > > > vma->lock unlock. > > > - write lock - (vma_write_lock) requires the mmap_lock to be > > > held exclusively and the current mm counter is noted to the vma > > > side. This will allow multiple vmas to be locked under a single > > > mmap_lock write lock (e.g. during vma merging). The vma counter > > > is modified under exclusive vma lock. > > > > Didn't realize one more thing. > > Unlike standard write lock this implementation allows to be > > called multiple times under a single mmap_lock. In a sense > > it is more of mark_vma_potentially_modified than a lock. > > In the RFC it was called vma_mark_locked() originally and renames were > discussed in the email thread ending here: > https://lore.kernel.org/all/621612d7-c537-3971-9520-a3dec7b43cb4@suse.cz/. > If other names are preferable I'm open to changing them. I don't want to bikeshed this, but rather than locking it seems to be more: vma_start_read() vma_end_read() vma_start_write() vma_end_write() vma_downgrade_write() ... and that these are _implemented_ with locks (in part) is an implementation detail? Would that reduce people's confusion? > > > > > - write unlock - (vma_write_unlock_mm) is a batch release of all > > > vma locks held. It doesn't pair with a specific > > > vma_write_lock! It is done before exclusive mmap_lock is > > > released by incrementing mm sequence counter (mm_lock_seq). > > > - write downgrade - if the mmap_lock is downgraded to the read > > > lock all vma write locks are released as well (effectivelly > > > same as write unlock). > > -- > > Michal Hocko > > SUSE Labs