From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 586B3D1BDF5 for ; Mon, 4 Nov 2024 21:26:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1B756B009E; Mon, 4 Nov 2024 16:26:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CCC2A6B00A0; Mon, 4 Nov 2024 16:26:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B93BD6B00A1; Mon, 4 Nov 2024 16:26:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9AFBC6B009E for ; Mon, 4 Nov 2024 16:26:33 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4C7DA80B35 for ; Mon, 4 Nov 2024 21:26:33 +0000 (UTC) X-FDA: 82749695406.06.5F7A095 Received: from mail-lf1-f41.google.com (mail-lf1-f41.google.com [209.85.167.41]) by imf19.hostedemail.com (Postfix) with ESMTP id E50961A000D for ; Mon, 4 Nov 2024 21:25:50 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oEZyPCsJ; spf=pass (imf19.hostedemail.com: domain of jannh@google.com designates 209.85.167.41 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730755370; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ioV/dL07mv9AW1513MtHFX0/Uum+Qg2GuPUOPrtpBeA=; b=CnBS5h4a665USy57X5D7xpjzSax42Mz9mnzDd3tJIjpGVcddz0eya/AwKvaU7w93BsPpQb lg4T3ydD2UqkvNduOEMLBJzG+xz5oR9cojsAn0/HDZ89RPzmryIq8b0xxLf3y8vi3h8vYq +X2c7z5fLMXnscddvtSGDoFiJW3q8ho= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oEZyPCsJ; spf=pass (imf19.hostedemail.com: domain of jannh@google.com designates 209.85.167.41 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730755370; a=rsa-sha256; cv=none; b=dswZ+FXLxeVYLF5C0DemephFS70kpyVUdn/+GBTfPi2HqP3LynylcbfDvjggk/qddDMawM fgc6BXvc19D3MCtolGauD5Jok/QjHSkP4vG5yUnQSeAG+RJ0HRYzso9Cy86o0zl0/a7ZTL EJm0zHv/UIniMSBpIq5Mdg3vaA8DBis= Received: by mail-lf1-f41.google.com with SMTP id 2adb3069b0e04-539e681ba70so1359e87.1 for ; Mon, 04 Nov 2024 13:26:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730755589; x=1731360389; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ioV/dL07mv9AW1513MtHFX0/Uum+Qg2GuPUOPrtpBeA=; b=oEZyPCsJR5wgmdFJ2ZzScprbqvBF5s1iSDnwAZ/dMxl88pQ+iV+IKoip75Q/lGS5Hg Ye+UdWONKBAYX6KuA0LSnwC4ufUU9JT/80oV0apxFyi83N8Dn5/jTHnMeJ/29BqEyMHa rcumLjAaJLIXQ1SwdJrJXLNPCUE10Qjnaa8+2n3TA9856koOdSESZbV+RLcT/zWBpqh2 DNA7dZrXmzG0pJg0//d2STD9KaFVctGD5Ugws+S8nCvgIRZ931MUwAxVsf79sZWL85kS exDPxU0PKumiaSkUcnrjwcSxBSnKxGyl50niQRYuqz7nQ9Lug7x+hz2b+/Kd/JPWdqU1 0jmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730755589; x=1731360389; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ioV/dL07mv9AW1513MtHFX0/Uum+Qg2GuPUOPrtpBeA=; b=UrJcPR2D676cs4q9yn4xgyAGhkc4gaXdLtZ6EOgK1LpCBa4ob5xZkL2Sqt7ks+7jmg csYjJlK8EtlHyi//U3BETIqi5LSS57oGBm8ZP9b1Rqi8Rm2GEcfbPhz4WbD1Ol4rh1rC elZoBN3WGcH0+nZkyJpCriemLlRGeeu4DGXvQ/nMjq3ROkAVGcMKDpOmeCFy2gG/wDRU d7LSS82nyx/yJW5mrtS2CnHG1l+c3ypMbpXK6usSSmJpFvelSxZqPFzn+PxqM0xXD+q5 y+MBauLsr170lI8wr+2Clm26SdBmTPO3+jFctMwIjyWvUxK8Gpo7Gq3d9qbFapepJkiv /EeA== X-Forwarded-Encrypted: i=1; AJvYcCXxxVB7qqy1A10WdwkTVMCxHEbba0N5Fc6yIMBK5siNOnLHe9r99rhwIvq42bGcOdlCwiRErhujBA==@kvack.org X-Gm-Message-State: AOJu0YxJBdcJr1cDjw/+Yknu/Zk07IX3lnVImOYrDtI05FOETaPrgaWw /6X6MmAzvSSr29v65+Dg26XpkIvBht6p/7orqV9bFcw96aTHWSo8eSi1saEqlqf5mHyBZpZ5taQ 6TtIKUPXK1qe0/MT8pad6RpkoLPixzFgDcOjr X-Gm-Gg: ASbGncv0AutZcuXhq/R+9qZfd0JVxhXRfBFbdy92Uv0j/PHD+6juS3ylIH91bXBvx4u 9NvXuYW2WY39aho/MJ7eXzToN+NlsphAI/FQwsE9blDmjuHO3JotPmcnWCxk= X-Google-Smtp-Source: AGHT+IH6R4d/hC/lwyft9RVEnMEbq2YoOGPdQlrePd5m8ZFmQ7lY9g7VTeIfA5PsHS/hHKz5PrijBkxMZmNU3DWbS5M= X-Received: by 2002:a19:385c:0:b0:535:3d14:1313 with SMTP id 2adb3069b0e04-53d78288ecemr107678e87.0.1730755588860; Mon, 04 Nov 2024 13:26:28 -0800 (PST) MIME-Version: 1.0 References: <20241101185033.131880-1-lorenzo.stoakes@oracle.com> <03cc76df-8814-4d0a-83a6-447212ef2370@lucifer.local> In-Reply-To: <03cc76df-8814-4d0a-83a6-447212ef2370@lucifer.local> From: Jann Horn Date: Mon, 4 Nov 2024 22:25:51 +0100 Message-ID: Subject: Re: [RFC PATCH] docs/mm: add VMA locks documentation To: Lorenzo Stoakes Cc: Suren Baghdasaryan , Jonathan Corbet , Andrew Morton , "Liam R . Howlett" , Vlastimil Babka , Alice Ryhl , Boqun Feng , Matthew Wilcox , Mike Rapoport , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E50961A000D X-Stat-Signature: aq5wu8m1pwsa5tndqn177tbs59et6uw4 X-Rspam-User: X-HE-Tag: 1730755550-417587 X-HE-Meta: U2FsdGVkX19JQJesmVOsEdoIDIBgczPQNAl77U+syMcVTLNTH9wGX1hJZMZxxsi+nqDCM55WLjDwHYBsz2udDIKh0IHpEC10yGrQ81Wm/GD/BRxcsVctDd4Icv3YWxzagr5AZrBbH7MPeGesNav1pznWdc2G5Am52GFGnaXGFOKy4r8XyE9MmkeumKnYxQK3a2s0Z/GwNqzk8wVk9cqi22222eM/+w1eB77+mhJGri1FQvj9GgKX2xM0aAqf5zlRW1XWOkg5D07yK5pARm6fQ7I77r1AZOdYDXt8oQWLpAEI+7J8Eu+JZgaVIJqBw6GYHOK8YCIPa4sN1bdzDPzXsXI4iD9URHvq+sCxRBka58hFCaaRaSRCK845pkBJ6PFKw9uw1kashzD9lzuEfTed1HmXTnywT3GfSmuX/0qZeNlTMURHM/PkUKda5EkevbN1rfoCu5rN3p2dwZ90d5yt6+sLUrHIYxluqL4jERNVq+6D1kywHwHxAVEvPjU1dFBRoVpD8BHpmnY3crHW0Ndz6wfA5SFHH+3aROmHophJbq+3EZde4bTGX6vhOFZKYSaCv8x86bfvKQLhMEJ5bImsp9rY7jG0WxelHCs+REjQ6c1qNWJ8cNv2CEu8BY+TWM10RbZBCDvJwZfTsOPYD+ohc2luaEl63XPlzdmjj/SZGaRwTH/9AEkXQYVS8MALneOsZi/9jHvObSZzOliGNcIVPPNTPGhcB3GlIgj85Vf40/+y/twfMV4hXrwmgMWcCU5DxVtTgyW8VojRRZWrsjePCfDDfgbZuZJ8knC/N4uKNRuKttktjpuK3yzawvK0TBQ0cTHC4HEhrcM5jTA2w0Q+gpvNZlS9ZcEiQIr7zsH0FIMOJPFf+i5VHeRRWaiz+7fb6V6U8gcEK5b29WXdeAykPNTz4KFp67HfzHgyoIKFDbYEsEcF3EQBtSPoRJqRqnxtLCvuX4BP7ac2eT8M/qq y76a5nvH KA3Lo3M2dTBm60Kco0Ibg21AeWo3n3VhXuNu0t2wyBUYAb1tAjKkVvq7denqSImJ2muv9e/rQzTxO+A4mvIU2BvHSqF/6nncfpVUpo7dm9z1xZkWbz09lhErD8jZIDd3yvPOSPyIW0W8AmcaDABRSx3ZIzOdp8QNQ3anvoQo2CIk7XXwCwTAf1POHeN/+M+IVXLpcVEGnU8kpqfH64qCS8TqkYxLNQjCSSuvjkV4P256o+JKo33qIPXQs4h2rlXIePajgGDLRFLPpv7BWRUOFy2is3yNrbmKN6Rfu X-Bogosity: Ham, tests=bogofilter, spamicity=0.000015, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 4, 2024 at 10:04=E2=80=AFPM Lorenzo Stoakes wrote: > On Mon, Nov 04, 2024 at 09:01:46AM -0800, Suren Baghdasaryan wrote: > > On Fri, Nov 1, 2024 at 11:51=E2=80=AFAM Lorenzo Stoakes > > wrote: > > > +MM and VMA locks > > > +---------------- > > > + > > > +There are two key classes of lock utilised when reading and manipula= ting VMAs - > > > +the `mmap_lock` which is a read/write semaphore maintained at the `m= m_struct` > > > +level of granularity and, if CONFIG_PER_VMA_LOCK is set, a per-VMA l= ock at the > > > +VMA level of granularity. > > > + > > > +.. note:: > > > + > > > + Generally speaking, a read/write semaphore is a class of lock whi= ch permits > > > + concurrent readers. However a write lock can only be obtained onc= e all > > > + readers have left the critical region (and pending readers made t= o wait). > > > + > > > + This renders read locks on a read/write semaphore concurrent with= other > > > + readers and write locks exclusive against all others holding the = semaphore. > > > + > > > +If CONFIG_PER_VMA_LOCK is not set, then things are relatively simple= - a write > > > +mmap lock gives you exclusive write access to a VMA, and a read lock= gives you > > > +concurrent read-only access. > > > + > > > +In the presence of CONFIG_PER_VMA_LOCK, i.e. VMA locks, things are m= ore > > > +complicated. In this instance, a write semaphore is no longer enough= to gain > > > +exclusive access to a VMA, a VMA write lock is also required. > > > > I think "exclusive access to a VMA" should be "exclusive access to mm" > > if you are talking about mmap_lock. > > Right, but in the past an mm write lock was sufficient to gain exclusive > access to a _vma_. I will adjust to say 'write semaphore on the mm'. We might want to introduce some explicit terminology for talking about types of locks in MM at some point in this document. Like: - "high-level locks" (or "metadata locks"?) means mmap lock, VMA lock, address_space lock, anon_vma lock - "pagetable-level locks" means page_table_lock and PMD/PTE spinlocks - "write-locked VMA" means mmap lock is held for writing and VMA has been marked as write-lock - "rmap locks" means the address_space and anon_vma locks - "holding the rmap locks for writing" means holding both (if applicable= ) - "holding an rmap lock for reading" means holding one of them - "read-locked VMA" means either mmap lock held for reading or VMA lock held for reading That might make it a bit easier to write concise descriptions of locking requirements in the rest of this document and keep them > > > +The VMA lock is implemented via the use of both a read/write semapho= re and > > > +per-VMA and per-mm sequence numbers. We go into detail on this in th= e VMA lock > > > +internals section below, so for the time being it is important only = to note that > > > +we can obtain either a VMA read or write lock. > > > + > > > +.. note:: > > > + > > > + VMAs under VMA **read** lock are obtained by the `lock_vma_under_= rcu()` > > > + function, and **no** existing mmap or VMA lock must be held, This= function > > > > "no existing mmap or VMA lock must be held" did you mean to say "no > > exclusive mmap or VMA locks must be held"? Because one can certainly > > hold a read-lock on them. > > Hmm really? You can hold an mmap read lock and obtain a VMA read lock too > irrespective of that? I think you can call lock_vma_under_rcu() while already holding the mmap read lock, but only because lock_vma_under_rcu() has trylock semantics. (The other way around leads to a deadlock: You can't take the mmap read lock while holding a VMA read lock, because the VMA read lock may prevent another task from write-locking a VMA after it has already taken an mmap write lock.) > > > +mmap write lock downgrading > > > +--------------------------- > > > + > > > +While it is possible to obtain an mmap write or read lock using the > > > +`mm->mmap_lock` read/write semaphore, it is also possible to **downg= rade** from > > > +a write lock to a read lock via `mmap_write_downgrade()`. > > > + > > > +Similar to `mmap_write_unlock()`, this implicitly terminates all VMA= write locks > > > +via `vma_end_write_all()` (more or this behaviour in the VMA lock in= ternals > > > +section below), but importantly does not relinquish the mmap lock wh= ile > > > +downgrading, therefore keeping the locked virtual address space stab= le. > > > + > > > +A subtlety here is that callers can assume, if they invoke an > > > +mmap_write_downgrade() operation, that they still have exclusive acc= ess to the > > > +virtual address space (excluding VMA read lock holders), as for anot= her task to > > > +have downgraded they would have had to have exclusive access to the = semaphore > > > +which can't be the case until the current task completes what it is = doing. > > > > I can't decipher the above paragraph. Could you please dumb it down > > for the likes of me? > > Since you're smarter than me this indicates I am not being clear here :) > Actually reading this again I've not expressed this correctly. > > This is something Jann mentioned, that I hadn't thought of before. > > So if you have an mmap write lock, you have exclusive access to the mmap > (with the usual caveats about racing vma locks unless you vma write lock)= . > > When you downgrade you now have a read lock - but because you were > exclusive earlier in the function AND any new caller of the function will > have to acquire that same write lock FIRST, they all have to wait on you > and therefore you have exclusive access to the mmap only with a read map. > > So you are actually guaranteed that nobody else can be racing you _in tha= t > function_, and equally no other writers can arise until you're done as yo= ur > holding the read lock prevents that. > > Jann - correct me if I'm wrong or missing something here. > > Will correct this unless Jann tells me I'm missing something on this :) Yeah, basically you can hold an rwsem in three modes: - reader (R) - reader that results from downgrading a writer (D) - writer (W) and this is the diagram of which excludes which (view it in monospace, =E2=9C=94 means mutually exclusive): | R | D | W =3D=3D|=3D=3D=3D|=3D=3D=3D|=3D=3D=3D R | =E2=9C=98 | =E2=9C=98 | =E2=9C=94 --|---|---|--- D | =E2=9C=98 | =E2=9C=94 | =E2=9C=94 --|---|---|--- W | =E2=9C=94 | =E2=9C=94 | =E2=9C=94 So the special thing about downgraded-readers compared to normal readers is that they exclude other downgraded-readers.