From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A35AD132D5 for ; Mon, 4 Nov 2024 14:48:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B349E6B007B; Mon, 4 Nov 2024 09:48:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AE5716B0082; Mon, 4 Nov 2024 09:48:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9AC346B0083; Mon, 4 Nov 2024 09:48:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7B9FF6B007B for ; Mon, 4 Nov 2024 09:48:12 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E92F11A01BA for ; Mon, 4 Nov 2024 14:48:11 +0000 (UTC) X-FDA: 82748691774.21.137E7C9 Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) by imf18.hostedemail.com (Postfix) with ESMTP id 04FA81C0010 for ; Mon, 4 Nov 2024 14:47:55 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Uyvgcosq; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of aliceryhl@google.com designates 209.85.221.45 as permitted sender) smtp.mailfrom=aliceryhl@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730731523; a=rsa-sha256; cv=none; b=VwVtBjpGBKS/MZkEulQaio+chww3bAY3QoReEG33xpNtd+901litZi+OpFXd+D91j/TuG9 CCbDu4CDSaHzPKaZQ6eSB9rGTmIG0jBPXdjsJV0iyU1oWTWlbGtOKlu7l4Eo7T1etO/wI2 ZFrLeX83KCPjhtDxoUqKpTbt4p8fBY8= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Uyvgcosq; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of aliceryhl@google.com designates 209.85.221.45 as permitted sender) smtp.mailfrom=aliceryhl@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730731523; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/u5WWnKcEl0DXoUZeV4v00gio1Ie7A1y/U+D5nXly8I=; b=TpYh3Xq6DNLJDI3a1Et+81Fv54wyrWTxyhgd8WHAI85LmP0miE9gfYjkhoUuj3s5YT+V8i fdeRm+u3K9XjgcSPkviFq4JUrdEPgL12ZtQY9vAHc7zS+E35fAhyLg/IbixacTCu/qtxc5 ZRwHmExe3p18JbzGYwLe5JK3w5ngthU= Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-37d4ba20075so2954446f8f.0 for ; Mon, 04 Nov 2024 06:48:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730731688; x=1731336488; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/u5WWnKcEl0DXoUZeV4v00gio1Ie7A1y/U+D5nXly8I=; b=UyvgcosqF3mPVBL9X10fjrso/Ye98DIyOUg4Pfi9EHqIctv812pIfqgqOF3fTPbpkE jDOQ5G84NDimEOQ8ptPbv0D7x3EVOCTgKvXQNOziLAgCD59HqG664+LCESqHE3hhI+St ItGHXaKXlQJkDlv3bGeosjycVHC2exPOjC7t5wnI9DsS8jWPlAevDVj/i4/HdH1y3+9y 5ace5vcsKoLUd3jnOhveSPeQJKjG725xRsl/qzvAdbMPWSH0ubitbBEXHSF4Ql9ebkN3 uKxmT2SIU1Q8INKhkzRj8A3qM4Zlxaw44ju9TFRFH16ylG9WwS7DERQDvrMRPJfTQT8S RIkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730731688; x=1731336488; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/u5WWnKcEl0DXoUZeV4v00gio1Ie7A1y/U+D5nXly8I=; b=KcsZuTJDO66g/QwP3wuRAoZ7JM9xaMHBTthIAoG0GeZp0f3bEGNaQP7S45zF5XN6tc CikdmYjdRb7xP7o+DPvz2lLRLgE707LTNtKCIY3bi9tIs2ar+U7a6/ycWQJ6f100B9mz Zeg2HXGiN89Md+/jcBKFyjvCGcLwUs/Ylg7yjaytiH+k/txn2ujRkyk4M0khz6j7By8w vsCHQPlUyADBmQdkeOO496fC1HcTmFtZw9O0k/VxZhJoiDLoYPE/I70UYb3EvhWyyt1P hu0AxaAg1d7xqVr5AMIaV0zho23Nr+FC+VYO7C/cpZmd2l8gi0HYoi0tTJWmHI6UZDy0 8qSg== X-Forwarded-Encrypted: i=1; AJvYcCXvIs8976dI26BXjRIXmSXGmZR3mJcY3vi1HhOnMdbC71uqFVl2vL2ijKW0RLpkwHww+Wxxt5ciow==@kvack.org X-Gm-Message-State: AOJu0YwUrRjedqRdAheF8SDnb1wwXaUvM/2u2b/U0jOfU1ldt4hB/Fd/ GvuM74cIuMu5D63ufHHHKwL308JqcdGe4ogQNE6OELEtvw6QZzr45whUG1oDKcccVDWvUGqQcJq H+sUNtfOqrJ5rlc+HJveBsgX0YiSivLM8b5bD X-Google-Smtp-Source: AGHT+IFLw3PtLMaMhDI1PhPWbZqD5k257Ad79u5K0QmJLkgLoqJXHSE9DXwZfmCLvYqsjJtZ/jdCxdE8rTzWMSDJ8Rk= X-Received: by 2002:a05:6000:2706:b0:381:d88b:21a7 with SMTP id ffacd0b85a97d-381d88b2203mr2277141f8f.49.1730731688294; Mon, 04 Nov 2024 06:48:08 -0800 (PST) MIME-Version: 1.0 References: <20241101185033.131880-1-lorenzo.stoakes@oracle.com> In-Reply-To: <20241101185033.131880-1-lorenzo.stoakes@oracle.com> From: Alice Ryhl Date: Mon, 4 Nov 2024 15:47:56 +0100 Message-ID: Subject: Re: [RFC PATCH] docs/mm: add VMA locks documentation To: Lorenzo Stoakes Cc: Jonathan Corbet , Andrew Morton , "Liam R . Howlett" , Vlastimil Babka , Jann Horn , Boqun Feng , Matthew Wilcox , Mike Rapoport , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: qzr3c79f6ubbqhfn4nseaxk7ipqchmsu X-Rspamd-Queue-Id: 04FA81C0010 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1730731675-332901 X-HE-Meta: U2FsdGVkX1989cj98waBxCqPcw3GhVwI2K/mAf97olhz3DAG7FQrj6e4tJAeyfkx+nVAg/ywcK1ssfgzZBdW+Psw7ahgx/MAEve60Q4FICepLu91TYsUr8AAD+o3c435IYDd79FTfq9nUyCdHOgmziifXArD3nPHBOI6PAUs0HQOKNDBmRIW8u3Se3yxwPgCHFNRfUOU+cDm+jlfyJQnYA52D8RMjXbOzStgC1+1jw+XCkN7mKiQpy44Pbp2T8R7AANeBfqErkjDaiR+DdXqWY0Bg5kTPU8uu0SArbJHRmX+HevZDvELqKhRXo6RO95P//STk5uhfYvL9IgVe7aL0GWaaowDg6FOSi7UfVGSq67FS1IXFalw6hTNbcfzLeM/gmUiRv+15suzfsjS+Pi+dpfFkaSWq7Cn96QXn3FX+vnsLjycecoO63XFKHXitW6Kn+FA3yo6CGj2gh6+1CMEZeYv87IgeX8io9ycrc/o2oryXp4CXPmiyYBEugLF7V1W9ZEgtWUSxjLG8SaeWYcFhBYtMppFOgoiHTPtsiiGAjzITXa+49rypzKCMB1cIyVkT98TqhmCzSUDlSJzZJU4zSfGpkYfKZP5Ir8fEkDjs9n/4pUe58FUd925VuDtee6F/HUzFpejVW29Aa6WvBmKQBre/PLDK1UPlGc1EcLo/KuceIzjvZoNEU6ohBAI7B0TBuLn95aMQ4EMwPRBrcRp1I9/p5yWpOdkivpf326ZeWT25d0dMsDk52rQePP8TQMRwktD7A0Tpv8PQCAuEVA+ALwa7q2Z054lDn0GpJMrBUnXyzk5Evo1VTU8Jo66tbvx2lB1Vj3jYwE8KRTQKAWK+8MFyvNQAAeUKof+9ig0tszFGL5yolqwsac9cxVAtFPBKoeuj1RGZG86/q5Oma/2kiJ1nKuNAvLsjIu4I7IguBkicDaiG6cltEDrEExMWpOpdWtQ0UtwMxUf+9Cdakr 758hjfI6 OR4JSYK20LPNJSt2b77Zr5wntvTRot2C5UuorisurYaIAbxOCizIyomXUpzV9yDpqxanObEMCutc8bnCfoohO9awsvBB13SuawM6Q9T/rCkGpd1oy4hdIgmaEcHb5/8jVPs6kvg6z190iKavLeWxpJt2PjWKon8236mg7PdPiD5T/B3DbKi6Lng7BmhoRfpI5f9yb6436aRon+qTi8JMvgmq1jnhBny6S5+Xan3VmO30mKWLMayTJ67r/V3Hfs4J12eVi4A+w2/NO33v34PyZAx/2uRR7DmOMflhV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Nov 1, 2024 at 7:50=E2=80=AFPM Lorenzo Stoakes wrote: > > Locking around VMAs is complicated and confusing. While we have a number = of > disparate comments scattered around the place, we seem to be reaching a > level of complexity that justifies a serious effort at clearly documentin= g > how locks are expected to be interacted with when it comes to interacting > with mm_struct and vm_area_struct objects. > > This is especially pertinent as regards efforts to find sensible > abstractions for these fundamental objects within the kernel rust > abstraction whose compiler strictly requires some means of expressing the= se > rules (and through this expression can help self-document these > requirements as well as enforce them which is an exciting concept). > > The document limits scope to mmap and VMA locks and those that are > immediately adjacent and relevant to them - so additionally covers page > table locking as this is so very closely tied to VMA operations (and reli= es > upon us handling these correctly). > > The document tries to cover some of the nastier and more confusing edge > cases and concerns especially around lock ordering and page table teardow= n. > > The document also provides some VMA lock internals, which are up to date > and inclusive of recent changes to recent sequence number changes. > > Signed-off-by: Lorenzo Stoakes [...] > +Page table locks > +---------------- > + > +When allocating a P4D, PUD or PMD and setting the relevant entry in the = above > +PGD, P4D or PUD, the `mm->page_table_lock` is acquired to do so. This is > +acquired in `__p4d_alloc()`, `__pud_alloc()` and `__pmd_alloc()` respect= ively. > + > +.. note:: > + `__pmd_alloc()` actually invokes `pud_lock()` and `pud_lockptr()` in = turn, > + however at the time of writing it ultimately references the > + `mm->page_table_lock`. > + > +Allocating a PTE will either use the `mm->page_table_lock` or, if > +`USE_SPLIT_PMD_PTLOCKS` is defined, used a lock embedded in the PMD phys= ical > +page metadata in the form of a `struct ptdesc`, acquired by `pmd_ptdesc(= )` > +called from `pmd_lock()` and ultimately `__pte_alloc()`. > + > +Finally, modifying the contents of the PTE has special treatment, as thi= s is a > +lock that we must acquire whenever we want stable and exclusive access t= o > +entries pointing to data pages within a PTE, especially when we wish to = modify > +them. > + > +This is performed via `pte_offset_map_lock()` which carefully checks to = ensure > +that the PTE hasn't changed from under us, ultimately invoking `pte_lock= ptr()` > +to obtain a spin lock at PTE granularity contained within the `struct pt= desc` > +associated with the physical PTE page. The lock must be released via > +`pte_unmap_unlock()`. > + > +.. note:: > + There are some variants on this, such as `pte_offset_map_rw_nolock()`= when we > + know we hold the PTE stable but for brevity we do not explore this. > + See the comment for `__pte_offset_map_lock()` for more details. > + > +When modifying data in ranges we typically only wish to allocate higher = page > +tables as necessary, using these locks to avoid races or overwriting any= thing, > +and set/clear data at the PTE level as required (for instance when page = faulting > +or zapping). Speaking as someone who doesn't know the internals at all ... this section doesn't really answer any questions I have about the page table. It looks like this could use an initial section about basic usage, and the detailed information could come after? Concretely, if I wish to call vm_insert_page or zap some pages, what are the locking requirements? What if I'm writing a page fault handler? Alice