From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57BF9D2B92D for ; Tue, 5 Nov 2024 13:57:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E0A606B0098; Tue, 5 Nov 2024 08:56:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D92CE6B009A; Tue, 5 Nov 2024 08:56:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C0CC76B009B; Tue, 5 Nov 2024 08:56:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9FA906B0098 for ; Tue, 5 Nov 2024 08:56:59 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 29DCE1C50F7 for ; Tue, 5 Nov 2024 13:56:59 +0000 (UTC) X-FDA: 82752192222.14.424A8F1 Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) by imf19.hostedemail.com (Postfix) with ESMTP id BF8951A001C for ; Tue, 5 Nov 2024 13:56:15 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Rq0x81hz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of aliceryhl@google.com designates 209.85.221.47 as permitted sender) smtp.mailfrom=aliceryhl@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730814849; a=rsa-sha256; cv=none; b=BpA3EgDqmKtWApmX15pFlR+FzNEQ1vGiNFNZtl20nTJIury8fL45b1gNc2c8Bblt91gU31 I7F0Rw7jWKKSxAAvcFKIeE+OgVuzyssmm21WPIb91cx8n0LGonp3/VrClMMK73is4otqOT XP6wHpOEmPU5AY/munJuCyWijsy8IIg= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Rq0x81hz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of aliceryhl@google.com designates 209.85.221.47 as permitted sender) smtp.mailfrom=aliceryhl@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730814849; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DzlpItDjbomglGLyYv6ZYQR5XIRTbsFKXOc4xD6qxj0=; b=vyyTHTn/OXgNslIgWzzyM3qINzLyE/y9zs7voXFIiKYQi+ecB0gbyOH2LB8jKycmFLlVIy 3lNmXJZOhSxGkdbXzVEu6Gfg7eNbC+EgizU+LQNJhw1smRuVovqNcGVcCE1adHCOtpxHSV dPu5EZxvBA1CaKR8AmAE/65bw2VqkVo= Received: by mail-wr1-f47.google.com with SMTP id ffacd0b85a97d-37d4a5ecc44so3638725f8f.2 for ; Tue, 05 Nov 2024 05:56:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730815016; x=1731419816; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=DzlpItDjbomglGLyYv6ZYQR5XIRTbsFKXOc4xD6qxj0=; b=Rq0x81hzjomRqgaOb5Ps26RLWEVeFRka77Hw2hu47+rJLut67ErbXcia0gmzhYHnAZ jsdcZq+p71MMC218HGn6frIjL2WA+255YOYp8LFLJMa3Jd9oCB95OgvllHU3W2a1ESgJ PBprZmZbmKocpwSSwEIGoOszNo/0lZxNqzGrHfgL8JHbaH2YBJaB5HeELHHRv8CeMXbk r9Lq/3Z8U5N5KkFWiEbIDlFIOBhj6kMQ9c9Q36upBgLmhq04GtXWCrR0FIkXZmr4Tu4F FMOM9Fb8WF02usfwXmwRehwAWDkQ6/bE6Vz43+WZr1TuCjG6DSEthNBeXGwgJ8CT/ITW ugYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730815016; x=1731419816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DzlpItDjbomglGLyYv6ZYQR5XIRTbsFKXOc4xD6qxj0=; b=BAl7xojIAgNWgqFKw76m3T/VctoGIlVwqN4Fyf5C4yyVnzBaeib+dk5DZoPHdHiYVq 1EbX7t7L+i9hK0qeFUqFMfTpvAy13QNc4l8nblEUknWH43KBkdtBAplPdvpq/9KWLf1f W2p3rGLXUdcR7AscaGznR2LrvDO/cIuhTdSxKIiPGfGoHWhm5tNLeFldhhZCMejuVc4a wL9f7T8EII4oGyz6RT8Beu/cevsRQK2QhUKedh0khyon6F6qp4s7Xn/m3EFEDbMIFsQ9 WbcU34kJgEB3RoiGnbbkUTY9SSPTGX867mVKt55Oxa0ss4P8+LyMGdfQvbe9OFi+Tvtw LX/Q== X-Forwarded-Encrypted: i=1; AJvYcCXoUA2z73UV+uqxgsD+KXsHGnGMMiUh7aZsz0jugZzN9TLvVgUquN952VAH/CA1kwbMjZkisKtT6A==@kvack.org X-Gm-Message-State: AOJu0YxzbfooeHkQgizFf7I3gc5jdJwhnxY+eq5YYKGPL5ksWXmw7CW/ nknnI9AzC25cVj3R3V7dyj6IQY9SJlVlhrHEMIF4NkVcFNLiD7mXWShat6l4VUzSTIeJJitJbnN 3/03oHdyah7rn8JbXZWPVAMAh/aTLjXkOzQnr X-Google-Smtp-Source: AGHT+IGfGtcBSbUN4qfO7S8deLzFzsXT/Kon05UIs3myi6cEll38dO8bx64onPiGM+Aa+P5MdR7PO/Cmb45AkaZsNCM= X-Received: by 2002:a05:6000:11cb:b0:37d:3f81:153e with SMTP id ffacd0b85a97d-380611282d9mr23992523f8f.14.1730815015642; Tue, 05 Nov 2024 05:56:55 -0800 (PST) MIME-Version: 1.0 References: <20241101185033.131880-1-lorenzo.stoakes@oracle.com> In-Reply-To: From: Alice Ryhl Date: Tue, 5 Nov 2024 14:56:43 +0100 Message-ID: Subject: Re: [RFC PATCH] docs/mm: add VMA locks documentation To: Lorenzo Stoakes Cc: Jonathan Corbet , Andrew Morton , "Liam R . Howlett" , Vlastimil Babka , Jann Horn , Boqun Feng , Matthew Wilcox , Mike Rapoport , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Suren Baghdasaryan , linux-doc@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: kqhxmxjoykzddijea6i6n391mrsmehjw X-Rspamd-Queue-Id: BF8951A001C X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1730814975-354726 X-HE-Meta: U2FsdGVkX1+Ujn5sLuPyg8/ZpZkXee12tARddjBbYzFRhH2knmbyUxHcgVJjTzilAsbIgW5hIwgfTbhjGHSzf6KZ7b/igEkizBeJ3efP0iMXZd+NlQkJo1l9Y7YksJf5CHwbYtuuHTvPzIkek52tQJiVJ2rGAR2np4osmb40YphFRzbOyAjXPoubM37PC3GeR7hwmy18qBrFJdmigw+rV99WDALmHQ/qjqt8Iab8g2wxI5Hj75cUlQ5M8l5Kl9RuSQR9W9/CoRBn4/8hbbLvur4wFjWKBn+u6ksqu2+iR7ijJB1xnWbhcYEdwnLd5OSCiPyRvE/5A7vvnzPbt68l8hjCeZloihi5S3ng5TGXica42Nf/PrIdvzlXelt4WRInkYl/hqbFBXp29sfWjL/Z9xUMlcv1nOw+Wym5SW8NTuXLjvWlhb5zO5jfYmewKqINK5pwsM7xj0GoQ+pqVqg7rpmRToTsDRO0DQAcpI3ozM9tnOoXUUsWsHPg9BdliTluNY4Z+ArC2Blruh2VUc/Z2H9/aIK2Xpl9PtidYOaCMCJoMriNjaHB25v8uY8wIIVLv5ykuJWVfKnwSv+q9MfrAH3TAasGCUl0InQw0794SNo14kX5beSWPK+A8TuqCu9JBTO7ATHtdkHUJ0+CEzJLP38w+zoFTw4FDmNqtwt1u7HPeje5Kr0TC0BCC3eB80ijJXl9MsRGjw6wXZPBOmGg4cuojgO7zyB+CUAc7NxPEPzka731J361nNfdANe8IOXwpFxR0kp7ccSP/IcUHLokMdf71dTG7k7zguv7qj5kyAtLCf7FyQNoq+V+W1X0mYluwVmx6pE6io/Q9CUX35UJraEcSffx6v1GPQIpfaTVIJKHuKiM2JA8uOeF4VUuo+lKnz4WrdLd4aFuZPXhSAPGlkPHNsZL+NAM1+5PW9GZqCX0EYBtOWSfTwzXwj4l2/8NqUYqVEJnPZJMHsXR7k1 MWW4Mfer HFpxDk2kEL0aB/TXZkMxR6+4XPsZZxWMrj5zuRPCtD6qBSTPxZKeJKukPhfQVMGlgMZzwBawcYOB++cuqiGZ5J/Zoj2jAqZN434K4xihKhJkWQaGZwzHM1Ap6BlZx+UFay2YS+hyX+R0xzxkFwqI0jo3JMirK0h9QZ4bPRwPKXsjYZUSG/tqaW8Dl/S/0cxxnTx9UJHSjuhcVsmgcAfstm8WSIRtqhyosw/qrjuN0y0kycxmM1ck+CCrkLpVKLR+JFN4vB0b9X/GJB+OzKMt5nW0Am/4pW2k/mgBM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 4, 2024 at 5:52=E2=80=AFPM Lorenzo Stoakes wrote: > > +cc Suren, linux-doc who I mistakenly didn't cc in first email! > > On Mon, Nov 04, 2024 at 03:47:56PM +0100, Alice Ryhl wrote: > > On Fri, Nov 1, 2024 at 7:50=E2=80=AFPM Lorenzo Stoakes > > wrote: > > > > > > Locking around VMAs is complicated and confusing. While we have a num= ber of > > > disparate comments scattered around the place, we seem to be reaching= a > > > level of complexity that justifies a serious effort at clearly docume= nting > > > how locks are expected to be interacted with when it comes to interac= ting > > > with mm_struct and vm_area_struct objects. > > > > > > This is especially pertinent as regards efforts to find sensible > > > abstractions for these fundamental objects within the kernel rust > > > abstraction whose compiler strictly requires some means of expressing= these > > > rules (and through this expression can help self-document these > > > requirements as well as enforce them which is an exciting concept). > > > > > > The document limits scope to mmap and VMA locks and those that are > > > immediately adjacent and relevant to them - so additionally covers pa= ge > > > table locking as this is so very closely tied to VMA operations (and = relies > > > upon us handling these correctly). > > > > > > The document tries to cover some of the nastier and more confusing ed= ge > > > cases and concerns especially around lock ordering and page table tea= rdown. > > > > > > The document also provides some VMA lock internals, which are up to d= ate > > > and inclusive of recent changes to recent sequence number changes. > > > > > > Signed-off-by: Lorenzo Stoakes > > > > [...] > > > > > +Page table locks > > > +---------------- > > > + > > > +When allocating a P4D, PUD or PMD and setting the relevant entry in = the above > > > +PGD, P4D or PUD, the `mm->page_table_lock` is acquired to do so. Thi= s is > > > +acquired in `__p4d_alloc()`, `__pud_alloc()` and `__pmd_alloc()` res= pectively. > > > + > > > +.. note:: > > > + `__pmd_alloc()` actually invokes `pud_lock()` and `pud_lockptr()`= in turn, > > > + however at the time of writing it ultimately references the > > > + `mm->page_table_lock`. > > > + > > > +Allocating a PTE will either use the `mm->page_table_lock` or, if > > > +`USE_SPLIT_PMD_PTLOCKS` is defined, used a lock embedded in the PMD = physical > > > +page metadata in the form of a `struct ptdesc`, acquired by `pmd_ptd= esc()` > > > +called from `pmd_lock()` and ultimately `__pte_alloc()`. > > > + > > > +Finally, modifying the contents of the PTE has special treatment, as= this is a > > > +lock that we must acquire whenever we want stable and exclusive acce= ss to > > > +entries pointing to data pages within a PTE, especially when we wish= to modify > > > +them. > > > + > > > +This is performed via `pte_offset_map_lock()` which carefully checks= to ensure > > > +that the PTE hasn't changed from under us, ultimately invoking `pte_= lockptr()` > > > +to obtain a spin lock at PTE granularity contained within the `struc= t ptdesc` > > > +associated with the physical PTE page. The lock must be released via > > > +`pte_unmap_unlock()`. > > > + > > > +.. note:: > > > + There are some variants on this, such as `pte_offset_map_rw_noloc= k()` when we > > > + know we hold the PTE stable but for brevity we do not explore thi= s. > > > + See the comment for `__pte_offset_map_lock()` for more details. > > > + > > > +When modifying data in ranges we typically only wish to allocate hig= her page > > > +tables as necessary, using these locks to avoid races or overwriting= anything, > > > +and set/clear data at the PTE level as required (for instance when p= age faulting > > > +or zapping). > > > > Speaking as someone who doesn't know the internals at all ... this > > section doesn't really answer any questions I have about the page > > table. It looks like this could use an initial section about basic > > usage, and the detailed information could come after? Concretely, if I > > wish to call vm_insert_page or zap some pages, what are the locking > > requirements? What if I'm writing a page fault handler? > > Ack totally agree, I think we need this document to serve two purposes - > one is to go over, in detail, the locking requirements from an mm dev's > point of view with internals focus, and secondly to give those outside mm > this kind of information. > > It's good to get insight from an outside perspective as inevitably we mm > devs lose sight of the wood for the trees when it comes to internals > vs. practical needs of those who make use of mm in one respect or another= . > > So this kind of feedback is very helpful and welcome :) TL;DR - yes I wil= l > explicitly state what is required for various operations on the respin. > > > > > Alice > > As a wordy aside, a large part of the motivation of this document, or > certainly my prioritisation of it, is explicitly to help the rust team > correctly abstract this aspect of mm. > > The other part is to help the mm team, that is especailly myself, correct= ly > understand and _remember_ the numerous painful ins and outs of this stuff= , > much of which has been pertinent of late for not wonderfully positive > reasons. > > Hopefully we accomplish both! :>) I do think this has revealed one issue with my Rust patch, which is that VmAreaMut currently requires the mmap lock, but it should also require the vma lock, since you need both for writing. Alice