From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8A66ECAAA1 for ; Mon, 31 Oct 2022 17:26:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2568D6B0075; Mon, 31 Oct 2022 13:26:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1DFA76B007B; Mon, 31 Oct 2022 13:26:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 080FC8E0001; Mon, 31 Oct 2022 13:26:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E4A166B0075 for ; Mon, 31 Oct 2022 13:26:48 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B0C12C0B08 for ; Mon, 31 Oct 2022 17:26:48 +0000 (UTC) X-FDA: 80081924496.16.9256213 Received: from mail-oa1-f47.google.com (mail-oa1-f47.google.com [209.85.160.47]) by imf15.hostedemail.com (Postfix) with ESMTP id 4CE2BA0011 for ; Mon, 31 Oct 2022 17:26:48 +0000 (UTC) Received: by mail-oa1-f47.google.com with SMTP id 586e51a60fabf-13ba86b5ac0so14214306fac.1 for ; Mon, 31 Oct 2022 10:26:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hD1Ogw5vFYWjfneHEDapbTXtrffLP/E4Nia3oOQ8AoU=; b=f0RU/VFeJ5For2WO7vDA7iIYnLqEaKTc1Lu+Sqwru2eHZ1BUFcvUTeCJDJqpztdBJz Q/PTWoHIaFg/GB7Own09eaHoSjsljb+/OLn3aHJiqk3rkksti+XYFdWc4Abc0trCKxRT aAu2j6YLmIKVA5dcfpud7vVJ2diaqeZHqDiZw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hD1Ogw5vFYWjfneHEDapbTXtrffLP/E4Nia3oOQ8AoU=; b=2DV8xzThiRZ8OAupLq5N0/QFFMvYtQG7pTUi1LqFKzTyuO2le9sYZ0epZhQmGjQq0R Eu1WyWBEWDKFsLdjn9gFYV1Mp9rk+WB2uqW/0C+Jug5Dr9Cmrptqb7dZssoUzHRRuQuU JdNuJkF0HPZHXJxbWAUEBV/2rRlQpQmV4fI0T51X9tnNppVMCJwwRH4ENi9lrQXXavm2 8pD1RICtbQuyyc00ng9r3kz1MuT8/PQj79v/pDJxi7VHIgwuX6J0YoaqKGyhJ7WRctLt Uyzvek2brGq+JWnP6U0ya3aIOqyS4u7Q+DHtbgc1mf5d2IkaK73smcE19lXldJeeDL9u 8M9g== X-Gm-Message-State: ACrzQf3gcIv+z1TrsgCNqmy+Su2YufQNGI2enNijTa8wdCxvENX82AVE w/Q06he9SzwnFc0x3t/4PWfy3jb+q/kdew== X-Google-Smtp-Source: AMsMyM6hYvKc2N+qQwPVrkOIPdbkQvn8hhy/m4fdIlGAuNftoc8lhYCMrDPkt1+6AvVp1Ex/DClIHA== X-Received: by 2002:a05:6870:b407:b0:13a:f0b9:7c13 with SMTP id x7-20020a056870b40700b0013af0b97c13mr8227268oap.212.1667237207003; Mon, 31 Oct 2022 10:26:47 -0700 (PDT) Received: from mail-ot1-f52.google.com (mail-ot1-f52.google.com. [209.85.210.52]) by smtp.gmail.com with ESMTPSA id fp42-20020a05687065aa00b0013bc95650c8sm3280819oab.54.2022.10.31.10.26.46 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 31 Oct 2022 10:26:46 -0700 (PDT) Received: by mail-ot1-f52.google.com with SMTP id r13-20020a056830418d00b0065601df69c0so7120480otu.7 for ; Mon, 31 Oct 2022 10:26:46 -0700 (PDT) X-Received: by 2002:a25:bb02:0:b0:6ca:9345:b2ee with SMTP id z2-20020a25bb02000000b006ca9345b2eemr2591588ybg.362.1667236799371; Mon, 31 Oct 2022 10:19:59 -0700 (PDT) MIME-Version: 1.0 References: <47678198-C502-47E1-B7C8-8A12352CDA95@gmail.com> <140B437E-B994-45B7-8DAC-E9B66885BEEF@gmail.com> In-Reply-To: From: Linus Torvalds Date: Mon, 31 Oct 2022 10:19:43 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment To: Peter Zijlstra Cc: Nadav Amit , Jann Horn , John Hubbard , X86 ML , Matthew Wilcox , Andrew Morton , kernel list , Linux-MM , Andrea Arcangeli , "Kirill A . Shutemov" , jroedel@suse.de, ubizjak@gmail.com, Alistair Popple Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667237208; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hD1Ogw5vFYWjfneHEDapbTXtrffLP/E4Nia3oOQ8AoU=; b=xZY9fnFBr0wGegFwGf+Ha4mLqO4UVAH7Fbqprj3J2yhAdERqTrZl/Jaio8dnqN2Redox3E l7/x8l4YRO5Hzl4BwvnONCa+3rvCC+dp59sQ5lAIDcHl55BjwOxNe1fj+tb7J2cEq3yj2e i4hOtIy2WpKTq9lo7r8YF48ep2LzrNs= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b="f0RU/VFe"; dmarc=none; spf=pass (imf15.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.160.47 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667237208; a=rsa-sha256; cv=none; b=3yxgtF9KLjTgP6VyBiSCs4Oh7CdA8EvbgIG+x5gOipAaBbnYNUwyE7fsPCJi8HkyTZLUfQ zaMx4WQHjpFij4lu4hI9i7qdx0hCFVXyMRF1iYshFXS1xf93txtrM81QNVey5BiLcDBrCk VJFDqDztiVeNLSa24qi/Jk14MfpgKl4= X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 4CE2BA0011 Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b="f0RU/VFe"; dmarc=none; spf=pass (imf15.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.160.47 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org X-Stat-Signature: jrdiby6oyjb6xmjkr4a8diidhyq9b3mm X-Rspam-User: X-HE-Tag: 1667237208-493552 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 31, 2022 at 2:29 AM Peter Zijlstra wrote: > > On Sun, Oct 30, 2022 at 03:47:36PM -0700, Linus Torvalds wrote: > > > + * This is the simplified form of page_remove_rmap(), that only > > + * deals with last-level pages, so 'compound' is always false, > > + * and the caller does 'munlock_vma_page(page, vma, compound)' > > + * separately. > > + * > > + * This allows for a much simpler calling convention and code. > > + * > > + * The caller holds the pte lock. > > + */ > > +void page_zap_pte_rmap(struct page *page) > > +{ > > One could consider adding something like: > > #ifdef USE_SPLIT_PTE_PTLOCKS > lockdep_assert_held(ptlock_ptr(page)) > #endif Yes, except that the page lock goes away in the next few patches and gets replaced by just using the safe dec_lruvec_page_state() instead, so it's not really worth it. > > + if (!atomic_add_negative(-1, &page->_mapcount)) > > + return; > > + > > + lock_page_memcg(page); > > + __dec_lruvec_page_state(page, > > + PageAnon(page) ? NR_ANON_MAPPED : NR_FILE_MAPPED); > > + unlock_page_memcg(page); > > +} > > Took me a little while, but yes, .compound=false seems to reduce to > this. Yeah - it's why I kept that thing as three separate patches, because even if each of the patches isn't "immediately obvious", you can at least go back and follow along and see what I did. The *full* simplification end result just looks like magic. Admittedly, I think a lot of that "looks like magic" is because the rmap code has seriously cruftified over the years. We had that time when we actually Go back a decade, and we literally used to do pretty much exactly what the simplified form does. The transformation to complexity hell starts with commit 89c06bd52fb9 ("memcg: use new logic for page stat accounting"), but just look at what it looked like before that: git show 89c06bd52fb9^:mm/rmap.c gets you the state back when it was simple. And look at what it did: void page_remove_rmap(struct page *page) { /* page still mapped by someone else? */ if (!atomic_add_negative(-1, &page->_mapcount)) return; ... statistics go here .. so in the end the simplified form of this page_zap_pte_rmap() really isn't *that* surprising. In fact, that old code handled PageHuge() fairly naturally too, and almost all the mess comes from the memcg accounting - and locking - changes. And I actually really wanted to one step further, and try to then batch up the page state accounting too. It's kind of stupid how we do all that memcg locking for each page, and with this new setup we have one nice array of pages that we *could* try to just batch things with. The pages in normal situations *probably* mostly all share the same memcg and node, so just optimistically trying to do something like "as long as it's the same memcg as last time, just keep the lock". Instead of locking and unlocking for every single page. But just looking at it exhausted me enough that I don't think I'll go there. Put another way: after this series, it's not the 'rmap' code that makes me go Ugh, it's the memcg tracking.. (But the hugepage rmap code is incredibly nasty stuff still, and I think the whole 'compound=true' case would be better with somebody taking a look at that case too, but that somebody won't be me). Linus