From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F26DEECAAA1 for ; Sun, 30 Oct 2022 19:34:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 657026B0071; Sun, 30 Oct 2022 15:34:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 606666B0073; Sun, 30 Oct 2022 15:34:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4A6D28E0001; Sun, 30 Oct 2022 15:34:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 3A87E6B0071 for ; Sun, 30 Oct 2022 15:34:57 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 055B31203C0 for ; Sun, 30 Oct 2022 19:34:56 +0000 (UTC) X-FDA: 80078618634.15.5DD658C Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) by imf30.hostedemail.com (Postfix) with ESMTP id 976FA8000A for ; Sun, 30 Oct 2022 19:34:56 +0000 (UTC) Received: by mail-pg1-f171.google.com with SMTP id f193so9083454pgc.0 for ; Sun, 30 Oct 2022 12:34:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3rRRU1s0zqHNH3Op2fFOiDASnNIOpIdPdkq7obGplwM=; b=aSyq77AYkfRACdpt7kfHTPFDClwHxTPC6OSBA1JZ84Iq2F/Lr+DgvXbiivKtzikCzT 3MJ9gjOASoSpmEjTs2YdsBaSBHPPwzjxUliBKwDidY7Wv9l1A/+0BKVzEh6Uv2rwRJF0 TQPWOzvF6YF+4G13b8QozqjpSdHSRq+QzSztLi+z0nKrHFNTEQg08qDgTg/arYyD6Ff0 awp/YVDIADj6tv9JuHPXfSQM6cY2hCriO3IABrmawstjrq/6zIR/4ggpu/uOvpT6QmWS mj4Xl6nfXEhLMIABAfqcbUj/Vz0eyMZWOu6xv2vox3tCjIhV5ZDtor6Ob1Ri1QFllqbC F/WA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3rRRU1s0zqHNH3Op2fFOiDASnNIOpIdPdkq7obGplwM=; b=R3EOORAzDLAklqnb9FTNtHOZYx1cqahQcLhQ4wI97pSbS0QYbpfQ+pwFs0a7E6j734 s/VGYLF7HHkZStR+qZPLFU8XFr2yuNSfycttC4mZomnIwBYhVbk8iiEfBI8UFR5hNmgt GKEDfXFuRthWYf+rHC1rs6eO3gUdJr5m7e1PZKPWUI4kl3ddwWbBVPj92bdTmXr0OD18 WZJ9GVvmVh48lIN9omO/p2pxV2k8P9u+SJ2bh8/Cln/Be46cwa7HAxjqYT8zy+LelmWv CG1oCxM/kUogVsapGberrtmzsKhFKXbP75HoAxC0UUKyF1w1bYewr0CfS4LrW4RvvEkj Hisg== X-Gm-Message-State: ACrzQf3S+yzNklJ8KWLA5OQ2FcvcVcDJNMZAKFqSjE/Rx+apBuw9N5gs eqwxBlfzCMch3SgM7eXcZvg= X-Google-Smtp-Source: AMsMyM57Hi5gkRbPJ2bKlPoUgGDAFHMpWMehFFthps4Gsty5oasp0vMb8BDX3dW0FL3f7Gr4JYtUxw== X-Received: by 2002:a63:914a:0:b0:46f:7e1c:6584 with SMTP id l71-20020a63914a000000b0046f7e1c6584mr9285280pge.562.1667158495355; Sun, 30 Oct 2022 12:34:55 -0700 (PDT) Received: from smtpclient.apple (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id o18-20020a170903009200b0018691ce1696sm2993173pld.131.2022.10.30.12.34.53 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 30 Oct 2022 12:34:54 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\)) Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment From: Nadav Amit In-Reply-To: Date: Sun, 30 Oct 2022 12:34:51 -0700 Cc: Peter Zijlstra , Jann Horn , John Hubbard , X86 ML , Matthew Wilcox , Andrew Morton , kernel list , Linux-MM , Andrea Arcangeli , "Kirill A . Shutemov" , jroedel@suse.de, ubizjak@gmail.com, Alistair Popple Content-Transfer-Encoding: quoted-printable Message-Id: <44A8D373-24CA-4777-AFC8-DB48F0DC4FAE@gmail.com> References: <20221022111403.531902164@infradead.org> <20221022114424.515572025@infradead.org> <2c800ed1-d17a-def4-39e1-09281ee78d05@nvidia.com> <6C548A9A-3AF3-4EC1-B1E5-47A7FFBEB761@gmail.com> <47678198-C502-47E1-B7C8-8A12352CDA95@gmail.com> <140B437E-B994-45B7-8DAC-E9B66885BEEF@gmail.com> To: Linus Torvalds X-Mailer: Apple Mail (2.3696.120.41.1.1) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667158496; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3rRRU1s0zqHNH3Op2fFOiDASnNIOpIdPdkq7obGplwM=; b=kLbMX/eR8NnDXg47So7I52ngBPKU/PizuMqLcDV2x8Lg/bJ7tls/NEBvEEvgsaRE58T22Y qej2TTncx8D06pyFfbaRf9MvAMwV5vFNGQHlrz26lHZqS1QpIQNhNwOceNJp7RQ69NMZZB Iyeo9PflB8ebV4LZLHjCdJaNSh/3gTM= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=aSyq77AY; spf=pass (imf30.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.215.171 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667158496; a=rsa-sha256; cv=none; b=4Duya7ZM5yn7ggt4y/0VfNnat1LSDmMEWOCWQMc/aoLOlC6e1QSnB+k8pAeDYpK7/SrFNB G8f4R8GjHvzuxO6y5y8TAnSCdgiPN7g6f4AdKEgQ+QWgcRO1+JoazkmPdJ+UwrU5G0QCeK htwfXVdbJS654y9Y4APK9EqrBu+N0Ls= X-Stat-Signature: gmtinqpsohcx86sothss6at1wqfo5zkc X-Rspamd-Queue-Id: 976FA8000A Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=aSyq77AY; spf=pass (imf30.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.215.171 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1667158496-514380 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Oct 30, 2022, at 11:19 AM, Linus Torvalds = wrote: > And page_remove_rmap() could *almost* be called later, but it does > have code that also depends on the page table lock, although it looks > like realistically that's just because it "knows" that means that > preemption is disabled, so it uses non-atomic statistics update. >=20 > I say "knows" in quotes, because that's what the comment says, but it > turns out that __mod_node_page_state() has to deal with CONFIG_RT > anyway and does that >=20 > preempt_disable_nested(); > ... > preempt_enable_nested(); >=20 > thing. >=20 > And then it wants to see the vma, although that's actually only to see > if it's 'mlock'ed, so we could just squirrel that away. >=20 > So we *could* move page_remove_rmap() later into the TLB flush region, > but then we would have lost the page table lock anyway, so then > folio_mkclean() can come in regardless. >=20 > So that doesn't even help. Well, if you combine it with the per-page-table stale TLB detection mechanism that I proposed, I think this could work. Reminder (feel free to skip): you would have per-mm =E2=80=9Ccompleted TLB-generation=E2=80=9D in addition to the current one, which would be = renamed to =E2=80=9Cpending TLB-generation=E2=80=9D. Whenever you update the = page-tables in a manner that might require a TLB flush, you would increase the =E2=80=9Cpending TLB-generation=E2=80=9D and save the pending TLB-generation in the = page-table=E2=80=99s page-struct. All of that is done once under the page-table lock. When = you finish a TLB-flush, you update the =E2=80=9Ccompleted TLB-generation=E2=80= =9D. Then on page_vma_mkclean_one(), you would check if the page-table=E2=80=99= s TLB-generation is greater than the completed TLB-generation, which would indicate that TLB entries for PTEs in this table might be stale. In that case you would just flush the TLB. [ Of course you can instead just = flush if mm_tlb_flush_pending(), but nobody likes this mechanism that has a very coarse granularity, and therefore can lead to many unnecessary TLB = flushes. ] Indeed, there would be potentially some overhead in extreme cases, since mm's TLB-generation since its cache is already highly-contended in = extreme cases. But I think it worth it to have simple logic that allows to = reason about correctness. My intuition is that although you appear to be right that we can just = mark this case as =E2=80=9Cextreme case nobody cares about=E2=80=9D, it might = have now or in the future some other implications that are hard to predict and prevent.