From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48751C38A02 for ; Sat, 29 Oct 2022 00:43:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B1316B0072; Fri, 28 Oct 2022 20:43:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 838E46B0073; Fri, 28 Oct 2022 20:43:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6B2C96B0074; Fri, 28 Oct 2022 20:43:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 5586A6B0072 for ; Fri, 28 Oct 2022 20:43:00 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1555F1A0366 for ; Sat, 29 Oct 2022 00:43:00 +0000 (UTC) X-FDA: 80072137320.22.AF4C8E8 Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) by imf21.hostedemail.com (Postfix) with ESMTP id B578A1C0006 for ; Sat, 29 Oct 2022 00:42:59 +0000 (UTC) Received: by mail-qt1-f176.google.com with SMTP id a27so1220768qtw.10 for ; Fri, 28 Oct 2022 17:42:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=UoT94j6DZH53rf5IYTyRvpqIs59NBO4pO7++wqZmn18=; b=Amx8rn61XHcImW4r+dJE2U388Yq/l/dtEA79H3HKWr5xbhx0X9uGz0UE8MpLuqr6/Q gugVpQEKbUBAOecxZ8iW8mOxtp9aryEcH6WPVLGziqEDtex20VtNM59PcChAf/TzITyd paEEPJ9rh5IQeIeQ+SBPrnAi2XUa2R69rbSaY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UoT94j6DZH53rf5IYTyRvpqIs59NBO4pO7++wqZmn18=; b=ho82hjJRJBU4X1AE4Jk0+GBIpZYgoSXLbsVFUC75X6gLS9aKSBi2EqlolOl1SyCy5l uqTCB2zL0tAlMP9fFGHpXd3lKx/tflLdny1bkv6wh50/UClpY7l5JZP1Rob6sqpQgvJC WK4Z4oWyr5/SG5pUPklc1GLb8BhBuPdHkZhq/rrgURCCRGTuIgfJO1tw95s8aSExJlaH KMFPDfBT8BeGuBnyBFXSomcdYaS59xW1MWQ1GNIfs97fsm7W5wGg5NyjamqKvCX7noXj z05tgbMOEPwSj4cxjBIW02ZP3fi5EdeY1HdMMjjdgqMDBY73ROa6/yGThfOaZDC5GSlk EPXQ== X-Gm-Message-State: ACrzQf2Or8KO4fzcbWI1dyA1ePq0NcqJ3dyBi6LEKAXDtjqHLmMe4sbS nTYcozffyl2PuFv8s58IS9ccxOy8ck/blw== X-Google-Smtp-Source: AMsMyM4nNjZaDiJMRDtwF9kgCAaxOA8q1C7hwnNo1nyabkA39lDQ+edLSOD/hMY8fEEvTOMSu/gqlA== X-Received: by 2002:a05:622a:4106:b0:3a5:108b:7858 with SMTP id cc6-20020a05622a410600b003a5108b7858mr1345725qtb.55.1667004178605; Fri, 28 Oct 2022 17:42:58 -0700 (PDT) Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com. [209.85.128.175]) by smtp.gmail.com with ESMTPSA id gb11-20020a05622a598b00b003999d25e772sm68835qtb.71.2022.10.28.17.42.56 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 Oct 2022 17:42:56 -0700 (PDT) Received: by mail-yw1-f175.google.com with SMTP id 00721157ae682-367b8adf788so61545837b3.2 for ; Fri, 28 Oct 2022 17:42:56 -0700 (PDT) X-Received: by 2002:a81:1902:0:b0:36b:2578:f6e2 with SMTP id 2-20020a811902000000b0036b2578f6e2mr2005164ywz.352.1667004175867; Fri, 28 Oct 2022 17:42:55 -0700 (PDT) MIME-Version: 1.0 References: <20221022111403.531902164@infradead.org> <20221022114424.515572025@infradead.org> <2c800ed1-d17a-def4-39e1-09281ee78d05@nvidia.com> <6C548A9A-3AF3-4EC1-B1E5-47A7FFBEB761@gmail.com> In-Reply-To: From: Linus Torvalds Date: Fri, 28 Oct 2022 17:42:39 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment To: Nadav Amit Cc: Peter Zijlstra , Jann Horn , John Hubbard , X86 ML , Matthew Wilcox , Andrew Morton , kernel list , Linux-MM , Andrea Arcangeli , "Kirill A . Shutemov" , jroedel@suse.de, ubizjak@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667004179; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UoT94j6DZH53rf5IYTyRvpqIs59NBO4pO7++wqZmn18=; b=ynOunziPuPZAq5P4xCpWwJtjjwHQ6cti9zRkujyI5s9EavnPNOzXafrwpzJpWn6WkJsxhl sHtrKMDQPr/2Vq/r4QUTe+RmMc6Y62OSEGqSxum9w8zGgF0KnxmPelutUgD6viTHRrskLX LsA3+rzFrNZ1b4xo+0VpmI3/r0/P8oM= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=Amx8rn61; spf=pass (imf21.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.160.176 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667004179; a=rsa-sha256; cv=none; b=WxeNdMVGu6s5I84XXIT42mTAn3fdz5OZGW57XMrdmEbqEqHUH1SAf7T5BMUVv5MYeoOX5m xJ7uICNqJ/6MZjfqqDzcnQAdiOk8niiPR7pf8+HMbiTcWYfabQx9kOeDdvbcKOgilBvdjG f0EE/fjyO63GnhyLlRt6l++V6ldfFA4= Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=Amx8rn61; spf=pass (imf21.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.160.176 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none X-Rspam-User: X-Rspamd-Queue-Id: B578A1C0006 X-Rspamd-Server: rspam03 X-Stat-Signature: rm6jqbg9g667hxye1npdgrp6nb6jxak7 X-HE-Tag: 1667004179-421385 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Oct 28, 2022 at 4:57 PM Nadav Amit wrote: > > The problem is in the following code of zap_pte_range(): > > if (!PageAnon(page)) { > if (pte_dirty(ptent)) { > force_flush =3D 1; > set_page_dirty(page); > } > =E2=80=A6 > } > page_remove_rmap(page, vma, false); > > Once we remove the rmap, rmap_walk() would not acquire the page-table loc= k > anymore. As a result, nothing prevents the kernel from performing writeba= ck > and cleaning the page-struct dirty-bit before the TLB is actually flushed= . Hah. The original reason for force_flush there was similar, with a race wrt page_mkclean() because this code doesn't take the page lock that would normally serialize things, because the page lock is so painful and ends up having nasty nasty interactions with slow IO operations. So all the page-dirty handling really *wants* to use the page lock, and for the IO side (writeback) that ends up being acceptable and works well, but from that "serialize VM state" it's horrendous. So instead the code intentionally serialized on the rmap data structures which page_mkclean() also walks, and as you point out, that's broken. It's not broken at the point where we do set_page_dirty(), but it *comes* broken when we drop the rmap, and the problem is exactly that "we still have the dirty bit hidden in the TLB state" issue that you pinpoint. I think the proper fix (or at least _a_ proper fix) would be to actually carry the dirty bit along to the __tlb_remove_page() point, and actually treat it exactly the same way as the page pointer itself - set the page dirty after the TLB flush, the same way we can free the page after the TLB flush. We could easiy hide said dirty bit in the low bits of the "batch->pages[]" array or something like that. We'd just have to add the 'dirty' argument to __tlb_remove_page_size() and friends. Hmm? Your idea of "do the page_remove_rmap() late instead" would also work, but the reason I think just squirrelling away the dirty bit is the "proper" fix is that it would get rid of the whole need for 'force_flush' in this area entirely. So we'd not only fix that race you noticed, we'd actually do so and reduce the number of TLB flushes too. I don't know. Maybe I'm missing something fundamental, and my idea is just stupid. Linus