From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9D54C4332F for ; Sun, 6 Nov 2022 22:42:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB37B8E0002; Sun, 6 Nov 2022 17:42:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E638A8E0001; Sun, 6 Nov 2022 17:42:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2B3D8E0002; Sun, 6 Nov 2022 17:42:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C0F838E0001 for ; Sun, 6 Nov 2022 17:42:56 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 99230AAAFE for ; Sun, 6 Nov 2022 22:42:56 +0000 (UTC) X-FDA: 80104493952.12.9E25AAA Received: from mail-oa1-f41.google.com (mail-oa1-f41.google.com [209.85.160.41]) by imf27.hostedemail.com (Postfix) with ESMTP id 39BAD40004 for ; Sun, 6 Nov 2022 22:42:56 +0000 (UTC) Received: by mail-oa1-f41.google.com with SMTP id 586e51a60fabf-13d9a3bb27aso10194421fac.11 for ; Sun, 06 Nov 2022 14:42:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ZMovLP0zi5Hy9M5Np2yrgkMh7vSv6N1lqtVs8OK/j6M=; b=SFFtpaaFdKF014jghg/vqNT3X4Waz9R9v3wN9nf0iRah92p1UUnQbcxIoHsPXF1RLq cK+uyn4DmRxuQl+k0wyeRO+yTDJgIff6MXt4wr6tsRo71fWvG3YluXBbLG+G8JrDK2Mc Z6GjqaAaevseb04xMi+8briB+EYfhb4ZaGah4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZMovLP0zi5Hy9M5Np2yrgkMh7vSv6N1lqtVs8OK/j6M=; b=LPIXfj8MRamUer/V1x9FDQfKqBHv4tmeZt4qRN8OidmVLqwroH86kMvajF0/4rAXT5 payZscqIdlTbR+o3FsHWZ/pSWYxsYecKp0HUi/+cnKJ4WE2WhCA1V2smOHJfPm78XBTU wK2mFvQk7Jq3m417uHvabv5dm9kSY3AHhA1pNjAy1B0Y1uMKMqHuItEKaYdt993nxdD6 IRaG4mydlfE1ZUR2UvFPRsozeCcKFiofUwZnPGNnVSpVEJAliHpQ/t1dh3s7kePSnu/G WbeRDtUz2MyYrApvbwtjRI0gxks+1SHWRvv+OL4JOTDbXx86wgdCZGX9c6AS8QOCb+uO pTsQ== X-Gm-Message-State: ACrzQf0z9CtjH/hYVDo7oWEUZmynDAz3cxucTB/0nncwFEIeCJA4n+De rF+a2cVk3fgjXgeyn5tEzY8K8s6gtrqzfQ== X-Google-Smtp-Source: AMsMyM6nr3JS65Snq44jpaVWgcaKRcEoN3q1+dH/Z3RJmszyXwluc2xILq2KWRKqY6sdgEJprIUHDQ== X-Received: by 2002:a05:6870:4795:b0:136:c345:19d9 with SMTP id c21-20020a056870479500b00136c34519d9mr27357955oaq.203.1667774574986; Sun, 06 Nov 2022 14:42:54 -0800 (PST) Received: from mail-ot1-f54.google.com (mail-ot1-f54.google.com. [209.85.210.54]) by smtp.gmail.com with ESMTPSA id m6-20020a4a9506000000b0049201e2b8f4sm1711391ooi.4.2022.11.06.14.42.54 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 06 Nov 2022 14:42:54 -0800 (PST) Received: by mail-ot1-f54.google.com with SMTP id j25-20020a056830015900b0066ca2cd96daso2081945otp.10 for ; Sun, 06 Nov 2022 14:42:54 -0800 (PST) X-Received: by 2002:a05:6902:1352:b0:6bb:3f4b:9666 with SMTP id g18-20020a056902135200b006bb3f4b9666mr42872513ybu.101.1667774108259; Sun, 06 Nov 2022 14:35:08 -0800 (PST) MIME-Version: 1.0 References: <140B437E-B994-45B7-8DAC-E9B66885BEEF@gmail.com> <8a1e97c9-bd5-7473-6da8-2aa75198fbe8@google.com> In-Reply-To: <8a1e97c9-bd5-7473-6da8-2aa75198fbe8@google.com> From: Linus Torvalds Date: Sun, 6 Nov 2022 14:34:51 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: mm: delay rmap removal until after TLB flush To: Hugh Dickins Cc: Johannes Weiner , Stephen Rothwell , Alexander Gordeev , Peter Zijlstra , Will Deacon , Aneesh Kumar , Nick Piggin , Heiko Carstens , Vasily Gorbik , Christian Borntraeger , Sven Schnelle , Nadav Amit , Jann Horn , John Hubbard , X86 ML , Matthew Wilcox , Andrew Morton , kernel list , Linux-MM , Andrea Arcangeli , "Kirill A . Shutemov" , Joerg Roedel , Uros Bizjak , Alistair Popple , linux-arch Content-Type: text/plain; charset="UTF-8" ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=SFFtpaaF; spf=pass (imf27.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.160.41 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667774576; a=rsa-sha256; cv=none; b=jetCmZIuSXImbp6/L6MZF925kSjhUHIzFKYMr5CWBCEuxPkanjT5WnNnXFRPd/rVvd5pm3 5Y13V8eIrjuj23Z0k10KzQcqdsOzBoCJZWKi/HFyWozWp9xsSXMXwQ55byP6tHJA6nu52W ZTdCnYJ7WP2466BOAydwbKzHYVVd2sQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667774576; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZMovLP0zi5Hy9M5Np2yrgkMh7vSv6N1lqtVs8OK/j6M=; b=Gcim6i6J54lHzNalcelrm7HET++VS7XHZ6O6AMj5HeZBrlB8Y1tYSj/cE41lvODW+RM8WN c4BO3MzrhWMO2CE+WLXcxBlvanw+TZ2FaUIccs7PwYjyRw/Hw0Kn8vtFbTT+q+ksaPki0f tk2WxmiV/WD0PoUpKbv2AxFqLpLtxFE= X-Stat-Signature: qp6d4uug963c19fc18xgespa8r9hnbot X-Rspamd-Queue-Id: 39BAD40004 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=SFFtpaaF; spf=pass (imf27.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.160.41 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none X-Rspamd-Server: rspam05 X-Rspam-User: X-HE-Tag: 1667774576-476414 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: [ Editing down to just the bare-bones problem cases ] On Sun, Nov 6, 2022 at 1:06 PM Hugh Dickins wrote: > > anon_vma (bad) > -------------- > > See folio_lock_anon_vma_read(): folio_mapped() plays a key role in > establishing the continued validity of an anon_vma. See comments > above folio_get_anon_vma(), some by me but most by PeterZ IIRC. > > I believe what has happened is that your patchset has, very intentionally, > kept the page as "folio_mapped" until after free_pgtables() does its > unlink_anon_vmas(); but that is telling folio_lock_anon_vma_read() > that the anon_vma is safe to use when actually it has been freed. > (It looked like a page table when I peeped at it.) > > I'm not certain, but I think that you made page_zap_pte_rmap() handle > anon as well as file, just for the righteous additional simplification; > but I'm afraid that (without opening a huge anon_vma refcounting can of > worms) that unification has to be reverted, and anon left to go the > same old way it did before. Indeed. I made them separate initially, just because the only case that mattered for the dirty bit was the file-mapped case. But then the two functions ended up being basically the identical function, so I unified them again. But the anonvma lifetime issue looks very real, and so doing the "delay rmap only for file mappings" seems sane. In fact, I wonder if we should delay it only for *dirty* file mappings, since it doesn't matter for the clean case. Hmm. I already threw away my branch (since Andrew picked the patches up), so a question for Andrew: do you want me to re-do the branch entirely, or do you want me to just send you an incremental patch? To make for minimal changes, I'd drop the 're-unification' patch, and then small updates to the zap_pte_range() code to keep the anon (and possibly non-dirty) case synchronous. And btw, this one is interesting: for anonymous (and non-dirty file-mapped) patches, we actually can end up delaying the final page free (and the rmap zapping) all the way to "tlb_finish_mmu()". Normally we still have the vma's all available, but yes, free_pgtables() can and does happen before the final TLB flush. The file-mapped dirty case doesn't have that issue - not just because it doesn't have an anonvma at all, but because it also does that "force_flush" thing that just measn that the page freeign never gets delayed that far in the first place. > mm-unstable (bad) > ----------------- > Aside from that PageAnon issue, mm-unstable is in an understandably bad > state because you could not have foreseen my subpages_mapcount addition > to page_remove_rmap(). page_zap_pte_rmap() now needs to handle the > PageCompound (but not the "compound") case too. I rushed you and akpm > an emergency patch for that on Friday night, but you, let's say, had > reservations about it. So I haven't posted it, and while the PageAnon > issue remains, I think your patchset has to be removed from mm-unstable > and linux-next anyway. So I think I'm fine with your patch, I just want to move the memcg accounting to outside of it. I can re-do my series on top of mm-unstable, I guess. That's probably the easiest way to handle this all. Andrew - can you remove those patches again, and I'll create a new series for you? Linus