From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90266C433FE for ; Tue, 22 Nov 2022 01:32:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA9446B0071; Mon, 21 Nov 2022 20:32:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B57EE6B0073; Mon, 21 Nov 2022 20:32:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9FA496B0074; Mon, 21 Nov 2022 20:32:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 89CF06B0071 for ; Mon, 21 Nov 2022 20:32:51 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 52888140889 for ; Tue, 22 Nov 2022 01:32:51 +0000 (UTC) X-FDA: 80159354142.09.A2874C2 Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by imf07.hostedemail.com (Postfix) with ESMTP id F27824000C for ; Tue, 22 Nov 2022 01:32:50 +0000 (UTC) Received: by mail-qt1-f177.google.com with SMTP id z6so8459066qtv.5 for ; Mon, 21 Nov 2022 17:32:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=W1WA2y8g5fhQuZnwMdhaIJwo0CEHJPMSKTSLaNEoLOM=; b=h8qYltXZdrvwN5p1AA9AAY9YC7IjoIPByGxTfBvZd7L1Cnio697HypHyozFeHG4sUj 05blWC79g0AC1gIuZhplB56hjCYx0gqfq8mNMgulNtc6QsXEwFmmROqjqz4RJ/fMxOIV 5KGAC8QEOOqFLRkQmwBFqsexDLlgxo8StA6E+q9mO9dE/LndqKX9x6cQVmJI5c7xcfNR 1DeZoH72UQo3iP4ex1wYSb9Vm0bgBqMbDDZr63OKbS1nvojupZ7xHueGfEMpLVeCcCFW YQ0I7xuDdCh3hijs/pjvTTnfugtsC/883gqWnG83L4ZhfAOvykF9usszbQgR8pQC8LbD ar/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W1WA2y8g5fhQuZnwMdhaIJwo0CEHJPMSKTSLaNEoLOM=; b=w9rAJ/RTMchOBzKDtb9jpzT0f4lQjxWg4cJtjI4g8Zuj7R/gr6kwrkHHOwxeZ7UQwB Fye86dO3RII1/9yfYR9ltazAq9XsMmKNadVP6OnnwQPCPPXt0sSW6K71zuXZiwD77oZr jjAxuS0M4nFgZfTwuJfDXoFDIYa/56W1i5TaV5D744Ih52qlHLenfnqhWQNkUwJL3DMF mE7Q9p1WQZ1aKZzMy0rXUJDPngeMsyB+bKBuFmJ/rBwzWzJuJZU6CnXS1VN5EP3yKUUN cbOffhQjoB66g7VjnpMb4HE2QXdtmll5zXCgWZzdq6cP4BYi617aDyI40HgVHNsKYr17 M91w== X-Gm-Message-State: ANoB5plhtQqsPTT+D3Yr/tSgaMk8ZF+n1oT73aAzQ4N0r9wx74LNSiA1 Gd3pTn5OGcVVJif0zmf199fgmA== X-Google-Smtp-Source: AA0mqf54wtZs225FHFwy4u2vNw7pAUL37boEkl/v7zKM5DRyM++jr9L4yfp17P5dR13GSwjWzHut/g== X-Received: by 2002:ac8:1003:0:b0:398:27cc:8c31 with SMTP id z3-20020ac81003000000b0039827cc8c31mr6935732qti.416.1669080770069; Mon, 21 Nov 2022 17:32:50 -0800 (PST) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id bw6-20020a05622a098600b003988b3d5280sm7557076qtb.70.2022.11.21.17.32.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Nov 2022 17:32:49 -0800 (PST) Date: Mon, 21 Nov 2022 17:32:38 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Johannes Weiner cc: Shakeel Butt , Hugh Dickins , Andrew Morton , Linus Torvalds , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Vlastimil Babka , Peter Xu , Yang Shi , John Hubbard , Mike Kravetz , Sidhartha Kumar , Muchun Song , Miaohe Lin , Naoya Horiguchi , Mina Almasry , James Houghton , Zach O'Keefe , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 0/3] mm,thp,rmap: rework the use of subpages_mapcount In-Reply-To: Message-ID: References: <5f52de70-975-e94f-f141-543765736181@google.com> <20221121165938.oid3pemsfkaeq3ws@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669080771; a=rsa-sha256; cv=none; b=0lnwQHypSOsy19ajwcUk5E20QTPDW0i+mMRcojXzSA/7FGl8kdZbxmnkRSMUQdMDSmJhX0 nk53bI0BqW+0Q5XZCcWOvtRAlyqS3S4uCXQU/zP9vOSLlgJui3KmqBedYPxQz/YcI1Jlak c9zLoMkKHwYrb2r0Mx1ru1BW5Zd4tWU= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=h8qYltXZ; spf=pass (imf07.hostedemail.com: domain of hughd@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669080771; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=W1WA2y8g5fhQuZnwMdhaIJwo0CEHJPMSKTSLaNEoLOM=; b=BlWqvHzTb+Y9AbrKVMxzXapHLSAFbQogMXqymfViyKEGh8EbmL24xYx+yK4s5vtGPswtAc 34gvXZvJOmAQdW1NFeMD+X0RTKSBpQ36V6hWCg1Y5yHb8yy1aytPvDKhJGp+afpPkeLI5s lwmSNKr9ZoupQgRQNey7t/SThluN2XY= X-Rspam-User: X-Rspamd-Queue-Id: F27824000C Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=h8qYltXZ; spf=pass (imf07.hostedemail.com: domain of hughd@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: twwyokrbf4wmfty7jydoq9hrjb1gj5y1 X-Rspamd-Server: rspam10 X-HE-Tag: 1669080770-355515 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 21 Nov 2022, Johannes Weiner wrote: > On Mon, Nov 21, 2022 at 04:59:38PM +0000, Shakeel Butt wrote: > > On Fri, Nov 18, 2022 at 01:08:13AM -0800, Hugh Dickins wrote: > > > Linus was underwhelmed by the earlier compound mapcounts series: > > > this series builds on top of it (as in next-20221117) to follow > > > up on his suggestions - except rmap.c still using lock_page_memcg(), > > > since I hesitate to steal the pleasure of deletion from Johannes. > > > > Is there a plan to remove lock_page_memcg() altogether which I missed? I > > am planning to make lock_page_memcg() a nop for cgroup-v2 (as it shows > > up in the perf profile on exit path) but if we are removing it then I > > should just wait. > > We can remove it for rmap at least, but we might be able to do more. I hope the calls from mm/rmap.c can be deleted before deciding the bigger picture for lock_page_memcg() itself; getting rid of it would be very nice, but it has always had a difficult job to do (and you've devoted lots of good effort to minimizing it). > > Besides rmap, we're left with the dirty and writeback page transitions > that wrt cgroups need to be atomic with NR_FILE_DIRTY and NR_WRITEBACK. > > Looking through the various callsites, I think we can delete it from > setting and clearing dirty state, as we always hold the page lock (or > the pte lock in some instances of folio_mark_dirty). Both of these are > taken from the cgroup side, so we're good there. > > I think we can also remove it when setting writeback, because those > sites have the page locked as well. > > That leaves clearing writeback. This can't hold the page lock due to > the atomic context, so currently we need to take lock_page_memcg() as > the lock of last resort. > > I wonder if we can have cgroup take the xalock instead: writeback > ending on file pages always acquires the xarray lock. Swap writeback > currently doesn't, but we could make it so (swap_address_space). It's a little bit of a regression to have to take that lock when ending writeback on swap (compared with the rcu_read_lock() of almost every lock_page_memcg()); but I suppose if swap had been doing that all along, like the normal page cache case, I would not be complaining. > > The only thing that gives me pause is the !mapping check in > __folio_end_writeback. File and swapcache pages usually have mappings, > and truncation waits for writeback to finish before axing > page->mapping. So AFAICS this can only happen if we call end_writeback > on something that isn't under writeback - in which case the test_clear > will fail and we don't update the stats anyway. But I want to be sure. > > Does anybody know from the top of their heads if a page under > writeback could be without a mapping in some weird cornercase? End of writeback has been a persistent troublemaker, in several ways; I forget whether we are content with it now or not. I would not trust whatever I think OTOH of that !mapping case, but I was deeper into it two years ago, and find myself saying "Can mapping be NULL? I don't see how, but allow for that with a WARN_ON_ONCE()" in a patch I posted then (but it didn't go in, we went in another direction). I'm pretty sure it never warned once for me, but I probably wasn't doing enough to test it. And IIRC I did also think that the !mapping check had perhaps been copied from a related function, one where it made more sense. It's also worth noting that the two stats which get decremented there, NR_WRITEBACK and NR_ZONE_WRITE_PENDING, are two of the three which we have commented "Skip checking stats known to go negative occasionally" in mm/vmstat.c: I never did come up with a convincing explanation for that (Roman had his explanation, but I wasn't quite convinced). Maybe it would just be wrong to touch them if mapping were NULL. > > If we could ensure that the NR_WRITEBACK decs are always protected by > the xalock, we could grab it from mem_cgroup_move_account(), and then > kill lock_page_memcg() altogether. I suppose so (but I still feel grudging about the xalock for swap). Hugh