From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A14A7C433F5 for ; Fri, 18 Mar 2022 03:49:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0EC868D0003; Thu, 17 Mar 2022 23:49:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 09D778D0001; Thu, 17 Mar 2022 23:49:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB42C8D0003; Thu, 17 Mar 2022 23:49:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0251.hostedemail.com [216.40.44.251]) by kanga.kvack.org (Postfix) with ESMTP id DBF478D0001 for ; Thu, 17 Mar 2022 23:49:32 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 92B5FA326A for ; Fri, 18 Mar 2022 03:49:32 +0000 (UTC) X-FDA: 79256127384.18.8E3A30C Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf26.hostedemail.com (Postfix) with ESMTP id 2FCB514002B for ; Fri, 18 Mar 2022 03:49:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=P+lmjWlX8RjtV4mSys5aX5+QRT1CeRIMXrpuZ7wboyQ=; b=jDNKNbq2aqbauFSSIrR/iLvQkl vrOkqrmhcjWFh9YriZGIq1lRPIe0eMDRRbbDlJR8eD+nq/l6SUNwOvk7hK6SVYZApWVJdzgRh+Aui cHEGJJCs/zcTRHBC/6CWE9lcreR4hvpqoWFEM9TtaBeAKeq9wwmt07ZmeHopDAfYFORwgYFvSoVAt ckr3uLD3/KKM+XL1JwMSTb7URzAMRyW70QzpxD451khxD3pU7qNqwx9IqJcJlPjTVZhlLczjZhQUn duJmerinql6KSGBddJIM1OpulEXPRNtnoFTHZ4fhmLaOpYc3jVJMVuakCmHKHsepUjl3aaHA+ReXV KjQDzDOA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nV3cF-007b06-Rh; Fri, 18 Mar 2022 03:49:27 +0000 Date: Fri, 18 Mar 2022 03:49:27 +0000 From: Matthew Wilcox To: Michal Hocko Cc: Johannes Weiner , Vladimir Davydov , cgroups@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC] memcg: Convert mc_target.page to mc_target.folio Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 2FCB514002B X-Stat-Signature: snqzotgktbnn7zijqumr8cuwqdryj9rd Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=jDNKNbq2; dmarc=none; spf=none (imf26.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org X-HE-Tag: 1647575372-492451 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000027, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 17, 2022 at 12:58:46PM +0100, Michal Hocko wrote: > On Wed 16-03-22 20:31:30, Matthew Wilcox wrote: > > This is a fairly mechanical change to convert mc_target.page to > > mc_target.folio. This is a prerequisite for converting > > find_get_incore_page() to find_get_incore_folio(). But I'm not > > convinced it's right, and I'm not convinced the existing code is > > quite right either. > > > > In particular, the code in hunk @@ -6036,28 +6041,26 @@ needs > > careful review. There are also assumptions in here that a memory > > allocation is never larger than a PMD, which is true today, but I've > > been asked about larger allocations. > > Could you be more specific about those usecases? Are they really > interested in supporting larger pages for the memcg migration which is > v1 only feature? Or you are interested merely to have the code more > generic? Ah! I didn't realise memcg migration was a v1-only feature. I think that makes all of the questions much less interesting. I've done some more reading, and it seems like all of this is "best effort", so it doesn't really matter if some folios get skipped. I'm not entirely sure what the usecases are for >PMD sized folios. I think the people who are asking for them probably overestimate how useful / practical they'll turn out to be. I sense it's a case of "our hardware supports a range of sizes, and we'd like to be able to support them all", rather than any sensible evaluation of the pros and cons. > [...] > > @@ -6036,28 +6041,26 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, > > case MC_TARGET_DEVICE: > > device = true; > > fallthrough; > > - case MC_TARGET_PAGE: > > - page = target.page; > > + case MC_TARGET_FOLIO: > > + folio = target.folio; > > /* > > - * We can have a part of the split pmd here. Moving it > > - * can be done but it would be too convoluted so simply > > - * ignore such a partial THP and keep it in original > > - * memcg. There should be somebody mapping the head. > > + * Is bailing out here with a large folio still the > > + * right thing to do? Unclear. > > */ > > - if (PageTransCompound(page)) > > + if (folio_test_large(folio)) > > goto put; > > - if (!device && isolate_lru_page(page)) > > + if (!device && folio_isolate_lru(folio)) > > goto put; > > - if (!mem_cgroup_move_account(page, false, > > + if (!mem_cgroup_move_account(folio, false, > > mc.from, mc.to)) { > > mc.precharge--; > > /* we uncharge from mc.from later. */ > > mc.moved_charge++; > > } > > if (!device) > > - putback_lru_page(page); > > -put: /* get_mctgt_type() gets the page */ > > - put_page(page); > > + folio_putback_lru(folio); > > +put: /* get_mctgt_type() gets the folio */ > > + folio_put(folio); > > break; > > case MC_TARGET_SWAP: > > ent = target.ent; > > It's been some time since I've looked at this particular code but my > recollection and current understanding is that we are skipping over pte > mapped huge pages for simplicity so that we do not have to recharge > all other ptes from the same huge page. What kind of concern do you see > there? That makes sense. I think the case that's currently mishandled is a THP in tmpfs which is misaligned when mapped to userspace. It's skipped, even if the entire THP is mapped. But maybe that simply doesn't matter. I suppose the question is: Do we care if mappings of files are not migrated to the new memcg? I'm getting a sense that the answer is "no", and if we actually ended up skipping all file mappings, it wouldn't matter. Thanks for taking a look!