From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F08B0ECAAD4 for ; Fri, 26 Aug 2022 13:07:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 39C0C940007; Fri, 26 Aug 2022 09:07:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 34B7A6B0075; Fri, 26 Aug 2022 09:07:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2154C940007; Fri, 26 Aug 2022 09:07:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 10F186B0074 for ; Fri, 26 Aug 2022 09:07:20 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D3B50C02A1 for ; Fri, 26 Aug 2022 13:07:19 +0000 (UTC) X-FDA: 79841769798.27.FC30653 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by imf29.hostedemail.com (Postfix) with ESMTP id CD15412002F for ; Fri, 26 Aug 2022 13:07:17 +0000 (UTC) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4MDg8y2l3Gz4xD1; Fri, 26 Aug 2022 23:07:14 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ellerman.id.au; s=201909; t=1661519235; bh=0i+s5wxpP3IqnwJK7XUXPlQWahZWLjBPLn3YDuffA54=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=RC+rHNrEU+ZD62VNj83ycZeTbgzHNj1aRUikYH+jRDEO2osPaEgMu9JAOgFUyvM01 BP0heyCS3o+FIJwtzn7Ih98HxwvCYkoTQUN4YZUi8h8pXoGb2eg9x7YVwPMsWwvYFZ fgsTJpelo5tv4jwUOzJ2wc6IyhA5VM3sMQmtNaOMyXQ1DC4Uikm5XE/3JJRvnK0ft1 lKGzPqX6zCFIGqkriGV9KxVqKLEtiMJ/gFJMkUMZ53sMnocEfbxY9dsEuA601dNwgm OqA2qV1TLO7m2Vf4BWyjw/Q95CbqhPcKTD569gA6pva6qyJsVjgtLRUY00tDs0bMYt ylSD2JRLy1Itw== From: Michael Ellerman To: Mike Kravetz Cc: Andrew Morton , "Wang, Haiyue" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "david@redhat.com" , "apopple@nvidia.com" , "linmiaohe@huawei.com" , "Huang, Ying" , "songmuchun@bytedance.com" , "naoya.horiguchi@linux.dev" , "alex.sierra@amd.com" , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , linuxppc-dev@lists.ozlabs.org, "Aneesh Kumar K.V" Subject: Re: [PATCH v6 1/2] mm: migration: fix the FOLL_GET failure on following huge page In-Reply-To: References: <20220812084921.409142-1-haiyue.wang@intel.com> <20220816022102.582865-1-haiyue.wang@intel.com> <20220816022102.582865-2-haiyue.wang@intel.com> <20220816175838.211a1b1e85bc68c439101995@linux-foundation.org> <20220816224322.33e0dfbcbf522fcdc2026f0e@linux-foundation.org> <875yiomq9z.fsf@mpe.ellerman.id.au> Date: Fri, 26 Aug 2022 23:07:12 +1000 Message-ID: <87r113jgqn.fsf@mpe.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661519238; a=rsa-sha256; cv=none; b=A2CSfGHLJ4ILYplc0Oh0RCqmP77P0REqeInVzk6VTUtxrSGnWIHHJ/g2migLdK8VaDn0U/ nhA8M4JsucoDck7kGjxV+DOjMTKURFvcZQs8IfpLiokZOnxNTUkvdkjOVYRQ5HsNV0bltO ZF7X8sJeWTIFoEW/uNodU3kZD8dLRzc= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=ellerman.id.au header.s=201909 header.b=RC+rHNrE; spf=pass (imf29.hostedemail.com: domain of mpe@ellerman.id.au designates 150.107.74.76 as permitted sender) smtp.mailfrom=mpe@ellerman.id.au; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661519238; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0i+s5wxpP3IqnwJK7XUXPlQWahZWLjBPLn3YDuffA54=; b=DN39FFgakh0+0e79WcN631xvXjI1NR8AUikL3MEVSaziqgurKVBDbW/iTWbr1Tohejcxiw OtjeNURq8n3vZp5pjuUzS5yxLvXrA9DWsVuuZB43uO0V4hlAAiWBpUg/QTFK9xU6yDcxxC JAnvfU9H6dS5a8KvuZyzlGZknvylaVQ= X-Stat-Signature: mynwuqzh786tj9ohgpqh3imh4wzk861y X-Rspamd-Queue-Id: CD15412002F X-Rspam-User: X-Rspamd-Server: rspam06 Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=ellerman.id.au header.s=201909 header.b=RC+rHNrE; spf=pass (imf29.hostedemail.com: domain of mpe@ellerman.id.au designates 150.107.74.76 as permitted sender) smtp.mailfrom=mpe@ellerman.id.au; dmarc=none X-HE-Tag: 1661519237-34801 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Mike Kravetz writes: > On 08/19/22 21:22, Michael Ellerman wrote: >> Mike Kravetz writes: >> > On 08/16/22 22:43, Andrew Morton wrote: >> >> On Wed, 17 Aug 2022 03:31:37 +0000 "Wang, Haiyue" wrote: >> >> >> >> > > > } >> >> > > >> >> > > I would be better to fix this for real at those three client code sites? >> >> > >> >> > Then 5.19 will break for a while to wait for the final BIG patch ? >> >> >> >> If that's the proposal then your [1/2] should have had a cc:stable and >> >> changelog words describing the plan for 6.0. >> >> >> >> But before we do that I'd like to see at least a prototype of the final >> >> fixes to s390 and hugetlb, so we can assess those as preferable for >> >> backporting. I don't think they'll be terribly intrusive or risky? >> > >> > I will start on adding follow_huge_pgd() support. Although, I may need >> > some help with verification from the powerpc folks, as that is the only >> > architecture which supports hugetlb pages at that level. >> > >> > mpe any suggestions? >> >> I'm happy to test. >> >> I have a system where I can allocate 1GB huge pages. >> >> I'm not sure how to actually test this path though. I hacked up the >> vm/migration.c test to allocate 1GB hugepages, but I can't see it going >> through follow_huge_pgd() (using ftrace). > > I thing you needed to use 16GB to trigger this code path. Anshuman introduced > support for page offline (and migration) at this level in commit 94310cbcaa3c > ("mm/madvise: enable (soft|hard) offline of HugeTLB pages at PGD level"). > When asked about the use case, he mentioned: > > "Yes, its in the context of 16GB pages on POWER8 system where all the > gigantic pages are pre allocated from the platform and passed on to > the kernel through the device tree. We dont allocate these gigantic > pages on runtime." That was true, but isn't anymore. I must have been insufficently caffeinated the other day. On our newer machines 1GB is the largest huge page size, but it's obviously way too small to sit at the PGD level. So that was a waste of my time :) We used to support 16GB at the PGD level, but we reworked the page table geometry a few years ago, and now they sit at the PUD level on machines that support 16GB pages: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ba95b5d0359609b4ec8010f77c40ab3c595a6ac6 Note the author :} So the good news is we no longer have any configuration where a huge page entry is expected in the PGD. So we can drop our pgd_huge() definitions, and ours are the last non-zero definitions, so it can all go away I think. I'll send a patch to remove the powerpc pgd_huge() definitions after I've run it through some tests. cheers