From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69E05CCA482 for ; Sun, 10 Jul 2022 11:20:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 665128E0002; Sun, 10 Jul 2022 07:20:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 614B98E0001; Sun, 10 Jul 2022 07:20:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 503858E0002; Sun, 10 Jul 2022 07:20:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 410738E0001 for ; Sun, 10 Jul 2022 07:20:05 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 05322807D9 for ; Sun, 10 Jul 2022 11:20:05 +0000 (UTC) X-FDA: 79670945970.10.9CA82CD Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by imf31.hostedemail.com (Postfix) with ESMTP id 8B54420053 for ; Sun, 10 Jul 2022 11:20:02 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R401e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046051;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=33;SR=0;TI=SMTPD_---0VIrBUyv_1657451992; Received: from 30.39.247.23(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VIrBUyv_1657451992) by smtp.aliyun-inc.com; Sun, 10 Jul 2022 19:19:54 +0800 Message-ID: <7628e9a7-8e2d-dcfb-09e5-27de36da5af7@linux.alibaba.com> Date: Sun, 10 Jul 2022 19:19:56 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH 0/3] Add PUD and kernel PTE level pagetable account To: Dave Hansen , akpm@linux-foundation.org Cc: rppt@linux.ibm.com, willy@infradead.org, will@kernel.org, aneesh.kumar@linux.ibm.com, npiggin@gmail.com, peterz@infradead.org, catalin.marinas@arm.com, chenhuacai@kernel.org, kernel@xen0n.name, tsbogend@alpha.franken.de, dave.hansen@linux.intel.com, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, arnd@arndb.de, guoren@kernel.org, monstr@monstr.eu, jonas@southpole.se, stefan.kristiansson@saunalahti.fi, shorne@gmail.com, x86@kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev, linux-mips@vger.kernel.org, linux-csky@vger.kernel.org, openrisc@lists.librecores.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <8821beda-4d60-4d01-b5c8-1629a19c7f0d@intel.com> From: Baolin Wang In-Reply-To: <8821beda-4d60-4d01-b5c8-1629a19c7f0d@intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf31.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.54 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657452004; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Sk+lVZnbPGt9PtE02LTTyeGFfleAWSdyxaVJRYmPdmQ=; b=GMN+/sa9JHY1FSomGq9zt/2AEyjQ/7gNsk81QZG0KjBJKnNUQrQzjCA64PaZI6Q3LSzJoU TJos8qn5j2T0HM1cID9fB81j2FciL0icEo/ss+Nr+xzQgrfHr2oQrZv87iVGAbV6rPQfeD xxDEyygXZXLKOu/CJp+64yV9jqr/YJI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657452004; a=rsa-sha256; cv=none; b=L2fNdoldOQ0K0BCmFI2cnyWYzrjEVaSeA+GMA2tj4k69E9gA379XTipVIb15fUUw73q5aB U+chDTmJMrdRUgsJc8pSIWOGEyb3kZoZNf8Bd5nx1+Hr0YdEp+FHmYqxqjFgQetN/ktdBY KubMXkHf1IkyYAYpsVHdsXb5nmBRx5c= Authentication-Results: imf31.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf31.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.54 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Stat-Signature: fdg7sg78u93qkd3c3ys17bmqzzkmga39 X-Rspamd-Queue-Id: 8B54420053 X-HE-Tag: 1657452002-592554 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 7/7/2022 10:44 PM, Dave Hansen wrote: > On 7/7/22 04:32, Baolin Wang wrote: >> On 7/6/2022 11:48 PM, Dave Hansen wrote: >>> On 7/6/22 01:59, Baolin Wang wrote: >>>> Now we will miss to account the PUD level pagetable and kernel PTE level >>>> pagetable, as well as missing to set the PG_table flags for these >>>> pagetable >>>> pages, which will get an inaccurate pagetable accounting, and miss >>>> PageTable() validation in some cases. So this patch set introduces new >>>> helpers to help to account PUD and kernel PTE pagetable pages. >>> >>> Could you explain the motivation for this series a bit more?  Is there a >>> real-world problem that this fixes? >> >> Not fix real problem. The motivation is that making the pagetable >> accounting more accurate, which helps us to analyse the consumption of >> the pagetable pages in some cases, and maybe help to do some empty >> pagetable reclaiming in future. > > This accounting isn't free. It costs storage (and also parts of > cachelines) in each mm and CPU time to maintain it, plus maintainer > eyeballs to maintain. PUD pages are also fundamentally (on x86 at > least) 0.0004% of the overhead of PTE and 0.2% of the overhead of PMD > pages unless someone is using gigantic hugetlbfs mappings. Yes, agree. However I think the performence influence of this patch is small from some testing I did (like mysql, no obvious performance influence). Moreover the pagetable accounting gap is about 1% from below testing data. Without this patchset, the pagetable consumption is about 110M with mysql testing. flags page-count MB symbolic-flags long-symbolic-flags 0x0000000004000000 28232 110 __________________________g__________________ pgtable With this patchset, and the consumption is about 111M. flags page-count MB symbolic-flags long-symbolic-flags 0x0000000004000000 28459 111 __________________________g__________________ pgtable > Even with 1G gigantic pages, you would need a quarter of a million > (well, 262144 or 512*512) mappings of one 1G page to consume 1G of > memory on PUD pages. > > That just doesn't seem like something anyone is likely to actually do in > practice. That makes the benefits of the PUD portion of this series > rather unclear in the real world. > > As for the kernel page tables, I'm not really aware of them causing any > problems. We have a pretty good idea how much space they consume from > the DirectMap* entries in meminfo: > > DirectMap4k: 2262720 kB > DirectMap2M: 40507392 kB > DirectMap1G: 24117248 kB However these statistics are arch-specific information, which only available on x86, s390 and powerpc. > as well as our page table debugging infrastructure. I haven't found > myself dying for more specific info on them. > > So, nothing in this series seems like a *BAD* idea, but I'm not sure in > the end it solves more problems than it creates. Thanks for your input.