From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6DB9C369B1 for ; Wed, 16 Apr 2025 07:35:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4AAC22800BA; Wed, 16 Apr 2025 03:35:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 409E52800AF; Wed, 16 Apr 2025 03:35:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 284732800BA; Wed, 16 Apr 2025 03:35:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 036E82800AF for ; Wed, 16 Apr 2025 03:35:01 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B1B701601BF for ; Wed, 16 Apr 2025 07:35:02 +0000 (UTC) X-FDA: 83339095644.22.8209472 Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) by imf22.hostedemail.com (Postfix) with ESMTP id AD578C0009 for ; Wed, 16 Apr 2025 07:35:00 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Rk8VSu4A; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of haoli.tcs@gmail.com designates 209.85.214.196 as permitted sender) smtp.mailfrom=haoli.tcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744788900; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qGWlbS0hcQdAVfhpeU9hYu98uUHTxIAMKxP4/ppybXc=; b=w3AYPWOIaOMbHH0qFKDWBHt0bCgbuQZLmOXWt7Bo8BFmLSUKoFvWj20NndnU+R1LLsBp3I zdRygsOtGj07u3FPU1yF0qXLHhZSt5aMeldWj/p7FlpIM6oOtWmztyVIqV75OBdXW0Jy1p X0e/Fa9pkbsWvDC61Bdgb2109TlGNmQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744788900; a=rsa-sha256; cv=none; b=Qmuc9geHu4svIQKfH05+xXMj9kcVKT+OkL19KRQbfWHAHBWAgb4aJJnMrlth0XqQFELRlV dM8PkrBTCUFJ1b7YjCLHtRQlCuKpIr+TXrBP4VV5buoe+Oys6mvrSNFUvHKYBPA/AsrRI2 mnCfqGgHte7OLGsd9C9jOHSMO7mmmyE= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Rk8VSu4A; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of haoli.tcs@gmail.com designates 209.85.214.196 as permitted sender) smtp.mailfrom=haoli.tcs@gmail.com Received: by mail-pl1-f196.google.com with SMTP id d9443c01a7336-22c336fcdaaso3478605ad.3 for ; Wed, 16 Apr 2025 00:35:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1744788899; x=1745393699; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=qGWlbS0hcQdAVfhpeU9hYu98uUHTxIAMKxP4/ppybXc=; b=Rk8VSu4AZ/tONCCrx9RlPWCIOiWLWuSkmUNffYwgL5gKYHL136tqpnXY2cZY24T7WH bP217Lz3uPA78727mXphk+tF6xScsLct5a3B+JwnkSWCDH8/1JVzGC+qWcubUOhTMJjJ Ee+VBEQWoIcnYczlcURSygwOMX7+TO+I8Xz+OO3dS9GUz2qgMa6lrGyGnfUY0VOcPRFg S5i+ae4as8ek2Z83sALhKNl37rr5nVl8SiDLnAE/FjzmmSou4RzM4iWRye4g0Mr49Hpv iBz5TQVAiss9+N20bHiQqzgw7gaG74c6gY0KpR5n5IFtvWpDsjOx/2TzPesto+xUvc9/ YGDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744788899; x=1745393699; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qGWlbS0hcQdAVfhpeU9hYu98uUHTxIAMKxP4/ppybXc=; b=gyRe3LWDmeThsnWFNbFmth4OpiAxUNwHfYK7Ky8+HzqQkreP7mDaWpdbhti2kTosf8 6+mowJ4EyEkNwrS3d9228A9HsWDxachbrFEljdoGorhCfqKTPIKT0zrgEphDJn7tbvOX GSWlcDRIKoAvwX0JIr8wePwGUech2p9gD8VD+eDGyZfXYmydSVi4cktw1MtxtHafkPSW AAB4+o1YSUGfazIU234jsppel0RD1fg77Dk+v1D4zm23v0AhnZGWRst4PcNZGQWVSlWC Y4y0PKkAHOjH03dEIeyJpfD2sUl4CCJ2jkKXrY74gxQ7cXbtVAX4lZ09/Iy7yogkMcpr UNgw== X-Forwarded-Encrypted: i=1; AJvYcCV41JgyCARtdultB22JMksqW99TF95MYoeWdkc7bzvTAEK4oLo8peh/KXHLcoJaz14ksH/5aE7iOw==@kvack.org X-Gm-Message-State: AOJu0YypDvrVimj+RzakIpyeshF427KgR1hYFjTpUqUSC45hyfmigDEk at0c02NaAS1P/nYjg0+covQBzxhMDZpf2bgKMlgaMZOqSBZ3pa7g X-Gm-Gg: ASbGncu+wwlOkOqp3Uq+aUqH/AuOtosGB6C78t9sp9q1T9gcSCwpyYyk8sOVfEYmeBE bfN6GrB/of3VvWJSO5C+qhxwC5HBNobzLcA9nY4v7lKAMyay9CdNQYwJZLrUxrjjVnUEsMZ5BK+ GvBqoJVzfPuBUrCjh+tonNCaz0RjIzyZi9a/Fa6//ACV/j9qTEmA41MbPxuxz2SIbobvzigS/dI hbx7zDRlJCD27pb9j/xy0ALNwt4saSfC5Blo+NGwc/XzCPhn5ChI7LQbxclhdnDrlP+mqRPsENi A0EjZwnHu9BPF2oAlmE2YUlH/A== X-Google-Smtp-Source: AGHT+IHXwjcm0QwSlvR87CI4C7uqeCOq6d1yBJSJoTSuAPmI1MxQhW0l5wAD1F5ZQDY6ihBkv7/Zpw== X-Received: by 2002:a17:903:2452:b0:224:1943:c5c with SMTP id d9443c01a7336-22c358dc26cmr11933315ad.15.1744788899278; Wed, 16 Apr 2025 00:34:59 -0700 (PDT) Received: from fedora ([2a01:7e03::f03c:91ff:fe02:b162]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-3086122c27bsm860849a91.40.2025.04.16.00.34.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Apr 2025 00:34:58 -0700 (PDT) Date: Wed, 16 Apr 2025 15:34:52 +0800 From: Hao Li To: Mateusz Guzik Cc: x86@kernel.org, linux-mm Subject: Re: idea to ponder: handling extra pages on faults of anonymous areas Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: AD578C0009 X-Rspam-User: X-Stat-Signature: e7tarrtyk6o51p9uziwqsm58qew3h8ka X-HE-Tag: 1744788900-99245 X-HE-Meta: U2FsdGVkX1/1q3luFjnNsslVVNwTlE5YVQIOLncILUHZzMGOxquJuOAm02LR7w/KtIE7m0bCfmzwxZhRd5JNs8uYJoGKPhS1hoo/KvKvo+bXg0F+5P3WfG2a83+DCWcJEFeqtRsqYZuxdnC4Xk9GKZKT/rr9y+6vGf+QwkKbvOu5+zCmYbsrFN+e5h693XfRjX2mchsh5xPiahxgfT/NCxeMBO/qyRjR68xjqjc+dweDglpzzJeJvmoWqtMr4alF3nGw3niPbBmCVNO31EWYLPuYUv/UPcogZfq3urirXzMC1RSGPhPWza9JnRwRelpSPkOyLnd+YI9X2EvOw1gYLqYy6/LZOUk7ep9aIf5mRNV0yfZq/+seub+4tfjXm+RIh3KYkMh5Xqz6v4/K0CC6x9/yfW8Vyhmo8372pnyKo1yd+EyLNYmJtOBSWTQDaSYn1hatRcgPPoQtDwg1J+0JXDWOwif7BC5ZZC/4wxpVVaz/jDMQz/M5xjLpkAkp621pjXoxI0iR7TwqlSPMi2DKwlvy2hLFv5OmWemLdY/lUYRQBouQ8gGf9HuHfe+SkYVOoP40aQpA2oxoTKqgo9d4s3rcFnuqM8wwQADTKm7Ilk+ZcTMOx2w0DxZch2OnM7oFXDTNc8UMMKS/WHBHXH4s2kP31KCDPh7zP4tMrndHpA26kbP4CXdrTsgUR+wZpo4F3UxAnHdKHq6jj9HPSrFTshq66EzPHjn/JIqYHWIw6PGkcKK4sjw6POXApOAO3Lzk7Dr+CKm44OxD+gD7dASzMe0eoK+d/GGGHGyWTWUIw8MkRojqdwM3s7yOcdYw3xzGYmP/hgSmGKi6dFU2Iu7aMz4OdR6f/QGdfTOEljQZA83eVivoeEyDUkTF67PaL7IWnX7xEVJcmwcWi7X2wIFb4gbpDviS/X9bPBiKDlkHuwqSBhSd0or06tdGatdcGWpPFQC2WDGmrO2qJ435gqx HMEzKDsi hvm9bh8BlooMp5c/+HsjrrrWPa0tnR3DdTxP9BuVaZoy+melcymc3bnm4iPeDHRXfCbjbALXzfA2LzPanNhQbm8V+jR+NRgQZOkyvMsKhAxf8CbfXSTmnzHDHzxVQpqWWLxmsYOcVVcHUXKMTJq5zeNpKc55FCQ388c55Hv8L7uXct4g8rtVhpbLtZomWhZKeOsXaxKRYWT8trjWerrP1PrDhFkVbdBoPnFZEM5QBeEuVfyVSIo5Wm7E0rxlLZK/+HU6iPG+jOrVKmKF4YjnTmq8dbCuqf2W7rfnXcQpCaMjJv/u1sGB99Nqw63mDeC22sspITH9AdODJ3gwKkD2OKwh1VO4g0fP74CYnBKeUtd9gRK6BCWHgjSCMqf5kMzBaaH0l4cbip4UoADSCV5Nxi4K3eoKxVGdKGBxbpK9GYM52WMxOHovqlnrxQzQL8acAOZsGe7N7R//RTfXs/u3jjc+ofgmrsii9O46i X-Bogosity: Ham, tests=bogofilter, spamicity=0.002504, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 15, 2025 at 06:58:43PM +0200, Mateusz Guzik wrote: > If you have an area not backed by a huge page and fault on it, you > only get the 4K page zeroed even if the given mmapped area is bigger. > I have a promising result from zeroing more pages than that, but don't > have time to evaluate more workloads or code up a proper patch. > Hopefully someone(tm) will be interested enough to pick it up. > > Rationale: > 4K pages predate the fall of the Soviet Union and ram sizes went up > orders of magnitude since then even on what's considered low end > systems today. Similarly, memory usage of programs went up > significantly. It is not a stretch to suspect a bigger size would > serve real workloads better. 2MB pages of course are applicable in > some capacity, but my testing shows there is still tons of faults on > areas where they are not used. > > In particular when running everyone's favourite workload of compiling > stuff, kernel time is quite big (e.g., > 15%), where a large chunk is > spent handling page faults. > > While the hardware does not provide good granularity (the immediate > 4KB -> 2MB jump) and will still need to use 4KB pages, fault handling > can go down by speculatively sorting out more than just the page which > got faulted on. > > I suspect rolling with 8KB would provide a good enough improvement > while suffering negligible waste in practice. > > While testing 8KB would require patching the kernel, I was pointed at > knobs in /sys/kernel/mm/transparent_hugepage which facilitate early > experiments. The smallest available size 16K, so that's what I used > below for benchmarking. > > I conducted a simple experiment building will-it-scale like so: > taskset --cpu-list 1 hyperfine "gmake -s -j 1 clean all" > > stock: > Time (mean ± σ): 20.707 s ± 0.080 s [User: 17.222 s, System: 3.376 s] > 16K pages: > Time (mean ± σ): 19.471 s ± 0.046 s [User: 16.836 s, System: 2.608 s] > > Or to put it differently a reliable 5% reduction in real time. Page > fault count dropped to less than half, which suggests majority of the > improvement would show up with mere 8K instead of 16. > > the 16K thing was tested with: > echo always > /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled > > I stress the proposal is not necessarily to use mTHPs here (or > whatever the name), the above was merely employed because it was > readily available. I'm told the use of these might prevent other > optimization by the kernel -- these are artifacts of the > implementation and are not inherent to the idea. > > The proposal is to fill in more than one page on faults on anonymous > areas, regardless of how it is specifically handled. I speculate > handling two pages (aka 8KB of size) will be an overall win and should > not be affecting anything else (huge page promotions, whatever TLB > fuckery and what have you). Worst case you got a page you are not > going to use. Hi, This sounds like the anonymous memory version of file readahead. For example, when userspace mmaps a range of anonymous memory, the kernel could speculate that the application will trigger do_anonymous_page faults sequentially. So instead of handling one page fault per PTE, it might be better to preemptively populate several PTEs in one go. > > I think a good quality proposal is quite time consuming to produce and > I don't have the cycles. I also can't guarantee the mm overlords will > accept something like that. I can however point out that google > experimented with 16KB pages for arm64 and got very promising results > (i have no idea if they switched to use them) -- I would start with > prodding those folk. > > cheers > -- > Mateusz Guzik > >