From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64078C54E41 for ; Tue, 5 Mar 2024 08:41:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA88D6B0089; Tue, 5 Mar 2024 03:41:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E58696B008A; Tue, 5 Mar 2024 03:41:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D20FA6B008C; Tue, 5 Mar 2024 03:41:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BCD6E6B0089 for ; Tue, 5 Mar 2024 03:41:32 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 7DD58140A10 for ; Tue, 5 Mar 2024 08:41:32 +0000 (UTC) X-FDA: 81862341624.27.C07CB76 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf15.hostedemail.com (Postfix) with ESMTP id CBB21A0009 for ; Tue, 5 Mar 2024 08:41:30 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bFA49CQh; spf=pass (imf15.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709628090; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0myzVNug4/pcJySV/q7wdu3w/0q/uiIDOZXajoJteDA=; b=H7EpUsUtCL/DaFjwzr5v8zq+iyhUziR/mg6ua0eCsJLIPBVhZ5g0GJALiysugrgHUCT9f+ 7FGj4x7EYB1oWM+JP55mzA7YoURtXgsehYGnpMvT+LUvrlGF6E1yjv2dSHT/ysR6VMrZ4g WemVVR0uwUF+tYx1W9G65jEo6e+vycI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709628090; a=rsa-sha256; cv=none; b=OS+TGfrQWPhr+3l1qhji1FKkOmIinuf8lyDvM0OI5tI08ATaG2ppadLZl06kpu+S3oKueA H81/W3P2Yer1xUfRn1HaY1VEibf9hpIzfh3FVQPm8M8jVx6MZcMFQAmCzsVtUrnIyrb1BJ rnG/zQQXw+x7RX3apbrmwDicB/d9yRE= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bFA49CQh; spf=pass (imf15.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pj1-f41.google.com with SMTP id 98e67ed59e1d1-29b2c48fa3dso291496a91.1 for ; Tue, 05 Mar 2024 00:41:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709628089; x=1710232889; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0myzVNug4/pcJySV/q7wdu3w/0q/uiIDOZXajoJteDA=; b=bFA49CQhj++cQ2SXq/eyOrpMmWr8UMbX0kVuXkjfSF49zfWVxhmQIoX2WPbyeA8JGR H1ROzTFOzdVYly3dGANKQ7ZHAdOhVz5AJAUWLRU/LOnJNf4WSXUNHqzQVwEVAenUaG9H xCEneDs9k47NWJWBZl9mi52R5XTWRIvNXbPPp5YA4/mAnoglGltAeIgxgykYLBD9qIot NSq1sIz4goWvo0j7bYTwHa4TK2FIuqKQtycG0OMBrQZhTkR1FH7TLN5XfoCP/e16DHTF uev17JTMdv76sXzPmDb2IGzaId/Y1WD3qAxd0oqWMPITPzbe1q9rlQ7CP1t6wkIpjHS1 cTnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709628089; x=1710232889; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0myzVNug4/pcJySV/q7wdu3w/0q/uiIDOZXajoJteDA=; b=p5TzLnd2SCpUgqUllyirv8TQP345Mu+3Tn689wi5jccfkfI/ADb5uhm2qUBAw1f2nW VQUwO2xo9+XfhlyHaHGanmyhHSTVDYwiYqYKsQQj7xBplF9JzWYvUeSUAQsUfDSsejUb s89ivRc4dfxCsdMNVhmTNVslH622kNW7V+cOgqx/nK062WMlt3FMlmVQwIGYfUKMK+ih N/iglqtQ4PzXDaXOGgUphHdKFKRZi/cVBp2gEyAzvvedLglUb3oQ8KuzFDtDfForL/o7 rqNMkqaRspW14CRcxMRGWd4cBxoG3idsDA3e++Rm0JqbF97BbMWwMWFCq3PEsxrYzayG DEkg== X-Forwarded-Encrypted: i=1; AJvYcCWvbeOwcxhjcxqqf4t5nBbxoGQf5T9UacP0VjxZgEArHtrMjzVbww8mqlOyFzuCnil7nVF37h+spoutoLvAjTjAdwc= X-Gm-Message-State: AOJu0Yx7vL9i+BlHFAINOo1aquvh2ZXhc4OjvrtufOUSt49g9fG0HWxB NWA2P4mu/HVsunldBh60Ht1m99ww/hWf1Vj6T74O030zcmsDV3bM X-Google-Smtp-Source: AGHT+IFSj03olK4FZOdfjvpc0nah9fdj0q/AxbeiNVvnU3VStBAJDHHvs/kgmuxi2mTXvQz10T4x0A== X-Received: by 2002:a17:90b:f10:b0:29b:4ccc:a85a with SMTP id br16-20020a17090b0f1000b0029b4ccca85amr4487804pjb.20.1709628089040; Tue, 05 Mar 2024 00:41:29 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id sx16-20020a17090b2cd000b00299b31de43esm11220580pjb.45.2024.03.05.00.41.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Mar 2024 00:41:28 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: yuzhao@google.com Cc: corbet@lwn.net, linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org Subject: Re: [Chapter One] THP zones: the use cases of policy zones Date: Tue, 5 Mar 2024 21:41:16 +1300 Message-Id: <20240305084116.25103-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240229183436.4110845-2-yuzhao@google.com> References: <20240229183436.4110845-2-yuzhao@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: CBB21A0009 X-Rspam-User: X-Stat-Signature: x654bt8zrhqeiqx8wtoiw3mxqyubboy3 X-Rspamd-Server: rspam03 X-HE-Tag: 1709628090-957041 X-HE-Meta: U2FsdGVkX18AWrdGlxgVgVJoG+jPWh/T6z17DScalbUvJTV/FphG++TWKoo/E2v5nAdFOWv7Xg4W/BI403503FNzcbFjYRJJY1tzm4dlHrBXIdTJDQ4J8/pyIqAgs1sO+cHzl7lRwIN580lZ73q2IpHPeao0vf+PuzIUwQqieFunKwh3jpVZlcoXpSSxQP3UTYqV7VuyGn3jZdgCG2vdlN9OMCuLpsbECYhGh4zwRjP+m0MpCzBcZbXD2QiCnyYa8lMv/k5F9oAHZzKFo88gpbkb3bYML/EzpIO6vfTG/xzLt3066ZbxTW3LYTuKKBz2JSCzonEyrqiThyx+Fk47116WTZX78Yn30vsiwWWZmpJtf2JexWKlEORrEQYt9kNEO8hkK0MWrAAaDv7ogUogZkRp6csI75Q1fuGwB3pf+OWrOC39EhjZmZ7lk9bkyPgO+zPhTLJXjxx3nZNHz9Ft2RPf+otjpYaGoK+iI2LHOUgzYHaTaieFdhmkCYuOewWEaZewI08UFUzQD3M+NsLBG9jJYOOofNb13+WLYh532Yz5RPwA33ibFWA4f8fgusfY52TDk4hPeef8c/TX8DqGkVoZ9iK1YpQTfcmPn/p22bGoChVwVrX6dDttP055HEZVk9Q60O/FF7g/xxRLySd5oeQaVLrXLrkEn3RrGoFOed6P+mmS6BlL8r0ADQuj2MGhh3js5BuWY6rAYvKDDHOoI+8jf2IaNiRwGJp7UTZ//QJcF0NX/9CKPOU+HIAuyHic4m4NLQysT2jJTUrduPNlUmltZ9isXMqrC1C8t/RuTyrfe870Ttfpq+Ic+W10+uKQnZGZ8DyFne7TpYTLuHXRyWLrEU8tG3DDZlHGQNtOTGu3G4b/LQvrwZ00ZGgwvW0vey5TuVrsKeVg2dxrzp+Pfj+sMGkp3yxUoiaUtrbd9pdkdd+iHbVH1txs32Dd8eqN184dbK003ZC89JUpNrA 91p6yUOt C20g4dwYutjk91Iema2NKC3dllotYiNpq/xq5eJmZs3WttadsthlgSfAYot6013shav+IKfbGcFhyqqdxcstLGxC9H4lfrczdo6ak2/pkUkRxQp8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000266, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > There are three types of zones: > 1. The first four zones partition the physical address space of CPU > memory. > 2. The device zone provides interoperability between CPU and device > memory. > 3. The movable zone commonly represents a memory allocation policy. > > Though originally designed for memory hot removal, the movable zone is > instead widely used for other purposes, e.g., CMA and kdump kernel, on > platforms that do not support hot removal, e.g., Android and ChromeOS. > Nowadays, it is legitimately a zone independent of any physical > characteristics. In spite of being somewhat regarded as a hack, > largely due to the lack of a generic design concept for its true major > use cases (on billions of client devices), the movable zone naturally > resembles a policy (virtual) zone overlayed on the first four > (physical) zones. > > This proposal formally generalizes this concept as policy zones so > that additional policies can be implemented and enforced by subsequent > zones after the movable zone. An inherited requirement of policy zones > (and the first four zones) is that subsequent zones must be able to > fall back to previous zones and therefore must add new properties to > the previous zones rather than remove existing ones from them. Also, > all properties must be known at the allocation time, rather than the > runtime, e.g., memory object size and mobility are valid properties > but hotness and lifetime are not. > > ZONE_MOVABLE becomes the first policy zone, followed by two new policy > zones: > 1. ZONE_NOSPLIT, which contains pages that are movable (inherited from > ZONE_MOVABLE) and restricted to a minimum order to be > anti-fragmentation. The latter means that they cannot be split down > below that order, while they are free or in use. > 2. ZONE_NOMERGE, which contains pages that are movable and restricted > to an exact order. The latter means that not only is split > prohibited (inherited from ZONE_NOSPLIT) but also merge (see the > reason in Chapter Three), while they are free or in use. > > Since these two zones only can serve THP allocations (__GFP_MOVABLE | > __GFP_COMP), they are called THP zones. Reclaim works seamlessly and > compaction is not needed for these two zones. > > Compared with the hugeTLB pool approach, THP zones tap into core MM > features including: > 1. THP allocations can fall back to the lower zones, which can have > higher latency but still succeed. > 2. THPs can be either shattered (see Chapter Two) if partially > unmapped or reclaimed if becoming cold. > 3. THP orders can be much smaller than the PMD/PUD orders, e.g., 64KB > contiguous PTEs on arm64 [1], which are more suitable for client > workloads. > > Policy zones can be dynamically resized by offlining pages in one of > them and onlining those pages in another of them. Note that this is > only done among policy zones, not between a policy zone and a physical > zone, since resizing is a (software) policy, not a physical > characteristic. > > Implementing the same idea in the pageblock granularity has also been > explored but rejected at Google. Pageblocks have a finer granularity > and therefore can be more flexible than zones. The tradeoff is that > this alternative implementation was more complex and failed to bring a > better ROI. However, the rejection was mainly due to its inability to > be smoothly extended to 1GB THPs [2], which is a planned use case of > TAO. We did implement similar idea in the pageblock granularity on OPPO's phones by extending two special migratetypes[1]: * QUAD_TO_TRIP - this is mainly for 4-order mTHP allocation which can use ARM64's CONT-PTE; but can rarely be splitted into 3 order to dull the pain of 3-order allocation if and only if 3-order allocation has failed in both normal buddy and the below TRIP_TO_QUAD. * TRIP_TO_QUAD - this is mainly for 4-order mTHP allocation which can use ARM64's CONT-PTE; but can sometimes be splitted into 3 order to dull the pain of 3-order allocation if and only if 3-order allocation has failed in normal buddy. neither of above will be merged into 5 order or above; neither of above will be splitted into 2 order or lower. in compaction, we will skip both of above. I am seeing one disadvantage of this approach is that I have to add a separate LRU list in each zone to place those mTHP folios. if mTHP and small folios are put in the same LRU list, the reclamation efficiency is extremely bad. A separate zone, on the other hand, can avoid a separate LRU list for mTHP as the new zone has its own LRU list. [1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplus/sm8650_u_14.0.0_oneplus12/mm/page_alloc.c > > [1] https://lore.kernel.org/20240215103205.2607016-1-ryan.roberts@arm.com/ > [2] https://lore.kernel.org/20200928175428.4110504-1-zi.yan@sent.com/ > > Signed-off-by: Yu Zhao Thanks Barry