From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 395C4C54E41 for ; Tue, 5 Mar 2024 08:38:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B98A56B0080; Tue, 5 Mar 2024 03:38:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B21466B0081; Tue, 5 Mar 2024 03:38:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C1676B0082; Tue, 5 Mar 2024 03:38:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8371D6B0080 for ; Tue, 5 Mar 2024 03:38:01 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 33CC21A01A6 for ; Tue, 5 Mar 2024 08:38:01 +0000 (UTC) X-FDA: 81862332762.16.9BFA8CB Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) by imf20.hostedemail.com (Postfix) with ESMTP id 7D1981C0009 for ; Tue, 5 Mar 2024 08:37:59 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FCFOqG6T; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.215.180 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709627879; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JhIEFUgDhIZXKxVmL8Qu8gJeUgD1297rsocfcd/T/tw=; b=n+kUvaAbwta9jPjbGDh4Wgv3lsye2BwNy3MAziRe6WEvJ3YNgYBN/7xWW4ggnrT/IgVsGd rkDqhzlKj6yddY8BVNqIWWbemBaGE44AK8iMnXKQELyg/CaFnose7SOxDfd1pYEin6pkQi vTNPwpxJ8A0vmLKQBK68Mu40DNF/bRo= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FCFOqG6T; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.215.180 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709627879; a=rsa-sha256; cv=none; b=yvydCp+iAmO0ehww3xwcVz8RgEH93KncdILw6TccwFTn260LgivUc+gjtFRwQ6khKTI5W5 QvwaynEceJ8K3Xs5n5s4cn3uo9oS6ucMu5TwRUpMb/LXu02SgSdqr0rmvQUBTdfH55nyVP fRufWcoiRDFC3kI61FAwK6sHuAjaV9o= Received: by mail-pg1-f180.google.com with SMTP id 41be03b00d2f7-5d4d15ec7c5so4791446a12.1 for ; Tue, 05 Mar 2024 00:37:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709627878; x=1710232678; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JhIEFUgDhIZXKxVmL8Qu8gJeUgD1297rsocfcd/T/tw=; b=FCFOqG6TyWOTP6biK+npDW9hHDpzeKyUJz7CX1JYYvgt27wRkHYk/zBJ0bynh1FRF+ oU+SactJm07ZKNuGu9b+p14a659rI6A7JpbJtu/rBctNXfvPItL6oPEeZWFMt4RILP5E biuXigAVyrYpR67DCeldQcwLVvPKX4DLBVj36kmL2+Z0LgpKmeOq22SHBTXRUrTCX9xW n5O9L9KEdEB7CNq6mKZ4qZLxBN+mFiCnAPdNVaXTWolU5p6gZxFaJVntF97YOZVkYJkn 8z77LWrdjqZMkD0M3RYgJFNW1Nk/ZaWfruhYl95ghjCrjZsQydqOwsddWKpkMJcxQzsg 27Tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709627878; x=1710232678; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JhIEFUgDhIZXKxVmL8Qu8gJeUgD1297rsocfcd/T/tw=; b=BZKYWboztQbo1EJl3OzYdzjQrlrOumxqYIixqxtSQ0iWmZipgdD87CCVDwuge265dM YNJ7iSQRVdUiW/4P+SIxnnrkSRH5lTdITxHyM1XWK5Jyzkv6FM6G8XVA/LsPtZ4HEV8C lx3Q3dRfnBXa87Yu+KFmSJFRCiWRJUrllTuO1qowPdGRKXleZ3pdUu8QY1JaT6KQD9mB lvpF6IUhFa9u2Z486xE4Ck8b2UDybGAob6ONeL08CVJDMN9fgl1eqDDFHdN9Kw+2FzQi wIMNM455xTgfkqdRZuCmO9A9qeRWqKLm+S4x45tL3ayTf07AQ23kMeF0Q9Hl/2rD/LKN viNg== X-Forwarded-Encrypted: i=1; AJvYcCWjqAHKzQ1tVL77ioliARUJWTeiLqoCL/ERfvpEZcjGoioV7wOGoTxndRT6D1JkeP36Tc3qUYKHSJR/d41HCxXqJfc= X-Gm-Message-State: AOJu0YyZxB0RDiwlEjkkHQ2oTkdhT/og7ZHQ8l6eoHNDaC/BUqtnaV6i SysmKxGh1e3js6GPo5+UefAL4Skvu8I+Rog0nbV8QiFJ6/xZ91Wxs6/V1mZPxdk= X-Google-Smtp-Source: AGHT+IGyaNhzEGRxWL3VdZWrL7/XkLtspKzpjwBhxTN5NRE3uTvKE0vpkQt+bRRTgbXq3XQeVFryCw== X-Received: by 2002:a05:6a20:3ca1:b0:1a1:4fce:8ee1 with SMTP id b33-20020a056a203ca100b001a14fce8ee1mr1150572pzj.8.1709627878164; Tue, 05 Mar 2024 00:37:58 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id g1-20020a056a0023c100b006e5557128efsm8463376pfc.133.2024.03.05.00.37.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Mar 2024 00:37:57 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: yuzhao@google.com Cc: corbet@lwn.net, linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org Subject: Re: [LSF/MM/BPF TOPIC] TAO: THP Allocator Optimizations Date: Tue, 5 Mar 2024 21:37:43 +1300 Message-Id: <20240305083743.24950-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240229183436.4110845-1-yuzhao@google.com> References: <20240229183436.4110845-1-yuzhao@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 7D1981C0009 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: z6ymputpzjcttbb5rqhucybot9urn4zs X-HE-Tag: 1709627879-432093 X-HE-Meta: U2FsdGVkX18ogRvxvkiOoyo6ziajQ7mppPsxKV7/Su+xSgQDyIJmZyhDOv/KdRQZIEC/xvuy66oHW2rhKVzbUrMBhg4toc2pWoAZRPN0m2mqAV6Rgkyi7MR9o57rRBqShns/OvnTFoVO8YGvk38lSqrfovbx4zF6XWbmjKxyabdE/3NQ1L+Nr5aoRuN61mC47hF7b4toLUVclLd7BScREa90Xg4YZAXc5B6xiHam0a0Yf+xqWtoKxprnwXbY1vR9qS1xeOkoRDNJIruSHBAlLrICpZcX7TI8ms5vezaCUUZ0ovutZBycZXdy/ihuW72CJDyTFOPWaxuKaRm0xJ7aRzbHvYeMDSBpeRZrNPMgJBhf/swJ5Qh/eoiL/+LVY8NqS0qk9KTmVhJ8tYXoQKpBmSgNyFDpwjrTcp+lHQCL9OeWStuguyFXngcO69Od7K/YN9gc+SSoV+t5zpBIiIa1mvW2rMwIfy43fzZZWpBbIVMh7Bh2lFuGZOd6dfFh1tIBwDw14FFoDElagZQzDT4ulUsecyTZPkz+3/hcqeGmB3QdOL+GTseoRpI349GnkQDlUkEia6PFB9pdLiOCFABV3nG1gzaHCXIbshqNX6PpbknD/UAUt2creza1U27DnwgFtyVIapUBpYgsibC7V4SryySR+Ua/Av9/Z4wOKPAlmG7jOJts+sxpmDamEqBX3LPQ/eF69q2xh/8I9rsiRmtz4XbpWttFM11kN/oxnHfzUwNDLVtYk0gBbsC/S1tWPuMZmfbsjRLyCZhMQzXXfTik1ZJJL4QSxDd/8Ol1Rg7yZyUtbz5sV+FKJCL+aslAP+0W2aln4sLAY8+jwhlKiu3Dp2yfPP6qRAmJLv36yE1m5sQgnvCbPubnW6AVatH15LzX02nujEYJujpWGm5GRLxyhdsznxSZrHwHoImrdn14JEwUmTrk4kKP1kNtITESxR2k/sspOddivsEDZbxRQDN jcohQwg4 2pXZU5fKc95WATA8gPKMgbbbZ1IGcCtAiQWrN+iHVBbJkGD+/tbCkAogg7n2ozgCnCaHaNT9QA0ZLV9H/zg//c4f4TeSTY3A32qnpi4cJUk4+R8U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.144225, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > TAO is an umbrella project aiming at a better economy of physical > contiguity viewed as a valuable resource. A few examples are: > 1. A multi-tenant system can have guaranteed THP coverage while > hosting abusers/misusers of the resource. > 2. Abusers/misusers, e.g., workloads excessively requesting and then > splitting THPs, should be punished if necessary. > 3. Good citizens should be awarded with, e.g., lower allocation > latency and less cost of metadata (struct page). I think TAO or similar optimization in buddy is essential to the success of mTHP. Ryan's recent mTHP work can widely bring multi-size large folios to various products while THP might be too large for them. But a pain is that the buddy of a real device with limited memory can be seriously fragmented after it runs for some time. We(OPPO) have actually brought up mTHP-like features on millions of phones even on 5.4, 5.10, 5.15 and 6.1 kernel with large folios whose size are 64KiB to leverage ARM64's CONT-PTE. The open source code for kernel 6.1 can be got here[1]. We found the success rate of 64KiB allocation could be very low after running monkey[2] on phones for one hour. After the phone has been running for one hour, the below is the data we collected from 60mins to 120mins(the second hour). w/o TAO-like optimization to the existing buddy, 64KiB large folios allocation can fall back to small folios at the rate of 92.35% in do_anonymous_page(). thp_do_anon_pages_fallback / (thp_do_anon_pages + thp_do_anon_pages_fallback) 25807330 / 27944523 = 0.9235 in do_anonymous_page(), thp_do_anon_pages_fallback is the number we try to allocate 64KiB but we fail, thus, we use small folios instead; thp_do_anon_pages is the number we try to allocate 64KiB and we succeed. So this number somehow means mTHP has lost vast majority of value on a fragmented system, while the fragmentation is always true for a phone. This has actually pushed us to implement a similar optimization to avoid splitting 64KiB and award 64KiB allocation with lower latency. Our implementation is different with TAO, rather than adding new zones, we are adding migration_types to mark some pageblocks are dedicated for mTHP allocation. And we avoid splitting them into lower orders except for some corner cases. This has significantly improved our success rate of 64KiB large folios allocation and decreased the latency, helped large folios to be finally applied in real products. [1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplus/sm8650_u_14.0.0_oneplus12/ [2] https://developer.android.com/studio/test/other-testing-tools/monkey > 4. Better interoperability with userspace memory allocators when > transacting the resource. > > This project puts the same emphasis on the established use case for > servers and the emerging use case for clients so that client workloads > like Android and ChromeOS can leverage the recent multi-sized THPs > [1][2]. > Chapter One introduces the cornerstone of TAO: an abstraction called > policy (virtual) zones, which are overlayed on the physical zones. > This is in line with item 1 above. > > A new door is open after Chapter One. The following two chapters > discuss the reverse of THP collapsing, called THP shattering, and THP > HVO, which brings the hugeTLB feature [3] to THP. They are in line > with items 2 & 3 above. > > Advanced use cases are discussed in Epilogue, since they require the > cooperation of userspace memory allocators. This is in line with item > 4 above. > > [1] https://lwn.net/Articles/932386/ > [2] https://lwn.net/Articles/937239/ > [3] https://www.kernel.org/doc/html/next/mm/vmemmap_dedup.html > > Yu Zhao (4): > THP zones: the use cases of policy zones > THP shattering: the reverse of collapsing > THP HVO: bring the hugeTLB feature to THP > Profile-Guided Heap Optimization and THP fungibility Thanks Barry