From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AEBED3E189 for ; Fri, 18 Oct 2024 19:44:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A15CC6B00B3; Fri, 18 Oct 2024 15:44:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C5546B00B4; Fri, 18 Oct 2024 15:44:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 843766B00B5; Fri, 18 Oct 2024 15:44:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 631FE6B00B3 for ; Fri, 18 Oct 2024 15:44:28 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D7ADD1209AA for ; Fri, 18 Oct 2024 19:44:16 +0000 (UTC) X-FDA: 82687749270.07.8C1D84F Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) by imf12.hostedemail.com (Postfix) with ESMTP id D045A40014 for ; Fri, 18 Oct 2024 19:44:20 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UADPd5G+; spf=pass (imf12.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729280518; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7rL4uFYxiHaJ0JKUkuavoS+7dJ1eUFch4sClLn68ScE=; b=IOwb4BTypR8PQGRApftXci/xuxxz3VErVPh/Xv+SiUrlP4BNwORRs/r+A1bgg35SdNDet/ ZsX1CWjo1UEif+5Pl2bcjLKBG/78JYw0N0NS9oLA1u3JM8s4M2Y8OlXm0WKCOE+O/bPL+J EO8lGTkGFd2TB/D5Ehfu2gtg6pyDSC8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729280518; a=rsa-sha256; cv=none; b=6KpiXbV3EuSYlPRPv1F+fxzX8bpS0Oy46zEWyo708YaG/3Hoy2WRUOPYhI0NmmhMGpBAM0 M1omXnOCM4zqi40Bfjj/+RPLljaA0j/M6/slUmS9P4hyaB3K0HLkGRT3xKPK05au7fp9zl wS5jaruKdO1e5k1zxW349PdLT/F6bIk= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UADPd5G+; spf=pass (imf12.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-a9a0ef5179dso325727766b.1 for ; Fri, 18 Oct 2024 12:44:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729280665; x=1729885465; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7rL4uFYxiHaJ0JKUkuavoS+7dJ1eUFch4sClLn68ScE=; b=UADPd5G+XDtSXRXEvrI9r6MwMlo3mTvaxUg6Itd5skLbDNhNFzRInVIMY3HleBLkj/ Y6+VWbpzpcq+mntPVtUWg6DPoJdVco+eDNQwy79MSFlis1yifvsLXofi5CBugt/oNIOP ZkFnQAU/Tyl5IxbtwZu+O7zUChENaKqYarRKtYjNH9Cn2asNtsEMrpT7hYpYtN+U/vVV UGN7JTUOR1T9pi0kN3SYF5ASXEYrERr8R75FAxGqNt/LRoqU2UrmtYM9mZVtg7boQUrx YizKJDGVtY3F8xT6U6bJDD8UAdKu1feDdPXKs/uGs3Q+Yyoog7vBNLiYpYsBjIaWQsKy r80A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729280665; x=1729885465; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7rL4uFYxiHaJ0JKUkuavoS+7dJ1eUFch4sClLn68ScE=; b=crC0ghjFDsXrvGwKZAaboZwIENqnawS3AGjfpipoMNK1Zm40YZJEhJAUo0EKcf+iLN aMFNTEwEbYQkb7IZAPb81YrWuXmZn9or8DPXT5ngo6X1QumbOkCMHo3pFt9xuAwNbPin ccPZx8PUn8qRivXhxX18z9QFoTCNEQDBsAwv4jrYXY3zHBxwBfQmuV5cdkoB3j3AuTKT gjzGmhizBkk8mZSfeuRDxUdZTBhn2aM1ehRA7rLKZC8CAxw3Z9fmw0mLRVqAkxN/iI1i v5sPJypSuh3rC9cNultfygkacySoW8UO6w09LwgV2Zn1fKbescPqwTEg3g+LqLl3FgUc MUiA== X-Forwarded-Encrypted: i=1; AJvYcCU0vEFjKFCLX3A+iGi10YpftEg5C6ywc8wK4JLmWFZeDlsL19dvm2IKiubpTwWQ7667bOofekc5jQ==@kvack.org X-Gm-Message-State: AOJu0Yz30cmeCGRYSvb3kgrIqIcmGYUCQkGSCCV/pGMBlMnNVWJXolHh MK7RwN5TfIS59oVf0xqPIlCDkxxCEJf3cPXT8uc8QhwIhsn5k8R0osipZyQ04mZHkfriqIHipmy KQ6HmpakzP6jEWy15Q/cJtiATA7U= X-Google-Smtp-Source: AGHT+IFEhC0kMZOjREatp1kTzwH1u4yXzZNwy3b7ULP4COvTFtYnodeIOuVk/DaUcXMJD/eXvW5ie7Ljmu2lI0pp13s= X-Received: by 2002:a17:907:7e99:b0:a99:76cb:cedd with SMTP id a640c23a62f3a-a9a69774946mr283012766b.9.1729280664463; Fri, 18 Oct 2024 12:44:24 -0700 (PDT) MIME-Version: 1.0 References: <20241008223748.555845-1-ziy@nvidia.com> <7ec81ff8-5645-42a1-a048-c8700aff07fa@redhat.com> <9A314663-43F1-49B5-9225-0E326A4DB315@nvidia.com> In-Reply-To: <9A314663-43F1-49B5-9225-0E326A4DB315@nvidia.com> From: Yang Shi Date: Fri, 18 Oct 2024 12:44:13 -0700 Message-ID: Subject: Re: [RFC PATCH 0/1] Buddy allocator like folio split To: Zi Yan Cc: David Hildenbrand , linux-mm@kvack.org, "Matthew Wilcox (Oracle)" , Ryan Roberts , Hugh Dickins , "Kirill A . Shutemov" , Yang Shi , Miaohe Lin , Kefeng Wang , Yu Zhao , John Hubbard , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: b6six3rye7skod3wy7ij5ydy9wemy4sx X-Rspamd-Queue-Id: D045A40014 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1729280660-372328 X-HE-Meta: U2FsdGVkX1+vs3+JzzMmE7tEFtmSA48vMZHi93J52pDRsd3aJYiD5CEBFZA1/0PhdSxYNBAjdQSgmHVEQ05VKV7HKYHMSW4HgqjZBlvz84yNk8uC5e6vgtSFEV4OPib4CUH1cHEBFjojC8aoi+/BNhRIy7vzXeD2Yw0wq7251ICjwPDZFXzLpRzc0lXlOS9pEvAyuVRBMqS2UiuMM00/yw4qv/dY+bKkVMWu9oj7ZzF6OSFv3i9+PT3Ehx4GhVA1qa5Y8z70401EAgj1kKMRE0hBmSWR4+Lo9O49aV7+41aE7fmwbP9cYjpxDM47fpZ92peMa4SqzH3Z+EVaoevpE+NT1Q5cKNwh8lujs4ZJZmHHvYIk1fP+cjMaoKkIDnoB26TJzVnNLH+6a9oxuFIJAEIRvTUV6SfN8c+K2xMtXdvow9To0Sj0ZBntEzQdOrYWHl0kCmKiHERNq6T9sCKnRrn/Fe7XzyAKgVWrnpPDJMi0GrVEW032EKhGE9g2tGZIsvoHkhgdLzsXwrQeEcfGna0Y9ADzxddE2+Zag1E6vQalAZkee6bCW+pTRCOG4rvpavZ5IuF+w+NPeBlsCPzpRCmmwinrBxg7QGIXMnGNPs3LWbd5daTQoJSVPVlBjuuHuvB4mF5QLWXrGscKE8HIIUPUssyK41XYxrQFrOjrlrvZYhRboY13uvMP31xZlsRpKIfrWad+fMBh5DNwdmKVKM4N8qAwiP7ebt4xAKS8Uk4GNu2zbkQrOgjsltmjo6xO7suY8mcMZM3oBMNnRp6mQq2FTXtzkITl/CF3qJia3W4PQIIfLkOWZBNsjlJLGjSknx6uiIhRCINzLQI7fqYl+NfSJYJ7nUWQI5JxfqoAdH/YRG/86CArjakFPqW+W4+hO4TJrV/b4FWIVih4v0BfazZDeRx+suvnQzCY9DgzMMA+H/Ux8IJTDIL3yVLTO73VGKiNw8bIUSp2eWiYh3l yA/efQ6/ lAlNNpLYB70mQ4U6WtWwbUEVsaYsCmmx47b0ASoSHlFs9hm5F9piccg2q6mvHSPZXUzTMWyrc8Y4j3GBW+TLwTAjUVOgR1di2uuSIMjgyrCXLu4kOwqZqHrXlI2ba10nG+YznjvPL9NNbt1j7Y2Yzll+VP9apgWO4xlPCA2zt2BBpferuedpldKlu8+NKSAumx7JvMsD8NDwFrbnw9wc7fsNu0zeDArsK2RvXo+RPsfiJFdMuRTcL+BJk5r1zGwtA3Op+HN6wRn16oE7oP4Ob6grVXU20JQwrSa4XgSw/7Bpf399qF7uVi86/dM6oucrDoFvEZk4xsoMFRTOekQahbW2LAXk25+BsJmO9IeqL26ukrbTdeAvhBMAzX6pnYYRAsVit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Oct 18, 2024 at 12:11=E2=80=AFPM Zi Yan wrote: > > On 18 Oct 2024, at 14:42, David Hildenbrand wrote: > > > On 09.10.24 00:37, Zi Yan wrote: > >> Hi all, > > > > Hi! > > > >> > >> Matthew and I have discussed about a different way of splitting large > >> folios. Instead of split one folio uniformly into the same order small= er > >> ones, doing buddy allocator like split can reduce the total number of > >> resulting folios, the amount of memory needed for multi-index xarray > >> split, and keep more large folios after a split. In addition, both > >> Hugh[1] and Ryan[2] had similar suggestions before. > >> > >> The patch is an initial implementation. It passes simple order-9 to > >> lower order split tests for anonymous folios and pagecache folios. > >> There are still a lot of TODOs to make it upstream. But I would like t= o gather > >> feedbacks before that. > > > > Interesting, but I don't see any actual users besides the debug/test in= terface wired up. > > Right. I am working on it now, since two potential users, anon large foli= os > and truncate, might need more sophisticated implementation to fully take > advantage of this new split. > > For anon large folios, this might be open to debate, if only a subset of > orders are enabled, I assume folio_split() can only split to smaller > folios with the enabled orders. For example, to get one order-0 from > an order-9, and only order-4 (64KB on x86) is enabled, folio_split() > can only split the order-9 to 16 order-0s, 31 order-4s, unless we are > OK with anon large folios with not enabled orders appear in the system. For anon large folios, deferred split may be a problem too. The deferred split is typically used to free the unmapped subpages by, for example, MADV_DONTNEED. But we don't know which subpages are unmapped without reading their _mapcount by iterating every subpages. > > For truncate, the example you give below is an easy one. For cases like > punching from 3rd to 5th order-0 of a order-3, [O0, O0, __, __, __, O0, O= 0, O0], > I am thinking which approach is better: > > 1. two folio_split()s, > 1) split second order-1 from order-3, 2) split order-0 from the second = order-2; > > 2. one folio_split() by making folio_split() to support arbitrary range s= plit, > so two steps in 1 can be done in one shot, which saves unmapping and rema= pping > cost. > > Maybe I should go for 1 first as an easy route, but I still need an algor= ithm > in truncate to figure out the way of calling folio_split()s. > > > > > I assume ftruncate() / fallocate(PUNCH_HOLE) might be good use cases? F= or example, when punching 1M of a 2M folio, we can just leave a 1M folio in= the pagecache. > > Yes, I am trying to make this work. > > > > > Any other obvious users you have in mind? > > Presumably, folio_split() should replace all split_huge*() to reduce tota= l > number of folios after a split. But for swapcache folios, I need to figur= e > out if swap system works well with buddy allocator like splits. > > > > Best Regards, > Yan, Zi