From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C860C48BF6 for ; Fri, 1 Mar 2024 01:38:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0AD4094000B; Thu, 29 Feb 2024 20:38:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 03497940007; Thu, 29 Feb 2024 20:38:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF16894000B; Thu, 29 Feb 2024 20:38:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C8C54940007 for ; Thu, 29 Feb 2024 20:38:03 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7F45D1C0EEE for ; Fri, 1 Mar 2024 01:38:03 +0000 (UTC) X-FDA: 81846759246.16.6BEA274 Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) by imf26.hostedemail.com (Postfix) with ESMTP id 88C1E140003 for ; Fri, 1 Mar 2024 01:38:01 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=CxwdU78H; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of jthoughton@google.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709257081; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CazZCgBJGneteLAsYOiA83h+fDfHgtbkjyEqisTMSe0=; b=dD5uhFQFJyHegfw0AZvTA4051+WoUxh3WDIdFcM1T42RHzSeaRgU1M1XPm8+phCpXnEkx4 OWP7qfXqfe1OB5ltk5TEkEnHR/b1qjxL0pxys4yCp0DpUrRwgVey/WU5FEeI8APiiOxc5H QDtstPF0OCa/K4VcZi0gphabi+UTYZA= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=CxwdU78H; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of jthoughton@google.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709257081; a=rsa-sha256; cv=none; b=GYThQw2wnlxf7MkUanmHUG8VvApoLBturkPE8t44og+8539EZY66H5/+RsTuHClCstU8DG L+hvS6lPguJaRc1XRDzoDF2ftwJuH1lZPqVtuVvSbM3dFBoiU/0fxsDgCbKMMp+rysRU5j My4wXP4UW8SMgXGnvzYRA9U6SdCmUfA= Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-42ecb40dc07so12421cf.1 for ; Thu, 29 Feb 2024 17:38:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709257080; x=1709861880; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=CazZCgBJGneteLAsYOiA83h+fDfHgtbkjyEqisTMSe0=; b=CxwdU78HS6CsLOFXVzGpR74eBn0flTV6EhK8DvBAO9+VZ0VyM2CwkOcJ3YzsP9iOp5 W+wpBIlrXtVNjPEnCeWo1+ix8IbUAAjn7PgGdJAgbx/gRZuyUI92NLbMcwIRN6Q0MnU4 8a99Dcc8S671U7Xj9TjzzWdfEHyjDeYZCdo4MzA39dRozcMClEFMARROTup//vZCNy29 zUU9errnDO2fM1vAWemx8yNwOU+gRWs8swRkTAm710MQX9XEVNpAMNpVHlipN2Dk0yF6 4Er2UgkUvnqSWctzip2gIU479gKpjFL+W5056aUJh1jpiscUKu6TDmGWPQqFIaMTPWaM WSSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709257080; x=1709861880; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CazZCgBJGneteLAsYOiA83h+fDfHgtbkjyEqisTMSe0=; b=sFDotgdFvfQFKIjyLZR62y2v85eOVOgxhtWeL2nWOFjFVxqpEFqG5qal53M3VQ4xat dYba30LQycoGgk5GH1ohut1Rt/x1p3byBaM88sgt/Gahm7oP6bz/c11Zl/phX3jN4PIj Y4pvnN5WTv8xJDyODX9DZIWCfYR6aSmeDbL6YH+Fz/6aEl1IXiIXz0izuFbFH/Xyf9i7 5OWHsrMotqwu2hVuL06+t1KdKA6eawA2uNIYz08Q0XXU0UaTSGf0tscoekQgKr3G5gyR 2hZdVJbAsge7neQfEtsUEOdB96RSMXluw6zsaF3ohwwemeNQnCWlN8+D2qvOKs+h09mq j5xw== X-Forwarded-Encrypted: i=1; AJvYcCXb1J8BPka473vixakJeyjVwCRaovUP1od8VNNF+IYH9lgeIkuvogwDw/3Nemil8PY20HWHs4QAS7PEwSslp+9a1U0= X-Gm-Message-State: AOJu0Yy5Pj8zNk1igLue7KGtmzitCQZTnQk8PD68zwApcKHAaVD7tPX1 ZB9W1U9FhMrkq3dg3AqxjWxDRXkUG6jKuuA9YszHL6V+U+UnK9/DVkZthDw5bUZmym0Bth7WjEF FG5jSsky02EUUcDnNAzYgcqVuurpr7eiUCctE X-Google-Smtp-Source: AGHT+IG/BBTzPjBosdPu+tcbd6mmvk5/tPeHTulEPb0u354I8AUCE12PSh3VS8gfEj3tW9xSbSY5UZQu3TFYTKDSFRE= X-Received: by 2002:ac8:53cc:0:b0:42e:b8d9:8911 with SMTP id c12-20020ac853cc000000b0042eb8d98911mr52032qtq.1.1709257080583; Thu, 29 Feb 2024 17:38:00 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: James Houghton Date: Thu, 29 Feb 2024 17:37:23 -0800 Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Hugetlb Unifications To: Peter Xu Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Muchun Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 88C1E140003 X-Stat-Signature: uuj1uwy5ixxpk4a8ykzkho3euctogzjz X-Rspam-User: X-HE-Tag: 1709257081-660406 X-HE-Meta: U2FsdGVkX18ZqmXlTi+Ah+zrCpuZprOmxCem18LR3LMuPuuv0vHXVb3NsD4Na7oSHox5seZNaq77bJczEPADzCAlGo2xLTnQ4E1zycygbO6f5R8OyMM52Y6XhijDR1ARxcQtQUNNqJBuziudISWBHYylMgTRtujlCOxuoIOk9+8c26gV+RIbtNwhaFmcm0spsqZKrZv2Z3pEijSNDealW9Cwet/Vfwq68gaD+Fw7HhI8ASGTgQ9nx2wIxVpj6vALR9SL4Zig/lhY84CIvwXEYeqxfA38cQBfpMe0jJdYHYoJRd8y3IkUEPJ60cOfdf3pmcJ3SD4puGgrWO61Nz1SKr7tlw6dO/Y4v8SBnalJX4OWWxWR9ndmifC0C5rufG0Cj5aeWTVh7EE5xLRNTc0iSWoZ+wAZEx4CIBSI36yELk5MySiozW9eCyzDTCn+hdStwa+GWe99Jb66RvXDoAxrzcfgWZOj7Ijk+mn8MNZnI/cvZxNgjekosT9lkAKeooLau7kbVtiqWhMXqSuE+PSM72dRGDszx2tx7Eaz8vuW/SdyvN3ucziuwrpoAE7UoXTI4L5D119i6RbSe+9CdOr3ZLZCWp+d48qE4c7jvbe+ZjJ+CYyrxpiP1x+mSLoLP4k7rFSVzd7Bf4rD/WCuNchYVjUB9I/cMWN5qIztuWHlyQKBXob6lKKpcnJK69jdN5fCiP/25gMtBTO/BUpR0Q23OiJb/A71LYaVBc3oLze/ADzNMwr+pc54IS+0pJQ+2LOqQdeDnpJOK8lYvhUhVqmGJGufi7RtGfeGS/mWt+eOxg7VfRRPTocrHUupUNYYv3Tt74sp8QV6lyCm8/Ydap6sP0HM8rHZYOpzaXnRnY+T0EMmkXOa7SeHoc8SUJgzGfmrSAipZ7p39DsUvzx0DDOKiaUcipnAgNWC2blZ8QFAg+te7kM4KrvN2a6acK3LVcuHjK4kKtpGlK1CSpJOK+S SOt3kU7t +oVrodsVfu1CldADS7hUQlow4OLtcuMHfDj9w0fGZ7995nZ2wW+D2KYsYuygPP4Zxbw4Ny1tw+D4nwcEzh4lLA/UC57m56vhvP+WJ5xnBYFmiOM+ePwzcag907vyZmQj04J2bBkGFFaGbB3yIHiv+E9jDLtQHWJfMhGxS5JANK5yiNe1500keCVdMc/09Y5GikQlaC5FjU3wfwjWsjDLLx+WgNhVhcmkbOM/2GBdl9sC4RT6NeMFn6LuvGnTRZcu/7iQu6dKNGtyI5t1Mi82S5RHJNisnuf0spk6EXxEd0gUacMb1BVZSeMNDmk3lK63aIDWC4F+9+9IjU/w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 22, 2024 at 12:50=E2=80=AFAM Peter Xu wrote= : > > I want to propose a session to discuss how we should unify hugetlb into > core mm. > > Due to legacy reasons, hugetlb has plenty of its own code paths that are > plugged into core mm, causing itself even more special than shmem. While > it is a pretty decent and useful file system, efficient on supporting lar= ge > & statically allocated chunks of memory, it also added maintenance burden > due to having its own specific code paths spread all over the place. Thank you for proposing this topic. HugeTLB is very useful (1G mappings, guaranteed hugepages, saving struct page overhead, shared page tables), but it is special in ways that make it a headache to modify (and making it harder to work on other mm features). I haven't been able to spend much time with HugeTLB since the LSFMM talk last year, so I'm not much of an expert anymore. But I'll give my two cents anyway. > It went into a bit of a mess, and it is messed up enough to become a reas= on > to not accept new major features like what used to be proposed last year = to > map hugetlb pages in smaller sizes [1]. > > We all seem to agree something needs to be done to hugetlb, but it seems > still not as clear on what exactly, then people forgot about it and move > on, until hit it again. The problem didn't yet go away itself even if > nobody asks. > > Is it worthwhile to spend time do such work? Do we really need a fresh n= ew > hugetlb-v2 just to accept new features? What exactly need to be > generalized for hugetlb? Is huge_pte_offset() the culprit, or what else? > To what extent hugetlb is free to accept new features? I think the smaller unification that has been done so far is great (thank you!!), but at some point additional unification will require a pretty heavy lift. Trying to enumerate some possible challenges: What does HugeTLB do differently than main mm? - Page table walking, huge_pte_offset/etc., of course. - "huge_pte" as a concept (type-erased p?d_t), though it shares its type with pte_t. - Completely different page fault path (hugetlbfs doesn't implement vm_ops->{huge_,}fault). - mapcount - Reservation/MAP_NORESERVE - HWPoison handling - Synchronization (hugetlb_fault_mutex_table, VMA lock for PMD sharing) - more... What does HugeTLB do that main mm doesn't do? - It keeps pools of hugepages that cannot be used for anything else. - It has PMD sharing (which can hopefully be replaced with mshare()) - It has HVO (which can hopefully be dropped in a memdesc world) - more...? Page table sharing and HVO are both important, but they're not fundamental to HugeTLB, so it's not impossible to make progress on drastic cleanup without them. No matter what, we'll need to add (more) PUD support into the main mm, so we could start with that, though it won't be easy. Then we would need at least... (1) ...a filesystem that implements huge_fault for PUDs It's not inconceivable to add support for this in shmem (where 1G pages are allocated -- perhaps ahead of time -- with CMA, maybe?). This could be done in hugetlbfs, but then you'd have to make sure that the huge_fault implementation stays compatible with everything else in hugetlb/hugetlbfs, perhaps making incremental progress difficult. Or you could create hugetlbfs-v2. I'm honestly not sure which of these is the least difficult -- probably the shmem route? (2) ...a mapcount (+refcount) system that works for PUD mappings. This discussion has progressed a lot since I last thought about it; I'll let the experts figure this one out[1]. Anyway, I'm oversimplifying things, and it's been a while since I've thought hard about this, so please take this all with a grain of salt. The main motivating use-case for HGM (to allow for post-copy live migration of HugeTLB-1G-backed VMs with userfaultfd) can be solved in other ways[2]. > The goal of such a session is trying to make it clearer on answering abov= e > questions. I hope we can land on a clear answer this year. :) - James [1]: https://lore.kernel.org/linux-mm/049e4674-44b6-4675-b53b-62e11481a7ce@= redhat.com/ [2]: https://lore.kernel.org/kvm/CALzav=3Dd23P5uE=3DoYqMpjFohvn0CASMJxXB_XE= OEi-jtqWcFTDA@mail.gmail.com/