From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02590C47422 for ; Sun, 21 Jan 2024 23:32:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3EEE98D0001; Sun, 21 Jan 2024 18:32:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 39F006B0075; Sun, 21 Jan 2024 18:32:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 23FBC8D0001; Sun, 21 Jan 2024 18:32:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0DB126B0074 for ; Sun, 21 Jan 2024 18:32:28 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6DFEA1203C8 for ; Sun, 21 Jan 2024 23:32:27 +0000 (UTC) X-FDA: 81704919534.13.DB2E95B Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) by imf17.hostedemail.com (Postfix) with ESMTP id 930664000D for ; Sun, 21 Jan 2024 23:32:25 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=H8hETCfG; spf=pass (imf17.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705879945; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tjAyR5yAuxqP2MXzhvI5gNl4zdn4tEXP6bBd21ORa3E=; b=m5eS/Y63a5ujrBamhNOp9K+zO8QQIUmtIZ3NR5GBqec31hteeDgqOax4mWkJOn8JS7KhlT gtmjBFtJ/SWKYZAd/Th0t8PXkOvVWnvOdgX+p6zKfYQN7Mjdh1O71/SKoGNe1gzQc9Rtgx e9Q4EadoDU+N7aP44hoFKf+D7Zhv1Bc= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=H8hETCfG; spf=pass (imf17.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.174 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705879945; a=rsa-sha256; cv=none; b=5kWzv17Rt2+jIo9H/bdVwmgizHxVcWyZ14Garv5UCSHqnu4nLZTNAZ3i2Mq5H9nEkOnLOK T/R25E0KBs2ME8nHNwkzLN26w75j0qKiZ+JfZIypJO6yYgkxHUt6HtteF+mm3i3TkWV3VC 5ttgtnFHbtCjjN+XpAzvubOcLd0uDj4= Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-429ca07044eso24065731cf.3 for ; Sun, 21 Jan 2024 15:32:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1705879944; x=1706484744; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tjAyR5yAuxqP2MXzhvI5gNl4zdn4tEXP6bBd21ORa3E=; b=H8hETCfGIJLjLLYMcAe87G/f2rplOXysFcqXucfxQ+byUzfT8fgRgAW6BzBH5RGWSn YxL/8v62OmrW9tElHdyWFb87tcbQv8KbGf/o0THfulk5NagKN5+uD9t0gCzKCJ78Kupd RWpTmFEdNSe/MfNsSceeHR43UHZ2SQ6NgMmue2+4Kje09g3c+0GJTWrJpTVmvsBgkIV/ wU4kGDLtaoaSo+am8Fbc3/m/vENn3SpcZuqrdTvbld/L/7UENRW0slTua/a2umFm5nwK HHOKhaewwG/5Djv1V+ZR0VmRUVzJIInbOuRMsqAYcArQvE1etvf329k3IKlMKWIiSzM7 evow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705879944; x=1706484744; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tjAyR5yAuxqP2MXzhvI5gNl4zdn4tEXP6bBd21ORa3E=; b=fJZuTFVezu51DU7MQH7mrmUguyBLXQLVtCrRSyx95nhx2rDdL0VsjxFYw8x7JZK6Uy R5Ri5eg4K1T0Myd6BNKMupVEzlHEiIfldkrhgZImzSQ8tTasu2Ll+iy5+lVql7KodZ3Z uDO7jvR+JVVTe1JntOqZ5Z/+Isx+rbPA3MSfaymoFBT8UaCiW1G8yHpcQFfZ75W2ZHRs 5u1wi8vwv/qlKTo9+4FwC/po74qrU88jvJ72HNx6zCbm5DQQvEcM95++Cevawz/RpuHL nbuFLO/I5LR9m1F8L+3o2vJ/xP+4W9aaUIrkm4KKnrupwTskYCyL4A5jGjLaSz7KfeXc trrA== X-Gm-Message-State: AOJu0Yw9yu0fwhoNZ+7dzz0d0TFVP4Vh9V2um8wFqxSd5D1662czThS3 i/JYgGTRk5k0DYb95r68i/QR0ykMIZiq7zLTRWxzhpxucdn9kohtAF6X/6WwRJ/ENMUxi2YBnTi Smw5s6pz+jtp9JiJIFZuiBvHCC6LJbCblUHI49g== X-Google-Smtp-Source: AGHT+IHivVz4AXDlztP5Ti2xRZhD4G1//TCleyRPSv0+jm2GNZ2s0PiypPLZvpl/q0OkhQIdBOHj30bfFDYrRraUsOY= X-Received: by 2002:a05:622a:b:b0:42a:3274:c523 with SMTP id x11-20020a05622a000b00b0042a3274c523mr4459943qtw.137.1705879944558; Sun, 21 Jan 2024 15:32:24 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Pasha Tatashin Date: Sun, 21 Jan 2024 18:31:48 -0500 Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] State Of The Page To: Matthew Wilcox Cc: David Rientjes , Pasha Tatashin , Sourav Panda , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org, linux-nvme@lists.infradead.org, bpf@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 930664000D X-Rspam-User: X-Stat-Signature: hc8zfjrwpw9rn6rn4xs7zoi38ffhc5oa X-Rspamd-Server: rspam01 X-HE-Tag: 1705879945-576530 X-HE-Meta: U2FsdGVkX1+PGuUVoVbMqb/HSUUOTGyZZnWwJMK3y84M7FuNH6KRf7WL9sYLtrc2c7dgHMvFvlMwFR+RMJDO3JroNEGayXs3okBkAiDxk395K1fHdN2CxwoyWUwFuxWS9mYGeBa3HNDN1eNwuVXYeDb5Ik2pSP69PvbEZXtdCjaQNwoM1VF41uxMHzHxT6e1fjvVV0C91UL+346XI801Pm8KPIiGl9jTyg5dYuhYaCgCXAM0lTyAidfqCfo3/xjBPNJQSm6+pIdBLluRhormMvtbA+SnCYeAakWcJV7RMCK17oLkDKgl3lHZ3UXwlnrutFi5YXq9CNJcxVdKM9i0QsBr+rYixEBfCsqGpXa5lIgTfPMlvfUBqKHImC3OG97axpZC9j+mESqn0VRw4cDbNwq7z9DS/Tv37ZAf6Q2sZSVS6R31nC7v5aqFnTppmrzP91iqQjDfbj6/xXsZqOine/7km/wI5xxhvN0OocnJFxeGwuMZ/0R0YtyqlhyExlwDqWzncn7oiSWDIqUm9QeOqGAzwzg70DB+EblzcInCsF/Y0Jy18mVr6NXIR1MKbZRYZd5NgqEhfnCAKGi89k+thvgVp1iyflf1IU8sflqK7Cgo8eG3Zav01WvsXm+xFyGrL6ulFmwRhoUm/grH/jNIK0PUtbufeYGKqy/OfsqVJEdCAL3IxDyyoHiHrtv/UoQEodIQNQ+EmIXjYESj1SV8aY+Cw5aFbsG0ad8Sa0Tu2n9OxewV98mC+GbBmqYz0lFfGCt24uwOSK7S+VOt5ZI4Dg6LEWLMfoBMvnlqjIja5/RRsGyXVQIkS8GC+W2uLy2LsF/kaqepqEfhxjM89epBLXhl1RKGPu8Y+USpBjOYE8YzbR2Es0NCTgKCoV2d4w3F9nYjZoNZR9b/qoTlZF8pquS+CrDkj4Lf5EMt0HxCdCfAMyUXKAWcP3RACRE3O5BUcdtt+wXjdjxsrzsP5rD jWV0Otsy BRH+QCvpjayE4GwlVECYz9nHLsOmnfwfTnvDQN19q8pZOfgSEPOpQ5uTfu7pEcMlQtg4+zHhwHl3S+B5UIUK7iLVYEghgYaMUg6cC3AAfFCGUO5uQ7rhfnge7BbNFsxzpCuHeu+IX3Is/nSb3IGIXDyuOMglqJhkW8vk4jriTJkkCQZ3zkiSB56nOm6GLlIKkTDWk3x1MxI+fvVkb7aDDsp7idZbiI2Qfw1d4L8wDoHF0EGZpfOf3VwqrmjWkBFsk6UEUYgrBsgTq4RM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Matthew, Thank you for proposing this topic. I would also like to be part of this discussion at LSF/MM specifically because of the memory efficiency opportunities coming of memdescs. On Sun, Jan 21, 2024 at 6:14=E2=80=AFPM Matthew Wilcox wrote: > > On Sun, Jan 21, 2024 at 01:00:40PM -0800, David Rientjes wrote: > > On Fri, 19 Jan 2024, Matthew Wilcox wrote: > > > It's probably worth doing another roundup of where we are on our jour= ney > > > to separating folios, slabs, pages, etc. Something suitable for peop= le > > > who aren't MM experts, and don't care about the details of how page > > > allocation works. I can talk for hours about whatever people want to > > > hear about but some ideas from me: > > > > > > - Overview of how the conversion is going > > > - Convenience functions for filesystem writers > > > - What's next? > > > - What's the difference between &folio->page and page_folio(folio, 0= )? > > > - What are we going to do about bio_vecs? > > > - How does all of this work with kmap()? > > > > > > I'm sure people would like to suggest other questions they have that > > > aren't adequately answered already and might be of interest to a wide= r > > > audience. > > > > > > > Thanks for proposing this again, Matthew, I'd definitely like to be > > involved in the discussion as I think a couple of my colleagues, cc'd, > > would has well. Memory efficiency is a top priority for 2024 and, thus= , > > getting on a pathway toward reducing the overhead of struct page is ver= y > > important for our hosts that are not using large amounts of 1GB hugetlb= . > > > > I've seen your other thread regarding how the page allocator can be > > enlightened for memdesc, so I'm hoping that can either be covered in th= is > > topic or a separate topic. > > I'd like to keep this topic relevant to as many people as possible. > I can add a proposal for a topic on both the PCP and Buddy allocators > (I have a series of Thoughts on how the PCP allocator works in a memdesc > world that I haven't written down & sent out yet). Interesting, given that pcp are mostly allocated by kmalloc and use vmalloc for large allocations, how memdesc can be different for them compared to regular kmalloc allocations given that they are sub-page? > Or we can cover the page allocators in your biweekly meetings. Maybe bot= h > since not everybody can attend either the phone call or the conference. > > > Especially important for us would be the division of work so that we ca= n > > parallelize development as much as possible for things like memdesc. I= f > > there are any areas that just haven't been investigated yet but we *kno= w* > > we'll need to address to get to the new world of memdesc, I think we'd > > love to discuss that. > > Thee's so much work to be done! And it's mostly parallelisable and almos= t > trivial. It's just largely on the filesystem-page cache interaction, so > it's not terribly interesting. See, for example, the ext2, ext4, gfs2, > nilfs2, ufs and ubifs patchsets I've done over the past few releases. > I have about half of an ntfs3 patchset ready to send. > There's a bunch of work to be done in DRM to switch from pages to folios > due to their use of shmem. You can also grep for 'page->mapping' (becaus= e > fortunately we aren't too imaginative when it comes to naming variables) > and find 270 places that need to be changed. Some are comments, but > those still need to be updated! > > Anything using lock_page(), get_page(), set_page_dirty(), using > &folio->page, any of the functions in mm/folio-compat.c needs auditing. > We can make the first three of those work, but they're good indicators > that the code needs to be looked at. > > There is some interesting work to be done, and one of the things I'm > thinking hard about right now is how we're doing folio conversions > that make sense with today's code, and stop making sense when we get > to memdescs. That doesn't apply to anything interacting with the page > cache (because those are folios now and in the future), but it does apply > to one spot in ext4 where it allocates memory from slab and attaches a > buffer_head to it ... There are many more drivers that would need the conversion. For example, IOMMU page tables can occupy gigabytes of space, have different implementations for AMD, X86, and several ARMs. Conversion to memdesc and unifying the IO page table management implementation for these platforms would be beneficial.