From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6336FCD1293 for ; Sat, 31 Aug 2024 10:21:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EACF06B0089; Sat, 31 Aug 2024 06:21:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E5CD56B019E; Sat, 31 Aug 2024 06:21:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD6466B008A; Sat, 31 Aug 2024 06:21:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id AC2516B0088 for ; Sat, 31 Aug 2024 06:21:55 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 53471A9D74 for ; Sat, 31 Aug 2024 10:21:55 +0000 (UTC) X-FDA: 82512149790.05.433969E Received: from mail-vk1-f181.google.com (mail-vk1-f181.google.com [209.85.221.181]) by imf03.hostedemail.com (Postfix) with ESMTP id 8A48E2001D for ; Sat, 31 Aug 2024 10:21:53 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hR3dbcWv; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.181 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725099592; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FXxWR/GrNO12m4u8BQtgmx2P5GzluWWtjT+p2Yuyknc=; b=hYc+uvx0xJo8vBwBIp0djeDXPva63g52TH2lfY2VNUFJgp6KjDRIA40ZrJqMWGABoPe5uP mds0TzvZmi31xLE8m5yJ30raYTnA1yooL5OH9axbaqMTvvBocV5NYWQ4HC+sIRp69jNnNa rIh/agw9A+DF2maeVCA9bDKyfR1DqAc= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hR3dbcWv; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.181 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725099592; a=rsa-sha256; cv=none; b=XYR4dokCIl+TuaFJ9ePtxwa66ZeMePJ12b+jed0j1hgKNivxQemeGQlt8G2xk23DSVJa2U btckaxKwq4lNdfvDMXnagsR8Jj+/1zZWcAVywz3IjH3Xji6OE3fi1Zd2u7e7sG+oVgLoNs 6sEy6JqPtbKo158ZOSePUx6q/BtVPh8= Received: by mail-vk1-f181.google.com with SMTP id 71dfb90a1353d-4fcefbd6bc4so1054883e0c.1 for ; Sat, 31 Aug 2024 03:21:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725099712; x=1725704512; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FXxWR/GrNO12m4u8BQtgmx2P5GzluWWtjT+p2Yuyknc=; b=hR3dbcWvsmYGKaPTpMEu3Fi/XjtygP2uYVdtTWWUxIfyfy5pRWXqtDJvi0l9ZRNRQV 0HYKLKDuDb5GnZEMh0jpdvarF+MQAWsoAA6AmO/p8m9VLi4f8PDVub/pij0wjAhd904k ZbL7wkEo6ZQ2s7/YUAL4s4iA5T7TZgqSaRYqFvMVF9qaYtZyqpGupSZdEu2qsDkuvZBC Movw/KpB5+L+CXGBVr0FxK3fnXbNQSJXZH5gV0SoqOW0mQw/WUtU4dg1q/I2P5hNwzdY BG9yvjgMLjMclByJx6Dut4pl8Tw2Dqhz66FQgwkNCZUU4f1sQLttlzVm5j3JEqz5dKum QAgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725099712; x=1725704512; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FXxWR/GrNO12m4u8BQtgmx2P5GzluWWtjT+p2Yuyknc=; b=WNrUkynsOy3/hSGXJ2G493FR72XYSaBNRgZmQ2xmNSStO6KZARgEHPijAn6o3j5fqF /3jyVqd4/Si1J0VG9QuTkFgiHRPW30vYnxk520V+MKtWfKzfA0eT+h3JlDiXt9pySJWB ykui1MtR4Yc2clZX32XTHGouxkYNNISr3qns5i26lLbe+ddLDxyxzYG11otA7XLzx3Q8 BKqrW/8p1p5Ub7MfM1wyA6u0YtqrsuOPIcTHClPrfdz2Lcd99lKX63G6opsdPbx+9nYg CvR6SGMLctQEBjDJkD+GcmrGGo5JOIY6MdtuOqyCY5uBHX1I59ATBSgU6T1VmORI4a2X XVIQ== X-Forwarded-Encrypted: i=1; AJvYcCUrjbu4wE2PJCy9kSqZuviFgk5zLKAGrDB4YPUCTGZFvPbLjn0TUCRVMTvQiYnHx1fSkptA8+aP7A==@kvack.org X-Gm-Message-State: AOJu0YwTpDYH5ck1iHhs+Q0keXM03641k7WoHTa7zBWTQfUrjRJHrmh/ LAbnIA4kVlLkpCl+RVWHOzmSIlr3LHqigih0gPKg0/9PWeMu66Qt8AjW192/vX5FVnPSd+2a9wu yOAXwF17uR9X5lJY507H6x7ShLzs= X-Google-Smtp-Source: AGHT+IEGalUU1aCVflGtCG2F8YL4OP/GuyfWpZ+ewR17Wef/R0PoUs/Og4VgJ+3Mx2PKqMweVYYEbyen4BddrhvFfSY= X-Received: by 2002:a05:6122:221b:b0:4fc:e4f5:7f83 with SMTP id 71dfb90a1353d-5009b1305bfmr2186616e0c.9.1725099712469; Sat, 31 Aug 2024 03:21:52 -0700 (PDT) MIME-Version: 1.0 References: <20240831092339.66085-1-21cnbao@gmail.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Sat, 31 Aug 2024 22:21:41 +1200 Message-ID: Subject: Re: [PATCH RFC] mm: entirely reuse the whole anon mTHP in do_wp_page To: David Hildenbrand Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Chuanhua Han , Baolin Wang , Ryan Roberts , Zi Yan , Chris Li , Kairui Song , Kalesh Singh , Suren Baghdasaryan Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 8A48E2001D X-Stat-Signature: sbtrb3pwzu684eic6qbaoima3dadwy11 X-Rspam-User: X-HE-Tag: 1725099713-15651 X-HE-Meta: U2FsdGVkX1/181LUue8RnDAxu0Ig+L3WCZW7WMTsnmHy4iJ+uO6wI7oc3YyyU/TXj0W4bblhXEndGRBTFDWm+vMOjQtXbfetREyMELQ0tH3n+KuprdLcWkCrCEV8vXOAe3+DLamsYmokbe6ph0S9Mxiclx82PS8eZm5HBJ/bgWUI2r9wZwQbZv3e7JaEQkNVtOIayN+hpBOOn1kHGxPD/vz2SN19YqdzM+TSZi+/QtiVBeGdvJz5VOmLGyOTXTYF8xliclmIaXnUHD0c8yTaKy4UCQDpfSfLQEHVVNqC9h1Fmle91tOMbnEfUg+zZHnqfkflVd6jjEqCX6RK7q3Y5uaC6wmhi2fizLMNd1BW51/k1jb6GBW56+iJ4lf9lckm3iuRHyQHPR1+vmFVdzP4x8u2HpL2pjMcuDhtMimQ/2C6p4UBpGFv1YnuUEgByVdLWIMfdnKXOg3WSKx+2d64qQ+axhjD5z7aRiOiPrmzsIhhg7EeqAem5ZlMju/gwd18p+AxDBPF20WiT4+Gvd5pWZ4IezqavepdQYWH4v0rle7NLNHPzbpNY8IrG8UtjZXw5GKmvToIp3RMDjdCJjnj6av6Hgaxrw0IMq0uxGdfJKAV22zkwA8k/QMUZORXToaVJ8xPOSesHEId83oEyQtdxgpB1cX2GNYVi6FH08MYzQIXTM3uvykqsUsNu7kW9puYaRFiaprZmI5zMQcdTH9JWmX73rzOMe/wahhDdfeiYcLEFO9cNvRQnlzfP13fj87BxuwW3MylFGDXE51yqsL7QWDSKPwQTiFo/8S9U8VJKAyKKqkcaQbu8/ys8iK4MRUNFXJO/LOBE+fPFJnMPmOeoBdXaAXmVkFr8hGLsWfylgMRsasfXMLGXn77yzB0cYijdTyhQ9YCqT5pKdo56siahyzpZFKQZYJxZ0aGM1ecwE73/XIsfwHqVRFisIuhmAWP103VH/lxJ0IGR3oGwFH uvSC08Sn pmxsILbgj1S3Q/Xz9RrD0MoSlZb1/NS89Z6PUm8QMYZmdT21mkmlP1Bg09pY2tEKDhOM08kY0WLS/zz3A4y+I9wRgDbKzZq6j1xaGoAbzWgi9yrAfjM1zNxdpoD1mbFh0xXgDULYwoS/1EHcIP39soI37BRUoiCi2wXd4AxtXiRa1DijP5SLkKBoxdKDL2op0SCl9KY9fcur1RtLl1bdf3VxX+WRWQCVDTMllAFc+SZmlHROvNtIKKV//KEw4rcGexNgO4jhzQ/xjbvCeHvetxVhb+mGLNOxGcF+p9+pz/AfHvpU/vf2bai//broZojw/mE/qX4kqc5NykHaoFwI1pNFxbNp5rbjbSZ0RwCNMyZwE4tAJEQ0MhNJUrtVyq4tHmv4ZXYDtROFAyZr5EEPLd01dlPK33lAJenPLQSTF27akTO55P8f8Ja9twg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000048, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Aug 31, 2024 at 10:07=E2=80=AFPM David Hildenbrand wrote: > > On 31.08.24 11:55, Barry Song wrote: > > On Sat, Aug 31, 2024 at 9:44=E2=80=AFPM David Hildenbrand wrote: > >> > >> On 31.08.24 11:23, Barry Song wrote: > >>> From: Barry Song > >>> > >>> On a physical phone, it's sometimes observed that deferred_split > >>> mTHPs account for over 15% of the total mTHPs. Profiling by Chuanhua > >>> indicates that the majority of these originate from the typical fork > >>> scenario. > >>> When the child process either execs or exits, the parent process shou= ld > >>> ideally be able to reuse the entire mTHP. However, the current kernel > >>> lacks this capability and instead places the mTHP into split_deferred= , > >>> performing a CoW (Copy-on-Write) on just a single subpage of the mTHP= . > >>> > >>> main() > >>> { > >>> #define SIZE 1024 * 1024UL > >>> void *p =3D malloc(SIZE); > >>> memset(p, 0x11, SIZE); > >>> if (fork() =3D=3D 0) > >>> exec(....); > >>> /* > >>> * this will trigger cow one subpage from > >>> * mTHP and put mTHP into split_deferred > >>> * list > >>> */ > >>> *(int *)(p + 10) =3D 10; > >>> printf("done\n"); > >>> while(1); > >>> } > >>> > >>> This leads to two significant issues: > >>> > >>> * Memory Waste: Before the mTHP is fully split by the shrinker, > >>> it wastes memory. In extreme cases, such as with a 64KB mTHP, > >>> the memory usage could be 64KB + 60KB until the last subpage > >>> is written, at which point the mTHP is freed. > >>> > >>> * Fragmentation and Performance Loss: It destroys large folios > >>> (negating the performance benefits of CONT-PTE) and fragments memory. > >>> > >>> To address this, we should aim to reuse the entire mTHP in such cases= . > >>> > >>> Hi David, > >>> > >>> I=E2=80=99ve renamed wp_page_reuse() to wp_folio_reuse() and added an > >>> entirely_reuse argument because I=E2=80=99m not sure if there are sti= ll cases > >>> where we reuse a subpage within an mTHP. For now, I=E2=80=99m setting > >>> entirely_reuse to true only for the newly supported case, while all > >>> other cases still get false. Please let me know if this is incorrect= =E2=80=94if > >>> we don=E2=80=99t reuse subpages at all, we could remove the argument. > >> > >> See [1] I sent out this week, that is able to reuse even without > >> scanning page tables. If we find the the folio is exclusive we could t= ry > >> processing surrounding PTEs that map the same folio. > >> > >> [1] https://lkml.kernel.org/r/20240829165627.2256514-1-david@redhat.co= m > > > > Great! It looks like I missed your patch again. Since you've implemente= d this > > in a better way, I=E2=80=99d prefer to use your patchset. > > I wouldn't say better, just more universally. And while taking care of > properly sync'ing the mapcount vs. refcount :P > > > > > I=E2=80=99m curious about how you're handling ptep_set_access_flags_nr(= ) or similar > > things because I couldn=E2=80=99t find the related code in your patch 1= 0/17: > > > > [PATCH v1 10/17] mm: COW reuse support for PTE-mapped THP with CONFIG_M= M_ID > > > > Am I missing something? > > The idea is to keep individual write faults as fast as possible. So the > patch set keeps it simple and only reuses a single PTE at a time, > setting that one PAE and mapping it writable. I got your point, thanks! as anyway the mTHP has been exclusive, so the following nr-1 minor page faults will set their particular PTE to writable one by one. > > As the patch states, it might be reasonable to optimize some cases, > maybe also only on some architectures. For example to fault-around and > map the other ones writable as well. It might not always be desirable > though, especially not for larger folios. as anyway, the mTHP has been entirely exclusive, setting all PTEs directly to writable should help reduce nr - 1 minor page faults and ideally help reduce CONTPTE unfold and fold? What is the downside to doing that? I also don't think mapping them all together will waste memory? > > -- > Cheers, > > David / dhildenb > Thanks Barry