From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36FFEC5478C for ; Tue, 27 Feb 2024 07:22:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 337B84401F6; Tue, 27 Feb 2024 02:22:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 24C034401F0; Tue, 27 Feb 2024 02:22:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 074D94401F6; Tue, 27 Feb 2024 02:22:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E97474401F0 for ; Tue, 27 Feb 2024 02:22:01 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 93E7A160A9A for ; Tue, 27 Feb 2024 07:22:01 +0000 (UTC) X-FDA: 81836739642.23.8230D3F Received: from mail-ua1-f43.google.com (mail-ua1-f43.google.com [209.85.222.43]) by imf21.hostedemail.com (Postfix) with ESMTP id F0B911C0013 for ; Tue, 27 Feb 2024 07:21:59 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DLzAgsBl; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.43 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709018520; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oEUnAGmSQMSq0UVn7yWFVkUhbK0HgR42Qs6BR08T3ek=; b=uspS/jgG8xnGug+6Hd2MRg/NNwcIGqnbSMfPJdQb6pRNnQ/vkrbl7AvW/qaV9K8ULOrA6S 18p0vkjR7iIL9bpCYJTfbgJ3krs8P8aicvGctMjeOcmDnaWUDMj8GzOoJjUbGJ4w+5c98d sjeAQJR3Ro7cyPD4u5HrpQAzTx5azm8= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DLzAgsBl; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.43 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709018520; a=rsa-sha256; cv=none; b=uJMKB0DEhNpxYzT7aTfSKy4Tf7xj1bgYgm+YWlMZidWU3g79hMBP8zUo0m2Gv/JrDBbQCU 6kySkdb6wwYAyXim4QSSCd4DkFimUyiJ/x0d1uhDG0LP67XWKmZNcLf8Q8D810kePQcinB Ql2p8XEjRL5/5gBtCEG6YfvKDWb5Lys= Received: by mail-ua1-f43.google.com with SMTP id a1e0cc1a2514c-7da6e0fc90eso1085746241.1 for ; Mon, 26 Feb 2024 23:21:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709018519; x=1709623319; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=oEUnAGmSQMSq0UVn7yWFVkUhbK0HgR42Qs6BR08T3ek=; b=DLzAgsBlbwburXFru915VsY5WQXob9EjN4Z+UfyDKXcgK6Dcv50GuRjBTYyBCld+70 yloH/lwFJqGXE10pvjAd1O5WjxMepkfCM4r4DxozxKmVLOLXOrO3AMv8FJpOiEMXGwvY XZhyVilNAxM/gD6vlsh7uckrQBjeKzeOfsjzKL4Fkhak/pidPIVJ8ULf0h22+FgiB6oa xKq1B7J0J9RSWZYZBSahAS1s6amFT2ikaEDyLCCt4qoMKea41oB1Sp4QhRlqKSS95Yvd Oxpd5gGwk9jU48pqnGa88EmGbYIU1G7T+s/JvPqhDoaakFAquVhTdydhcToUtUFKBukP H4Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709018519; x=1709623319; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oEUnAGmSQMSq0UVn7yWFVkUhbK0HgR42Qs6BR08T3ek=; b=mk/erue4O+TnvNglpbQu+jRHfbeFy+1gml7fqZyw00ilOEaL9IhavHZ/tBdOXz1L+C BXi82EKhSrQg2PfrVy5kNyWnWSdJwIWcew13Xlvo0olDa+4PzWAdg8kT5xJPTTeTVq5g JhWvzWdegmWwRxvhQ6pykeDvxi85cHzcsGAH2WeTdJfJWIERhZhrmZ1QyDhSiLxftz8i xd5Rv/bZ1neNsPpPQ0PuuUFifK619U09OlkLp+njIEyOFQi+MtnGOSeB6T0sMuxLaXM9 JdifgXp3hfig0RupP+u4UdVDQaqLl6N2HGeuS+oFj5oWILfnX4jWjWBQMK/4IRz1mR6r asOQ== X-Forwarded-Encrypted: i=1; AJvYcCWnxhQr3SlNK5MeCnp8BZ5SsOsPKGFhh4dD2BARmPN9Wcq33LGKzlWgJdEopdAVAOCdu0Qv9O4qfET/C/Gjz9cbSRI= X-Gm-Message-State: AOJu0YxcDq6j5u3ITJupkGgw2u2867iYxkEVgTwAjDurbWjSi1u+G+ra WGgjMyCokJYeUFJ8xKUUczGx4xqeYriMoEHVmdRwHW3OqqIpTkWppPsG2UfipOLkY2A/aAk9XQL VX/d3l+rh5HcX57LJW8NpggcigYI= X-Google-Smtp-Source: AGHT+IFwXnNTIk3orhZfP4qxn3dYyruHvbP6SterPuZJvbV1zio+Eacm/yiWDPX1DUiBi39Sa5QfMAvx65iGxDukW2E= X-Received: by 2002:a1f:dfc4:0:b0:4c8:a2c6:c2be with SMTP id w187-20020a1fdfc4000000b004c8a2c6c2bemr6562061vkg.8.1709018518951; Mon, 26 Feb 2024 23:21:58 -0800 (PST) MIME-Version: 1.0 References: <20240226083714.26187-1-ioworker0@gmail.com> <9bcf5141-7376-441e-bbe3-779956ef28b9@redhat.com> <318be511-06de-423e-8216-af869f27f849@arm.com> <19758162-be5f-4dc4-b316-77b0115d12ce@intel.com> <3c56d7b8-b76d-4210-b431-ee6431775ba7@intel.com> <6ea0020a-8f4b-44d1-a3b2-7c2905d32772@intel.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Tue, 27 Feb 2024 20:21:47 +1300 Message-ID: Subject: Re: [PATCH 1/1] mm/madvise: enhance lazyfreeing with mTHP in madvise_free To: Yin Fengwei Cc: Ryan Roberts , Lance Yang , David Hildenbrand , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.com, minchan@kernel.org, peterx@redhat.com, shy828301@gmail.com, songmuchun@bytedance.com, wangkefeng.wang@huawei.com, zokeefe@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: F0B911C0013 X-Stat-Signature: 74atr4yy34s7m1c8fgb7dddwfyegthbj X-HE-Tag: 1709018519-921500 X-HE-Meta: U2FsdGVkX1/bso/Y/yfdwaErKSX99EoOXJYvk371vIGi/5UVRIz5AzJGFZFbVt99tiUFMDf0OnAviNpb5MpiBYdEomMeYJ8yNv+YX2vS3lYk7+iTZY7rqF9DnMeIF17dAPK8S72PopcQxEar830zettycZcg6zfGGHciTTuwUusZ0EmlL94ruUqYawedqsRNjR9xSnZtF1/DnrAmoZoxmUy7+s8CG/PFdkt0nMuxAKjO5c2s9j3YBrQXURsi5EioHq+HfIg9IL9rUBIbwQ85nuXvdkNpGtodBp7N4HA4dEsrBRs9tACAdeKfsZplcuovkuUqjmdBxGzh8M1yuGXHNjKYnykwNQNL2jcUsLu7R3dued82dPsWkn7WB8eCbq/FjyftwUhpR+rlpQQP3JMuX0xGoZ6vs+MjKWDe19gmULfsVMZvxEoVO3DUhmtPJWPBFUr+Qv3FWocyljYmewMP2htsnX8XTh46mOZ+PP866op2m9dya0uQ82pJ8misFOFECphbgvDcc1dCOsfM7c/FKHw+556jU7KQ26GRKJ+KZsVc2cOwlf3VxNjLRjtJishRq2+K0qAFnQ6kGaqNB5lEK9TuPuIetO2dsNQipz416iI6JKVXVduYn6d7bdSkZn9/h/9dHrdqna0r7oElNDGKmi8uVp6HkxrrDquH3leZCA90vsnDxeU+vSoBNwFdQX8ZaRlQSipDcxnubG1iPKRr9t4Ls4X6JhN4ooetoJ3L8n6hTiyRxpMbVG/heDq/trIBC0VCpYHi2Rt2AhoIIUQe8wamZJp3hBuf99GmDZg2Wl3ONgrtOmjYoN/n9THDIg7X5HBQ9PzG1xFhprMjOm+DqNq4HNDe65GS/AAM6NfqLa24LGIpJHZwuVXbg9iGo50Ddx3s44UUb8NINeXhCneZ0w73XU9OWE9A6zZxQRcaVwDXQHf3tz2Ss2LcT7dd2kzUZCuR9+63mUqtTDa65rc yGuCpcTV Dakz4KgyWwGD2pAnKPYDGMns6xamh/KdyKxUzVkb+foJdrECIf8i8oj31Vc9SRdg12MMxksu50dksUXh80+46Ozg0zg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 27, 2024 at 8:11=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrot= e: > > On Tue, Feb 27, 2024 at 8:02=E2=80=AFPM Yin Fengwei wrote: > > > > > > > > On 2/27/24 14:40, Barry Song wrote: > > > On Tue, Feb 27, 2024 at 7:14=E2=80=AFPM Yin Fengwei wrote: > > >> > > >> > > >> > > >> On 2/27/24 10:17, Barry Song wrote: > > >>>> Like if we hit folio which is partially mapped to the range, don't= split it but > > >>>> just unmap the mapping part from the range. Let page reclaim decid= e whether > > >>>> split the large folio or not (If it's not mapped to any other rang= e,it will be > > >>>> freed as whole large folio. If part of it still mapped to other ra= nge,page reclaim > > >>>> can decide whether to split it or ignore it for current reclaim cy= cle). > > >>> Yes, we can. but we still have to play the ptes check game to avoid= adding > > >>> folios multiple times to reclaim the list. > > >>> > > >>> I don't see too much difference between splitting in madvise and sp= litting > > >>> in vmscan. as our real purpose is avoiding splitting entirely mapp= ed > > >>> large folios. for partial mapped large folios, if we split in madvi= se, then > > >>> we don't need to play the game of skipping folios while iterating P= TEs. > > >>> if we don't split in madvise, we have to make sure the large folio = is only > > >>> added in reclaimed list one time by checking if PTEs belong to the > > >>> previous added folio. > > >> > > >> If the partial mapped large folio is unmapped from the range, the re= lated PTE > > >> become none. How could the folio be added to reclaimed list multiple= times? > > > > > > in case we have 16 PTEs in a large folio. > > > PTE0 present > > > PTE1 present > > > PTE2 present > > > PTE3 none > > > PTE4 present > > > PTE5 none > > > PTE6 present > > > .... > > > the current code is scanning PTE one by one. > > > while scanning PTE0, we have added the folio. then PTE1, PTE2, PTE4, = PTE6... > > No. Before detect the folio is fully mapped to the range, we can't add = folio > > to reclaim list because the partial mapped folio shouldn't be added. We= can > > only scan PTE15 and know it's fully mapped. > > you never know PTE15 is the last one mapping to the large folio, PTE15 ca= n > be mapping to a completely different folio with PTE0. > > > > > So, when scanning PTE0, we will not add folio. Then when hit PTE3, we k= now > > this is a partial mapped large folio. We will unmap it. Then all 16 PTE= s > > become none. > > I don't understand why all 16PTEs become none as we set PTEs to none. > we set PTEs to swap entries till try_to_unmap_one called by vmscan. > > > > > If the large folio is fully mapped, the folio will be added to reclaim = list > > after scan PTE15 and know it's fully mapped. > > our approach is calling pte_batch_pte while meeting the first pte, if > pte_batch_pte =3D 16, > then we add this folio to reclaim_list and skip the left 15 PTEs. Let's compare two different implementation, for partial mapped large folio with 8 PTEs as below, PTE0 present for large folio1 PTE1 present for large folio1 PTE2 present for another folio2 PTE3 present for another folio3 PTE4 present for large folio1 PTE5 present for large folio1 PTE6 present for another folio4 PTE7 present for another folio5 If we don't split in madvise(depend on vmscan to split after adding folio1), we will have to make sure folio1, folio2, folio3, folio4, folio5 are added to reclaim_list by doing a complex game while scanning these 8 PTEs. if we split in madvise, they become: PTE0 present for large folioA - splitted from folio 1 PTE1 present for large folioB - splitted from folio 1 PTE2 present for another folio2 PTE3 present for another folio3 PTE4 present for large folioC - splitted from folio 1 PTE5 present for large folioD - splitted from folio 1 PTE6 present for another folio4 PTE7 present for another folio5 we simply add the above 8 folios into reclaim_list one by one. I would vote for splitting for partial mapped large folio in madvise. Thanks Barry