From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA201C54E49 for ; Tue, 27 Feb 2024 07:11:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8163F4401F1; Tue, 27 Feb 2024 02:11:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7C5A44401F0; Tue, 27 Feb 2024 02:11:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68D2E4401F1; Tue, 27 Feb 2024 02:11:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 55BA14401F0 for ; Tue, 27 Feb 2024 02:11:21 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2E2F8160AAF for ; Tue, 27 Feb 2024 07:11:21 +0000 (UTC) X-FDA: 81836712762.01.F25B82F Received: from mail-ua1-f45.google.com (mail-ua1-f45.google.com [209.85.222.45]) by imf27.hostedemail.com (Postfix) with ESMTP id 6AD094000A for ; Tue, 27 Feb 2024 07:11:19 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aMJEJWpF; spf=pass (imf27.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709017879; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fYQ3hdhddFNYK8GyMhkyIKxBGg92Et2d0tbp94QuLyE=; b=Tp5eMyRproN42e7Et4uvLtX6x0vrZyQ/4iBWH1gaMPlvDDPxU0nWavOb0phHMaRPhcjHm4 99vPPT3vNROCdVtio1IKPGIjlTxorFYpWAp6cnrrwiv+87WsGQsk9m3NJDP6Bvs2uiPANj uUHNxG3Zv5MKuiIuNQ+R02v5DzP9GRY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709017879; a=rsa-sha256; cv=none; b=eDezw0YPpmRL5mPjJAdJfjefkk5pMSiCA0IhYARFUOUZ+mswiz7ZwRsnGRj+o0Lp10DQEi MsSQoLfMxCKFUfEG9wQ+dh2S78zid5QAhEAedHvXieyoBiSRTydqBq6jpqhmOJrgW/Wpxp 8epvJjLyIouk5XfjtAIxR3TQtI4Z3tc= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aMJEJWpF; spf=pass (imf27.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ua1-f45.google.com with SMTP id a1e0cc1a2514c-7d5c40f874aso1991639241.1 for ; Mon, 26 Feb 2024 23:11:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709017878; x=1709622678; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fYQ3hdhddFNYK8GyMhkyIKxBGg92Et2d0tbp94QuLyE=; b=aMJEJWpFJ8Mp9fKV3pA+iTmPf59r2Ug4HX8HsvnRokuPfb0kPPWdUP0cNiCgADaUT5 wZ5OasS6A3akb4fmAYJQ5aB3Ua1iHPG+GA+IhJ6gVsQF1p6x4tRTxkpzgkt/YQ6qEpZn pHpX6BRQwGiRn4+9zGvZ2hvQ996MVN69HVDsAUEuEWSLs65ytZXn/2xBu/H6nKemJ66u 4PsEl5Lxa907AxcuJsRCE/eLmxkWz2V22B/znZSYn4j9i/cnTG2jtigJzSo19sYlWABR hHdNU3TpOGV+66m11cbKuc+uZeeQc1vroBVigJH6fwdYRBmFEYL2E4E/0B5N7yBNzRDp BwEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709017878; x=1709622678; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fYQ3hdhddFNYK8GyMhkyIKxBGg92Et2d0tbp94QuLyE=; b=US3WRQsrXDueODtRZP5jCf0L5/dosK33oS1z7rRzJJ8+/HG3JIOU3Qdcn/RhF1JUwj 76wOqkeH8o9eaD42NUhRDeCpNuoDbPi/fioAiAq65RxZk7RM8VFpigvLBcVe4V07nbC1 eRoWaFD0DCO45IPlokkfyHBPpI4spG5scjlvWdEoOIa9bpY4+mBkNrEqWCMip3Q4+JWe sByfCSpxFy2iwhYA8n7Uxuui34xNfKF+sqVvBQn6anvQPzvzoibfRAeRFQ1+8gFkQ9DQ 8ApIVfd7O0IIp5OyHFmDgJNJAQlDAscIk75SuUSiLqdxoGaP7qxuOIXjV4hzcrBDFv8I vrfw== X-Forwarded-Encrypted: i=1; AJvYcCXC8d2J2P2Ff7YQihVpkVmo/T5vRZOvjpF29Bm8MwiHjPHpq3NxamUM+QuE43Xu4OGdAWcIjQ48WOj8BH545lFNRgc= X-Gm-Message-State: AOJu0YxIMeFYnud2deni0Gt9Nn2ygNXsOOxRQyxT7QfzcKJ7Pzq5FJJY ShTnF8IHfHHSeOipI/UEKakn9knjaxF2YpqHRdJjZ7LPYs6Vytl9xO6T8yY3Krm9zRUtKAI8TJ4 gk2qOTDqhqe66WJZDm0eOUB7r4Bc= X-Google-Smtp-Source: AGHT+IFS8wpr7oLSEHmvMDX9Of/jUOUizc5A3MTc0dl/69EcjCWv8LO+PlGDRRyuypngziSy52encCDWhibD4Yzse5s= X-Received: by 2002:a1f:4e44:0:b0:4c0:1cc8:8821 with SMTP id c65-20020a1f4e44000000b004c01cc88821mr5585018vkb.9.1709017878540; Mon, 26 Feb 2024 23:11:18 -0800 (PST) MIME-Version: 1.0 References: <20240226083714.26187-1-ioworker0@gmail.com> <9bcf5141-7376-441e-bbe3-779956ef28b9@redhat.com> <318be511-06de-423e-8216-af869f27f849@arm.com> <19758162-be5f-4dc4-b316-77b0115d12ce@intel.com> <3c56d7b8-b76d-4210-b431-ee6431775ba7@intel.com> <6ea0020a-8f4b-44d1-a3b2-7c2905d32772@intel.com> In-Reply-To: <6ea0020a-8f4b-44d1-a3b2-7c2905d32772@intel.com> From: Barry Song <21cnbao@gmail.com> Date: Tue, 27 Feb 2024 20:11:07 +1300 Message-ID: Subject: Re: [PATCH 1/1] mm/madvise: enhance lazyfreeing with mTHP in madvise_free To: Yin Fengwei Cc: Ryan Roberts , Lance Yang , David Hildenbrand , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.com, minchan@kernel.org, peterx@redhat.com, shy828301@gmail.com, songmuchun@bytedance.com, wangkefeng.wang@huawei.com, zokeefe@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 6AD094000A X-Rspam-User: X-Stat-Signature: d5dsamaczde7mkp63rdhzc5bc4f3f4so X-Rspamd-Server: rspam03 X-HE-Tag: 1709017879-526579 X-HE-Meta: U2FsdGVkX1/l2jzd/lQFjMwYS1/mJdr7dWEmyi3duR9jIatkbHE3r7h7ltzQH4WMIX+lkHyXhTeswgtLCCAi7blQNeLWMcfdPH9zI0BaicZSUERI3lNbKhpG9+uNuDHZozaWpSL2r8dmVac5qg971aNB1z1+e3bo9F2931t6IK2LN64oGZLnvIBRuxtnq9cnHMUpePV8O4T8b2iGcqxCpiFbZDSXTNSZMZN5zUpbRWa2DBdU1XLyf4leFBxO0RKOQX4kmEqtIIL0KbPTZODefpsTkGrO2rmasYnipBgAAJAmzNa5Y/HQXsGe6Olz1UtgwoIYvXsBn8xu2Irua/LqriOvDYMJD+Z0EDNoua8PxWBHXURTj1Hz49tjdHWbRsr46AgOBkSDNElkmnA6b0vB+KNdUmTKqrChWx1L/xpK8pc14tn0TUVoFgflso1p11lA7YKQibmdbGi520x31zYvoAk4QvB1nEQySfhIq5L//LsU5RmjXKD0LeSKjybJfY4gGPJIMPrCtLEtBmR1mad2bgogNFcEI9AxKgaHmmNE0szizFkUwYpcXlB2lpPErDERK6XBhoHCqYRJycHXVYy/vLbwf1948VAx+F6/asNE0j5zzucVd2NOIGw+yM8RkbqAhEy59nXoluSdGLhwNLfImcmSFAFVqvfqD1Ougt9SEh5UZUGLS+UOsomS6avPZA163HUrGpH0NSRBXL84NSsY8cRiugyqHOYpFty0FlpflLJZlAit7/dQ6I9GRIvafiXN7rvpp/g5r0d45HFV3LzGQmhxZP/QwO7KQfUQC8fnC6oAZKIoNYRMPky1G575KT8sNP8bILkiqXCmm2uXIrl0vQk3iJ+OGm6JfDHvFKMojILRwpSCWFb0rIu8SihMBbAoi7QypTCyTu9JxHBIsc7ijzUjF9eVbE8EpawW1Hr0qNa62xDJMAGiISSqP1mFAsvXDdTJFcjiVgZj9O3TrfT 49x0GPpq R+WBYq0dctC9ZdAKPdCZe+Pe81rF1C6bRcSQxghrl0C2/twrPJPXQvS4Fx1eGxNHlLnnMkUWo1dJD8qNp8ughujdsTR98rZGL77I29RB8/Pdy60DvPnSL0/fW9GGkleQL4Ca2a6dDzZqhsdmfOO1sfbL+Hq03xS3uTj2GZYwEjyTp6LgIUaqP6Oe8n8zKHWs306OiikJRlX+KsUOLRJlAHHvR/+LCCL0OQHuBJgiEy3qteyMZfs4zcU6XMSC4HlHqEO9jNLRw2wgDCMLCZEXSqlQto8DSjHXrYE6BLhcFC8lZQNRpuV8YCT1aZbmVkE4VglSldM+2VgS4I71OT+SDf4tR80Meqojh6kPaFCYbMsqgbfc1KS3QVfibwbCVxvbEMbA7DXuW4GSYkHttKNkyMn3wv3XvnQp64CiRaF4aXJqu7Q8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 27, 2024 at 8:02=E2=80=AFPM Yin Fengwei = wrote: > > > > On 2/27/24 14:40, Barry Song wrote: > > On Tue, Feb 27, 2024 at 7:14=E2=80=AFPM Yin Fengwei wrote: > >> > >> > >> > >> On 2/27/24 10:17, Barry Song wrote: > >>>> Like if we hit folio which is partially mapped to the range, don't s= plit it but > >>>> just unmap the mapping part from the range. Let page reclaim decide = whether > >>>> split the large folio or not (If it's not mapped to any other range,= it will be > >>>> freed as whole large folio. If part of it still mapped to other rang= e,page reclaim > >>>> can decide whether to split it or ignore it for current reclaim cycl= e). > >>> Yes, we can. but we still have to play the ptes check game to avoid a= dding > >>> folios multiple times to reclaim the list. > >>> > >>> I don't see too much difference between splitting in madvise and spli= tting > >>> in vmscan. as our real purpose is avoiding splitting entirely mapped > >>> large folios. for partial mapped large folios, if we split in madvise= , then > >>> we don't need to play the game of skipping folios while iterating PTE= s. > >>> if we don't split in madvise, we have to make sure the large folio is= only > >>> added in reclaimed list one time by checking if PTEs belong to the > >>> previous added folio. > >> > >> If the partial mapped large folio is unmapped from the range, the rela= ted PTE > >> become none. How could the folio be added to reclaimed list multiple t= imes? > > > > in case we have 16 PTEs in a large folio. > > PTE0 present > > PTE1 present > > PTE2 present > > PTE3 none > > PTE4 present > > PTE5 none > > PTE6 present > > .... > > the current code is scanning PTE one by one. > > while scanning PTE0, we have added the folio. then PTE1, PTE2, PTE4, PT= E6... > No. Before detect the folio is fully mapped to the range, we can't add fo= lio > to reclaim list because the partial mapped folio shouldn't be added. We c= an > only scan PTE15 and know it's fully mapped. you never know PTE15 is the last one mapping to the large folio, PTE15 can be mapping to a completely different folio with PTE0. > > So, when scanning PTE0, we will not add folio. Then when hit PTE3, we kno= w > this is a partial mapped large folio. We will unmap it. Then all 16 PTEs > become none. I don't understand why all 16PTEs become none as we set PTEs to none. we set PTEs to swap entries till try_to_unmap_one called by vmscan. > > If the large folio is fully mapped, the folio will be added to reclaim li= st > after scan PTE15 and know it's fully mapped. our approach is calling pte_batch_pte while meeting the first pte, if pte_batch_pte =3D 16, then we add this folio to reclaim_list and skip the left 15 PTEs. > > Regards > Yin, Fengwei > > > > > there are all kinds of possibilities for unmapping. > > > > so what we can do is recording we have added the folio while scanning P= TE0, > > then skipping this folios for all other PTEs. > > > > otherwise, we can split it while scanning PTE0, then we will meet > > different folios > > afterwards. > > > >> > >> > >> Regards > >> Yin, Fengwei > > Thanks Barry