From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 639AECD54AB for ; Tue, 19 Sep 2023 08:57:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0F3B6B04DB; Tue, 19 Sep 2023 04:57:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EBC556B04DC; Tue, 19 Sep 2023 04:57:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DACD96B04DD; Tue, 19 Sep 2023 04:57:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id CB71C6B04DB for ; Tue, 19 Sep 2023 04:57:42 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 98D4412057C for ; Tue, 19 Sep 2023 08:57:42 +0000 (UTC) X-FDA: 81252743964.30.2E48434 Received: from out-221.mta0.migadu.com (out-221.mta0.migadu.com [91.218.175.221]) by imf17.hostedemail.com (Postfix) with ESMTP id A9ECD4001E for ; Tue, 19 Sep 2023 08:57:40 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=uGdjC6Qr; spf=pass (imf17.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.221 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695113860; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xhG63ZKSBvU91SDuaCV2UVUbpZEeTJ9JdWWGs2XqxmU=; b=c7o05MJ6bBncaRovrJPd0D9RHIZKRNQd1+kYynkpZlSSXf9dE4AR/a/QI+Fu4FpV1VC/hb gpf2jkZM4qBxKqK/phAGp4D9XswIyUVqJpn2mLVIaGTl1olQmasnBUn7kPfX2Fj8TUqeu1 mJjqenGJkPGvO2BEZlWv02ZkBeQE4bg= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=uGdjC6Qr; spf=pass (imf17.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.221 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695113860; a=rsa-sha256; cv=none; b=1qou2hoSCgtEQBckNjddXy7WHOKyDgppCngTDLBaWeRTN64mXUutki0Nz/B8eVNBBbMfVL oQj4Z77/xZ1kaeu6O95zF/bGipUsuDjZwOrWXursBdzXzyQ0kTy4lsMTGA1QneX7J0GZmo O5CT9g0ASjlKoxo1bhNgjb5LhJyrAw8= Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1695113858; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xhG63ZKSBvU91SDuaCV2UVUbpZEeTJ9JdWWGs2XqxmU=; b=uGdjC6QrZs4BreUkw8Io9PPSZ3Cw7OlodJ7r9RJeihtKLtehE0lQalsQIYhoQl3Yx3+wqO Qcpkj7awzZ3lb66E8mMQCckiiA4N4EgN4sUclQo5t/yjEel+9uAglU6TO9VyRvMpHB+mWa lJxrBYS1Wkn7DVHqptDPsQoCsPMIZWw= Mime-Version: 1.0 Subject: Re: [PATCH v4 6/8] hugetlb: batch PMD split for bulk vmemmap dedup X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: Date: Tue, 19 Sep 2023 16:57:00 +0800 Cc: Mike Kravetz , Muchun Song , Oscar Salvador , David Hildenbrand , Miaohe Lin , David Rientjes , Anshuman Khandual , Naoya Horiguchi , Barry Song <21cnbao@gmail.com>, Michal Hocko , Matthew Wilcox , Xiongchun Duan , Linux-MM , Andrew Morton , LKML Content-Transfer-Encoding: quoted-printable Message-Id: <83B874B6-FF22-4588-90A9-31644D598032@linux.dev> References: <20230918230202.254631-1-mike.kravetz@oracle.com> <20230918230202.254631-7-mike.kravetz@oracle.com> <9c627733-e6a2-833b-b0f9-d59552f6ab0d@linux.dev> <07192BE2-C66E-4F74-8F76-05F57777C6B7@linux.dev> To: Joao Martins X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: A9ECD4001E X-Rspam-User: X-Stat-Signature: daahxb5w1bc6356t56ei9bxxyquk8xbe X-Rspamd-Server: rspam01 X-HE-Tag: 1695113860-477669 X-HE-Meta: U2FsdGVkX1+qpFIMWn06/wis8NcDJK7vYpxRejQT/d9Q+faTq+OtRx+nKrGu4V8J1KQ/3gZE7e0y/fkUGXsF6K8xcfUchjkk+qa+HeANB7HNxSTjiHNKXewCYj6JPOzqGzntUXKv9GdV3tyk2WoHWNeAdbG4kk9aSEJnglTpWUHHFzurXWTDDqcfHaP0Y18v39+fkjBnlkHzzpmh32r5Z53qmPsE5OIlUfCv7JKLgOYNdqMO7uy1Pb9I971hTGw2Fe1cf0XOQzCDXJwVG7KDFD/2mapb7Thf6iRCghIh4p/51KpEVOhtt11IUKYWyxYC5V6rE5w6cD7fFQx55i6Tq9Q2lUTFdmF/3GmvhWyuO+iQO8SdVFULmfP/DF7us/pGVK8Y1axLjkcWM97hMfqDdgkp8eFdQhc1SDNeRzC5GOkgt0lVyLO2nLSxkpGUN3Kq0OCYuD6VV4VaKUByXQpPUtrHEObP7O+yKaD4qwk+d2rkB1rsNsOV4s/JQGEQphzPJk5PNZWR27YDCy4W9g4ohYzibWTW2EAPDZ3DCqSGpw3zZbCVumBHvCfrzyF/UsEL62ZZB7/lI/ox4SWu6utM/tMw6ZatLz9P0IIZ5Aibbg64ueVznfXrTT1I1ZIPyxHirBrJHcDDW912a9ZMrFVZeI+8ErnUkmEzIafZHwYPBmhiHGm2HjRP8NY7TgzFK6VF1tQtrZTpIJbOlizGbkCirBDN/M4jz9rqzwQ9ODgO/OVf3235Geomz4zueKD9VKQ/EXklmZ33EJRsfi+qw/iNHEoGB99jVAuMFkTtpaIioDlsnsZSNtvuK29XCXlJHToIW4HsSIe/Od4a+OcHfax8IphUE5IeUZuEJYTaQjRCx/BxD2SNsn/RJmGh44maaevNk+5Oq7liQfX5fj130h+p3VRwBBfaBJYzV07fQ8xuLfQ/ZqR8HbPuP8YpNBBBwy1WR4zcbIeE1YxVS58rj98 x7tloZ5g rRPBeAo/lPze9e7CPTuVdf/r4fupESQkvZwZjzoYt4TRktSb5LpN/PNbrz+jQbRCnoYRyaTFPfJ15U2LybIjlGzclqBul/MypXKVaaAqfxAdSvaRStBCpTtxI5eR/VHSpKKpFIMXn8E3jwWGmDTqzTbp3iyvAihh9lKVKHeo4RO/yTN4+d/HK7NYGLA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Sep 19, 2023, at 16:55, Joao Martins = wrote: >=20 > On 19/09/2023 09:41, Muchun Song wrote: >>> On Sep 19, 2023, at 16:26, Joao Martins = wrote: >>> On 19/09/2023 07:42, Muchun Song wrote: >>>> On 2023/9/19 07:01, Mike Kravetz wrote: >>>>> From: Joao Martins >>>>>=20 >>>>> In an effort to minimize amount of TLB flushes, batch all PMD = splits >>>>> belonging to a range of pages in order to perform only 1 (global) = TLB >>>>> flush. >>>>>=20 >>>>> Add a flags field to the walker and pass whether it's a bulk = allocation >>>>> or just a single page to decide to remap. First value >>>>> (VMEMMAP_SPLIT_NO_TLB_FLUSH) designates the request to not do the = TLB >>>>> flush when we split the PMD. >>>>>=20 >>>>> Rebased and updated by Mike Kravetz >>>>>=20 >>>>> Signed-off-by: Joao Martins >>>>> Signed-off-by: Mike Kravetz >>>>> --- >>>>> mm/hugetlb_vmemmap.c | 79 = +++++++++++++++++++++++++++++++++++++++++--- >>>>> 1 file changed, 75 insertions(+), 4 deletions(-) >>>>>=20 >>>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >>>>> index 147ed15bcae4..e8bc2f7567db 100644 >>>>> --- a/mm/hugetlb_vmemmap.c >>>>> +++ b/mm/hugetlb_vmemmap.c >>>>> @@ -27,6 +27,7 @@ >>>>> * @reuse_addr: the virtual address of the @reuse_page = page. >>>>> * @vmemmap_pages: the list head of the vmemmap pages that can = be freed >>>>> * or is mapped from. >>>>> + * @flags: used to modify behavior in bulk operations >>>>=20 >>>> Better to describe it as "used to modify behavior in vmemmap page = table walking >>>> operations" >>>>=20 >>> OK >>>=20 >>>>> void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct = list_head >>>>> *folio_list) >>>>> { >>>>> struct folio *folio; >>>>> LIST_HEAD(vmemmap_pages); >>>>> + list_for_each_entry(folio, folio_list, lru) >>>>> + hugetlb_vmemmap_split(h, &folio->page); >>>>> + >>>>> + flush_tlb_all(); >>>>> + >>>>> list_for_each_entry(folio, folio_list, lru) { >>>>> int ret =3D __hugetlb_vmemmap_optimize(h, &folio->page, >>>>> &vmemmap_pages); >>>>=20 >>>> This is unlikely to be failed since the page table allocation >>>> is moved to the above=20 >>>=20 >>>> (Note that the head vmemmap page allocation >>>> is not mandatory).=20 >>>=20 >>> Good point that I almost forgot >>>=20 >>>> So we should handle the error case in the above >>>> splitting operation. >>>=20 >>> But back to the previous discussion in v2... the thinking was that = /some/ PMDs >>> got split, and say could allow some PTE remapping to occur and free = some pages >>> back (each page allows 6 more splits worst case). Then the next >>> __hugetlb_vmemmap_optimize() will have to split PMD pages again for = those >>> hugepages that failed the batch PMD split (as we only defer the PTE = remap tlb >>> flush in this stage). >>=20 >> Oh, yes. Maybe we could break the above traversal as early as = possible >> once we enter an ENOMEM? >>=20 >=20 > Sounds good -- no point in keep trying to split if we are failing with = OOM. >=20 > Perhaps a comment in both of these clauses (the early break on split = and the OOM > handling in batch optimize) could help make this clear. Make sense. Thanks. >=20 >>>=20 >>> Unless this isn't something worth handling >>>=20 >>> Joao