From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6569AC001DC for ; Tue, 18 Jul 2023 16:31:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D910F8D0001; Tue, 18 Jul 2023 12:31:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D410B8D0012; Tue, 18 Jul 2023 12:31:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C39AC8D0001; Tue, 18 Jul 2023 12:31:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B4D6F8D0001 for ; Tue, 18 Jul 2023 12:31:51 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id CF5171603E7 for ; Tue, 18 Jul 2023 16:31:50 +0000 (UTC) X-FDA: 81025273980.25.8DA62FB Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) by imf27.hostedemail.com (Postfix) with ESMTP id CFB6540004 for ; Tue, 18 Jul 2023 16:31:48 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=qWMmdb1E; spf=pass (imf27.hostedemail.com: domain of jthoughton@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689697908; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+kWSufgbF1ilR3fnCaYRUv7EJnoFqqfNMPgZ+JW0ixk=; b=RoURwYf3k0GaC/5yn708UhQaKWMahXbP1hL9v/v72QQWOCQh3TXeBkBWT2DoHQWKsw4n/O MsEQP3x4r56BL6Sst6kPReLFfTUEhctnAxfKn3ZxJdD5/RDjGGjMtniYaJt1XID0o2EwtB YCRSZdaUycp2YUAdgjWb0741RQWqv4o= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=qWMmdb1E; spf=pass (imf27.hostedemail.com: domain of jthoughton@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689697908; a=rsa-sha256; cv=none; b=AGgASqR/Iftx4qi9XdAOrKLNzWXpXzUq4fYiQ5i6JZRXoonL2+K4/fJU4hj8/aumN54LrH LhlyK6A26noNzkL2KE/9hREdnpIiCj9dWjtmqUKuZd1b470BQCy0ZvBR1Q3aaFOve7E8pB LpGEK35fU4S+Dg5cPvcHmw2TXMn1cQw= Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-401d1d967beso378731cf.0 for ; Tue, 18 Jul 2023 09:31:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689697908; x=1692289908; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+kWSufgbF1ilR3fnCaYRUv7EJnoFqqfNMPgZ+JW0ixk=; b=qWMmdb1E6ct3nyLRQ4Dr0bILS6voMnjBKxj6MHT6b1nDURVcmuc5iYkhR3eZi3wnvd POzO4mix2uXRz9lNt1Uz3bMvFUeAnaWeVLKxnxWBia2c5MEzDpx9JmghgldbI4ZpxrFh n1DMSdzFsgTS4iySvf+Zu/7ptONCXUvwqOISGRpc3w+euwnC/+kC5+tTzluT9NoQxr6F /KfLeDcambEhPlmxuhbPlnfHLEDFVq4TMNcpXIl7H0m9IvfcKdps8Gl1HDK5DYXH9KUc y3ZxkzVXtWSt7PYDVFWedwLxid/Y/bmPmqme4fp5bXpXOsTIOn/fIlcf7vLzz8KFl8J/ WTng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689697908; x=1692289908; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+kWSufgbF1ilR3fnCaYRUv7EJnoFqqfNMPgZ+JW0ixk=; b=KdmdrS2DeENOQPBED4l55ptBUZ5NV7jtQDDCCZVr16/gtP2zO2v6szaDEEWPhLJvXp mjuqvcI5cCWWt/FxtRQ9fNBqjv7e4Rv5tI/1+MDBgl12fRrAM0Cw7lTa6EHCLbtUYqRm /JQMtKT/cCiPi0pEMdVfWmsBhcpEcMmdxqfGPMF3DEbsAKtk0HugyGoWZRIR6MxPqHKQ TNpselEpcUbrGQI0YKCK5ro5revcadWaj9ADTMSk8ysVJH+Ub45G9Y/pb26M29v3W7Qg qYCDlCekKgdJccs0N3+q4GyMqFzGuhkpowyuXno11trzYblc67yfm96R3PVl9C8sMNB8 xYpA== X-Gm-Message-State: ABy/qLZ1HHw8vQ9Jr01mqPPjxaysAgfHfKNiBie1ZjTi4GVsoRPg5zez hv30vbTwoPEZCmpJJMU2Pw4KIs0TIz9PfocW8uDgwQ== X-Google-Smtp-Source: APBJJlG9moMaK1J+MXKKSahhbk3tCuAAQsqhf19T+KELPMPI/xtqKKqqfvD/YeMcU02hEDP1YsRLExeVOM374H7jf1Y= X-Received: by 2002:a05:622a:1009:b0:403:b3ab:393e with SMTP id d9-20020a05622a100900b00403b3ab393emr309945qte.18.1689697907796; Tue, 18 Jul 2023 09:31:47 -0700 (PDT) MIME-Version: 1.0 References: <20230718004942.113174-1-mike.kravetz@oracle.com> <20230718004942.113174-3-mike.kravetz@oracle.com> In-Reply-To: <20230718004942.113174-3-mike.kravetz@oracle.com> From: James Houghton Date: Tue, 18 Jul 2023 09:31:11 -0700 Message-ID: Subject: Re: [PATCH v2 2/2] hugetlb: optimize update_and_free_pages_bulk to avoid lock cycles To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jiaqi Yan , Naoya Horiguchi , Muchun Song , Miaohe Lin , Axel Rasmussen , Michal Hocko , Andrew Morton Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: CFB6540004 X-Rspam-User: X-Stat-Signature: 4xd59e3sa8qkqbapdf9o4y1hfoh9394p X-Rspamd-Server: rspam01 X-HE-Tag: 1689697908-778088 X-HE-Meta: U2FsdGVkX1+9L1ydZ8/iwE3MsaFOtR3q4K+zrnVSwZaggavbF45uZ9isEtu1ZyoMZ9uYMXlNj6agZmOjucNQ1LLSxxBo1Dhp2rxVTePS9nrlp28Wg8Bwj+MLYuLoXDOO2s6nUt0Oqj6jsTqOnKYultxkbc0H7DtUuAXJEPN+HRBSw7ghCs9kRqLxkfnua1mSQ2ygoHLji+50Hp6BGRkDd3QojAAdvaheMF25gjhhCfUzXVZF78eRVaiaUh0kskGzE6p4dt/bfo37nISHkfJ3mkgbYvSueCxeiF+dDwdI/2S8vu0/BZfVi7AA7BWadj1twyBfefxoyB4YBC37ymTI9q6lH+v+riuvL+m9vJcyzxVBWSABsL+knV8v4GDqf7UrTbJH6KPlGOui6qDOPHD3vaS59MlQ++BJDRBvBOlP3gkOFTFhtHG3WBu/mclbfdrT5cRu+sgy2AXpEQYphcd6ltyv9MVdnwe2F68ySMx/80JurhA0Pq8v6IvYvYqyQH/WfYRqq4KesaPtGOKPlOQUGWFAHbgzXXqiARsdi8/t/bsTSM1dg76tI6vm0c2jthuZK7JM99/g1g4qltjuIsaFOqV4dlQ4b3RJN4dp/bRqE9fV3jZY/hesXFzt6o3Ey0U6R12VXjmGUivQJHy7hjHFrpzkPFcTvShUqbKmx26STYZpzdzI5VMOWBKKh7VvgCNLEU16EsEin7ymSO1F7WQNCzX4YXBnLJ6wkohu0C1RfCBVxuFSc7PNElF25AIhhVsSG54SRC8ezvV2MT0GSfrdeTRMYa8ub06OzpFG+0YAdq0Zrpb1Q+GGQHS/WjyhmWexQqF20g+D1IsYhCh8WG3T5LaNbkvDty8lruXuNlM36UlIumqxAc3IlEMpi0NHhcAB4nPl5w20Byk+gXHP0SdWGzl6teSJtAye9PeRyq98J7QT0EZYxJeR02iTiy7SFBv8Ux+zOXBtzj+EO9gk7J5 6dFY5eWm jz4T7l4PD9EGJ5E1CY1wmjUMoIt+SaIRDJvv0JIBFNFmCgLhoTQ+uwhZ7JBGSkwIEb4G+zpqtf0gtSSN1krBD44kvFvUIYcgJo9ccodxJTa7SApyzpxgUhx3fMEkQLl6hIHtUgKmTsqi2vpuPpVJ4D3IPGKv3L0hZ2yFgv4Q6Acu3s1iSPqfqAD+UC5hJfyzpgEaVUWhssyfYABXUdx8jjHk/eG1f2RkHD/SqJ860GXWkKUp5Sq4tGnRdycI2cCoqNhXOoAlRDXrh2QwtPmf5vxO0cMJ2OyYwt9OYwqmjftgS7Vo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jul 17, 2023 at 5:50=E2=80=AFPM Mike Kravetz wrote: > > update_and_free_pages_bulk is designed to free a list of hugetlb pages > back to their associated lower level allocators. This may require > allocating vmemmmap pages associated with each hugetlb page. The > hugetlb page destructor must be changed before pages are freed to lower > level allocators. However, the destructor must be changed under the > hugetlb lock. This means there is potentially one lock cycle per page. > > Minimize the number of lock cycles in update_and_free_pages_bulk by: > 1) allocating necessary vmemmap for all hugetlb pages on the list > 2) take hugetlb lock and clear destructor for all pages on the list > 3) free all pages on list back to low level allocators > > Signed-off-by: Mike Kravetz > --- > mm/hugetlb.c | 38 ++++++++++++++++++++++++++++++++++---- > 1 file changed, 34 insertions(+), 4 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 4a910121a647..e6b780291539 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1856,13 +1856,43 @@ static void update_and_free_hugetlb_folio(struct = hstate *h, struct folio *folio, > static void update_and_free_pages_bulk(struct hstate *h, struct list_hea= d *list) > { > struct page *page, *t_page; > - struct folio *folio; > + bool clear_dtor =3D false; > > + /* > + * First allocate required vmemmmap for all pages on list. If vm= emmap > + * can not be allocated, we can not free page to lower level allo= cator, > + * so add back as hugetlb surplus page. > + */ > list_for_each_entry_safe(page, t_page, list, lru) { > - folio =3D page_folio(page); > - update_and_free_hugetlb_folio(h, folio, false); > - cond_resched(); > + if (HPageVmemmapOptimized(page)) { > + if (hugetlb_vmemmap_restore(h, page)) { > + spin_lock_irq(&hugetlb_lock); > + add_hugetlb_folio(h, page_folio(page), tr= ue); > + spin_unlock_irq(&hugetlb_lock); > + } else > + clear_dtor =3D true; > + cond_resched(); > + } > + } > + > + /* > + * If vmemmmap allocation performed above, then take lock to clea= r s/vmemmmap/vmemmap. Also is a little hard to understand, something like "If vmemmap allocation was performed above for any folios, then..." seems clearer to me. > + * destructor of all pages on list. > + */ > + if (clear_dtor) { > + spin_lock_irq(&hugetlb_lock); > + list_for_each_entry(page, list, lru) > + __clear_hugetlb_destructor(h, page_folio(page)); > + spin_unlock_irq(&hugetlb_lock); > } I'm not too familiar with this code, but the above block seems weird to me. If we successfully allocated the vmemmap for *any* folio, we clear the hugetlb destructor for all the folios? I feel like we should only be clearing the hugetlb destructor for all folios if the vmemmap allocation succeeded for *all* folios. If the code is functionally correct as is, I'm a little bit confused why we need `clear_dtor`; it seems like this function doesn't really need it. (I could have some huge misunderstanding here.) > + > + /* > + * Free pages back to low level allocators. vmemmap and destruct= ors > + * were taken care of above, so update_and_free_hugetlb_folio wil= l > + * not need to take hugetlb lock. > + */ > + list_for_each_entry_safe(page, t_page, list, lru) > + update_and_free_hugetlb_folio(h, page_folio(page), false)= ; > } > > struct hstate *size_to_hstate(unsigned long size) > -- > 2.41.0 >