From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EBB2C433DB for ; Thu, 25 Feb 2021 22:15:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BCB9664DD3 for ; Thu, 25 Feb 2021 22:15:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BCB9664DD3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 47E806B0005; Thu, 25 Feb 2021 17:15:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 42E926B0006; Thu, 25 Feb 2021 17:15:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 31DE06B006C; Thu, 25 Feb 2021 17:15:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0100.hostedemail.com [216.40.44.100]) by kanga.kvack.org (Postfix) with ESMTP id 18B536B0005 for ; Thu, 25 Feb 2021 17:15:23 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D6C17824556B for ; Thu, 25 Feb 2021 22:15:22 +0000 (UTC) X-FDA: 77858197284.06.FCE043A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf02.hostedemail.com (Postfix) with ESMTP id F24924080F40 for ; Thu, 25 Feb 2021 22:15:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614291321; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zRRQd/QLYLk/lPyDA6HK0X34wWeUVT2vGo2efbJKbmk=; b=DDDZWxKfKn4ydOEisnAF8zQNY37S00f1NGTiMib0k8lIkyv45wlXoLfgGiRBUQm6NJKW2U cpKvBuoJf3JFpOmUQ9GlwnKxME3C92B1jXx6NJXVqx5Iaknds9R7jW25L/AQkNzUvLv3Cp qP6X10VFTzXTqJD9WN4UzhbJLML4Iug= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-517-jtP-daAFP1CfZv01NzDarA-1; Thu, 25 Feb 2021 17:15:20 -0500 X-MC-Unique: jtP-daAFP1CfZv01NzDarA-1 Received: by mail-ej1-f72.google.com with SMTP id bs10so3171706ejb.21 for ; Thu, 25 Feb 2021 14:15:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=+533owFFoe3g0xTq7seU8RN9Dccv2HdyDhgPBrwMfSM=; b=p4rnjVWmUp5njqhh9/AcWCzoD75gKvpHXF6uOfXa4utSWExfBgw/bzmjk+QSlNwPVh 25KqJlI5NsWo2iWC6c+9morFOGkjO5EBeW3X2/tgv2HzLIwdQeOS79g/sfQmLSIB1J7U XdSEFHY6j02OlQeEwQIijWi43Q3GPYlfySW72jYJ9zlGpsLRhGVbBaM3MuwivA4rn2gz wCKS0Pzajmrvd/sLYFAy1aL/524jmsfoyubWEjNDxEurIYBqOAYGIwQJe4ZT0I4gFs9s RVyZ0/LYwJrzickd8g0au73DbFUxQ+PFK7UZ9roZQ1WGxJt/R5ruuYHtAtq+Trp7oWKE NSAA== X-Gm-Message-State: AOAM533uLT1HLkLVIOI5/iClSutJwoIk60pHyCZjgM2i2gaQHp9MBNE+ T3Qs/jPvzkZr3y+nZpzfP4RH0GzdWNuy1ndkQsPU/6M/7v89HrFTyuZDoWzdrviEJ/vXxmlHNc3 Iun6WC4JdimfvRgW47tpGd6v9BU5YHYhAV90WaRSMH6TRLoPzZpQhuMub5p4= X-Received: by 2002:a17:906:5acd:: with SMTP id x13mr4841453ejs.211.1614291318693; Thu, 25 Feb 2021 14:15:18 -0800 (PST) X-Google-Smtp-Source: ABdhPJwsLOW/8G5/Ak1nix7I8hM+mG5Fh/maJBwHMgKJbldwwCRfHbbA63oUAdYER2wiGlo1aFp3nw== X-Received: by 2002:a17:906:5acd:: with SMTP id x13mr4841423ejs.211.1614291318397; Thu, 25 Feb 2021 14:15:18 -0800 (PST) Received: from ?IPv6:2a01:598:b880:cd5f:2452:43b8:4b31:c43? ([2a01:598:b880:cd5f:2452:43b8:4b31:c43]) by smtp.gmail.com with ESMTPSA id a3sm3531876ejv.40.2021.02.25.14.15.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 25 Feb 2021 14:15:17 -0800 (PST) From: David Hildenbrand Mime-Version: 1.0 (1.0) Subject: Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free hugetlb pages Date: Thu, 25 Feb 2021 23:15:15 +0100 Message-Id: <1C808F10-158D-4DB8-A393-01829A398B17@redhat.com> References: Cc: David Hildenbrand , Oscar Salvador , Muchun Song , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org In-Reply-To: To: Mike Kravetz X-Mailer: iPhone Mail (18D52) Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F24924080F40 X-Stat-Signature: y8owf5dcfaftnphp8pyxnpbo9zte77zr Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf02; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614291307-992951 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > Am 25.02.2021 um 22:43 schrieb Mike Kravetz : >=20 > =EF=BB=BFOn 2/10/21 12:23 AM, David Hildenbrand wrote: >>> On 08.02.21 11:38, Oscar Salvador wrote: >>> --- a/mm/compaction.c >>> +++ b/mm/compaction.c >>> @@ -952,6 +952,17 @@ isolate_migratepages_block(struct compact_control = *cc, unsigned long low_pfn, >>> low_pfn +=3D compound_nr(page) - 1; >>> goto isolate_success_no_list; >>> } >>> + } else { >>=20 >> } else if (alloc_and_dissolve_huge_page(page))) { >>=20 >> ... >>=20 >>> + /* >>> + * Free hugetlb page. Allocate a new one and >>> + * dissolve this is if succeed. >>> + */ >>> + if (alloc_and_dissolve_huge_page(page)) { >>> + unsigned long order =3D buddy_order_unsafe(page); >>> + >>> + low_pfn +=3D (1UL << order) - 1; >>> + continue; >>> + } >>=20 >>=20 >>=20 >> Note that there is a very ugly corner case we will have to handle gracef= ully (I think also in patch #1): >>=20 >> Assume you allocated a gigantic page (and assume that we are not using C= MA for gigantic pages for simplicity). Assume you want to allocate another = one. alloc_pool_huge_page()->...->alloc_contig_pages() will stumble over th= e first allocated page. It will try to alloc_and_dissolve_huge_page() the e= xisting gigantic page. To do that, it will alloc_pool_huge_page()->...->all= oc_contig_pages() ... and so on. Bad. >>=20 >=20 > Sorry for resurrecting an old thread. > While looking at V3 of these patches, I was exploring all the calling > sequences looking for races and other issues. It 'may' be that the > issue about infinitely allocating and freeing gigantic pages may not be > an issue. Of course, I could be mistaken. Here is my reasoning: >=20 > alloc_and_dissolve_huge_page (now isolate_or_dissolve_huge_page) will be > called from __alloc_contig_migrate_range() within alloc_contig_range(). > Before calling __alloc_contig_migrate_range, we call start_isolate_page_r= ange > to isolate all page blocks in the range. Because all the page blocks in > the range are isolated, another invocation of alloc_contig_range will > not operate on any part of that range. See the comments for > start_isolate_page_range or commit 2c7452a075d4. So, when > start_isolate_page_range goes to allocate another gigantic page it will > never notice/operate on the existing gigantic page. >=20 > Again, this is confusing and I might be missing something. I think you are right that the endless loop is blocked. But I think the who= le thing could cascade once we have multiple gigantic pages allocated. Try allocating a new gpage. We find an existing gpage, isolate it and try t= o migrate it. To do that, we try allocating a new gpage. We find yet anothe= r existing gpage, isolate and try to migrate it ... until we isolated all g= pages on out way to an actual usable area. Then we have to actually migrate= all these in reverse order ... Of course this only works if we can actually isolate a gigantic page - whic= h should be the case I think (they are migratable and should be marked as m= ovable). >=20 > In any case, I agree that gigantic pages are tricky and we should leave > them out of the discussion for now. We can rethink this later if > necessary. Yes, it=E2=80=98s tricky and not strictly required right now because we nev= er place them on ZONE_MOVABLE. And as I said, actual use cases might be rar= e.