From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AACB7C433E0 for ; Fri, 26 Feb 2021 09:25:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 27D48601FC for ; Fri, 26 Feb 2021 09:25:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 27D48601FC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 899E76B0070; Fri, 26 Feb 2021 04:25:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 824D48D0002; Fri, 26 Feb 2021 04:25:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6EC0B8D0001; Fri, 26 Feb 2021 04:25:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0189.hostedemail.com [216.40.44.189]) by kanga.kvack.org (Postfix) with ESMTP id 55F436B0070 for ; Fri, 26 Feb 2021 04:25:58 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 0F67F181EBA34 for ; Fri, 26 Feb 2021 09:25:58 +0000 (UTC) X-FDA: 77859887196.06.C6FE7AB Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 38E90E4 for ; Fri, 26 Feb 2021 09:25:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614331556; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=azbAsxEI3AxRc/UDWQ1hBG0sbx1EivKXfIBbGEf7odE=; b=F6gUe4ju8g4HumDc0KQYLdAfmDkKJQIcTz6wCCI1WT3rcggHybKRyr+1VoOnMWRMj9oVtT bAMTj42msud9qZ3tZfwCNIslBwcXAwNYJPKMknBLqOMKKm8b/DPPX+8BIv9boqBAzOe10S 8jwIHIPMgPDS5KIUa1gfJ0Uxvis1vI4= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-423-sCaVhPIEO0-S5F1b2RT_Fg-1; Fri, 26 Feb 2021 04:25:53 -0500 X-MC-Unique: sCaVhPIEO0-S5F1b2RT_Fg-1 Received: by mail-wm1-f72.google.com with SMTP id h20so1540438wmq.9 for ; Fri, 26 Feb 2021 01:25:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=T2l5+I6W4V05+dmGrghLMdI2gVzdtzzFhQzH8nXmI6M=; b=XNLGww0+sIVWzGBWr56zz0xftZoyAlpaoeYA8++YltVPOjDnp72jmATM5mgHttgDGJ x1aiqpTjZhwunvt1dozkiYz+mG5XNFEbPIGtxJnr58VK9q9wDkbgvHkUBxBTlYSqqK8T xfTm8J3Vbq1Jvdd3aIDwoiVvytuMID8zFCnvjli2nWhgg1jIsMn6hi3h5hj5S1nR4ebn qOxjAs7UlT3wapHUKmB6jYqR8r29ewT2As49GB0lRJbHZZXKFFCGiB1t4QqtKX9JKgJJ i9wTBW2O5oeyuuqQxHMuGk6D7OpAfmZNW2YCEgfM0IHcaX/BgmxrRyzuv4AmY2YMwUqL znZA== X-Gm-Message-State: AOAM532kAdohFe3Y/SBrU7+b77Lr0Qee0Pc1T70t/KxNwO2H6U676Kt5 Ff1i9JKHw7fHsmtVIj5UE5DE003zcdAe7/dWBZddV7I2XfBrzPigytQpuaKKyjsoKWj5ORQ7xq5 Pm0esw6rZJrA= X-Received: by 2002:a5d:4952:: with SMTP id r18mr2219330wrs.268.1614331552488; Fri, 26 Feb 2021 01:25:52 -0800 (PST) X-Google-Smtp-Source: ABdhPJxK3P9d1lxR+Xyd0m/onYDYoGFVvo+pCNmt6qG49TdjEe1bmQtkOMnxXO3vpJ+gZg1OnU90LQ== X-Received: by 2002:a5d:4952:: with SMTP id r18mr2219302wrs.268.1614331552212; Fri, 26 Feb 2021 01:25:52 -0800 (PST) Received: from [192.168.3.108] (p5b0c63fb.dip0.t-ipconnect.de. [91.12.99.251]) by smtp.gmail.com with ESMTPSA id i1sm10030446wmq.12.2021.02.26.01.25.51 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 26 Feb 2021 01:25:51 -0800 (PST) From: David Hildenbrand Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v3 1/2] mm: Make alloc_contig_range handle free hugetlb pages Date: Fri, 26 Feb 2021 10:25:46 +0100 Message-Id: <1F1B32C0-10EA-4A7F-A062-1B8CE8D47C3F@redhat.com> References: Cc: Oscar Salvador , Andrew Morton , Mike Kravetz , David Hildenbrand , Muchun Song , linux-mm@kvack.org, linux-kernel@vger.kernel.org In-Reply-To: To: Michal Hocko X-Mailer: iPhone Mail (18D52) Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 38E90E4 X-Stat-Signature: hehkryx1z9aw1hja97zq495smi8ie9ic Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf20; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614331557-713445 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > Am 26.02.2021 um 09:38 schrieb Michal Hocko : >=20 > =EF=BB=BFOn Fri 26-02-21 09:35:10, Michal Hocko wrote: >>> On Mon 22-02-21 14:51:36, Oscar Salvador wrote: >>> alloc_contig_range will fail if it ever sees a HugeTLB page within the >>> range we are trying to allocate, even when that page is free and can be >>> easily reallocated. >>> This has proved to be problematic for some users of alloc_contic_range, >>> e.g: CMA and virtio-mem, where those would fail the call even when thos= e >>> pages lay in ZONE_MOVABLE and are free. >>>=20 >>> We can do better by trying to replace such page. >>>=20 >>> Free hugepages are tricky to handle so as to no userspace application >>> notices disruption, we need to replace the current free hugepage with >>> a new one. >>>=20 >>> In order to do that, a new function called alloc_and_dissolve_huge_page >>> is introduced. >>> This function will first try to get a new fresh hugepage, and if it >>> succeeds, it will replace the old one in the free hugepage pool. >>>=20 >>> All operations are being handled under hugetlb_lock, so no races are >>> possible. The only exception is when page's refcount is 0, but it still >>> has not been flagged as PageHugeFreed. >>=20 >> I think it would be helpful to call out that specific case explicitly >> here. I can see only one scenario (are there more?) >> __free_huge_page() isolate_or_dissolve_huge_page >> PageHuge() =3D=3D T >> alloc_and_dissolve_huge_page >> alloc_fresh_huge_page() >> spin_lock(hugetlb_lock) >> // PageHuge() && !PageHugeFreed && >> // !PageCount() >> spin_unlock(hugetlb_lock) >> spin_lock(hugetlb_lock) >> 1) update_and_free_page >> PageHuge() =3D=3D F >> __free_pages() >> 2) enqueue_huge_page >> SetPageHugeFreed() >> spin_unlock(&hugetlb_lock) =20 >>=20 >>> In this case we retry as the window race is quite small and we have hig= h >>> chances to succeed next time. >>>=20 >>> With regard to the allocation, we restrict it to the node the page belo= ngs >>> to with __GFP_THISNODE, meaning we do not fallback on other node's zone= s. >>>=20 >>> Note that gigantic hugetlb pages are fenced off since there is a cyclic >>> dependency between them and alloc_contig_range. >>>=20 >>> Signed-off-by: Oscar Salvador >>=20 >> Thanks this looks much better than the initial version. One nit below. >> Acked-by: Michal Hocko >=20 > Btw. if David has some numbers it would be great to add them to the > changelog. I=E2=80=98m planning on giving both patches a churn early next week, with a) free huge pages b) idle allocated huge pages c) heavily read huge pages (Them I=E2=80=98m also planning on having another brief look at the patches= :) ) Thanks!