From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 239B0C3A5A6 for ; Thu, 19 Sep 2019 20:04:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CB7832053B for ; Thu, 19 Sep 2019 20:04:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Omgake2b" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CB7832053B Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 51FA56B0266; Thu, 19 Sep 2019 16:04:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 487396B026B; Thu, 19 Sep 2019 16:04:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3263A6B0276; Thu, 19 Sep 2019 16:04:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0005.hostedemail.com [216.40.44.5]) by kanga.kvack.org (Postfix) with ESMTP id 0BFC66B0266 for ; Thu, 19 Sep 2019 16:04:38 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id A4F20180AD807 for ; Thu, 19 Sep 2019 20:04:37 +0000 (UTC) X-FDA: 75952747794.17.play00_457236f1a8447 X-HE-Tag: play00_457236f1a8447 X-Filterd-Recvd-Size: 8577 Received: from mail-ua1-f73.google.com (mail-ua1-f73.google.com [209.85.222.73]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Thu, 19 Sep 2019 20:04:37 +0000 (UTC) Received: by mail-ua1-f73.google.com with SMTP id t16so918406uae.21 for ; Thu, 19 Sep 2019 13:04:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=VgzUh1n1rberaTXXihsqmsmieopIuBTorRpxm9/WwWg=; b=Omgake2bCvusDoi9TS/gj/AqeLx5vELUrVjRQKHqhEkF4Tgs2bPF15on7pAmUpSTf1 TX75BbMM8sNz9Job3O0OlVylOKu+udFtDc19H3cEPJYtzrrJh+fx3HBKn09ouO2QuOVO LDna1cAUm2xmBFMEaH/DIqALGed7Kk1YxGfsKFT30uzvzaxUGN8wxB+/ATzarVrRS3O4 lQmWEn9MMfzf3QfTpPTL2ItkdgZGZTP8JNf6pBBvXGFxg53nq3c4Svp9beVDZrgaqvgH ZFb9gM5KSvjCrEYfeBOOBj2vJcmDAEPvdgAGrngnL7O0IMy4HSGz8ijxnkgA359O6k3c 2LOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=VgzUh1n1rberaTXXihsqmsmieopIuBTorRpxm9/WwWg=; b=Gqmo598rkP7KHn2lOWa2+KQc1hv5QJCwKR1IF/ClHuA6zQcSmogIqNRHmvYew5UMuj sgii2IsgxTAIbAydeVdhvlhkOL/M5j701as/k11amewBjBBNdMR2RC1cDjI9HbZtcUmb hTIswHlEvFv35MeGOdSwkkRgcxmcWaT2/hw0yjoxQEZsPHqDAbLdESjdbDU/wp5Wc8fR c6QIQWmc6GO7cta3R2FQcU0vPpv59sdZe5+/k9WWAoe2bw3fzNb4qyV7WFIR4bVrpCSU DUpLAKH0/dKCDZO8944opi4rH3thHljm8HF/ofH541f/n6JALZ90tHtFHR+kpsz+E6Uu d4aw== X-Gm-Message-State: APjAAAUQQ+4DQwGLKSVuqZ0Z8R646PsCPvDxYlnBYNljVrzMOhf/+wyG R+sFfpsSatWhTI8FKIovPLVfu4ZxKUgMzveHkw== X-Google-Smtp-Source: APXvYqzyOKasUBwfbVumiUXbU2+j6HGE8tYCxCuoJohu/6zOJWSyZX9I9TiTmOtlhlUuc6NKPZaGSAV3RwMfKd7Djw== X-Received: by 2002:a1f:19d8:: with SMTP id 207mr5740981vkz.54.1568923476284; Thu, 19 Sep 2019 13:04:36 -0700 (PDT) Date: Thu, 19 Sep 2019 13:04:27 -0700 In-Reply-To: <20190919200428.188797-1-almasrymina@google.com> Message-Id: <20190919200428.188797-2-almasrymina@google.com> Mime-Version: 1.0 References: <20190919200428.188797-1-almasrymina@google.com> X-Mailer: git-send-email 2.23.0.351.gc4317032e6-goog Subject: [PATCH 1/2] hugetlb: region_chg provides only cache entry From: Mina Almasry To: mike.kravetz@oracle.com Cc: almasrymina@google.com, rientjes@google.com, shakeelb@google.com, gthelen@google.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Current behavior is that region_chg provides both a cache entry in resv->region_cache, AND a placeholder entry in resv->regions. region_add first tries to use the placeholder, and if it finds that the placeholder has been deleted by a racing region_del call, it uses the cache entry. This behavior is completely unnecessary and is removed in this patch for a couple of reasons: 1. region_add needs to either find a cached file_region entry in resv->region_cache, or find an entry in resv->regions to expand. It does not need both. 2. region_chg adding a placeholder entry in resv->regions opens up a possible race with region_del, where region_chg adds a placeholder region in resv->regions, and this region is deleted by a racing call to region_del during region_chg execution or before region_add is called. Removing the race makes the code easier to reason about and maintain. In addition, a follow up patch in another series that disables region coalescing, which would be further complicated if the race with region_del exists. Signed-off-by: Mina Almasry Reviewed-by: Mike Kravetz --- mm/hugetlb.c | 63 +++++++++------------------------------------------- 1 file changed, 11 insertions(+), 52 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6d7296dd11b8..a14f6047fc7e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -246,14 +246,10 @@ struct file_region { /* * Add the huge page range represented by [f, t) to the reserve - * map. In the normal case, existing regions will be expanded - * to accommodate the specified range. Sufficient regions should - * exist for expansion due to the previous call to region_chg - * with the same range. However, it is possible that region_del - * could have been called after region_chg and modifed the map - * in such a way that no region exists to be expanded. In this - * case, pull a region descriptor from the cache associated with - * the map and use that for the new range. + * map. Existing regions will be expanded to accommodate the specified + * range, or a region will be taken from the cache. Sufficient regions + * must exist in the cache due to the previous call to region_chg with + * the same range. * * Return the number of new huge pages added to the map. This * number is greater than or equal to zero. @@ -272,9 +268,8 @@ static long region_add(struct resv_map *resv, long f, long t) /* * If no region exists which can be expanded to include the - * specified range, the list must have been modified by an - * interleving call to region_del(). Pull a region descriptor - * from the cache and use it for this range. + * specified range, pull a region descriptor from the cache + * and use it for this range. */ if (&rg->link == head || t < rg->from) { VM_BUG_ON(resv->region_cache_count <= 0); @@ -339,15 +334,9 @@ static long region_add(struct resv_map *resv, long f, long t) * call to region_add that will actually modify the reserve * map to add the specified range [f, t). region_chg does * not change the number of huge pages represented by the - * map. However, if the existing regions in the map can not - * be expanded to represent the new range, a new file_region - * structure is added to the map as a placeholder. This is - * so that the subsequent region_add call will have all the - * regions it needs and will not fail. - * - * Upon entry, region_chg will also examine the cache of region descriptors - * associated with the map. If there are not enough descriptors cached, one - * will be allocated for the in progress add operation. + * map. A new file_region structure is added to the cache + * as a placeholder, so that the subsequent region_add + * call will have all the regions it needs and will not fail. * * Returns the number of huge pages that need to be added to the existing * reservation map for the range [f, t). This number is greater or equal to @@ -357,10 +346,9 @@ static long region_add(struct resv_map *resv, long f, long t) static long region_chg(struct resv_map *resv, long f, long t) { struct list_head *head = &resv->regions; - struct file_region *rg, *nrg = NULL; + struct file_region *rg; long chg = 0; -retry: spin_lock(&resv->lock); retry_locked: resv->adds_in_progress++; @@ -378,10 +366,8 @@ static long region_chg(struct resv_map *resv, long f, long t) spin_unlock(&resv->lock); trg = kmalloc(sizeof(*trg), GFP_KERNEL); - if (!trg) { - kfree(nrg); + if (!trg) return -ENOMEM; - } spin_lock(&resv->lock); list_add(&trg->link, &resv->region_cache); @@ -394,28 +380,6 @@ static long region_chg(struct resv_map *resv, long f, long t) if (f <= rg->to) break; - /* If we are below the current region then a new region is required. - * Subtle, allocate a new region at the position but make it zero - * size such that we can guarantee to record the reservation. */ - if (&rg->link == head || t < rg->from) { - if (!nrg) { - resv->adds_in_progress--; - spin_unlock(&resv->lock); - nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); - if (!nrg) - return -ENOMEM; - - nrg->from = f; - nrg->to = f; - INIT_LIST_HEAD(&nrg->link); - goto retry; - } - - list_add(&nrg->link, rg->link.prev); - chg = t - f; - goto out_nrg; - } - /* Round our left edge to the current segment if it encloses us. */ if (f > rg->from) f = rg->from; @@ -439,11 +403,6 @@ static long region_chg(struct resv_map *resv, long f, long t) } out: - spin_unlock(&resv->lock); - /* We already know we raced and no longer need the new region */ - kfree(nrg); - return chg; -out_nrg: spin_unlock(&resv->lock); return chg; } -- 2.23.0.351.gc4317032e6-goog