From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E148C2B9F8 for ; Tue, 25 May 2021 00:11:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 38CC4613F5 for ; Tue, 25 May 2021 00:11:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 38CC4613F5 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8C5666B0070; Mon, 24 May 2021 20:11:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 874896B0071; Mon, 24 May 2021 20:11:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6EDF46B0072; Mon, 24 May 2021 20:11:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0157.hostedemail.com [216.40.44.157]) by kanga.kvack.org (Postfix) with ESMTP id 3C6DE6B0070 for ; Mon, 24 May 2021 20:11:48 -0400 (EDT) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id DB576180AD807 for ; Tue, 25 May 2021 00:11:47 +0000 (UTC) X-FDA: 78177825054.34.801DA04 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by imf02.hostedemail.com (Postfix) with ESMTP id 7524640002F3 for ; Tue, 25 May 2021 00:11:44 +0000 (UTC) Received: by mail-pj1-f45.google.com with SMTP id g6-20020a17090adac6b029015d1a9a6f1aso809728pjx.1 for ; Mon, 24 May 2021 17:11:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OUlwWPEjEBC466iZ6g8hcxYhiN/KFRQWiFx0UlEipac=; b=U9fMhp00ZsfRAVyT9JVw0V+4wEIZ53dxOQhQr+RJ7edEyGoasKWtcqmMn1r50OcqIM 4KrICxJ9rycntJ5xA4ASSTGWxLRdzz71HoK8E314h8xHF2nzZOkwDEbxmusYVgTCkVeE 8q75THtH8pMEISNJMbyv0jQ8hCAKetfexGqsRdi/aIqkasHXxyuupBw6PmL2c6GmZeyB h8ZT6hre36tPaZeU4w33IIbf7c+bqWOQbKm5mK8QIOixAJwj9/03TDFYUm25Fj9c42Jx jNFfDFkrzIs/tR3sR0Zm9AvispJyP3dWquDcGJYL2/ig+VRcU4MyL2t2iD0r71T76i6P EW5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OUlwWPEjEBC466iZ6g8hcxYhiN/KFRQWiFx0UlEipac=; b=O3QYHaLFUu3uhOVIdTGlFfmVnblJT2p6y2ElaiVpO7aZJyUAj/K7Tq96PaM3JgWtcK V/kTImCT+bzOjn0TVWH132p3fY5UqSsy33/0wDsoY6F634SW6eY3F720HhqaF66vaj9l h9T0vdXL+/nOFQL4f9hV6kXQTshvIlvVfefwh824l5FZO3244Cg3jDUhJwz+vqBapO/2 PRyYIsjuIFkoI7AcfNUYkYeDKFCTmAmf0UvmUVK6aoUbEBGgIdHVXG73rX2m+T80ehC5 oWn+7hrHodPzN68Kf2uyFCh6gkcZS3FxGHA00bcChwqO+pJXwDIY+oups5b7m45DiN89 zmCQ== X-Gm-Message-State: AOAM533S29GUSa8WK71shFuiAYsRDOqLxOHzYTDvEJeJkeonUGYT0VgH YQZ9Vg8GmoTTkkeqazXSFX/vGuXQ5CCDO4VdRjbi5g== X-Google-Smtp-Source: ABdhPJzDpAY5Kl7EYrbTh60YtA/nIgzXVmikK11bIk7UFwcPG3S/kJ7+gTYXYzmM5m694eIhFp/UI1892g3EYd+miWE= X-Received: by 2002:a17:902:ea0c:b029:f0:af3d:c5d6 with SMTP id s12-20020a170902ea0cb02900f0af3dc5d6mr27426462plg.45.1621901506390; Mon, 24 May 2021 17:11:46 -0700 (PDT) MIME-Version: 1.0 References: <20210521074433.931380-1-almasrymina@google.com> <2a983662-ab90-0cdb-850c-eb50b0845b49@oracle.com> In-Reply-To: <2a983662-ab90-0cdb-850c-eb50b0845b49@oracle.com> From: Mina Almasry Date: Mon, 24 May 2021 17:11:34 -0700 Message-ID: Subject: Re: [PATCH v3] mm, hugetlb: fix resv_huge_pages underflow on UFFDIO_COPY To: Mike Kravetz Cc: Axel Rasmussen , Peter Xu , Linux-MM , Andrew Morton , open list Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=U9fMhp00; spf=pass (imf02.hostedemail.com: domain of almasrymina@google.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 7524640002F3 X-Stat-Signature: 7cd5uztgg7w1tu4tjtaqo5fh9wa5o38g X-HE-Tag: 1621901504-103058 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000094, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > > + if (!HPageRestoreReserve(page)) { > > + if (unlikely(hugetlb_unreserve_pages( > > + mapping->host, idx, idx + 1, 1))) > > + hugetlb_fix_reserve_counts( > > + mapping->host); > > + } > > I do not understand the need to call hugetlb_unreserve_pages(). The > call to restore_reserve_on_error 'should' fix up the reserve map to > align with restoring the reserve count in put_page/free_huge_page. > Can you explain why that is there? > AFAICT here is what happens for a given index *without* the call to hugetlb_unreserve_pages(): 1. hugetlb_no_page() allocates a page consuming the reservation, resv_huge_pages decrements. 2. remove_inode_hugepages() does remove_huge_page() and hugetlb_unreserve_pages(). This removes the entry from the resv_map, but does NOT increment back the resv_huge_pages. Because we removed the entry, it looks like we have no reservation for this index. free_huge_page() gets called on this page, and resv_huge_pages is not incremented, I'm not sure why. This page should have come from the reserves. 3. hugetlb_mcopy_pte_atomic() gets called for this index. Because of the prior call to hugetlb_unreserve_page(), there is no entry in the resv_map for this index, which means it looks like we don't have a reservation for this index. We allocate a page outside the reserves (deferred_reservation=1, HPageRestoreReserve=0), add an entry into resv_map, and don't modify resv_huge_pages. 4. The copy fails and we deallocate the page, since HPageRestoreReserve==0 for this page, restore_reserve_on_error() does nothing. 5. hugetlb_mcopy_pte_atomic() gets recalled with the temporary page, and we allocate another page. Now, since we added an entry in the resv_map in the previous allocation, it looks like we have a reservation for this allocation. We allocate a page with deferred_reserve=0 && HPageRestoreReserve=1, we decrement resv_huge_pages. Boom, we decremented resv_huge_pages twice for this index, never incremented it. To fix this, in step 4, when I deallocate a page, I check HPageRestoreReserve(page). If HPageRestoreReserve=0, then this reservation was consumed and deallocated before, and so I need to remove the entry from the resv_map.