From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C552C25B08 for ; Wed, 17 Aug 2022 08:31:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BFDB78D0003; Wed, 17 Aug 2022 04:31:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B867C8D0002; Wed, 17 Aug 2022 04:31:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A26BC8D0003; Wed, 17 Aug 2022 04:31:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 909DC8D0002 for ; Wed, 17 Aug 2022 04:31:34 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 5D1DD1A06B7 for ; Wed, 17 Aug 2022 08:31:34 +0000 (UTC) X-FDA: 79808415708.26.F503988 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf30.hostedemail.com (Postfix) with ESMTP id 09DDF8007E for ; Wed, 17 Aug 2022 08:31:32 +0000 (UTC) Received: from canpemm500002.china.huawei.com (unknown [172.30.72.54]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4M71R84n7czGpdx; Wed, 17 Aug 2022 16:29:56 +0800 (CST) Received: from [10.174.177.76] (10.174.177.76) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Wed, 17 Aug 2022 16:31:28 +0800 To: Andrew Morton , , Muchun Song , Linux-MM , linux-kernel From: Miaohe Lin Subject: [bug report] mm/hugetlb: various bugs with avoid_reserve case in alloc_huge_page() Message-ID: Date: Wed, 17 Aug 2022 16:31:28 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.177.76] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To canpemm500002.china.huawei.com (7.192.104.244) X-CFilter-Loop: Reflected ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660725094; a=rsa-sha256; cv=none; b=COTZ4iRUn437+yPcHcljpBIUjT96wPGsecO65gP7E5tTP4lQMYP73Z4tWpL405YY1wbywH 3082opVNGorzNtQIetXMvN4M7ZOBfbZW6Qg9YLYuzqWpDDjr//xCBcDwVw4QlSY1C6qzi/ WUqauGS2DaWbJqy/Iurw/vaVixIMelo= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf30.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660725094; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=MxO3sOUEMWQgEpPzjsXFdoi/Kh8u/NJJ5LIIA03m59g=; b=5XhTuabvXB+xykTEiWwFwxkCF6Hb7lO8NnNYTW4ED6DEnxUqGy9DPrk8UbyG55GUjnvzZ8 F47q2x/XgQbG+KTHSAbqyBiYVz5In/pza6cIpxhkpc/w073ah1d2LMe9hto8eDMDOriqLN hj7phK0nEZc9LTHRThKdJqD7LPPOY78= Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf30.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 09DDF8007E X-Stat-Signature: cyqer5f61r7i1h9q6bm66dfb9w6ab4j5 X-Rspam-User: X-HE-Tag: 1660725092-109083 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi all: When I investigate the mm/hugetlb.c code again, I found there are a few possible issues with avoid_reserve case. (It's really hard to follow the relevant code for me.) Please take a look at the below analysis: 1.avoid_reserve issue with h->resv_huge_pages in alloc_huge_page. Assume: h->free_huge_pages 60 h->resv_huge_pages 30 spool->rsv_hpages 30 When avoid_reserve is true, after alloc_huge_page(), we will have: spool->rsv_hpages 29 /* hugepage_subpool_get_pages decreases it. */ h->free_huge_pages 59 h->resv_huge_pages 30 /* rsv_hpages is used, but *h->resv_huge_pages is not modified accordingly*. */ If the hugetlb page is freed later, we will have: spool->rsv_hpages 30 /* hugepage_subpool_put_pages increases it. */ h->free_huge_pages 60 h->resv_huge_pages 31 /* *increased wrongly* due to hugepage_subpool_put_pages(spool, 1) == 0. */ ^^ 2.avoid_reserve issue with hugetlb rsvd cgroup charge for private mappings in alloc_huge_page. In general, if hugetlb pages are reserved, corresponding rsvd counters are charged in resv_maps for private mappings. Otherwise they're charged in individual hugetlb pages. When alloc_huge_page() is called with avoid_reserve == true, hugetlb_cgroup_charge_cgroup_rsvd() will be called to charge the newly allocated hugetlb page even if there has a reservation for this page in resv_maps. Then vma_commit_reservation() is called to indicate that the reservation is consumed. So the reservation *can not be used, thus leaking* from now on because vma_needs_reservation always return 1 for it. 3.avoid_reserve issue with restore_reserve_on_error There's a assumption in restore_reserve_on_error(): If HPageRestoreReserve is not set, this indicates there is an entry in the reserve map added by alloc_huge_page or HPageRestoreReserve would be set on the page. But this assumption *does not hold for avoid_reserve*. HPageRestoreReserve won't be set even if there is already an entry in the reserve map for avoid_reserve case. So avoid_reserve should be considered in this function, i.e. we need *a reliable way* to determine whether the entry is added by the alloc_huge_page(). Are above issues possible? Or am I miss something? These possible issues seem not easy to fix for me. Any thoughts? Any response would be appreciated! Thanks! Miaohe Lin