From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BD60C4363A for ; Wed, 28 Oct 2020 19:37:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 478042477C for ; Wed, 28 Oct 2020 19:37:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="vHWozAGF" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 478042477C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 691826B005C; Wed, 28 Oct 2020 15:37:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 641FB6B0062; Wed, 28 Oct 2020 15:37:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 532126B0068; Wed, 28 Oct 2020 15:37:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0216.hostedemail.com [216.40.44.216]) by kanga.kvack.org (Postfix) with ESMTP id 25C026B005C for ; Wed, 28 Oct 2020 15:37:49 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B87FF180AD815 for ; Wed, 28 Oct 2020 19:37:48 +0000 (UTC) X-FDA: 77422344216.29.nail48_36110e327287 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 99F1F180868DF for ; Wed, 28 Oct 2020 19:37:48 +0000 (UTC) X-HE-Tag: nail48_36110e327287 X-Filterd-Recvd-Size: 8112 Received: from mail-ej1-f66.google.com (mail-ej1-f66.google.com [209.85.218.66]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Wed, 28 Oct 2020 19:37:48 +0000 (UTC) Received: by mail-ej1-f66.google.com with SMTP id s15so542095ejf.8 for ; Wed, 28 Oct 2020 12:37:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=T3cqq/NCYbTXYrF1TPKJhjFGrojFTr6Vs4M/e3juExo=; b=vHWozAGFqMpE7ggVelY+F2jS1kMR2nSt1Yk9nJlD6AAg+OUfG1xGn3eYW8v457b2nU Sm/+yTkS7qN1Gs/Odp2HfGjKKxpC6VQOgqnJfspet+MfHCvkS1jeQQSCukzHFxlZD7YF WLjJauesFmQhnosBRh7E2dfg5F9NUnD3KU9Sr7LATbF3GUK3yDLNQVG5iwQMFcsDh3BO KTnXkpWSaeQ7oy07H1SC7JIUOf3B7DacFJbF+jMsvTnu8Q+KOhDOiptmA2Kf5x+jDHPU 81ZACzEow/DJmNKXH3l9zBEMjzuKfOgGSHBW2fe0mLoQNVWik+UiEnq0MSUMhOQpg9pj 9PZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=T3cqq/NCYbTXYrF1TPKJhjFGrojFTr6Vs4M/e3juExo=; b=OO1O8xXTimgq2UVHFwOXy/AJY1kmeq3C+Sss2YBlOc9wty8sMIhJrSA6nxtdU2ydqW gruWIFf6T84YHLb0cbRKHlIzijiwO2WC0gTIQ0y932QcS5rXfr87Bh9hThrfRFypzg2n yEKHa87orhVP4Vo9K3aVAelrsn44RgnxwFL3ONaRnXYlW28N+j9bvsh75V2vFfDlXZf2 JuM8/3qK5SrQrFjXeDbZ+nItkGPIlBwWZ8W1xfh9GpXYb11eFzV4T5iXcyWSfyUSDj+e QtxZistZMg5wtsLFLv56t5jm+DtNRZ4w1oKpkFvf0+GhW7EmwpGT3nUTUHjqJaO00O+z WuiQ== X-Gm-Message-State: AOAM5339LucCPrveVQTX2RoltAJvmDDkJFg8yCsb2d64qOrJPlCAq5kM n0Bxja2KmXhSKFCnUS9DKPcwaQlxYuWLwJClh61AYQ== X-Google-Smtp-Source: ABdhPJwlwDQMF3LHnuk4kQId499nf+MAFapv086WNS3mOjg8mkPXn9Wv1Y0YDCbQat01jYZQYKX3xEqRi2pw0BvLurk= X-Received: by 2002:a17:906:48b:: with SMTP id f11mr651163eja.293.1603913866632; Wed, 28 Oct 2020 12:37:46 -0700 (PDT) MIME-Version: 1.0 References: <20201021204426.36069-1-mike.kravetz@oracle.com> <20201022081538-mutt-send-email-mst@kernel.org> In-Reply-To: <20201022081538-mutt-send-email-mst@kernel.org> From: Mina Almasry Date: Wed, 28 Oct 2020 12:37:35 -0700 Message-ID: Subject: Re: [PATCH] hugetlb_cgroup: fix reservation accounting To: "Michael S. Tsirkin" Cc: Mike Kravetz , Linux-MM , open list , David Hildenbrand , Michal Privoznik , Michal Hocko , Muchun Song , "Aneesh Kumar K . V" , Tejun Heo , Andrew Morton , stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 22, 2020 at 5:21 AM Michael S. Tsirkin wrote: > > On Wed, Oct 21, 2020 at 01:44:26PM -0700, Mike Kravetz wrote: > > Michal Privoznik was using "free page reporting" in QEMU/virtio-balloon > > with hugetlbfs and hit the warning below. QEMU with free page hinting > > uses fallocate(FALLOC_FL_PUNCH_HOLE) to discard pages that are reported > > as free by a VM. The reporting granularity is in pageblock granularity. > > So when the guest reports 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE) > > one huge page in QEMU. > > > > [ 315.251417] ------------[ cut here ]------------ > > [ 315.251424] WARNING: CPU: 7 PID: 6636 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50 > > [ 315.251425] Modules linked in: ... > > [ 315.251466] CPU: 7 PID: 6636 Comm: qemu-system-x86 Not tainted 5.9.0 #137 > > [ 315.251467] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020 > > [ 315.251469] RIP: 0010:page_counter_uncharge+0x4b/0x50 > > ... > > [ 315.251479] Call Trace: > > [ 315.251485] hugetlb_cgroup_uncharge_file_region+0x4b/0x80 > > [ 315.251487] region_del+0x1d3/0x300 > > [ 315.251489] hugetlb_unreserve_pages+0x39/0xb0 > > [ 315.251492] remove_inode_hugepages+0x1a8/0x3d0 > > [ 315.251495] ? tlb_finish_mmu+0x7a/0x1d0 > > [ 315.251497] hugetlbfs_fallocate+0x3c4/0x5c0 > > [ 315.251519] ? kvm_arch_vcpu_ioctl_run+0x614/0x1700 [kvm] > > [ 315.251522] ? file_has_perm+0xa2/0xb0 > > [ 315.251524] ? inode_security+0xc/0x60 > > [ 315.251525] ? selinux_file_permission+0x4e/0x120 > > [ 315.251527] vfs_fallocate+0x146/0x290 > > [ 315.251529] __x64_sys_fallocate+0x3e/0x70 > > [ 315.251531] do_syscall_64+0x33/0x40 > > [ 315.251533] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > ... > > [ 315.251542] ---[ end trace 4c88c62ccb1349c9 ]--- > > > > Investigation of the issue uncovered bugs in hugetlb cgroup reservation > > accounting. This patch addresses the found issues. > > > > Fixes: 075a61d07a8e ("hugetlb_cgroup: add accounting for shared mappings") > > Cc: > > Reported-by: Michal Privoznik > > Co-developed-by: David Hildenbrand > > Signed-off-by: David Hildenbrand > > Signed-off-by: Mike Kravetz > > Acked-by: Michael S. Tsirkin > > > --- > > mm/hugetlb.c | 20 +++++++++++--------- > > 1 file changed, 11 insertions(+), 9 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 67fc6383995b..b853a11de14f 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -655,6 +655,8 @@ static long region_del(struct resv_map *resv, long f, long t) > > } > > > > del += t - f; > > + hugetlb_cgroup_uncharge_file_region( > > + resv, rg, t - f); > > > > /* New entry for end of split region */ > > nrg->from = t; > > @@ -667,9 +669,6 @@ static long region_del(struct resv_map *resv, long f, long t) > > /* Original entry is trimmed */ > > rg->to = f; > > > > - hugetlb_cgroup_uncharge_file_region( > > - resv, rg, nrg->to - nrg->from); > > - > > list_add(&nrg->link, &rg->link); > > nrg = NULL; > > break; > > @@ -685,17 +684,17 @@ static long region_del(struct resv_map *resv, long f, long t) > > } > > > > if (f <= rg->from) { /* Trim beginning of region */ > > - del += t - rg->from; > > - rg->from = t; > > - > > hugetlb_cgroup_uncharge_file_region(resv, rg, > > t - rg->from); > > - } else { /* Trim end of region */ > > - del += rg->to - f; > > - rg->to = f; > > > > + del += t - rg->from; > > + rg->from = t; > > + } else { /* Trim end of region */ > > hugetlb_cgroup_uncharge_file_region(resv, rg, > > rg->to - f); > > + > > + del += rg->to - f; > > + rg->to = f; > > } > > } > > > > @@ -2454,6 +2453,9 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, > > > > rsv_adjust = hugepage_subpool_put_pages(spool, 1); > > hugetlb_acct_memory(h, -rsv_adjust); > > + if (deferred_reserve) > > + hugetlb_cgroup_uncharge_page_rsvd(hstate_index(h), > > + pages_per_huge_page(h), page); > > } > > return page; > > > > -- > > 2.25.4 > Sorry for the late review. Looks good to me. Reviewed-by: Mina Almasry