From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E457C433DB for ; Thu, 24 Dec 2020 01:31:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BD77B2246B for ; Thu, 24 Dec 2020 01:31:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD77B2246B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id ED61E8D005D; Wed, 23 Dec 2020 20:31:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E5F8D6B007D; Wed, 23 Dec 2020 20:31:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CAF6E8D005D; Wed, 23 Dec 2020 20:31:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0114.hostedemail.com [216.40.44.114]) by kanga.kvack.org (Postfix) with ESMTP id AEF706B007B for ; Wed, 23 Dec 2020 20:31:35 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 7DB21180AD830 for ; Thu, 24 Dec 2020 01:31:35 +0000 (UTC) X-FDA: 77626448550.20.heart44_3b060a42746d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 60DAC180C07AB for ; Thu, 24 Dec 2020 01:31:35 +0000 (UTC) X-HE-Tag: heart44_3b060a42746d X-Filterd-Recvd-Size: 6566 Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com [209.85.167.45]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Thu, 24 Dec 2020 01:31:34 +0000 (UTC) Received: by mail-lf1-f45.google.com with SMTP id 23so1541382lfg.10 for ; Wed, 23 Dec 2020 17:31:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0ylr8xtf0vRN/65EDusgs3mZeJJKO/MAb2WdRIppjm8=; b=ZV8b8FbD8TZ+50o8AlHdgL2kIgkNMvx3LcjGzyZAsPuEMA5CNGfZyMP42jf59G8GcA tfP+6Egyecm0W5A3t3olaLiu22+SWaHeAz3DJi3oYyjuWwVOpOIOcoSxYbJHEgLDcuSS z5V2vCaoE0IcDE0LkHffO+li2Qr4KVhUEU09Ixf8a5qy4UQBWUVv0sr1V5GYSzQx38iV rp05h/UT506EY6YXwo2ppzxhkjMvFkeW/yEVhAgWa3PhzQaiX0lOt6YelYkyTj06NB2/ gbS8IFmMhBHLxOYXDgBo9VSfQoezSICh4YRiWEz5Qx0geh5rFAr/dfK/j8NUy6xf1l7g mjow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0ylr8xtf0vRN/65EDusgs3mZeJJKO/MAb2WdRIppjm8=; b=NBtqMfhDiWbLqnViyNCzLAdBODHHn5iCclcRViHTKZKHCLZC/Oy4Dw0haAOsVfvEIE LG11jSrDLvYbcU9WPAIsXJVjKJ55kcV0Dh9NAg539YL8QlZRRzlwiZQ4IgJ0e9VJJGiB Z1oN3x1lW0AW0Khb9O8R1W+vXyzOzAliDBUBt5WQI+KjZlxKYG8HHbRME0M3r69wM7wB eTYVDK3D6bg1iOdq8UG2h0IXulILrbu2ELErPDXgKHt2+tgFSsKurOicybOIjDtXp0rO A3uoLCJf7Y/pqgHwQpXl/L2hu2trT4B9YjNH7+BfEOm2Q1fo9u09ZTYXgAogJR692YHq Bb1g== X-Gm-Message-State: AOAM533lLHCHDaUG8RLYA0c3ENhkd/HYboS5zZ2cxJjwIkwccTqvp5gi O9zfvIJ1cZNA5GYqOf6CMZoZ8nx1eFCDoIsX12g= X-Google-Smtp-Source: ABdhPJyEG+SjICWUo3m2UZWlLl0MO437nGTfmoHzp8M/Z0PaDq7bPPDKbJxvN7AIsNn5Z4VtDFOo7tJri5HZhvYInsM= X-Received: by 2002:a19:814c:: with SMTP id c73mr11053100lfd.638.1608773493512; Wed, 23 Dec 2020 17:31:33 -0800 (PST) MIME-Version: 1.0 References: <20201222074656.GA30035@open-light-1.localdomain> <63318bf1-21ea-7202-e060-b4b2517c684e@oracle.com> In-Reply-To: From: Liang Li Date: Thu, 24 Dec 2020 09:31:21 +0800 Message-ID: Subject: Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting To: Mike Kravetz Cc: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , linux-mm , LKML , virtualization@lists.linux-foundation.org, qemu-devel@nongnu.org Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > >>> +static int > >>> +hugepage_reporting_cycle(struct page_reporting_dev_info *prdev, > >>> + struct hstate *h, unsigned int nid, > >>> + struct scatterlist *sgl, unsigned int *offset) > >>> +{ > >>> + struct list_head *list = &h->hugepage_freelists[nid]; > >>> + unsigned int page_len = PAGE_SIZE << h->order; > >>> + struct page *page, *next; > >>> + long budget; > >>> + int ret = 0, scan_cnt = 0; > >>> + > >>> + /* > >>> + * Perform early check, if free area is empty there is > >>> + * nothing to process so we can skip this free_list. > >>> + */ > >>> + if (list_empty(list)) > >>> + return ret; > >> > >> Do note that not all entries on the hugetlb free lists are free. Reserved > >> entries are also on the free list. The actual number of free entries is > >> 'h->free_huge_pages - h->resv_huge_pages'. > >> Is the intention to process reserved pages as well as free pages? > > > > Yes, Reserved pages was treated as 'free pages' > > If that is true, then this code breaks hugetlb. hugetlb code assumes that > h->free_huge_pages is ALWAYS >= h->resv_huge_pages. This code would break > that assumption. If you really want to add support for hugetlb pages, then > you will need to take reserved pages into account. I didn't know that. thanks! > P.S. There might be some confusion about 'reservations' based on the > commit message. My comments are directed at hugetlb reservations described > in Documentation/vm/hugetlbfs_reserv.rst. > > >>> + /* Attempt to pull page from list and place in scatterlist */ > >>> + if (*offset) { > >>> + isolate_free_huge_page(page, h, nid); > >> > >> Once a hugetlb page is isolated, it can not be used and applications that > >> depend on hugetlb pages can start to fail. > >> I assume that is acceptable/expected behavior. Correct? > >> On some systems, hugetlb pages are a precious resource and the sysadmin > >> carefully configures the number needed by applications. Removing a hugetlb > >> page (even for a very short period of time) could cause serious application > >> failure. > > > > That' true, especially for 1G pages. Any suggestions? > > Let the hugepage allocator be aware of this situation and retry ? > > I would hate to add that complexity to the allocator. > > This question is likely based on my lack of understanding of virtio-balloon > usage and this reporting mechanism. But, why do the hugetlb pages have to > be 'temporarily' allocated for reporting purposes? The link here will give your more detail about how page reporting works, https://www.kernel.org/doc/html/latest//vm/free_page_reporting.html the virtio-balloon driver is based on this framework and will report the free pages information to QEMU&KVM, host can unmap the memory region corresponding to reported free pages and reclaim the memory for other use, it's useful for memory overcommit. Allocated the pages 'temporarily' before reporting is necessary, it make sure guests will not use the page when the host side unmap the region. or it will break the guest. Now I realized we should solve this issue first, it seems adding a lock will help. Thanks