From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 298EDC43603 for ; Tue, 17 Dec 2019 10:51:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C2A062072D for ; Tue, 17 Dec 2019 10:51:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C2A062072D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=virtuozzo.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 62D4C8E0051; Tue, 17 Dec 2019 05:51:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5DD018E0040; Tue, 17 Dec 2019 05:51:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F30F8E0051; Tue, 17 Dec 2019 05:51:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0072.hostedemail.com [216.40.44.72]) by kanga.kvack.org (Postfix) with ESMTP id 3A4CF8E0040 for ; Tue, 17 Dec 2019 05:51:04 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id E0146181AC9CB for ; Tue, 17 Dec 2019 10:51:03 +0000 (UTC) X-FDA: 76274316006.12.match75_7da3b4d7d3416 X-HE-Tag: match75_7da3b4d7d3416 X-Filterd-Recvd-Size: 2646 Received: from relay.sw.ru (relay.sw.ru [185.231.240.75]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Tue, 17 Dec 2019 10:51:03 +0000 (UTC) Received: from dhcp-172-16-24-104.sw.ru ([172.16.24.104]) by relay.sw.ru with esmtp (Exim 4.92.3) (envelope-from ) id 1ihAQi-0003L8-8F; Tue, 17 Dec 2019 13:50:16 +0300 Subject: Re: Re: [PATCH v2] mm/hugetlb: Defer freeing of huge pages if in non-task context To: Michal Hocko , Waiman Long Cc: Mike Kravetz , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Davidlohr Bueso , Andi Kleen , "Aneesh Kumar K.V" References: <20191217012508.31495-1-longman@redhat.com> <20191217093143.GC31063@dhcp22.suse.cz> From: Kirill Tkhai Message-ID: <87c2ff49-999e-3196-791f-36e3d42ad79c@virtuozzo.com> Date: Tue, 17 Dec 2019 13:50:15 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20191217093143.GC31063@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 17.12.2019 12:31, Michal Hocko wrote: > On Mon 16-12-19 20:25:08, Waiman Long wrote: > [...] >> Both the hugetbl_lock and the subpool lock can be acquired in >> free_huge_page(). One way to solve the problem is to make both locks >> irq-safe. > > Please document why we do not take this, quite natural path and instead > we have to come up with an elaborate way instead. I believe the primary > motivation is that some operations under those locks are quite > expensive. Please add that to the changelog and ideally to the code as > well. We probably want to fix those anyway and then this would be a > temporary workaround. > >> Another alternative is to defer the freeing to a workqueue job. >> >> This patch implements the deferred freeing by adding a >> free_hpage_workfn() work function to do the actual freeing. The >> free_huge_page() call in a non-task context saves the page to be freed >> in the hpage_freelist linked list in a lockless manner. > > Do we need to over complicate this (presumably) rare event by a lockless > algorithm? Why cannot we use a dedicated spin lock for for the linked > list manipulation? This should be really a trivial code without an > additional burden of all the lockless subtleties. Why not llist_add()/llist_del_all() ?