From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E925C433DF for ; Fri, 14 Aug 2020 06:39:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BC6E720774 for ; Fri, 14 Aug 2020 06:39:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BC6E720774 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 470426B0002; Fri, 14 Aug 2020 02:39:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4239D6B0003; Fri, 14 Aug 2020 02:39:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 337706B0005; Fri, 14 Aug 2020 02:39:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0169.hostedemail.com [216.40.44.169]) by kanga.kvack.org (Postfix) with ESMTP id 1D4A86B0002 for ; Fri, 14 Aug 2020 02:39:28 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C7D85173084B for ; Fri, 14 Aug 2020 06:39:27 +0000 (UTC) X-FDA: 77148222774.03.stove63_021262f26ffa Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 9B84128A4EA for ; Fri, 14 Aug 2020 06:39:27 +0000 (UTC) X-HE-Tag: stove63_021262f26ffa X-Filterd-Recvd-Size: 3959 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Fri, 14 Aug 2020 06:39:27 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id A4CFFADF0; Fri, 14 Aug 2020 06:39:48 +0000 (UTC) Date: Fri, 14 Aug 2020 08:39:24 +0200 From: Michal Hocko To: Charan Teja Kalla Cc: akpm@linux-foundation.org, vbabka@suse.cz, david@redhat.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, vinmenon@codeaurora.org Subject: Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() Message-ID: <20200814063924.GX9477@dhcp22.suse.cz> References: <1597150703-19003-1-git-send-email-charante@codeaurora.org> <20200813114105.GI9477@dhcp22.suse.cz> <9ca76893-dfe8-9a46-f2ec-6b3c663e848e@codeaurora.org> <20200813163054.GR9477@dhcp22.suse.cz> <099b1a12-7fcd-f665-3f9d-e20d4e1396d3@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <099b1a12-7fcd-f665-3f9d-e20d4e1396d3@codeaurora.org> X-Rspamd-Queue-Id: 9B84128A4EA X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 13-08-20 22:57:32, Charan Teja Kalla wrote: > Thanks Michal. > > On 8/13/2020 10:00 PM, Michal Hocko wrote: > > On Thu 13-08-20 21:51:29, Charan Teja Kalla wrote: > >> Thanks Michal for comments. > >> > >> On 8/13/2020 5:11 PM, Michal Hocko wrote: > >>> On Tue 11-08-20 18:28:23, Charan Teja Reddy wrote: > >>> [...] > >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>>> index e4896e6..839039f 100644 > >>>> --- a/mm/page_alloc.c > >>>> +++ b/mm/page_alloc.c > >>>> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, > >>>> struct page *page, *tmp; > >>>> LIST_HEAD(head); > >>>> > >>>> + /* > >>>> + * Ensure proper count is passed which otherwise would stuck in the > >>>> + * below while (list_empty(list)) loop. > >>>> + */ > >>>> + count = min(pcp->count, count); > >>>> while (count) { > >>>> struct list_head *list; > >>> > >>> > >>> How does this prevent the race actually? > >> > >> This doesn't prevent the race. This only fixes the core hung(as this is > >> called with spin_lock_irq()) caused by the race condition. This core > >> hung is because of incorrect count value is passed to the > >> free_pcppages_bulk() function. > > > > Let me ask differently. What does enforce that the count and lists do > > not get out of sync in the loop. > > count value is updated whenever an order-0 page is being added to the > pcp lists through free_unref_page_commit(), which is being called with > both interrupts, premption disabled. > static void free_unref_page_commit(struct page *page, { > .... > list_add(&page->lru, &pcp->lists[migratetype]); > pcp->count++ > } > > As these are pcp lists, they only gets touched by another process when > this process is context switched, which happens only after enabling > premption or interrupts. So, as long as process is operating on these > pcp lists in free_unref_page_commit function, the count and lists are > always synced. > > However, the problem here is not that the count and lists are being out > of sync. They do always in sync, as explained above. It is with the > asking free_pcppages_bulk() to free the pages more than what is present > in the pcp lists which is ending up in while(list_empty()). You are right. I managed to confuse myself. The thing is that the batch count is out of sync. > > Your changelog says that the fix is to > > use the proper value without any specifics. > > > Will change this to: Ensure the count value passed is not greater than > the pcp lists count. Any better you suggest? Yes, this makes it more clear. Feel free to add Acked-by: Michal Hocko -- Michal Hocko SUSE Labs