From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44E5CC00523 for ; Wed, 8 Jan 2020 09:40:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0E820206DA for ; Wed, 8 Jan 2020 09:40:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0E820206DA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 98ACE8E0005; Wed, 8 Jan 2020 04:40:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 93C0D8E0001; Wed, 8 Jan 2020 04:40:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82A268E0005; Wed, 8 Jan 2020 04:40:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0085.hostedemail.com [216.40.44.85]) by kanga.kvack.org (Postfix) with ESMTP id 6A1538E0001 for ; Wed, 8 Jan 2020 04:40:46 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 2B818AC02 for ; Wed, 8 Jan 2020 09:40:46 +0000 (UTC) X-FDA: 76353972492.26.body29_40dd92310415d X-HE-Tag: body29_40dd92310415d X-Filterd-Recvd-Size: 5034 Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com [209.85.128.66]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Wed, 8 Jan 2020 09:40:45 +0000 (UTC) Received: by mail-wm1-f66.google.com with SMTP id u2so1736799wmc.3 for ; Wed, 08 Jan 2020 01:40:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=meLHATiTUUwiWD33DPkavEeGj5naCYHS6kL4s3+/juM=; b=Dk0NFOKefD0DzEo1VEs8ncFNWnrEejhHyrc4/+RKkVOdHgH+NYbyyhNFFdU1g1n3Ej SxHLECPRKQMUzioQDBUjVcLtie957zZuv+RZ+uiEvTSPpfVras2axph0zWC/nd37xQu5 s+Xu+oNtZZ8VRk5MQxcz0PSRzg3HWbSDZPY3IU9V3ZOE4CXw+RCac/mCKbJsz7QH1XzB QQicOMXPU2t+YswWPbkzqubzUeeQnsa11za5RVGrz2+AQ4SzCjbMPH01rHSZZNlQ0klQ UO4eAdMTGUwivJhq2cQLNBKZPxtBJ9jB+9vEyI7oCBLMTW9EueCfTdCzwESgQ4j3Mvqg emFQ== X-Gm-Message-State: APjAAAWz4AsSNiR/B5LQmTLk4iHbCr6bW6aW9/FB24rq0H8uV3P2AbtC bSHZodi3EaMyqJa1H4BPl7A= X-Google-Smtp-Source: APXvYqzGlv3UyrOBEYEndGtbZQxn4HYfskoN+/gXiEDOTsWvFx0jPB8TFCI/W8XsPq4H8mmr5M7KMQ== X-Received: by 2002:a7b:cf26:: with SMTP id m6mr2553244wmg.17.1578476444398; Wed, 08 Jan 2020 01:40:44 -0800 (PST) Received: from localhost (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id s19sm2998002wmj.33.2020.01.08.01.40.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2020 01:40:43 -0800 (PST) Date: Wed, 8 Jan 2020 10:40:41 +0100 From: Michal Hocko To: Wei Yang Cc: hannes@cmpxchg.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, kirill.shutemov@linux.intel.com, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, yang.shi@linux.alibaba.com Subject: Re: [RFC PATCH] mm: thp: grab the lock before manipulation defer list Message-ID: <20200108094041.GQ32178@dhcp22.suse.cz> References: <20200103143407.1089-1-richardw.yang@linux.intel.com> <20200106102345.GE12699@dhcp22.suse.cz> <20200107012241.GA15341@richard> <20200107083808.GC32178@dhcp22.suse.cz> <20200108003543.GA13943@richard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200108003543.GA13943@richard> User-Agent: Mutt/1.12.2 (2019-09-21) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 08-01-20 08:35:43, Wei Yang wrote: > On Tue, Jan 07, 2020 at 09:38:08AM +0100, Michal Hocko wrote: > >On Tue 07-01-20 09:22:41, Wei Yang wrote: > >> On Mon, Jan 06, 2020 at 11:23:45AM +0100, Michal Hocko wrote: > >> >On Fri 03-01-20 22:34:07, Wei Yang wrote: > >> >> As all the other places, we grab the lock before manipulate the defer list. > >> >> Current implementation may face a race condition. > >> > > >> >Please always make sure to describe the effect of the change. Why a racy > >> >list_empty check matters? > >> > > >> > >> Hmm... access the list without proper lock leads to many bad behaviors. > > > >My point is that the changelog should describe that bad behavior. > > > >> For example, if we grab the lock after checking list_empty, the page may > >> already be removed from list in split_huge_page_list. And then list_del_init > >> would trigger bug. > > > >And how does list_empty check under the lock guarantee that the page is > >on the deferred list? > > Just one confusion, is this kind of description basic concept of concurrent > programming? How detail level we need to describe the effect? When I write changelogs for patches like this I usually describe, what is the potential race - e.g. CPU1 CPU2 path1 path2 check lock operation2 unlock lock # check might not hold anymore operation1 unlock and what is the effect of the race - e.g. a crash, data corruption, pointless attempt for operation1 which fails with user visible effect etc. This helps reviewers and everybody reading the code in the future to understand the locking scheme. > To me, grab the lock before accessing the critical section is obvious. It might be obvious but in many cases it is useful to minimize the locking and do a potentially race check before the lock is taken if the resulting operation can handle that. > list_empty and list_del should be the critical section. And the > lock should protect the whole critical section instead of part of it. I am not disputing that. What I am trying to say is that the changelog should described the problem in the first place. Moreover, look at the code you are trying to fix. Sure extending the locking seem straightforward but does it result in a correct code though? See my question in the previous email. How do we know that the page is actually enqued in a non-empty list? -- Michal Hocko SUSE Labs