From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B668C00140 for ; Tue, 2 Aug 2022 17:20:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B07FF6B0071; Tue, 2 Aug 2022 13:20:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A911A6B0072; Tue, 2 Aug 2022 13:20:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E8FD8E0001; Tue, 2 Aug 2022 13:20:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 788756B0071 for ; Tue, 2 Aug 2022 13:20:23 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 505961C3D0F for ; Tue, 2 Aug 2022 17:20:23 +0000 (UTC) X-FDA: 79755316326.31.DBD43BF Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) by imf19.hostedemail.com (Postfix) with ESMTP id C531E1A00E1 for ; Tue, 2 Aug 2022 17:20:21 +0000 (UTC) Received: from pps.filterd (m0167075.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 272HH6wg012981 for ; Tue, 2 Aug 2022 13:20:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h=mime-version : references : in-reply-to : from : date : message-id : subject : to : cc : content-type; s=pps01; bh=Gv04c7xsUvJvy6rqqeE5EHNxOJ+7qXD39mqMwdZmk/w=; b=LJ4zrYKw0l091OPQ6TYumzXgORsrL5kJacxyELfje7jEC6virZT01Iu/e5RHNZPeIfGH hg7et6pq2bF0ljGE+PORuJrsyRzzPtZJ6XsDWEjIWWwcHS+JTyMS/SzKk1D2FUBtIyoo pifGs4ZcXEeGPs/ROD+aEyceT0XntxgjFy7YuSWZD7N4WJob5htNlsBRn5ptzA1jXRlA kbdEPzoEmOc5lWWp8q3PusHFi9HUD057+juh5k0G8Z6/oqFFuV/3sFsWPuFCeNGuuXbC A0C1De0Vr6fzQmp1dTrTc6RB3hko+K6DcxlC3cxT6hdH1nhnO66v/YfW59nIf54Xg5qI zw== Received: from sendprdmail22.cc.columbia.edu (sendprdmail22.cc.columbia.edu [128.59.72.24]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 3hnhsnh3gw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 02 Aug 2022 13:20:20 -0400 Received: from mail-ua1-f71.google.com (mail-ua1-f71.google.com [209.85.222.71]) by sendprdmail22.cc.columbia.edu (8.14.7/8.14.4) with ESMTP id 272HKJWq081305 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 2 Aug 2022 13:20:19 -0400 Received: by mail-ua1-f71.google.com with SMTP id o5-20020ab01505000000b00382efb8efabso3739489uae.12 for ; Tue, 02 Aug 2022 10:20:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=Gv04c7xsUvJvy6rqqeE5EHNxOJ+7qXD39mqMwdZmk/w=; b=JN9vEGJ6W95ojZOIuCv3+azAklQCDdPR2WGTpLfUnwI8xPIsONJTlsE62mSr9h+Sgn raSm5yYwvk293ZEV3SZcYI1do+R/xUsbDviSsglWDVlQJMqkqLghR4qmKfNTBmnZBnMu CKzaR6VpXvwl3yipYPnVIDePMn7R0/H4fx233OmHeCzxsNTPI945p+4HqGwCRTV4BvT2 uyzIDr7eRYCyu10sCygWiIZTqNwf9u24ZqOBCf8ba3Fayua9IquxxiLqj0CC7Z/1If9X KMHC8X1bZ3kzVlNSxxE7hsw1Pfxz8gIgv+q84t0yi6ob+NBNafHiGFGls9X0CqRxh+Rf EuTg== X-Gm-Message-State: ACgBeo2PeQQopK3PKhv0KSK9SvSHH/MNFcY97RPkicqrMnY7U2zKHvl7 Ts1n7E/vym0laoueXYDw13X3LRBoi7zVFy/bptlxu+LCr7oft9oB2DzDs9KajPsUNnOrcYiov2q UPCiSYlPh4pA7S0ATlhHCo3CVRcoi6bkMeu4= X-Received: by 2002:ab0:a85:0:b0:384:cd61:1df5 with SMTP id d5-20020ab00a85000000b00384cd611df5mr8657637uak.41.1659460818687; Tue, 02 Aug 2022 10:20:18 -0700 (PDT) X-Google-Smtp-Source: AA6agR6rEV1Q9Z6Bm2+RpqmXrvBVpmQY6YHPtaoQKIVSMlehulAdSMp5JOeASXrETLjFYZ7qTJFEWwpkfHnadYQid1s= X-Received: by 2002:ab0:a85:0:b0:384:cd61:1df5 with SMTP id d5-20020ab00a85000000b00384cd611df5mr8657626uak.41.1659460818372; Tue, 02 Aug 2022 10:20:18 -0700 (PDT) MIME-Version: 1.0 References: <20220802151550.159076-1-wangkefeng.wang@huawei.com> In-Reply-To: From: Gabriel Ryan Date: Tue, 2 Aug 2022 13:20:07 -0400 Message-ID: Subject: Re: [PATCH] mm: ksm: fix data-race in __ksm_enter / run_store To: Matthew Wilcox Cc: Kefeng Wang , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Abhishek Shah Content-Type: multipart/alternative; boundary="000000000000991ea405e545544c" X-Proofpoint-GUID: E3x9eacvudsvYGRCpARZRKHAXYYSRxei X-Proofpoint-ORIG-GUID: E3x9eacvudsvYGRCpARZRKHAXYYSRxei X-CU-OB: Yes X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-02_11,2022-08-02_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 impostorscore=10 priorityscore=1501 mlxlogscore=763 phishscore=0 lowpriorityscore=10 bulkscore=10 clxscore=1015 suspectscore=0 malwarescore=0 mlxscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2208020081 ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=columbia.edu header.s=pps01 header.b=LJ4zrYKw; spf=pass (imf19.hostedemail.com: domain of gr2547@columbia.edu designates 148.163.139.74 as permitted sender) smtp.mailfrom=gr2547@columbia.edu; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659460822; a=rsa-sha256; cv=none; b=ic+RWfmIG0zVjFpk9+kOq364GBpSjNeqkZ2ccIi09wYwT2vngabiuRonFENIQIS9noPt37 v7b3GEM2aneVN51LOs3SbxAL9snLDzsfU0JrO+75gcAFF5sdNWyXUSILu8Qg8wmG/3aJOn tZdNSLVLcRkENPqdmpQBeKyDhDDt3do= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659460822; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Gv04c7xsUvJvy6rqqeE5EHNxOJ+7qXD39mqMwdZmk/w=; b=xpXuYDc4JBITB2Y9LRctJVhsbZ5Jihp3jhxhKxe1PmIovlu49jRXNCOrsuTCKj19YhSsgF YF5agCpKbgMBlIs8wOoDuNtgJMbx1kk4cH4LjQ031FSt8UBRGa2MDYyuYIAAt+xwxZ71eb HUo5rpTeGQlSdC2gvkyZqatrgXN+dEk= X-Stat-Signature: y3m9e4wucidkchoec64jdjqqyu454gao X-Rspamd-Queue-Id: C531E1A00E1 Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=columbia.edu header.s=pps01 header.b=LJ4zrYKw; spf=pass (imf19.hostedemail.com: domain of gr2547@columbia.edu designates 148.163.139.74 as permitted sender) smtp.mailfrom=gr2547@columbia.edu; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1659460821-435764 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --000000000000991ea405e545544c Content-Type: text/plain; charset="UTF-8" Hi Matthew, I don't believe execution of unmerge_and_remove_all_rmap_items() after an mm is misplaced is guaranteed. Consider the following interleaving: Thread A executes *__ksm_enter* with KSM_RUN_MERGE set through the check on https://elixir.bootlin.com/linux/v5.18-rc5/source/mm/ksm.c#L2501 Thread B executes *run_store* and sets KSM_RUN_UNMERGE and then also executes unmerge_and_remove_all_rmap_items() on https://elixir.bootlin.com/linux/v5.18-rc5/source/mm/ksm.c#L2900 Thread A completes *__ksm_enter *and misplaces the mm behind the scanning cursor since it is still on the KSM_RUN_MERGE path on https://elixir.bootlin.com/linux/v5.18-rc5/source/mm/ksm.c#L2504 I also noticed through manual inspection another check that appears racy of the KSM_RUN_UNMERGE flag on https://elixir.bootlin.com/linux/v5.18-rc5/source/mm/ksm.c#L2563 Best, Gabe On Tue, Aug 2, 2022 at 11:45 AM Matthew Wilcox wrote: > On Tue, Aug 02, 2022 at 11:15:50PM +0800, Kefeng Wang wrote: > > The ksm_run is alread protected by ksm_thread_mutex in run_store, we > > could add this lock in __ksm_enter() to avoid the above issue. > > I don't think this is a great fix. Why not protect the store with > ksm_mmlist_lock? ie: > > mutex_lock(&ksm_thread_mutex); > wait_while_offlining(); > if (ksm_run != flags) { > + spin_lock(&ksm_mmlist_lock); > ksm_run = flags; > + spin_unlock(&ksm_mmlist_lock); > if (flags & KSM_RUN_UNMERGE) { > set_current_oom_origin(); > err = unmerge_and_remove_all_rmap_items(); > clear_current_oom_origin(); > if (err) { > + spin_lock(&ksm_mmlist_lock); > ksm_run = KSM_RUN_STOP; > + spin_unlock(&ksm_mmlist_lock); > ... > > (I also don't think this is a real bug, because the call to > unmerge_and_remove_all_rmap_items() will "cure" the misplacement of > items in the list, but there's value in shutting up the tools, I suppose) > --000000000000991ea405e545544c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Matthew,

I don't beli= eve execution of=20 unmerge_and_remove_all_rmap_items() after an mm is misplaced is guaranteed.= =C2=A0

Consider the following interleaving:
<= div>Thread A executes __ksm_enter with KSM_RUN_MERGE set through the= check on https://elixir.bootlin.com/linux/v5.18-rc5/source/mm/ksm.c#L250= 1
Thread B executes run_store and sets KSM_RUN_UNMERGE= and then also executes=20 unmerge_and_remove_all_rmap_items() on https://elixir.bootlin.com/linux/v= 5.18-rc5/source/mm/ksm.c#L2900
Thread A completes __ksm_en= ter and misplaces the mm behind the scanning cursor since it is still o= n the KSM_RUN_MERGE path on https://elixir.bootlin.com/linux/v5.18-rc5/so= urce/mm/ksm.c#L2504

I also noticed through man= ual inspection another check that appears racy of the KSM_RUN_UNMERGE flag = on https://elixir.bootlin.com/linux/v5.18-rc5/source/mm/ksm.c#L2563

Best,

Gabe
<= br>


On Tue, Aug 2, 2022 at 11:45 AM Matthew Wilcox <= willy@infradead.org> wrote:
On Tue, Aug 02, 2= 022 at 11:15:50PM +0800, Kefeng Wang wrote:
> The ksm_run is alread protected by ksm_thread_mutex in run_store, we > could add this lock in __ksm_enter() to avoid the above issue.

I don't think this is a great fix.=C2=A0 Why not protect the store with=
ksm_mmlist_lock?=C2=A0 ie:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 mutex_lock(&ksm_thread_mutex);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wait_while_offlining();
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (ksm_run !=3D flags) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0spin_lock(&ksm_= mmlist_lock);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ksm_run =3D flags;<= br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0spin_unlock(&ks= m_mmlist_lock);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (flags & KSM= _RUN_UNMERGE) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 set_current_oom_origin();
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 err =3D unmerge_and_remove_all_rmap_items();
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 clear_current_oom_origin();
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 if (err) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0spin_lock(&ksm_mmlist_lock);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ksm_run =3D KSM_RUN_STOP;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0spin_unlock(&ksm_mmlist_lock); ...

(I also don't think this is a real bug, because the call to
unmerge_and_remove_all_rmap_items() will "cure" the misplacement = of
items in the list, but there's value in shutting up the tools, I suppos= e)
--000000000000991ea405e545544c--