From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CABB0C3065C for ; Tue, 2 Jul 2024 16:16:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 672846B009A; Tue, 2 Jul 2024 12:16:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6230A6B009F; Tue, 2 Jul 2024 12:16:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EAEF6B00A1; Tue, 2 Jul 2024 12:16:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2D97E6B009A for ; Tue, 2 Jul 2024 12:16:14 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CC5E2403B6 for ; Tue, 2 Jul 2024 16:16:13 +0000 (UTC) X-FDA: 82295314626.29.10EC033 Received: from mail-yw1-f171.google.com (mail-yw1-f171.google.com [209.85.128.171]) by imf23.hostedemail.com (Postfix) with ESMTP id 0923E140023 for ; Tue, 2 Jul 2024 16:16:11 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4R2qCwg6; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of hughd@google.com designates 209.85.128.171 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719936953; a=rsa-sha256; cv=none; b=xJV8wi8DHww2XF7yPh6IOJycu4AEYUWH0LTMctHyt5kII9HQxRiBFXH8DfKF2pvy47wRJL VU/LAQfbT4E14kKQNvzNweLIOjd1JZF975qavQBCE0Al+iRs7QbfTFG0v8QCKz6cWzAQ5/ adWRhbHrT6RKkrOtRw8Qrhf13yUIlL4= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4R2qCwg6; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of hughd@google.com designates 209.85.128.171 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719936953; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8z2/XiGVaBn+uz521Bdia43A7AchOvsj677ZoQYBVYQ=; b=7hNd50ms5Poq6GFkfzxagaq9E8thIxWzRelX6VbSdEMHI3rPoORBmETC56/edXwLxr/Now 1dJGRsVCI3Z8L1Iy+lHqLtUBpN28DELcHtXRoY0TXypIwKvtH89bbbUSqQypBgLeY3XoJo /w7paIqy29qvfNXRmv4r8jhmuspkb2Q= Received: by mail-yw1-f171.google.com with SMTP id 00721157ae682-630640c1e14so43146217b3.1 for ; Tue, 02 Jul 2024 09:16:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1719936971; x=1720541771; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=8z2/XiGVaBn+uz521Bdia43A7AchOvsj677ZoQYBVYQ=; b=4R2qCwg6IdgmYD1ZCuv4ceova4afU17iMnGeSuEjPMtp6dGswWWoRpCue+Cx0P44j3 aNPAdW6KeSvFchya4s5aIPtktptN0eDKFThPfN8DlBUCYGNpWFq3v/AxzJHy/aGd5gMs T6jHZQoqbXZQurOoeu/fN413TPIA9hcURSwsIA63nDKHH162kSkBYgt9DJTCDUbT3uEa PBQzs4DdzT+Thmq+J/4hn6aaC9NGi5G9HRylmeSwGBEwCOr7K9/uxzPb5mx9q5IWMZs5 Fl8/tHagSy2m5eX7T1dYg2frd/JJOnR9IQa7erMPgRGPOOw3jyTAexLHo4QQnfl9pPH4 E3IA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719936971; x=1720541771; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8z2/XiGVaBn+uz521Bdia43A7AchOvsj677ZoQYBVYQ=; b=UqrOsX0eb3CvO4MYwjjlVgdGtoc1knfAvL4d9iJi+Ja3D1kze+1EA4Gmcax3NtQ3oX cvuHKrKoJuTjlB0KFMen17LL/GcCtAHwMmldZVaisl8yYyNPjFz2wTHk1QQsLewHI5eb erYniq98Oc6cyFp5KgSfZKWQHcXTvz+Ueg/vgqWHBxbUB4SncpPsNZTQvQvTBxfyYeXP a7OyXeQb/J2PaOsg8mhL7o2gn5Di2htxpfeYCOMOXOeR6wuGv2+ANoA4wvlcoo++eEom LZOLWCPolZNHqgrclH0AdSGkrV9wEf06xgwCd+yx3xKLGR1+VVuHZHAhlBoelMlZlSgW cW/A== X-Forwarded-Encrypted: i=1; AJvYcCVB3RyQxYXBnApzB4uqIpmfy+JWROlxrDug/NUXVpxue4CfBXAWoVUwWQMxoD4n9JgcTdZc7xfHMrnX37GBhQ22Tr8= X-Gm-Message-State: AOJu0Yzk8irZNO+FQMYxfdtNZ0i44uqZpleHBRwyBQwhvaCRsRxHxlZk ehgg/Zm6Dr6mcduI7awHtrdDth68bN9RnA07SMvLUSEBYcYHbDQ66nO10FFMCg== X-Google-Smtp-Source: AGHT+IGMTd9vVYJ9rW41TxRQS96zY1DUsAKyRb/H1VxRHrMUJIA6BK6QEvH+l4zS1pZX4JRx60I/LQ== X-Received: by 2002:a05:690c:a16:b0:61b:1f0e:10 with SMTP id 00721157ae682-64c70c76528mr87847567b3.4.1719936970721; Tue, 02 Jul 2024 09:16:10 -0700 (PDT) Received: from darker.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 00721157ae682-64a9ba5a19fsm18211047b3.96.2024.07.02.09.16.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Jul 2024 09:16:10 -0700 (PDT) Date: Tue, 2 Jul 2024 09:15:54 -0700 (PDT) From: Hugh Dickins To: Baolin Wang cc: Hugh Dickins , Andrew Morton , Nhat Pham , Yang Shi , Zi Yan , Barry Song , Kefeng Wang , David Hildenbrand , Matthew Wilcox , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH hotfix] mm: fix crashes from deferred split racing folio migration In-Reply-To: Message-ID: References: <29c83d1a-11ca-b6c9-f92e-6ccb322af510@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 0923E140023 X-Stat-Signature: yuejpen7q6671chsqsisur7dwrkierrk X-Rspam-User: X-HE-Tag: 1719936971-919422 X-HE-Meta: U2FsdGVkX1+Im02AM7FBKMFeDCV2ZOLxngavOU1YbAnKbFR0zw6wb9QHRBnE/h92SeEkRs8RnrU9CFgUPOCHR/89cKEjQkXpLywqIC+pBlJjdp8K1j1XXUvoMq0yrS5BcH2l/ZtcAVgInT8AxbpnUVwJMeLXc015qhF6ESnjoGMwiTNen1DMejBNCKot+zNtZy3FcBYgKyVdBsKkUVlSsc4GgKKKNYlaD71yTADRrWmoUWd9evh9CgiczHWDac3jiYOrLPuAoSjRNmYN84J27G7DnhM8vU+CBvjIegAFcIncK7FYHy0KM/LrB5c81l8PDadym4E4MYB6m8lCC520SyEC2OCD5eMFMluTmPxbNppQmIiIpzIjQ/mpme/6deyIIkDVtROoIwq/RwrxvhFx4i6e3zMecl+lutOM5M6MUzbYj/Pntdrj0UofFoqY1Bm+zArdCR6B/sa2lNxXFH3ozzHjsqk6yNQ8+W9akFfIXVz6AhrPJBq/PBkfablnqxDvUZ5xuQbDYJgs4bWjPWl9GKNdaJzjPtPIdg8KEf0/61Aha1Dh4+YdWQ1+0VewQaVStqnaOFCDp4r4bUzoVfC1r7xADfh4wB7+yzG/Lru1OOpi3UC1Rb4gZD6IdfRkigGH2HyXZig60FQKpw44uup2A6LJr1IYvF8vSOnb2uMGf+Jhpr+Kn6WxiYE37Odt2TUm9T7EukK6DqbSvPBglYxRsfBIsUlUKSRzVsIEYjy5Jr7Oq3WrchEw6sq1WI+Hg0jVXpjlyvQKm+8HfMPv9EUrCGV+paF1hs1QyxLfclA1Efqczy1c3Rx7GT3L01ij7DWLpNiKj25O4hm7PvdCKlhzJEJ+LIIXTHi/ylzoR8bmny4FM10eZJTBnbCykWtMRUG92b1OCvIJ6n8gGLd99oKmuy0EsCkyfI3zV23wiuMRMjvpSxJT4neF+J8dWWGFYwsKIO9pytVugcJXvfBE2de y0ioirH0 02yWQEbRVwndCeVfIg/ggv3hGNwjEIDKoQ+nQFvV4g3eAAwBPaqd08NR7i7ZUESlFuKTPyoOLHAAAeGeufo4wLE/pYLvXOp62Ibh0teDOpXEBN/4SI81A+QOekIBe3oR/XBHxPS+qKw7/EUuV1U/qa36JdIUwXX5cupzezDiQTygEZ3Lc1knEf26axXUVQAEbqzzAiwYPLzefdF1Gjbsep+KlL33bR646k4riNBOV//FZCjeKeFmrK4BHeOz1oDZnr7MA51OlOa3V4BKC9MCssZJXtIHl6P59wuewekMMIqjod6V/BG/Ptdcstu5GjXV5R0XRcy5ez+4ipxw9hK+r6vv/SixUBr85yqqipR97D5684qQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000039, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 2 Jul 2024, Baolin Wang wrote: > On 2024/7/2 15:40, Hugh Dickins wrote: > > Even on 6.10-rc6, I've been seeing elusive "Bad page state"s (often on > > flags when freeing, yet the flags shown are not bad: PG_locked had been > > set and cleared??), and VM_BUG_ON_PAGE(page_ref_count(page) == 0)s from > > deferred_split_scan()'s folio_put(), and a variety of other BUG and WARN > > symptoms implying double free by deferred split and large folio migration. > > > > 6.7 commit 9bcef5973e31 ("mm: memcg: fix split queue list crash when large > > folio migration") was right to fix the memcg-dependent locking broken in > > 85ce2c517ade ("memcontrol: only transfer the memcg data for migration"), > > but missed a subtlety of deferred_split_scan(): it moves folios to its own > > local list to work on them without split_queue_lock, during which time > > folio->_deferred_list is not empty, but even the "right" lock does nothing > > to secure the folio and the list it is on. > > > > Fortunately, deferred_split_scan() is careful to use folio_try_get(): so > > folio_migrate_mapping() can avoid the race by folio_undo_large_rmappable() > > while the old folio's reference count is temporarily frozen to 0 - adding > > such a freeze in the !mapping case too (originally, folio lock and (I should have added "isolation and" into that list of conditions.) > > unmapping and no swap cache left an anon folio unreachable, so no freezing > > was needed there: but the deferred split queue offers a way to reach it). > > Thanks Hugh. > > But after reading your analysis, I am concerned that the > folio_undo_large_rmappable() and deferred_split_scan() may still encounter a > race condition with the local list, even with your patch. > > Suppose folio A has already been queued into the local list in > deferred_split_scan() by thread A, but fails to 'folio_trylock' and then > releases the reference count. At the same time, folio A can be frozen by > another thread B in folio_migrate_mapping(). In such a case, > folio_undo_large_rmappable() would remove folio A from the local list without > *any* lock protection, creating a race condition with the local list iteration > in deferred_split_scan(). It's a good doubt to raise, but I think we're okay: because Kirill designed the deferred split list (and its temporary local list) to be safe in that way. You're right that if thread B's freeze to 0 wins the race, thread B will be messing with a list on thread A's stack while thread A is quite possibly running; but thread A will not leave that stack frame without taking again the split_queue_lock which thread B holds while removing from the list. We would certainly not want to introduce such a subtlety right now! But never mind page migration, this is how it has always worked when racing with the folio being freed at the same time - maybe deferred_split_scan() wins the freeing race and is the one to remove folio from deferred split list, or maybe the other freer does that. I forget whether there was an initial flurry of races to be fixed when it came in, but it has been stable against concurrent freeing for years. Please think this over again: do not trust my honeyed words! > > Anyway, I think this patch can still fix some possible races. Feel free to > add: > Reviewed-by: Baolin Wang Thanks, but I certainly don't want this to go into the tree if it's still flawed as you suggest. Hugh