From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB7C6C30653 for ; Thu, 27 Jun 2024 22:30:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5F5A36B0096; Thu, 27 Jun 2024 18:30:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 57D086B0098; Thu, 27 Jun 2024 18:30:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3CFC96B0099; Thu, 27 Jun 2024 18:30:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 1B0A16B0096 for ; Thu, 27 Jun 2024 18:30:09 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 95678A441F for ; Thu, 27 Jun 2024 22:30:08 +0000 (UTC) X-FDA: 82278112896.26.19D4F1E Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) by imf25.hostedemail.com (Postfix) with ESMTP id C1870A0012 for ; Thu, 27 Jun 2024 22:30:06 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=g38qbN8o; spf=pass (imf25.hostedemail.com: domain of yuzhao@google.com designates 209.85.128.45 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719527398; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=58qVww7f0rf9S4s7uhywaPpaUHviHgiTVylBLsqtTzQ=; b=JsdjFHWLxTJnLg1/BXVR+J2TXEtSYQcK3rYgn6f61PSpQFzT662b5fsR1j93JM+A4tTN45 a4grVVb4IVITSlwjs0kJtmo3h++8tYT5wku0gCbxSLb9ZuYBNaquKHu1YmbKZmWOrOllnT qtKDS2yitq5ZB+njkYM1oEFPmoVBo0A= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=g38qbN8o; spf=pass (imf25.hostedemail.com: domain of yuzhao@google.com designates 209.85.128.45 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719527398; a=rsa-sha256; cv=none; b=Cd3nzCSkgk84dYDjClH42BLpb+JM3if5477OEODkP2T3icP5kyiNHgcU3pqUERkKuWQJoW U97hQ0xTg8k8FI0vx9E5t1B62pJoCQdUBB+yGBF+Yj4EqbOpdptJjQ/+ToIdoIJh1xmr5O uQfbUpK4OaVBBInRQUP+nFprh/an5EA= Received: by mail-wm1-f45.google.com with SMTP id 5b1f17b1804b1-4255f915611so9205e9.0 for ; Thu, 27 Jun 2024 15:30:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1719527405; x=1720132205; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=58qVww7f0rf9S4s7uhywaPpaUHviHgiTVylBLsqtTzQ=; b=g38qbN8oYIwUOzzunc2t6t/NhhgaXz+MDj9KRNd41Peh3pebJVmjyk3ANlEC2DkYzJ Pk3bjO01FFHVW/TjIJsfkXe4GhoUPSuJb0vFU7dfvz4UivTpy+MK83+L9dkhaAO+WQW/ bK20mLjrWQszzSZt7NqqT0VRtpk6QYFx4V2e3tNlnFQzaJJ9U3THMoKmnNcWNN1EtYfv QVS83M/mJzRhbwKxTo7ZO5ixWOUkjUTZK8powoNk8ZdF2AnoKquTL90XzWaxRfRK+vy7 MsoNexQyPn0TPkTlnJcB7TQdX5//Ai/bo5VczV4H7iBetJEVEo5x0KKXjU/dTWPR98Hd 0o4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719527405; x=1720132205; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=58qVww7f0rf9S4s7uhywaPpaUHviHgiTVylBLsqtTzQ=; b=qZaV23+1rxBXySgFqsMK4v+6D9GfKShQ3fAW4GPn4ScHtZEYocUNulIYZGrRiPtXB9 GLYTQGDj+vCf+X94iHGzLR3ZlhSr3P5pvymQMfVBoP2GS0pJOfOFeYMEXBTxw3YvPXqN 11+BxCjdf6j0cmIzlg7rkoDl3RrLFsBJ8KwabNSPNYpmmd5cXqg7RLkjFpAwcNK5jNaz vWpXC/JMs1Jzr7ORU+poslBM3S7X/M9LKr9/zZBWvO1rGIXa5gjdysCWFe8dE9inmFpv HlQ5PO+6aaZFVH4UVOUHbH9upwbj51EMLk9a+PN2csVJSoRlu46Ywc73Uj+4FuA4aj1b cEqw== X-Forwarded-Encrypted: i=1; AJvYcCWKrl7HE9UG/B70zRgCLDObCH/3IgszH1RPr5hpvYSxc3byFKglvYbu+WxjMbRVUBiWSaX5ZNAu7GV51Meew81vJEM= X-Gm-Message-State: AOJu0YztccREsRLhTcv3ixnSVRI1I5xZTtnCBDnbqXEv/dnYeJazcceR DuUxBbja6XUnqs1lAdY6nagfAv1MfmA1VhNpyWcOlRgOpHRBGY2rSDxOh+LzzMUoQ1kMEGP/Q+g /d747KBAEcB03pAHOU30VnSWY+KtvDdv2Gcgd X-Google-Smtp-Source: AGHT+IHMew38O538uLnlg7SpZpTHY59xmxUx61uW+Hjbg9Utnj+/9MBmXx14QwI2eDPC3JVBfRAAOzdk4zR/xoQg8As= X-Received: by 2002:a05:600c:4f4f:b0:424:8b0c:156a with SMTP id 5b1f17b1804b1-4256b737260mr621875e9.2.1719527404944; Thu, 27 Jun 2024 15:30:04 -0700 (PDT) MIME-Version: 1.0 References: <20240627044335.2698578-1-yuzhao@google.com> <379a225a-3e26-4adc-9add-b4d931c55a9a@linux.dev> In-Reply-To: <379a225a-3e26-4adc-9add-b4d931c55a9a@linux.dev> From: Yu Zhao Date: Thu, 27 Jun 2024 16:29:28 -0600 Message-ID: Subject: Re: [PATCH mm-unstable v1] mm/hugetlb_vmemmap: fix race with speculative PFN walkers To: Muchun Song Cc: Andrew Morton , David Hildenbrand , Frank van der Linden , "Matthew Wilcox (Oracle)" , Peter Xu , Yang Shi , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: C1870A0012 X-Stat-Signature: ynghny87h4wx9s336knhjnytj9w9rqqa X-HE-Tag: 1719527406-138467 X-HE-Meta: U2FsdGVkX1+IzknYf0/O/KZyODGUL3Rh+Ys+sjqI9z4ZazHqjFIfYYEfwl8MvQt0mxxB206weHTlw6bIQXUlZQKEhybO75r4XGouTxgDr83aYc4DHvu9wJuOg3zqZL3fJaHxxpQ6PNTH2HwFf3lk7KTwAKmky+ToXxxV7BEqhZTzgUL47v+UR6FaM5P9TYdF52FzLksoT0exEHy8CWmXlJjqN5Qajg4gkVVPdOEI1PdZV/jbZ2K8Or2SB5wJWDZQfKgvz71d+XM3YSFuKC17YmfF6vZB6+ZvW3/4+3rPydJTrXdzgat48neYHUVMEWPI/FJdjk+l1vMxQjzcvsmRllD/dZLDqyZLJNemeULO8F1T/bpz946aQl5fupjfl8wBJJlORB9+X41roi+ygYpINCSQnAH/Zm/mPhVh05ituaduVl+0XGkugorCdOxWJABstJtswu2jNPf4OvHRZf7uVitcB3dLejt4jGe2pft30rVrUANT59Ft+UV3Qso36hzPSPC+TBYQ8CNwM1BdcF8NPfJ31gdB7K1aL5cEzgSLs44pEIeC3C3apTDoenpGioXmG+C6kdDXW70iiA3CP3ujb/bZ3w4NZrcSOrKwalHNSEO6odnLR8RG9xlJg8t7eBGuanTX4T4h+o5FP7FtpwZ/NEBiltoRvbxRZmakR8dvQHN6WPL8YvOLdOoVWlnlJuGZPJt8iw68pWBrXb/V6NXlkqXpffxiSHAosrq0zNmuylDmdwQrL8XHEmuINcYRVpZWYIPSMN6u/DFU388l7jZregZhmUrqcXjzUdq6RuPpohreXdFXK2z8g/WIUWK8IS1KLV3OesW6YGT6QDVm1lJU/r+iHSu1vuEfAinJnO2mcD2Aeitr2Pzux0uWEBbrBbzxu8VwbBvWBcnHDo7kFAsoFSpBpTKnASD2+6qSyPRt0fv8AVTYBeyRMV9/HLwofcrKsT8G6kMNEqzGSRsDI0N LfqdsQIz WT9r9XLi61s7RKMvUPJ4UAPe9YQqtHSpKLX7jDE/BiYV0gQgU2xMheJN9tudSAklyxTcss0otsyfoDmAMmiLn1N4kkWNleCtfnkJCTP04K6nx0K3D9bG9Qb2hjFgmNzRoooqSes+8k6oko43XcYIJIPpQlE/InpxxpjPGTgAedhnRkkjNwCOSuv14CNq/KMxiAB17DtzKebJpwRQi4aKAEDmUTKBSh4bb2RNhicA7L4B81sAkFr5dOhjl/5uUqUP4cDb1KzqerOn7aaFKdAOYzU4UICExDU/gNw/2VYfqQDoOvr68WL5UAXyTtdq0xM5xZvO7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.002929, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 27, 2024 at 1:25=E2=80=AFAM Muchun Song = wrote: > > > > On 2024/6/27 12:43, Yu Zhao wrote: > > While investigating HVO for THPs [1], it turns out that speculative > > PFN walkers like compaction can race with vmemmap modifications, e.g., > > > > CPU 1 (vmemmap modifier) CPU 2 (speculative PFN walker) > > ------------------------------- ------------------------------ > > Allocates an LRU folio page1 > > Sees page1 > > Frees page1 > > > > Allocates a hugeTLB folio page2 > > (page1 being a tail of page2) > > > > Updates vmemmap mapping page1 > > get_page_unless_zero(page1) > > > > Even though page1->_refcount is zero after HVO, get_page_unless_zero() > > can still try to modify this read-only field, resulting in a crash. > > > > An independent report [2] confirmed this race. > > > > There are two discussed approaches to fix this race: > > 1. Make RO vmemmap RW so that get_page_unless_zero() can fail without > > triggering a PF. > > 2. Use RCU to make sure get_page_unless_zero() either sees zero > > page->_refcount through the old vmemmap or non-zero page->_refcount > > through the new one. > > > > The second approach is preferred here because: > > 1. It can prevent illegal modifications to struct page[] that has been > > HVO'ed; > > 2. It can be generalized, in a way similar to ZERO_PAGE(), to fix > > similar races in other places, e.g., arch_remove_memory() on x86 > > [3], which frees vmemmap mapping offlined struct page[]. > > > > While adding synchronize_rcu(), the goal is to be surgical, rather > > than optimized. Specifically, calls to synchronize_rcu() on the error > > handling paths can be coalesced, but it is not done for the sake of > > Simplicity: noticeably, this fix removes ~50% more lines than it adds. > > I suggest adding some user-visible effect here like for use > case of nr_overcommit_hugepages, synchronize_rcu() will make > this use case worse. > > > > > [1] https://lore.kernel.org/20240229183436.4110845-4-yuzhao@google.com/ > > [2] https://lore.kernel.org/917FFC7F-0615-44DD-90EE-9F85F8EA9974@linux.= dev/ > > [3] https://lore.kernel.org/be130a96-a27e-4240-ad78-776802f57cad@redhat= .com/ > > > > Signed-off-by: Yu Zhao > > Acked-by: Muchun Song > > A nit below. Thanks for reviewing! I've addressed all your suggestions in v2.