From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24D13C433B4 for ; Mon, 3 May 2021 08:28:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C2FDA61221 for ; Mon, 3 May 2021 08:28:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C2FDA61221 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 566CD6B0036; Mon, 3 May 2021 04:28:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 53E8D6B006E; Mon, 3 May 2021 04:28:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DE176B0070; Mon, 3 May 2021 04:28:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0193.hostedemail.com [216.40.44.193]) by kanga.kvack.org (Postfix) with ESMTP id 216856B0036 for ; Mon, 3 May 2021 04:28:42 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C07BF8249980 for ; Mon, 3 May 2021 08:28:41 +0000 (UTC) X-FDA: 78099243642.24.8BEFEC3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf05.hostedemail.com (Postfix) with ESMTP id A0B0CE000111 for ; Mon, 3 May 2021 08:28:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1620030520; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oeOHRYo4V+sjXuBsEphc00viyXFIqMXEH0Oj0f+S1EU=; b=Kd84l536yc5a4ViY1qMQlph5VJW0RglzDjppSYQl67BP8fJY7PAfiQQM22DFkEpNxY47bG sGCEFkD4n5Zgexr40M54EL0n0lO7yB49lAPQQwuTs+AMbTmvuG2HvILo95rnHUSnTR+4Mz wrtnrmiOToQdxchJmnt+cfCMxcjDX9s= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-314-7EKpkHwsNiq35vsSmxfyiA-1; Mon, 03 May 2021 04:28:39 -0400 X-MC-Unique: 7EKpkHwsNiq35vsSmxfyiA-1 Received: by mail-wm1-f69.google.com with SMTP id d9-20020a1cb4090000b02901494f55f719so297740wmf.9 for ; Mon, 03 May 2021 01:28:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=oeOHRYo4V+sjXuBsEphc00viyXFIqMXEH0Oj0f+S1EU=; b=ZxtF9YI57O+RoqSHMvtGx7YZ5pLeZFCFOm4wjjyX4809Kcuwp/NswdpVOW4Sl5o5hL Bc/lStFIpNS5xaYLCOLU1td1VrWVvtbHKWL6acaWKa9M6sm1VRR9WoEbDDseNYFFGafr kM5/HDirkK3i5Vw+QgPQ+ONyhbecdsnILsY1Iwytm1aJiQYkoLmfJcgtRjaffeHdEOX4 VwOHG7WFsvcwd6+rvBeRws/ReQ7et5hirZJpMIVyOwBAYEQNxL62BKscqlx384Z3EO+v 0l5PgIhKwtyr+s6NSpqi/2gkt0izicNdHQB+dOZWIi8kuPBO2c65eKRRT6G1TgyWne9v DZKg== X-Gm-Message-State: AOAM53154MUPUXSlzIbSuiNUd79bCCuNnah+Vlf5j20k3k8aJrv+EfrC KbT+gqQ2msFTYB5eEkH76ulndsan+F3al7HNzklk8Oc0d3t8YoNIptd3IuaSpftn+CZKnl8fgIM R6YPFpgFF9KWftmATXtJcyaloYqvgG+aWW1EsFXU6GvuwCMehIYw+bcSgSMg= X-Received: by 2002:a5d:6d85:: with SMTP id l5mr22979407wrs.22.1620030517831; Mon, 03 May 2021 01:28:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzaIYYprmCHdbWEQG0UKlOnfr25C8JpbLNd+9kN/fx46K8mzScJzZQz2ZSZd2ki6w32eeO6Cw== X-Received: by 2002:a5d:6d85:: with SMTP id l5mr22979345wrs.22.1620030517468; Mon, 03 May 2021 01:28:37 -0700 (PDT) Received: from [192.168.3.132] (p5b0c649f.dip0.t-ipconnect.de. [91.12.100.159]) by smtp.gmail.com with ESMTPSA id r5sm12059190wmh.23.2021.05.03.01.28.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 03 May 2021 01:28:37 -0700 (PDT) To: Mike Rapoport Cc: linux-kernel@vger.kernel.org, Andrew Morton , "Michael S. Tsirkin" , Jason Wang , Alexey Dobriyan , "Matthew Wilcox (Oracle)" , Oscar Salvador , Michal Hocko , Roman Gushchin , Alex Shi , Steven Price , Mike Kravetz , Aili Yao , Jiri Bohac , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , Wei Liu , Naoya Horiguchi , linux-hyperv@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org References: <20210429122519.15183-1-david@redhat.com> <20210429122519.15183-8-david@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1 7/7] fs/proc/kcore: use page_offline_(freeze|unfreeze) Message-ID: <5a5a7552-4f0a-75bc-582f-73d24afcf57b@redhat.com> Date: Mon, 3 May 2021 10:28:36 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A0B0CE000111 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Kd84l536; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf05.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com X-Stat-Signature: 9bm9fduh7tuq5h9x9i6s6u6b1znrt4q3 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf05; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1620030517-769363 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 02.05.21 08:34, Mike Rapoport wrote: > On Thu, Apr 29, 2021 at 02:25:19PM +0200, David Hildenbrand wrote: >> Let's properly synchronize with drivers that set PageOffline(). Unfree= ze >> every now and then, so drivers that want to set PageOffline() can make >> progress. >> >> Signed-off-by: David Hildenbrand >> --- >> fs/proc/kcore.c | 15 +++++++++++++++ >> 1 file changed, 15 insertions(+) >> >> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c >> index 92ff1e4436cb..3d7531f47389 100644 >> --- a/fs/proc/kcore.c >> +++ b/fs/proc/kcore.c >> @@ -311,6 +311,7 @@ static void append_kcore_note(char *notes, size_t = *i, const char *name, >> static ssize_t >> read_kcore(struct file *file, char __user *buffer, size_t buflen, lo= ff_t *fpos) >> { >> + size_t page_offline_frozen =3D 0; >> char *buf =3D file->private_data; >> size_t phdrs_offset, notes_offset, data_offset; >> size_t phdrs_len, notes_len; >> @@ -509,6 +510,18 @@ read_kcore(struct file *file, char __user *buffer= , size_t buflen, loff_t *fpos) >> pfn =3D __pa(start) >> PAGE_SHIFT; >> page =3D pfn_to_online_page(pfn); >=20 > Can't this race with page offlining for the first time we get here? To clarify, we have three types of offline pages in the kernel ... a) Pages part of an offline memory section; the memap is stale and not=20 trustworthy. pfn_to_online_page() checks that. We *can* protect against=20 memory offlining using get_online_mems()/put_online_mems(), but usually=20 avoid doing so as the race window is very small (and a problem all over=20 the kernel we basically never hit) and locking is rather expensive. In=20 the future, we might switch to rcu to handle that more efficiently and=20 avoiding these possible races. b) PageOffline(): logically offline pages contained in an online memory=20 section with a sane memmap. virtio-mem calls these pages "fake offline";=20 something like a "temporary" memory hole. The new mechanism I propose=20 will be used to handle synchronization as races can be more severe,=20 e.g., when reading actual page content here. c) Soft offline pages: hwpoisoned pages that are not actually harmful=20 yet, but could become harmful in the future. So we better try to remove=20 the page from the page allcoator and try to migrate away existing users. So page_offline_* handle "b) PageOffline()" only. There is a tiny race=20 between pfn_to_online_page(pfn) and looking at the memmap as we have in=20 many cases already throughout the kernel, to be tackled in the future. (A better name for PageOffline() might make sense; PageSoftOffline()=20 would be catchy but interferes with c). PageLogicallyOffline() is ugly;=20 PageFakeOffline() might do) > =20 >> + /* >> + * Don't race against drivers that set PageOffline() >> + * and expect no further page access. >> + */ >> + if (page_offline_frozen =3D=3D MAX_ORDER_NR_PAGES) { >> + page_offline_unfreeze(); >> + page_offline_frozen =3D 0; >> + cond_resched(); >> + } >> + if (!page_offline_frozen++) >> + page_offline_freeze(); >> + >=20 > Don't we need to freeze before doing pfn_to_online_page()? See my explanation above. Thanks! --=20 Thanks, David / dhildenb