From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D896E7D246 for ; Tue, 26 Sep 2023 07:39:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 04D6E6B017A; Tue, 26 Sep 2023 03:39:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F3F196B017D; Tue, 26 Sep 2023 03:39:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E2DB56B017E; Tue, 26 Sep 2023 03:39:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C0E846B017A for ; Tue, 26 Sep 2023 03:39:11 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6EF2CB2F68 for ; Tue, 26 Sep 2023 07:39:11 +0000 (UTC) X-FDA: 81277947702.15.947A403 Received: from out-200.mta1.migadu.com (out-200.mta1.migadu.com [95.215.58.200]) by imf23.hostedemail.com (Postfix) with ESMTP id 6AD8814000B for ; Tue, 26 Sep 2023 07:39:09 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=SwrFLDRg; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf23.hostedemail.com: domain of naoya.horiguchi@linux.dev designates 95.215.58.200 as permitted sender) smtp.mailfrom=naoya.horiguchi@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695713949; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3xYp1rdqrFOjO23K/4G/Pj62TX8riTffcX69tavICJ0=; b=cCWdfCm+onsy4mA8G+sVpKGHsWxPCkl/ArO7toXCTRNTYaIT65D+H+bnxzJH1InW5N/8CF H0hcSPLWqdHYc6WS/a37iHFBJE+jGFYP1onlrN2EQrBQhVQnnBNgPzL643v9L3wGfF+HIX ByyFnyoHslC5binUAwYr8g7jNrKbMdU= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=SwrFLDRg; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf23.hostedemail.com: domain of naoya.horiguchi@linux.dev designates 95.215.58.200 as permitted sender) smtp.mailfrom=naoya.horiguchi@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695713949; a=rsa-sha256; cv=none; b=w19ebYiZrUNBeO0MftuNZv6zksIVvlGeaAUH/YXn0Pi3LMVX7cHKe5jHdhjJSicLG6pPUV DNo4feBb586e2GzAA+5SbAyEQuQkOYYMR/fJfjqmR41rqev8Nmah7x0OeHVRcWNp7GJMY5 SyNGaEols2jCjSvuTnRnSJJDhgDd1Hw= Date: Tue, 26 Sep 2023 16:38:57 +0900 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1695713947; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3xYp1rdqrFOjO23K/4G/Pj62TX8riTffcX69tavICJ0=; b=SwrFLDRgZjNBfQAQ9EQHTpsWpVLkPRMa2WxZ6nyxV9YyR7X/B0/gDt96mzvLkSYjl9MJ47 +Kh8YVVFAd/X34brhmFR88XxJAfJad2R097TRJk9ULXn5qc+nuyUxU0hGS9ti2U62rmLKp 8050BZUkH0pNSQ0BoNa8nO7qwtaEM4Y= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Naoya Horiguchi To: ankita@nvidia.com Cc: jgg@nvidia.com, alex.williamson@redhat.com, akpm@linux-foundation.org, tony.luck@intel.com, bp@alien8.de, naoya.horiguchi@nec.com, linmiaohe@huawei.com, aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com, anuaggarwal@nvidia.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-edac@vger.kernel.org, kvm@vger.kernel.org Subject: Re: [PATCH v1 4/4] vfio/nvgpu: register device memory for poison handling Message-ID: <20230926073857.GB1344149@ik1-406-35019.vs.sakura.ne.jp> References: <20230920140210.12663-1-ankita@nvidia.com> <20230920140210.12663-5-ankita@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230920140210.12663-5-ankita@nvidia.com> X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 6AD8814000B X-Stat-Signature: jrsb3kj8erhruutny49imhnu1jwj7et8 X-Rspam-User: X-HE-Tag: 1695713949-625097 X-HE-Meta: U2FsdGVkX18X5lclPdszsFcMeCQ/6mcQbHowQ1ZSVYnM22RBmCQ+ghXaBs0w4dnYhCb/eq9Y9+Hk9BfOhhpdGF+4DdqSmrHFaEmfHRCf5/0TcbXpcXdFmT1tCHScAuMMdRngLSPEPRya2z6Pu+fJRQPCXAN0l6O1GFXmc9jnb0OKByVU3C47gDHITpJ/GqKXI/d3iw3jU+vvsh8xCZ/pTGE9RPH0VeQXwHXLsr1S8W8ftVbo4cDGTY4Vo5d/hoLLlZCv9k8TkSH+ZaQ0CJGQnf2e5R8mJGE+75NlctFkqe/joAUnrrwNIPzkqP7HtCB/yZI6nCetFE+LVhv26sIYeN6CRVTZFnGGwn4pG08dmfyIrKuCgTL7yTZFfYq4ZRKmxOEPQlPq8FYOcnJhyb+UxGP3rUvHhABqC2GqgaMhvefGwzUCwOjjR5eSJJBWCoNKWAZp3LEbJo2tzN/b+pUTcmFXUhBWsLXK02cvW7rpXYU+Bs3hM3mf0eEClWaHuGpLNNbamOtIxbeQsMCsfFMH6azZZZTitXpuF+eZ9hp+8BZWB50IhXIH8sJ2Bl8qVsbQvfgrj8ti7K5bRqDCNkHuVpSaMvw8jAl+w5pOV55baVtq0Iv0VIJEppXIlatMBPo96i7Ngz3Vcp9U603rK0GiUyZQeiNX3eT8AjBtwOTfU2Fx4ZVxENtGF1SFDlwp1QCzq3ua9bBGu6NJswyFq2Xmisgtbp0gOUDkAw094xW0+a4ypWrJPK9d7HhM1ia9m9jv113C5n0zHueaqaF2XU5rdT2urgGlXMY7oTw+upRiL/0zBoO7IEITUW5CvEzlvSTy7xpVJBUoApXLKxLsS7IVHV/Ms7G5nEgySpu27Kh2j7X5WAuzNKsbW6cPq0rEsWq688VCnqI0BxjsPXA3QC2PMGXAQwU/oMqAt2av5h95f6OBLp8K5f/WXQFfUgPAgY0ztdyIeauf6jwzQVEhjpt iVmIWMMw +vzvYpi8r12cmKfq/TcYhu6UFHs4+jJdj2DDhPKnjFzHx3uvlwzjmB3ydlSOwjqErADwP63XOJJDtEuEC46gEF0viOyHQbiBTBTAOElBUrK4Z54xQLarMRcvgGUnDbqKP0fdD7UhQg3Qse6syG3yVyG8L3X4S0d5GWPc3iWXrPAtE1pcyqp0bHm1sr88HrYKQxG97TUzhH6jX+B/mIKXHeSBp1A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 20, 2023 at 07:32:10PM +0530, ankita@nvidia.com wrote: > From: Ankit Agrawal > > The nvgrace-gpu-vfio-pci module [1] maps the device memory to the user VA > (Qemu) using remap_pfn_range() without adding the memory to the kernel. > The device memory pages are not backed by struct page. Patches 1-3 > implements the mechanism to handle ECC/poison on memory page without > struct page and expose a registration function. This new mechanism is > leveraged here. >   > The module registers its memory region with the kernel MM for ECC handling > using the register_pfn_address_space() registration API exposed by the > kernel. It also defines a failure callback function pfn_memory_failure() > to get the poisoned PFN from the MM. >   > The module track poisoned PFN as a bitmap with a bit per PFN. The PFN is > communicated by the kernel MM to the module through the failure function, > which sets the appropriate bit in the bitmap. >   > The module also defines a VMA fault ops for the module. It returns > VM_FAULT_HWPOISON in case the bit for the PFN is set in the bitmap. > > [1] https://lore.kernel.org/all/20230915025415.6762-1-ankita@nvidia.com/ > > Signed-off-by: Ankit Agrawal > --- ... > @@ -406,6 +494,19 @@ nvgrace_gpu_vfio_pci_fetch_memory_property(struct pci_dev *pdev, > > nvdev->memlength = memlength; > > +#ifdef CONFIG_MEMORY_FAILURE > + /* > + * A bitmap is maintained to track the pages that are poisoned. Each > + * page is represented by a bit. Allocation size in bytes is > + * determined by shifting the device memory size by PAGE_SHIFT to > + * determine the number of pages; and further shifted by 3 as each > + * byte could track 8 pages. > + */ > + nvdev->pfn_bitmap > + = vzalloc((nvdev->memlength >> PAGE_SHIFT)/BITS_PER_TYPE(char)); > + if (!nvdev->pfn_bitmap) > + ret = -ENOMEM; > +#endif > return ret; > } > I assume that memory failure is a relatively rare event (otherwise the device is simply broken and it's better to stop using it), so the bitmap is mostly full of zeros. I think that the size of device memory is on the order of 100GB, then the bitmap size is about 3.2MB, which might be not too large in modern systems, but using other data structure with smaller memory footprint like hash table can be more beneficial? Thanks, Naoya Horiguchi