From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 596ECC4332D for ; Thu, 19 Mar 2020 18:17:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 045C520724 for ; Thu, 19 Mar 2020 18:17:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="j8pnhR3Y" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 045C520724 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 743736B000A; Thu, 19 Mar 2020 14:17:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F3E16B000C; Thu, 19 Mar 2020 14:17:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E2C16B000D; Thu, 19 Mar 2020 14:17:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0018.hostedemail.com [216.40.44.18]) by kanga.kvack.org (Postfix) with ESMTP id 4864B6B000A for ; Thu, 19 Mar 2020 14:17:19 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id F30651ADAA for ; Thu, 19 Mar 2020 18:17:18 +0000 (UTC) X-FDA: 76612918956.21.sign07_7a07331172f25 X-HE-Tag: sign07_7a07331172f25 X-Filterd-Recvd-Size: 12706 Received: from mail-qt1-f193.google.com (mail-qt1-f193.google.com [209.85.160.193]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Thu, 19 Mar 2020 18:17:18 +0000 (UTC) Received: by mail-qt1-f193.google.com with SMTP id d12so240784qtj.4 for ; Thu, 19 Mar 2020 11:17:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=k5kvdyWMY1o6QuKYjzO+Ges86hYeEHCBwSk9Em5z3RY=; b=j8pnhR3Ywal5Z1Ent5/7bTVo3Ir3yTvmYBTrp/3TgZ1En/8VgLPzeQNOQHocQ12Fbq +wlp/G9KV5LlvxtOL4lNeHPnUqx0JKyhI/wJx4Fe1/u5387bNzRW2EQpOBhAQpvf7O9d bHrajTsRgKVQHN74GEvPveGDOcgcu6xf8gjD8QKG8ib4/H8m/FI/jD2gu3QeYxi2GFVr odN7SjSmQJx6xMNo9Nc1TVDmMlbqZ/r5iNmpgCYw+cbn0V9fm1rrH/TBS/q4fVMGw3Q5 mmxcWC57wzoY3QOun7fHS9uZLTrCPfClXiPtMxLHjQb8pwbO3lCwGxSmikmrAshS9R4D HsZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=k5kvdyWMY1o6QuKYjzO+Ges86hYeEHCBwSk9Em5z3RY=; b=WEJ5lN2rCIweDitAE217kb2JVjRxYXfhRMlhuexo9R/NtVaR3amCnMVm7hMMpl4S4g OPHSXJFVJ88r5zNqrvPxUDdc79/jIAR6ZtsFCkPwcBTKgx1E3F7Yi94Cr65oIL1n9HgU WgkNf6tHkDou9LQlzLt+56s9b+oHiGESVz1oobcT+EsMPZbuo803ljNsMI/WfnYdo6wo p8R10pAglR4NPtq6HA/ZfoAljZHejYYZtVyCGHHFx7hAuI3jvy7sQwx26kIBylYeYHZs d76UU0UEZ7jQKc6OXRc841kv59RjzwW1mq0HnqIYquAymzMaJDag8cC1Bm6y61kUvQTV dMjg== X-Gm-Message-State: ANhLgQ0739usMVB630dd0nPzWsyQ+0TVvoM7ulCO2CDPmXObQS97MIUP 2IbfmIkcU+dCeDfCJaIFm3+xmJ7w6SHqVw== X-Google-Smtp-Source: ADFU+vs9n45VQ6cNCCfzK9jn/XCI43PqbzGXPuVaqhPNr6PSXuviZAOAXhocW0lnqyPDfm1FoSx9qg== X-Received: by 2002:ac8:6b54:: with SMTP id x20mr4299882qts.41.1584641837615; Thu, 19 Mar 2020 11:17:17 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-68-57-212.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.57.212]) by smtp.gmail.com with ESMTPSA id 199sm2067530qkm.7.2020.03.19.11.17.16 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 19 Mar 2020 11:17:16 -0700 (PDT) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1jEzjI-0002Wc-6t; Thu, 19 Mar 2020 15:17:16 -0300 Date: Thu, 19 Mar 2020 15:17:16 -0300 From: Jason Gunthorpe To: Ralph Campbell Cc: Christoph Hellwig , Dan Williams , Bharata B Rao , Christian =?utf-8?B?S8O2bmln?= , Ben Skeggs , Jerome Glisse , kvm-ppc@vger.kernel.org, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, linux-mm@kvack.org Subject: Re: [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault Message-ID: <20200319181716.GK20941@ziepe.ca> References: <20200316193216.920734-1-hch@lst.de> <20200316193216.920734-4-hch@lst.de> <7256f88d-809e-4aba-3c46-a223bd8cc521@nvidia.com> <20200317121536.GQ20941@ziepe.ca> <20200317122445.GA11662@lst.de> <20200317122813.GA11866@lst.de> <20200317124755.GR20941@ziepe.ca> <20200317125955.GA12847@lst.de> <24fca825-3b0f-188f-bcf2-fadcf3a9f05a@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <24fca825-3b0f-188f-bcf2-fadcf3a9f05a@nvidia.com> User-Agent: Mutt/1.9.4 (2018-02-28) Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 17, 2020 at 04:14:31PM -0700, Ralph Campbell wrote: >=20 > On 3/17/20 5:59 AM, Christoph Hellwig wrote: > > On Tue, Mar 17, 2020 at 09:47:55AM -0300, Jason Gunthorpe wrote: > > > I've been using v7 of Ralph's tester and it is working well - it ha= s > > > DEVICE_PRIVATE support so I think it can test this flow too. Ralph = are > > > you able? > > >=20 > > > This hunk seems trivial enough to me, can we include it now? > >=20 > > I can send a separate patch for it once the tester covers it. I don'= t > > want to add it to the original patch as it is a significant behavior > > change compared to the existing code. > >=20 >=20 > Attached is an updated version of my HMM tests based on linux-5.6.0-rc6= . > I ran this OK with Jason's 8+1 HMM patches, Christoph's 1-5 misc HMM cl= ean ups, > and Christoph's 1-4 device private page changes applied. I'd like to get this to mergable, it looks pretty good now, but I have no idea about selftests - and I'm struggling to even compile the tools dir > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index 69def4a9df00..4d22ce7879a7 100644 > +++ b/lib/Kconfig.debug > @@ -2162,6 +2162,18 @@ config TEST_MEMINIT > =20 > If unsure, say N. > =20 > +config TEST_HMM > + tristate "Test HMM (Heterogeneous Memory Management)" > + depends on DEVICE_PRIVATE > + select HMM_MIRROR > + select MMU_NOTIFIER extra spaces In general I wonder if it even makes sense that DEVICE_PRIVATE is user selectable? > +static int dmirror_fops_open(struct inode *inode, struct file *filp) > +{ > + struct cdev *cdev =3D inode->i_cdev; > + struct dmirror *dmirror; > + int ret; > + > + /* Mirror this process address space */ > + dmirror =3D kzalloc(sizeof(*dmirror), GFP_KERNEL); > + if (dmirror =3D=3D NULL) > + return -ENOMEM; > + > + dmirror->mdevice =3D container_of(cdev, struct dmirror_device, cdevic= e); > + mutex_init(&dmirror->mutex); > + xa_init(&dmirror->pt); > + > + ret =3D mmu_interval_notifier_insert(&dmirror->notifier, current->mm, > + 0, ULONG_MAX & PAGE_MASK, &dmirror_min_ops); > + if (ret) { > + kfree(dmirror); > + return ret; > + } > + > + /* Pairs with the mmdrop() in dmirror_fops_release(). */ > + mmgrab(current->mm); > + dmirror->mm =3D current->mm; The notifier holds a mmgrab, no need for another one > + /* Only the first open registers the address space. */ > + filp->private_data =3D dmirror; Not sure what this comment means > +static inline struct dmirror_device *dmirror_page_to_device(struct pag= e *page) > + > +{ > + struct dmirror_chunk *devmem; > + > + devmem =3D container_of(page->pgmap, struct dmirror_chunk, pagemap); > + return devmem->mdevice; > +} extra devmem var is not really needed > + > +static bool dmirror_device_is_mine(struct dmirror_device *mdevice, > + struct page *page) > +{ > + if (!is_zone_device_page(page)) > + return false; > + return page->pgmap->ops =3D=3D &dmirror_devmem_ops && > + dmirror_page_to_device(page) =3D=3D mdevice; > +} Use new owner stuff, right? Actually this is redunant now, the check should be just WARN_ON pageowner !=3D self owner > +static int dmirror_do_fault(struct dmirror *dmirror, struct hmm_range = *range) > +{ > + uint64_t *pfns =3D range->pfns; > + unsigned long pfn; > + > + for (pfn =3D (range->start >> PAGE_SHIFT); > + pfn < (range->end >> PAGE_SHIFT); > + pfn++, pfns++) { > + struct page *page; > + void *entry; > + > + /* > + * HMM_PFN_ERROR is returned if it is accessing invalid memory > + * either because of memory error (hardware detected memory > + * corruption) or more likely because of truncate on mmap > + * file. > + */ > + if (*pfns =3D=3D range->values[HMM_PFN_ERROR]) > + return -EFAULT; Unless that snapshot is use hmm_range_fault() never returns success and sets PFN_ERROR, so this should be a WARN_ON > + if (!(*pfns & range->flags[HMM_PFN_VALID])) > + return -EFAULT; Same with valid. > + page =3D hmm_device_entry_to_page(range, *pfns); > + /* We asked for pages to be populated but check anyway. */ > + if (!page) > + return -EFAULT; WARN_ON > + if (is_zone_device_page(page)) { > + /* > + * TODO: need a way to ask HMM to fault foreign zone > + * device private pages. > + */ > + if (!dmirror_device_is_mine(dmirror->mdevice, page)) > + continue; Actually re > +static bool dmirror_interval_invalidate(struct mmu_interval_notifier *= mni, > + const struct mmu_notifier_range *range, > + unsigned long cur_seq) > +{ > + struct dmirror *dmirror =3D container_of(mni, struct dmirror, notifie= r); > + struct mm_struct *mm =3D dmirror->mm; > + > + /* > + * If the process doesn't exist, we don't need to invalidate the > + * device page table since the address space will be torn down. > + */ > + if (!mmget_not_zero(mm)) > + return true; Why? Don't the notifiers provide for this already.=20 mmget_not_zero() is required before calling hmm_range_fault() though > +static int dmirror_fault(struct dmirror *dmirror, unsigned long start, > + unsigned long end, bool write) > +{ > + struct mm_struct *mm =3D dmirror->mm; > + unsigned long addr; > + uint64_t pfns[64]; > + struct hmm_range range =3D { > + .notifier =3D &dmirror->notifier, > + .pfns =3D pfns, > + .flags =3D dmirror_hmm_flags, > + .values =3D dmirror_hmm_values, > + .pfn_shift =3D DPT_SHIFT, > + .pfn_flags_mask =3D ~(dmirror_hmm_flags[HMM_PFN_VALID] | > + dmirror_hmm_flags[HMM_PFN_WRITE]), > + .default_flags =3D dmirror_hmm_flags[HMM_PFN_VALID] | > + (write ? dmirror_hmm_flags[HMM_PFN_WRITE] : 0), > + .dev_private_owner =3D dmirror->mdevice, > + }; > + int ret =3D 0; > + > + /* Since the mm is for the mirrored process, get a reference first. *= / > + if (!mmget_not_zero(mm)) > + return 0; Right > + for (addr =3D start; addr < end; addr =3D range.end) { > + range.start =3D addr; > + range.end =3D min(addr + (ARRAY_SIZE(pfns) << PAGE_SHIFT), end); > + > + ret =3D dmirror_range_fault(dmirror, &range); > + if (ret) > + break; > + } > + > + mmput(mm); > + return ret; > +} > + > +static int dmirror_do_read(struct dmirror *dmirror, unsigned long star= t, > + unsigned long end, struct dmirror_bounce *bounce) > +{ > + unsigned long pfn; > + void *ptr; > + > + ptr =3D bounce->ptr + ((start - bounce->addr) & PAGE_MASK); > + > + for (pfn =3D start >> PAGE_SHIFT; pfn < (end >> PAGE_SHIFT); pfn++) { > + void *entry; > + struct page *page; > + void *tmp; > + > + entry =3D xa_load(&dmirror->pt, pfn); > + page =3D xa_untag_pointer(entry); > + if (!page) > + return -ENOENT; > + > + tmp =3D kmap(page); > + memcpy(ptr, tmp, PAGE_SIZE); > + kunmap(page); > + > + ptr +=3D PAGE_SIZE; > + bounce->cpages++; > + } > + > + return 0; > +} > + > +static int dmirror_read(struct dmirror *dmirror, struct hmm_dmirror_cm= d *cmd) > +{ > + struct dmirror_bounce bounce; > + unsigned long start, end; > + unsigned long size =3D cmd->npages << PAGE_SHIFT; > + int ret; > + > + start =3D cmd->addr; > + end =3D start + size; > + if (end < start) > + return -EINVAL; > + > + ret =3D dmirror_bounce_init(&bounce, start, size); > + if (ret) > + return ret; > + > +again: > + mutex_lock(&dmirror->mutex); > + ret =3D dmirror_do_read(dmirror, start, end, &bounce); > + mutex_unlock(&dmirror->mutex); > + if (ret =3D=3D 0) > + ret =3D copy_to_user((void __user *)cmd->ptr, bounce.ptr, > + bounce.size); Use u64_to_user_ptr() instead of the cast > +static int dmirror_write(struct dmirror *dmirror, struct hmm_dmirror_c= md *cmd) > +{ > + struct dmirror_bounce bounce; > + unsigned long start, end; > + unsigned long size =3D cmd->npages << PAGE_SHIFT; > + int ret; > + > + start =3D cmd->addr; > + end =3D start + size; > + if (end < start) > + return -EINVAL; > + > + ret =3D dmirror_bounce_init(&bounce, start, size); > + if (ret) > + return ret; > + ret =3D copy_from_user(bounce.ptr, (void __user *)cmd->ptr, > + bounce.size); ditto > + if (ret) > + return ret; > + > +again: > + mutex_lock(&dmirror->mutex); > + ret =3D dmirror_do_write(dmirror, start, end, &bounce); > + mutex_unlock(&dmirror->mutex); > + if (ret =3D=3D -ENOENT) { > + start =3D cmd->addr + (bounce.cpages << PAGE_SHIFT); > + ret =3D dmirror_fault(dmirror, start, end, true); > + if (ret =3D=3D 0) { > + cmd->faults++; > + goto again; Use a loop instead of goto? Also I get this: lib/test_hmm.c: In function =E2=80=98dmirror_devmem_fault_alloc_and_copy=E2= =80=99: lib/test_hmm.c:1041:25: warning: unused variable =E2=80=98vma=E2=80=99 [-= Wunused-variable] 1041 | struct vm_area_struct *vma =3D args->vma; But this is a kernel bug, due to alloc_page_vma being a #define not a static inline and me having CONFIG_NUMA off in this .config Jason