From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0285CC27C79 for ; Mon, 17 Jun 2024 18:37:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7B24E6B0271; Mon, 17 Jun 2024 14:37:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 73C216B0272; Mon, 17 Jun 2024 14:37:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B4F66B0274; Mon, 17 Jun 2024 14:37:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 36EB86B0271 for ; Mon, 17 Jun 2024 14:37:40 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B2E80161992 for ; Mon, 17 Jun 2024 18:37:39 +0000 (UTC) X-FDA: 82241239038.22.8834845 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf15.hostedemail.com (Postfix) with ESMTP id C00CCA0004 for ; Mon, 17 Jun 2024 18:37:36 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=pz0ETZ1+; spf=pass (imf15.hostedemail.com: domain of 3b4JwZgYKCHAgSObXQUccUZS.QcaZWbil-aaYjOQY.cfU@flex--seanjc.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3b4JwZgYKCHAgSObXQUccUZS.QcaZWbil-aaYjOQY.cfU@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718649454; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VVb9bJW/yXgkQj+PV0cMLcdHxyPeYBELQ67LgqP0P74=; b=Kq5KBcevGN/GPNlLtTfZQJgHG6rZoVPjFgQ9batZ2FLuyXW0bbM2iOYbJOp07qmou8QiC4 uqrnE55BXW2wiCRtBHhGzsaA2H5jHk2T73uvXDW29PRESScie5nskNFVeqSMQuW22Zidpj BzpNeohXvqrIejgdsCm5xrFoOqSWHec= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=pz0ETZ1+; spf=pass (imf15.hostedemail.com: domain of 3b4JwZgYKCHAgSObXQUccUZS.QcaZWbil-aaYjOQY.cfU@flex--seanjc.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3b4JwZgYKCHAgSObXQUccUZS.QcaZWbil-aaYjOQY.cfU@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718649454; a=rsa-sha256; cv=none; b=pERcGiVIEmvqPi21pP/046cae4DVSL4T6JrNEt8ZWK24OY36cnSfewTaT9FNjhexsAk2AP LIYq2zW8wk9/cFrResrUSRCK+qYK3+ZK1VD09RhYb2dslgRhEwJrmY6tu4O4805tEhWRai 7H4ims0kbW+4Mu90qF9YE1/lYm0Qn5w= Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2c315247aedso3721796a91.1 for ; Mon, 17 Jun 2024 11:37:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718649455; x=1719254255; darn=kvack.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=VVb9bJW/yXgkQj+PV0cMLcdHxyPeYBELQ67LgqP0P74=; b=pz0ETZ1+c+qM6Gr4I6BJ995l83GxbFxNkYF8BzVhuGruhcqXUiuGzuKkEUOLFPHcCk OAIaeqsKoV7MPactFB1C360RyXmawPJlj4IKPRNlFSfn8siwxTkzqlETz+W4PXAYJ3vu 5JsJsvf7RL0UEScIUJqaCB1eUSDb8f03zAfzpqPyICYz0EmNFqoo+kx9gV00zMX9i+TE f+c3PK+oOjXhZ94JmpTzWr4AjjcTGMSPkzti2Bw+YJJuok7Fx4YEMHnV+LCheyUDhVb8 HVPL/L9PNrEB7Dls7X2IO9emVnyLzjITxshUO4im61foYKFOX2bzSQLsIrUixMn9mkli jnoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718649455; x=1719254255; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=VVb9bJW/yXgkQj+PV0cMLcdHxyPeYBELQ67LgqP0P74=; b=ABvDrP6Pz6mhjnCqHP6eC6OERjwrPw1fBTepRvVhvPPirh9/ZcJoDLF2NJ0Fglw023 hCR6i43GV8cDo2OdFICwKGTPdwYUXmsFMvGhjpqGpUGakyrXN9AekLQb88AkpoRMvufU Ovpj1CM+ApE9JmP8rsu5f6KeAtZRb6dzCTPvf2j3QZyOqEZN6uNVEsf17BlW5FbUV3YI YDoJIw6ob0zNxkaSU/hYN2gLsHpU7p5JSahBc0//l6CTg5Puqka1HiTEstRJHqlNPjsb wscOQJBtblMjdhk/7wMuLCCZDxlUkpXVWDaRxEsoTTbV1aD9cLN71nnh0wiLdrM8Lokt ymeA== X-Forwarded-Encrypted: i=1; AJvYcCXRAFeProOYi0sNm8gmwq2kvJsPJwTIFr7I7j+RoBNZyt27vShbMfjEdEZoebIGnDZG2wgS0epTRLGIPhtI6vYHp4Q= X-Gm-Message-State: AOJu0YwV/Bc8+4JMbdMghskPx67yoy1LV1CJdxGcJFA3AEvCAOv//4dp 1izS9WNckL+G6jeT1uTk7BQVvYUT3X4uZH1wv/bE+Zl2ZhgQoPCVagFLvAVDR1PcaB7yKvqTXfl 3Hg== X-Google-Smtp-Source: AGHT+IF3ovJSiOKecScB3AmrwPx3A35/rUgMT/905FvA0ZGLjFKkhGdKFyVMw2sGINH1wDrN4Gap7jytJ8g= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90a:9cb:b0:2c4:e074:de83 with SMTP id 98e67ed59e1d1-2c6c9221593mr1951a91.2.1718649455060; Mon, 17 Jun 2024 11:37:35 -0700 (PDT) Date: Mon, 17 Jun 2024 11:37:27 -0700 In-Reply-To: Mime-Version: 1.0 References: Message-ID: Subject: Re: [PATCH v5 4/9] mm: Add test_clear_young_fast_only MMU notifier From: Sean Christopherson To: James Houghton Cc: Yu Zhao , Andrew Morton , Paolo Bonzini , Ankit Agrawal , Axel Rasmussen , Catalin Marinas , David Matlack , David Rientjes , James Morse , Jonathan Corbet , Marc Zyngier , Oliver Upton , Raghavendra Rao Ananta , Ryan Roberts , Shaoqin Huang , Suzuki K Poulose , Wei Xu , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: C00CCA0004 X-Stat-Signature: 5yshxq8t6agj98o5j5otb3z98stftpiz X-HE-Tag: 1718649456-375319 X-HE-Meta: U2FsdGVkX19XnoQKqk+C7hhjN0nToXfR53GHAhJ6IXnAtlY+lmN5GiZvgLUCuogFon+Z2ClfmUWPa5pvT+hdMx4OPCEDC5pnDBnBbzSLI29AIJ7295vxwWw6mdM7AZljIg1DR1BxuFohGVsEfmYKl0swNlq55eo7cSFtVVoCV7ordKWwHW6mJujvCFf3KM+2e7TynHN+H8yiSSORZTpPD3vcmMZXARKKT+gtc2rJazTbyCSwZLAb5YOOx5fw6qhBssi6LbPOx6V6I554MO5Mmj2OrsgEtKeGAOZ+dRnyatezkNyqHC4XiNA7n9V+bcVi153tujayawE9uq6GYp3A2Eq8eS1GYv4nScG5wYIHcFN97WEJxo2Md0fBHzKh/3YHaKFDl9nwSj1IycfZrlHQeb7RYOXc/1y11ifomoWT9YSGjraagsYaVqVU9p/6hpYDiiVs7CWiH26XzWVAf8NPnq2iV0KHJV+BY+EaOCdxSdeyZSqeU73Tg3ebZrHqlypjkFKi/g4PgYcGertOyvzVuhUoQmKBjMG++9VMz0qAhH6h8LborQdkHnd1eYHC0KIW/KTOt3R6FUuAXWU/4GtYA1XoBq8hbqE+XTfCG4suuzcYbk9GVXRd00812hT51P38Veufraj/AuOAPOzIAPHZWVHwxacHjYmA2kiMZXiOo2lU6XMeHyHBckzB60VI3pf8LmThKAKapLoekEHLMasdclqk6RCFfonUtdqx3Qs2Y8vHegqm7t6/LShS6rDpgXXhFICJhlWKfiuRWcb4Fo6WJo6yRDxuQ0CfrIFe3/V/eUGXtRxDpiA6avbvDTsg0cXgOSs8540pWgruP/cyOFu6jYVGwd/2SiuwB+Rr8KUe1ErLWuEOvi6aEkB9IOsrwVnXAlYNqQfXShrKgQlG3gQw47l5bmQNgn6EEomzahj9Hw2lZ8w6Mq0gpURzrqvrBR7KzlsfJgImjlDkVYmOcUA urBOJkvI swwccIt7wnB5x4yNP2sLzkne4LO5lGE8HoQN3wWl0/m0m6jzKbaJdYlnwZlzFSjQEwvuFktR1TQ7NSPsQJOlQ09x58AG5HQgTxUJ6BE/b7ZonUWcHLRT4UgoHtXYlVYwGzTY/Vj3HqRjyoGAsHsNMxJt7742+n0mhGAXGnqCdcSEqHwU7Kqm4v/JiBw+VFns4a85prmSutPbogYunMd5ooxuIDhqnIiGy8Cu5PrAPNJhYCB1aIcws5qou0CCDizj6FgPJH+IoB4E0rQMyiQtg+9QQyBSCgvxvS4yK/T3zgwGTOBBY9cDuWdgeqBgwlBhpod46VM8GLQ77MsFFa2PTZYdnMBblf53gdJDONfr6pwYpkBsGzySm7t/JKeqI0e9/Mk+sRPkCGjgLdBTDoyQOK5Y31IycJOcuUoSclwkwmlXf51WhCHULryKt4j/Ymf2sYGrlONQ9gYAdaWQOvq03TU+s442JTwXxttBgDiUPoF7+mDfOsdW2ijSWAg5Z2EpOO4uSJD4Ni7ty8yJEizOCVt0/dXdcHC1sN3CY0/BnarOkYktKR6PIaUtio7qckTpFHnjRJSu9xSu6snXDf4/COzCQ4Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000005, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 17, 2024, James Houghton wrote: > On Fri, Jun 14, 2024 at 4:17=E2=80=AFPM Sean Christopherson wrote: > > > > On Fri, Jun 14, 2024, James Houghton wrote: > > > On Fri, Jun 14, 2024 at 9:13=E2=80=AFAM Sean Christopherson wrote: > > > > > > > > On Thu, Jun 13, 2024, James Houghton wrote: > > > > > I wonder if this still makes sense if whether or not an MMU is "f= ast" > > > > > is determined by how contended some lock(s) are at the time. > > > > > > > > No. Just because a lock wasn't contended on the initial aging does= n't mean it > > > > won't be contended on the next round. E.g. when using KVM x86's sh= adow MMU, which > > > > takes mmu_lock for write for all operations, an aging operation cou= ld get lucky > > > > and sneak in while mmu_lock happened to be free, but then get stuck= behind a large > > > > queue of operations. > > > > > > > > The fast-ness needs to be predictable and all but guaranteed, i.e. = lockless or in > > > > an MMU that takes mmu_lock for read in all but the most rare paths. > > > > > > Aging and look-around themselves only use the fast-only notifiers, so > > > they won't ever wait on a lock (well... provided KVM is written like > > > that, which I think is a given). > > > > Regarding aging, is that actually the behavior that we want? I thought= the plan > > is to have the initial test look at all MMUs, i.e. be potentially slow,= but only > > do the lookaround if it can be fast. IIUC, that was Yu's intent (and p= eeking back > > at v2, that is indeed the case, unless I'm misreading the code). >=20 > I believe what I said is correct. There are three separate things going o= n here: >=20 > 1. Aging (when we hit the low watermark, scan PTEs to find young pages) > 2. Eviction (pick a page to evict; if it is definitely not young, evict i= t) > 3. Look-around (upon finding a page is young upon attempted eviction, > check adjacent pages if they are young too) Ah, I now see the difference between #1 and #2, and your responses make a l= ot more sense. Thanks! > > If KVM _never_ consults shadow (nested TDP) MMUs, then a VM running an = L2 will > > end up with hot pages (used by L2) swapped out. >=20 > The shadow MMU is consulted at eviction time -- only at eviction time. > So pages used by L2 won't be swapped out unless they're still cold at > eviction time. >=20 > In my (and Yu's) head, not being able to do aging for nested TDP is ok > because running nested VMs is much more rare than running non-nested > VMs. And in the non-nested case, being able to do aging is a strict > improvement over what we have now. Yes and no. Running nested VMs is indeed rare when viewing them as a perce= ntage of all VMs in the fleet, but for many use cases, the primary workload of a = VM is to run nested VMs. E.g. say x% of VMs in the fleet run nested VMs, where '= x' is likely very small, but for those x% VMs, they run nested VMs 99% of the tim= e (completely made up number). So yes, I completely agree that aging for non-nested VMs is a strict improv= ement, but I also think don't think we should completely dismiss nested VMs as a p= roblem not worth solving. > We could look into being able to do aging with the shadow MMU, but I > don't think that should necessarily block this series. ... > > Ooh! Actually, after fiddling a bit to see how feasible fast-aging in = the shadow > > MMU would be, I'm pretty sure we can do straight there for nested TDP. = Or rather, > > I suspect/hope we can get close enough for an initial merge, which woul= d allow > > aging_is_fast to be a property of the mmu_notifier, i.e. would simplify= things > > because KVM wouldn't need to communicate MMU_NOTIFY_WAS_FAST for each n= otification. > > > > Walking KVM's rmaps requires mmu_lock because adding/removing rmap entr= ies is done > > in such a way that a lockless walk would be painfully complex. But if = there is > > exactly _one_ rmap entry for a gfn, then slot->arch.rmap[...] points di= rectly at > > that one SPTE. And with nested TDP, unless L1 is doing something uncom= mon, e.g. > > mapping the same page into multiple L2s, that overwhelming vast majorit= y of rmaps > > have only one entry. That's not the case for legacy shadow paging beca= use kernels > > almost always map a pfn using multiple virtual addresses, e.g. Linux's = direct map > > along with any userspace mappings. =20 ... > Hmm, interesting. I need to spend a little bit more time digesting this. >=20 > Would you like to see this included in v6? (It'd be nice to avoid the > WAS_FAST stuff....) Should we leave it for a later series? I haven't > formed my own opinion yet. I would say it depends on the viability and complexity of my idea. E.g. if= it pans out more or less like my rough sketch, then it's probably worth taking= on the extra code+complexity in KVM to avoid the whole WAS_FAST goo. Note, if we do go this route, the implementation would need to be tweaked t= o handle the difference in behavior between aging and last-minute checks for = eviction, which I obviously didn't understand when I threw together that hack-a-patch= . I need to think more about how best to handle that though, e.g. skipping GF= Ns with multiple mappings is probably the worst possible behavior, as we'd risk evi= cting hot pages. But falling back to taking mmu_lock for write isn't all that de= sirable either.