Re: kernel.org tooling update - Mauro Carvalho Chehab

ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: users@kernel.org, ksummit@lists.linux.dev
Subject: Re: kernel.org tooling update
Date: Wed, 10 Dec 2025 09:11:53 +0100	[thread overview]
Message-ID: <20251210091153.014a5618@foz.lan> (raw)
In-Reply-To: <20251209-roaring-hidden-alligator-068eea@lemur>

Hi Konstantin,

Em Tue, 9 Dec 2025 23:48:24 -0500
Konstantin Ryabitsev <konstantin@linuxfoundation.org> escreveu:

> I spent a lot of time on trying to integrate AI into b4 workflows, but with
> little to show for it in the end due to lackluster results.
> 
> - Used local ollama as opposed to proprietary services, with the goal to avoid
>   introducing hard dependencies on third-party commercial tooling. This is
>   probably the main reason why my results were not so exciting as what others
>   see with much more powerful models.
> 
> - Focused on thread/series summarization features as opposed to code analysis:
> 
>     - Summarize follow-ups (trailers, acks/nacks received), though this is
>       already fairly well-handed with non-AI tooling.
> 
>     - Gauge "temperature" of the discussion to highlight controversial series.
> 
>     - Gauge quality of the submission; help decide "is this series worth
>       looking at" before maintainers spend their effort looking at it, using
>       maintainer-tailored prompts. This may be better done via CI/patchwork
>       integration, than with b4.
> 
>     - Use LLM to prepare a merge commit message using the cover letter and
>       summarizing the patches.
> 
> I did not end up releasing any features based on that work, because:
> 
>     - LLM was not fantastic at following discussions and keeping a clear
>       picture of who said what, which is kind of crucial for maintainer
>       decision making.
> 
>     - Very large series and huge threads run out fo context window, which
>       causes the LLM to get even worse at "who said what" (and it's
>       already not that great at it).
> 
>     - Thread analysis requires lots of VRAM and a modern graphics card, and is
>       still fairly slow there (I used a fairly powerful GeForce RTX).
> 
>     - Actual code review is best if it happens post-apply in a temporary
>       workdir or a temporary branch, so the agent can see the change in the
>       context of the git tree and the entire codebase, not just the context
>       lines of the patch itself.
> 
> I did have much better success when I worked to represent a thread not as
> multiple messages, but as a single document with all interleaved follow-up
> conversations collated together. However, this was done manually --
> representing emails from arbitrary threads as such collated documents is a
> separate challenge.

I would love to see what you got there. I tried to an experiment similar
to it, also with ollama, writing some code Python code from scratch, aiming
to run locally on my GPU (with has only 16GB VRAM but it is a brand new
RDNA4 GPU), using a prompt similar to this:

            You are an expert at summarizing email threads and discussion forums. 
	    Your task is to analyze the following text, which is a chunk of an
	    email thread with nested replies, and provide a concise, structured summary.

            **Instructions:**
            1.  **Reconstruct the Chronology:** Carefully analyze the indentation levels (e.g., `>>>`, `>`, `>>`) 
		and timestamps to determine the correct order of messages. The oldest message is likely the most indented.
            2.  **Identify Speakers:** For each message, extract the first name from the "From:" field (e.g., "From: John Doe" becomes "John").
            3.  **Consolidate by Topic and Speaker:** Group the main discussion points by topic. 
		For each topic, summarize what each person contributed, consolidating their points 
		even if they appear in multiple messages.
            4.  **Focus on New Information:** Ignore salutations (e.g., "Hi Mike,") and email
		signature blocks. Focus on the substantive content of each message.
            5.  **Output Format:** Provide the summary in the following structure:
                -   **Main Topic(s) of Discussion:** [List 1-3 main topics]
                -   **Summary by Participant:**
                    -   **[First Name 1]:** [Concise summary of their stance, questions,
			or information provided, in chronological order if important.]
                    -   **[First Name 2]:** [Concise summary of their stance, questions,
			or information provided.]
                -   **Outcome/Next Steps:** [Note any conclusions, decisions, or action items agreed upon.]

            **Text to Summarize:**
            {chunk}

Yet, grouping e-mails per thread is a challenge, specially since
I was planning to ask it to summarize it in short time intervals,
so, picking only the newer emails, and re-using already parsed data.

My goal is not to handle patches, as I doubt this would give anything
relevant. Instead, I wanted to keep in track with LKML and other high
traffic mailing lists to pick most relevant threads.

Btw, I got some success summarizing patch series from a given Kernel author
along an entire month using just the e-mail subject, with mistral-small3.2
LLM model, and a somewhat complex prompt. Goal was to summarize how many
patches were submitted, grouping them by threads and different open source
projects. Output were far a way from being perfect, and, if the number of
patches is too big, it starts forgetting about the context - with is one of
the current challenges with current LLM technology - even on proprietary
models.

It sounds to me that, with the current technology, the best approach
would be to ask AI to summarize each e-mail individually, then group 
the results using a non-AI approach (or mixing AI with normal programming).

> Using proprietary models and remote services will probably show better
> results, but I did not have the funds or the inkling to do it (plus see the
> concern for third-party commercial tooling). I may need to collaborate more
> closely with the maintainers already doing it on their own instead of
> continuing my separate work on it.

Yeah, the best is to have this not dependent on proprietary
models or on external GPU farms. I wonder if a DSX Spark would be
reasonably good with its 128GB unified RAM for something like that.
Its price is still too high, but maybe we'll end having similar
models next year to allow local tests with bigger models.

Thanks,
Mauro

next prev parent reply	other threads:[~2025-12-10  8:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-10  4:48 Konstantin Ryabitsev
2025-12-10  8:11 ` Mauro Carvalho Chehab [this message]
2025-12-10 13:30 ` Thorsten Leemhuis
2025-12-11  3:04   ` Theodore Tso
2025-12-12 23:48   ` Stephen Hemminger
2025-12-12 23:54     ` Randy Dunlap
2025-12-16 16:21 ` Lukas Wunner
2025-12-16 20:33   ` Jeff Johnson
2025-12-17  0:47     ` Mario Limonciello
2025-12-18 13:37       ` Jani Nikula
2025-12-18 14:09         ` Mario Limonciello

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251210091153.014a5618@foz.lan \
    --to=mchehab+huawei@kernel.org \
    --cc=konstantin@linuxfoundation.org \
    --cc=ksummit@lists.linux.dev \
    --cc=users@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox