Youtube summaries are low-quality | Voters

Youtube summaries are low-quality

under review

Endre

Always wondered why the summaries (based on Captions) are so low quality, like an old Google Translate output, full of wrong wording and grammar defects; cases aren't aligned, articles are inappropriate (in those languages where there are multiple articles).

May we know which model is used for it? Could it be enhanced? Maybe a model selector for us in this function? :)

March 27, 2025

HoJun Kim

Compared to Monica, Harpa, and Viola, Merlin’s YouTube summaries are basically unusable for the same video. After watching and using hundreds of videos over a long time, I’ve concluded that it’s just way too hard to figure out what a video is about from Merlin’s summaries.

endu

marked this post as

under review

Siddhartha

Its gpt-4o-mini, which should handle all the cases quite well. Any particular video where summaries are bad?

Endre

Siddhartha Sorry for the latency, just noticed your question. Almost every video, it pops out at. I can't show you the issues because, you know, it's not your language, but it spoils objects, plurals, uses the wrong article (in English: "the", but in Hungarian or German, there are several articles), sometimes uses an original term, does not translate it.

If it's GPT-4o-mini (which should not be that blunt) then maybe it's because the SRT is a fragmented text. (So I'm not even surprised that the LLM struggles with an input like an SRT.)

In this case, my idea is the following. I made an experiment with this one: https://www.youtube.com/watch?v=UKcWu1l_UNw

Used the TXT format first, with a prompt like what you find in this experimental step: https://www.getmerlin.in/hu/share/chat/Duuo0tZqNcS
Then added this output to the top of the SRT file, and used a prompt like what you find in this chat: https://chatgpt.com/share/67efa45b-63b4-8000-97b1-0d40c2eae03d (I couldn't make it work in Merlin by the way, it always left the timestamps out from the output, but it worked in ChatGPT, don't know the model though, nor could choose it.) Now the summary is perfect.

I bet this workflow is a completely different concept from what you applied under the hood but I hope you can convert it into your own technique. 🤘�

Siddhartha

Endre: The present workflow is this
Take transcript and use json mode in OpenAI gpt-4o-mini to convert that into a certain format (key points and subpoints).
Then we take each subpoint and keypoint and then give that to GPT-4o to translate that the to the user language.

Endre

Siddhartha I see. Well, now it proved to be an inappropriate process because it generates garbage summary. No surprise, because SRT is a garbage input. 😁

Could you please improve it? My method should be a great starting point.