Youtube summaries are low-quality
under review
E
Endre
Always wondered why the summaries (based on Captions) are so low quality, like an old Google Translate output, full of wrong wording and grammar defects; cases aren't aligned, articles are inappropriate (in those languages where there are multiple articles).
May we know which model is used for it? Could it be enhanced? Maybe a model selector for us in this function? :)
endu
under review
S
Siddhartha
Its gpt-4o-mini, which should handle all the cases quite well. Any particular video where summaries are bad?
E
Endre
Siddhartha Sorry for the latency, just noticed your question. Almost every video, it pops out at. I can't show you the issues because, you know, it's not your language, but it spoils objects, plurals, uses the wrong article (in English: "the", but in Hungarian or German, there are several articles), sometimes uses an original term, does not translate it.
If it's GPT-4o-mini (which should not be that blunt) then maybe it's because the SRT is a fragmented text. (So I'm not even surprised that the LLM struggles with an input like an SRT.)
In this case, my idea is the following. I made an experiment with this one: https://www.youtube.com/watch?v=UKcWu1l_UNw
- Used the TXT format first, with a prompt like what you find in this experimental step: https://www.getmerlin.in/hu/share/chat/Duuo0tZqNcS
- Then added this output to the top of the SRT file, and used a prompt like what you find in this chat: https://chatgpt.com/share/67efa45b-63b4-8000-97b1-0d40c2eae03d (I couldn't make it work in Merlin by the way, it always left the timestamps out from the output, but it worked in ChatGPT, don't know the model though, nor could choose it.) Now the summary is perfect.
I bet this workflow is a completely different concept from what you applied under the hood but I hope you can convert it into your own technique. 🤘�
S
Siddhartha
Endre: The present workflow is this
Take transcript and use json mode in OpenAI gpt-4o-mini to convert that into a certain format (key points and subpoints).
Then we take each subpoint and keypoint and then give that to GPT-4o to translate that the to the user language.
E
Endre
Siddhartha I see. Well, now it proved to be an inappropriate process because it generates garbage summary. No surprise, because SRT is a garbage input. 😁
Could you please improve it? My method should be a great starting point.