I absolutely love ffmpeg, but for the life of me I cannot understand how its pip...

xbmcuser · on Feb 28, 2023

I know a lot of people look down on ChatGpt. But I have been using it for creating scripts to use with ffmpeg and I was able to get most of the things that I needed with very little massaging required. You can then ask ChatGpt about what it did and why so it will explain it as well so you can get some basic understanding on how things work. People don't realise what kind of tool Chat Gpt is and how to properly utilise it but it can be very useful for stuff like this.

amadvance · on Feb 28, 2023

It really works! With few interactions, ChatGPT was able to create a not so obvious -filter_complex pipeline, like:

  ffmpeg -ss 00:01:00 -t 00:02:00 -i input1.mp4 -ss 00:03:00 -t 00:02:30 -i input2.mp4 -filter_complex "[0:v][0:a][1:v][1:a]concat=n=2:v=1:a=1[v][a];[v]eq=brightness=0.3[outv]" -map "[outv]" output.mp4

m3kw9 · on Feb 28, 2023

I asked gpt3 and here is what it said:

This command uses FFmpeg to create a single output video file from two input video files. It starts by specifying a start and duration for each of the two files (input1.mp4 and input2.mp4). It then applies a filter complex to the two files, which combines the two videos and audio into one stream, and adds a brightness filter with a value of 0.3. Finally, it maps the output video stream to the output file (output.mp4).

pjc50 · on Feb 28, 2023

And what does all that filter_complex do?

suzumer · on Feb 28, 2023

-filter_complex specifies a series of filters that accept inputs and return outputs. Any value contained in brackets ([]), is a value that can be input or output by a filter. [0:v],[0:a],[1:v],[1:a] are values supplied by ffmpeg representing the video and audio streams of the 1st and 2nd inputs, in this case input1.mp4 and input2.mp4.

The first filter, concat, takes in a set of synchronized audio and video segments, concatenates them, and returns the resulting audio and video clips. n specifies the number of segments, v specifies the number of output video clips, and a specifies the number of output audio clips. The results are saved to the values of [v] and [a] for video and audio respectively.

The eq filter then takes the [v] video returned by concat, and adjusts the value to a brightness of 0.3. For reference, 0 represent no change to the brightness.

This [v] value is then mapped to the output video using -map.

That being said, this filter isn't correct, as the [a] value is never used or mapped, so the filter would fail. The correct way to write the filter, if the intended use is to discard the audio, would be:

  -filter_complex "[0:v][1:v]concat[v];[v]eq=brightness=0.3[outv]"

I omitted the n,v, and a values in the concat filter, as they are by default 2,1, and 0 respectively.

Another way to visualize this filter in an imperative style would look like this:

  def filter(input0, input1) {
    v = concat(input0.v,input1.v);
    outv = eq(v,brightness=0.3);
    return outv;
  }

pancrufty · on Feb 28, 2023

I think you can ask that to ChatGPT in a follow up question ;)

nolok · on Feb 28, 2023

I understand what you mean.

But I also understand my sister doesn't need to know how her phone does any of what it does to play candy crush or read her emails.

Just like she doesn't need to know how a microwave works to reheat her meal.

If you want to know how things are done, of course get yourself involved in the details, but for most things in life you just want to use it without bothering with the details, so you can focus of the parts that are of interests to you.

(I know some people like to know the details of everything, and maybe you are one of them, and that's great, but the vast majority of people do not)

nicky0 · on Feb 28, 2023

Yes, chat GPT excels at comprehending and explaining things that have a consistent structure, restructuring, and and synthesising variations. If you keep it in its lane, it’s an excellent tool.

It’s really really bad at counting though. For example, try asking it to produce a line of 40 asterisks.

gamegoblin · on Feb 28, 2023

It’s bad at counting because counting relies on a stateful O(N) algorithm you run in your brain.

GPT is trained to reproduce human text, which tends to simply have the output of this O(N) counting process, but not the process itself. So GPT “thinks” it should be able to just spit out the number just like human text implies we do. It doesn’t know we are relying on an offline O(N) algorithm.

If you have it emit a numbered list of 40 elements, it will succeed, because producing a numbered list embeds the O(N) process and state into the text, which is the only thing it can see and reason about.

nicky0 · on Feb 28, 2023

That’s very interesting. I assumed it was something about the fact that it is a language model rather than a calculating machine. So printing 44 asterisks instead of 40 is kind of close.

I wonder if would it be possible to teach the machine to recognise situations it’s better at and be less confident other answers? Or does it need to be confident about everything in order to produce good answers where it does.

It’s kind of funny how confident chatgpt is about giving out bullshit, and then even when you correct it, it says oh I’m terribly sorry, here is definitely the correct answer this time and then it gives you another wrong answer. Just an observation, I realise it is just a tool that you have to understand the limitations of.

OmarAssadi · on Feb 28, 2023

> here is definitely the correct answer this time and then it gives you another wrong answer.

My favorite is when it gets into some weird context loop, apologizes and claims to have corrected an issue, but gives you literally, character-for-character, the same answer it gave before.

Fortunately, it mostly happens to me when I am asking particularly ambiguous or weird questions -- e.g., asking for any assembly in AT&T/GAS syntax seems to always go wrong, not necessarily in terms of the logic itself, but rather that it ends up mixing Intel and AT&T, or asking explicitly for POSIX-compliant shell often gives weird Bash/GNUisms, presumably since so many StackOverflow posts seem to conflate all shells with Bash and always expect GNU coreutils.

jack_pp · on Feb 28, 2023

We can check our answers, we can spit out bullshit like it does but then take the time to check them. It has no process for checking the answers or analyzing them and I'd rather not ask it how confident it is because that's just not what I care about.

I find it amazing that it can actually sort of run code "in its head", all the code output it does is not actually run through an interpreter but it's still pretty close if not perfect each time. But trying to run code with it is mostly for kicks, rather I asked it to produce a simple API for me and then produce a python script that tests it. it had no bugs and I could check it myself fairly fast; certainly faster than it would've taken me to write all that code without any bugs. I'd have had to check my own code for bugs anyway.

So if you accept that chatGPT is sort of like a guy that looked over millions of programmers shoulders but never actually communicated with any of them to understand the code, it has a perfect memory while not being able to compute much in its head then it can still be a great tool. Just understand its limitations and its advantages. Just because it can't reverse a string in its head doesn't mean it's "dumb" or not useful for everyday tasks.

nicky0 · on Feb 28, 2023

I code with GitHub Copilot. I liken it to pair programming with an brilliant, insigntful & more experienced colleague who is always slightly drunk.

robin_reala · on Feb 28, 2023

So basically a chat routine that’s been designed to hit the Ballmer peak.

cubefox · on March 1, 2023

Note that language models get much better at pretty much any reasoning task when they are prompted to use chain-of-thaught (Cot) reasoning. The difference between "Solve x" and "Solve x, let's think step by step" comes from the language model using the context window as short term memory in some sense. Perhaps your explanation in terms of complexity is better, but I'm not sure whether it explains the effectiveness of CoT in general.

Wowfunhappy · on Feb 28, 2023

Shouldn't RHLF help with this? So it learns that when people specify a number, they mean something very specific.

gamegoblin · on Feb 28, 2023

You cannot RL learn an O(N) algorithm in an O(1) feed forward neural network.

You could RL learn that when someone specifies a number, the appropriate thing to say is "Ok, 40 asterisks, let's count them, 1, *, 2, *, 3 , *, ..." and then it would indeed produce 40 asterisks. But not as a single string. Because producing them as a single contiguous string requires some offline state/memory/processing, and all the neural network has access to is the last ~page of text.

Embedding the counting process into the text itself kind of embeds the state of the O(N) algorithm in the O(N) text itself, that is, "unrolling the loop" externally.

brookst · on Feb 28, 2023

It doesn’t have any logic; it just tries to complete strings in the most plausible way. It’s training material probably did not have a lot of “write five at signs: @@@@@“. RLHF might help steer it in the right direction, but probably wouldn’t product the concept of counting or loops.

Wowfunhappy · on Feb 28, 2023

So, this is where I guess I just don't understand. I've had ChatGPT produce code for me that there is absolutely no way it already had in its training set. I realize it can't actually "think", but then I also don't know how to describe what I'm seeing.

Sunspark · on Feb 28, 2023

It gave me 40 short Gaulish warriors..

beepbooptheory · on Feb 28, 2023

Just be sure to thank all those stackoverflow repliers for this!

quijoteuniv · on Feb 28, 2023

Agree! I used ChatGPT to explain some (uncommented)ffmpeg scripts I wrote few years ago. Scripts where created by going trough many websites and adapting to my needs. Explanation from chatGPT was spot on.

ly3xqhl8g9 · on Feb 28, 2023

Use ChatGPT, prompt: ffmpeg pipeline capture webcam to mp4

ChatGPT: >>>

ffmpeg -f v4l2 -framerate 30 -video_size 640x480 -i /dev/video0 -c:v libx264 -preset veryfast -crf 28 -pix_fmt yuv420p output.mp4

Here's a brief explanation of what each part of the command does:

  - `f v4l2`: specifies that the input device is a V4L2 (Video for Linux 2) device, which is typically what webcams use.
  - `framerate 30`: sets the framerate to 30 frames per second.
  - `video_size 640x480`: sets the video resolution to 640x480 pixels. You can adjust this to your desired resolution.
  - `i /dev/video0``: specifies the input device to use, which is typically /dev/video0 for the first connected webcam. If you have multiple webcams, you may need to adjust this to the appropriate device.
  - `c:v libx264``: sets the video codec to use, which is H.264 in this case.
  - `preset veryfast``: sets the encoding speed preset to "veryfast", which is a good balance between speed and quality.
  - `crf 28`: sets the video quality using the Constant Rate Factor (CRF) method, with a value of 28. Lower values mean higher quality and larger file sizes, while higher values mean lower quality and smaller file sizes. You can adjust this to your desired quality/file size.
  - `pix_fmt yuv420p``: sets the pixel format to YUV 4:2:0, which is a commonly used format for H.264 video.
  - `output.mp4`: specifies the output file name and format.

<<<

Never used the `crf` or `pix_fmt` flags personally, but seems correct (the issue with ChatGPT).

thrdbndndn · on Feb 28, 2023

crf: "Constant Rate Factor" for x264 among other encoders. Think it like a output quality (higher = lower quality). I'd say 28 is way too high especially when you're using preset=veryfast (too bad quality). Since it's only 640-480, most of computers can handle fast or even medium in real-time. If not sure, I'd say just use default (by not giving anything).

pix_fmt: pixel format. 99% videos use this format, and it has best compatibility. FFMPEG will try to keep the original format when do transcoding, so in most of time you don't need to specify. However, in this very case, since the input is from a webcam, the chance it uses some weird format is high. And you don't want to keep that in your final result. So it's good to specify it out here.

My 2c: ChatGPT is great, but I recommend to read the comments it gives about each parameter, try to understand their purposes, and adjust accordingly if needed.

Also having a rough idea about how FFMPEG pipeline works (mainly the order of input, output and their associated switches in arguments) helps a lot.

Video process is a very complex thing and lots of time it relies on experience. Just be prepared sometimes your "typically works" command would break.

_ugfj · on Feb 28, 2023

> Never used the `crf` or `pix_fmt` flags personally, but seems correct (

Let me translate: the AI might or might not be bullshitting, I have no way to know but I decide to believe in it. Why ?

ly3xqhl8g9 · on Feb 28, 2023

Why do you trust the descriptions of `ffmpeg -help`? What if some evil daemon [1] went into the binary and changed completely the behaviour of the flags? Do you read the source code and verify the checksums for every program you run? In the real world, good or bad, very few people care beyond a first level of trust: does it work for my current issue? great, no?, try something else.

Also, you disingenuously left out the parentheses from the quote: using a tool requires undertaking its downsides, if the downsides can be mitigated accordingly then the tool is useful. Millions of users put to good use imperfect tools daily.

[1] https://en.wikipedia.org/wiki/Evil_demon

_ugfj · on March 1, 2023

ChatGPT is a stochastic parrot it doesn't even pass the sniff test, or first level of trust. Does it work? That's meaningless pretty much as everything else that ChatGPT spits out. The information contained in them is zero. If you read something ChatGPT wrote, you know exactly as much as you did before you read it. It sounds plausible and it's basically a zero day on human cognition. Be vigilant.

You linked Wikipedia: I don't trust that either. If I want to actually know something I will follow the sources and then evaluate the trustworthiness of the sources. Indeed, most of the Scottish Wikipedia being written by an American teenager who didn't speak Scottish at all very strongly parallels ChatGPT.

ttflee · on Feb 28, 2023

You need `-pix_fmt yuv420p` for an iPhone to recognize a ffmpeg-encoded video file.

jeroenhd · on Feb 28, 2023

Do you happen to know why? Is this a hardware decode limitation perhaps, or a bug in the standard decoder?

izacus · on Feb 28, 2023

YUV420p is the only pixel format that all H.264 compatible decoders MUST support. Other formats are optional.

In practice that means that most hardware decoders (not just iPhones, also other phones, TVs, older PC GPUs) won't be able to decode YUV 4:2:2 or YUV 4:4:4 videos.

It's kinda annoying, since YUV420 really messes up red text in screen captures for example.

You need this parameter to force ffmpeg to convert the color space if your input isn't YUV420P (it defaults to keeping the input pixel format to avoid quality loss).

thrdbndndn · on Feb 28, 2023

>since YUV420 really messes up red text in screen captures for example

Is there an detailed explanation why red is especially bad after chroma subsampling? Have been wondering about this for awhile.

I assume it's because red has low brightness/luminance (no idea if correct or not)?

klodolph · on Feb 28, 2023

Yes, it’s because red is dim. Dark blue (like, RGB 0, 0, 255) also suffers, perhaps worse, but that shade of blue is hard to reaf even when it is reproduced perfectly.

brigade · on Feb 28, 2023

The capture pixel format depends on device - I think 4:2:2 is pretty common for webcams, which in H.264 requires profiles that aren’t widely implemented outside of x264 and ffmpeg.

867-5309 · on Feb 28, 2023

licensing for a non-free closed-source platform

djkoolaide · on Feb 28, 2023

and if you're encoding an x265 file, it needs

-tag:v hvc1

for iphone/quicktime to recognize it.

onion2k · on Feb 28, 2023

You also need that same parameter for transparent videos.

izacus · on Feb 28, 2023

Yeah, crf sets the quality and 28 will make your video look like a blurry over-compressed mess. Congrats :)

asicsp · on Feb 28, 2023

Check out https://ffmpeg.guide/ (https://news.ycombinator.com/item?id=33383023)

See also https://img.ly/blog/ultimate-guide-to-ffmpeg/ (https://news.ycombinator.com/item?id=33771445)

pjc50 · on Feb 28, 2023

I was looking for ffmpeg.guide the other day and couldn't remember what it was called, thanks!

geerlingguy · on Feb 28, 2023

I wouldn't call myself a power user by any means, but for many parts, the documentation is quite thorough, and if you're wrestling with some specific filter long enough, you might even begin to understand some of the magic incantation required to get it to work.

It's helpful to have some background in media container formats, compression algorithms, sound formats, and all the jargon and acronyms associated with the above. Easy!

rollcat · on Feb 28, 2023

I know this doesn't answer your question, but have a look at GStreamer pipelines. They take the basic idea of shell pipelines, but add typing, format negotiation, M:N connections, etc all while giving you optional control over low-level details (which demuxer?), but also high-level abstractions (just show me a window with a preview). Once prototyped on the CLI (gst-launch-1.0[1]), they're also very easy to start using within an application through the framework (gst_parse_launch[2]), where you can e.g. iteratively add things like a volume slider. You can also access most of FFmpeg's codecs through libav.

[1]: https://gstreamer.freedesktop.org/documentation/tutorials/ba...

[2]: https://gstreamer.freedesktop.org/documentation/gstreamer/gs...

oldstrangers · on Feb 28, 2023

Have you tried https://handbrake.fr/?

Aperocky · on Feb 28, 2023

The ordering of the arguments changes behavior, I don't think that's a good thing or desired outcome, at most it's a bug_feature.

It tend to happen when big project rolls out batteries itself, in this case command line parsing.

gyan · on Feb 28, 2023

The ffmpeg cli supports multiple inputs and multiple outputs, so there needs to be a way to unambiguously map an option to its target. Position is looked at, to group options for the same target together.

dylan604 · on Feb 28, 2023

argument order is definitely a feature. with -ss being one that behaves differently depending on it's location in reference to -i. it's not an accidental thing and the desired outcome dictates where you place it. not understanding that just means you're not using it enough to grok it.

Aperocky · on Feb 28, 2023

Oh I used it enough to understand.

But I'm not convinced, maybe it is the local optima though.

orbisvicis · on Feb 28, 2023

Even given an option it can be difficult to find the corresponding documentation, if only because of the many different submodules and encoders and decoders and filters that have oh-so-slightly different options. That said, I've just switched from pydub to ffmpeg-python (due to memory issues of the former[1]) and judging from the Jupyter notebook[2] it seems a much more intuitive method of constructing ffmpeg pipelines.

[1] https://github.com/jiaaro/pydub/issues/135

[2] https://github.com/kkroening/ffmpeg-python/tree/master/examp...

zimmund · on Feb 28, 2023

The manpages are good! I don't use ffmpeg too frequently so, to help myself, I made a cheatsheet (which I update whenever I do something new)

https://github.com/elamperti/dotfiles/blob/master/docs/ffmpe...

mmsimanga · on Feb 28, 2023

This. And imagemagick.

Macha · on Feb 28, 2023

Though I have used ffmpeg on images, too.

dylan604 · on Feb 28, 2023

If you don't use it every day, then this will be the typical result. But that can be said about anything, not specific to ffmpeg.

Practice, practice, practice. Eventually, you'll start thinking like ffmpeg. Knowing how ffmpeg labels the various streams inside a file is a great place to start. For example [0:a:1] means the second audio stream inside the first input. This is key for stringing together complex filter chains in the appropriately named -filter_complex.

There are some filters that require you to merge streams together so the processing is done evenly, followed by a split to get back to the original stream layout. amerge/channelsplit is a common combo in most of my commands.

ovi256 · on Feb 28, 2023

I think the only hope a mortal brain has is using a graphic tool to build ffmpeg command lines like https://ffmpeg.guide/

devwastaken · on Feb 28, 2023

I've been trying to get color space and bit conversion to work with png's and bt2020 video. Apparently any time I use a png with ffmpeg's avif encoder it comes out too bright.

reidjs · on Feb 28, 2023

I’ve never had to directly use ffmpeg. What do you do that makes it so useful to you? Something in video production?

pmontra · on Feb 28, 2023

Not parent but in my case yes, video production: ffmpeg to extract audio from captured video, process audio with Audacity, ffmpeg to cut video and merge it with the audio from Audacity.

smcnally · on Feb 28, 2023

Also to quickly convert video formats and containers, scale and crop renders, alter speed, make gifs