><i>The --multiline option means the search spans multiple lines - I only want to match entire files that begin with my search term, so this means that ^ will match the start of the file, not the start of individual lines.</i><p>That's not correct because the `m` flag gets enabled by the multiline option.<p><pre><code> $ printf 'a\nbaz\nabc\n' | rg -U '^b'
baz
</code></pre>
Need to use `\A` to match start of file or disable `m` flag using `(?-m)`, but seems like there's some sort of bug though (will file an issue soon):<p><pre><code> $ printf 'a\nbaz\nabc\n' | rg -U '\Ab'
baz
$ printf 'a1\nbaz\nabc\n' | rg -U '\Ab'
baz
$ printf 'a12\nbaz\nabc\n' | rg -U '\Ab'
$</code></pre>
"BOM" == UTF-8 Byte Order Mark I guess.<p>I initially thought it was searching for "Bill of Materials" for electronics projects or similar.
Here's a coreutils (two-liner) version:<p><pre><code> printf '\xEF\xBB\xBF' >bom.dat
find . -name '*.csv' \
-exec sh -c 'head --bytes 3 {} | cmp --quiet - bom.dat' \; \
-print
</code></pre>
The -exec option for find can be used as a filter (though -exec disables the default action, -print, so it must be reenabled after).<p>Could be made into a oneliner by replacing the 'bom.dat' argument to cmp with '<(printf ...)'.
One large source of byte order marks in utf8 is Windows. In MS DOS and later windows, 8 bit encoded files are assumed to be in the system code page, which to enable all the worlds writing systems varies from country to country. When utf8 came along, Microsoft tools disambiguated those from the local code page by prefixing them with a byte order mark. They also do this in (for instance) the .net framework Xml libraries(by default). I don’t know what .net core does. I suppose it made sense at the time but I’m sure they regret this by now.