UTF 8 issue find doesnt find all your files
UTF 8 issue find doesnt find all your files
Public bug announcement: Beware that GNU find in findutils 4.4.2 (as shipped on Ubuntu Lucid) will not find all your files if its run in the UTF-8 locale: even if the file is there, find may just skip printing its name. Solution: If you have non-ASCII characters in your file names, use LC_CTYPE=C find
instead of find
.
Example:
$ echo $LC_CTYPE
en_US.UTF-8
$ ls foo*
ls: cannot access foo*: No such file or directory
$ perl -e die if !open F, ">", "foox80bar"
$ ls foo*
foo?bar
$ find -type f
...
./foo?bar
...
$ find -name foo*
$ LC_CTYPE=C find -name foo*
./foo?bar
Possible explanation: The file name matcher wont match a file if its name cannot be parsed properly in the current locale (LC_CTYPE). That is, since foox80bar
is not valid UTF-8, GNU find 4.4.2 will not find it.
This strange behavior can be very surprising and possibly dangerous, especially in automated shell scripts.
download file now