Hi all. trying to compare and uniq 2 .txt files. Getting this error:
sort: string comparison failed: Invalid or incomplete multibyte or wide character
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were ‘sorient\342t\r’ and ‘sorient\350rent\r’.
I would be so grateful, if you could advise me how to avoid it.
Manuall string delete is not working for me, as .txt is really large
I have found that it was because of strings, containing unprintable characters. How could i remove all of them from the txt file?
I mean, to remove all the strings containing unprintable characters or smth
(07-13-2023, 11:05 PM)ataman4uk Wrote: Hi all. trying to compare and uniq 2 .txt files. Getting this error:
sort: string comparison failed: Invalid or incomplete multibyte or wide character
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were ‘sorient\342t\r’ and ‘sorient\350rent\r’.
I would be so grateful, if you could advise me how to avoid it. elastic man
Manuall string delete is not working for me, as .txt is really large
I have found that it was because of strings, containing unprintable characters. How could i remove all of them from the txt file?
I mean, to remove all the strings containing unprintable characters or smth
From the CPU point of vue, this could probably make sense. Appart from that, the backup "script" wasn't written by myself and I believe that it originally was meant for backing up to a networked drive. But in any case, I believe that rsync is still a good option because of its versatility (local use is documented in the man page) and it's performance with big folders.
cp would imply playing with find and timestamps files, and tar would create archives which doesn't help when you need to get a file back rapidly.
That said, anyone feeling like to discuss this "multibyte character" problem ?
this will strip unprintable chars from input, but never the less, it seems your input files are malformed or have been through some seriuos misconversion between different character encodings which will mostly result in these problems you mentioned