![]() |
Working with tab/comma/pipe delimited files (more sed). - Printable Version +- hashcat Forum (https://hashcat.net/forum) +-- Forum: Misc (https://hashcat.net/forum/forum-15.html) +--- Forum: General Talk (https://hashcat.net/forum/forum-33.html) +--- Thread: Working with tab/comma/pipe delimited files (more sed). (/thread-1304.html) |
Working with tab/comma/pipe delimited files (more sed). - Socapex - 06-18-2012 So I've been working on a world-wide country, city, town, geographical feature (lake, ponds, reserves, parks, hotels, etc.) wordlist. Now I wasn't sure if I should write this new thread, and please Atom, if you're tired of commandline threads tell me and I will stop. Anyways, I'm learning some sed right now and here are some commands I used. I'd like to thank epixoip and M@lik for teaching me some sed fundamentals. sed is really powerful. My first question is, any way to make these prettier? I'll dive right in to the most complex/ugly command. I had a file delimited by pipes, which looked like this: Code: 399|Agua Sal Creek|Stream|AZ|04|Apache|001|362740N|1092842W|36.4611122|-109.4784394|362053N|1090915W|36.3480582|-109.1542662|1645|5397|Fire Dance Mesa|02/08/1980| So I used: Code: cat NationalFile_20120602.txt | sed -n 's/^[^|]*[|]\([^|]*\)[|][^|]*[|][^|]*[|][^|]*[|]\([^|]*\)[|][^|]*[|][^|]*[|][^|]*[|][^|]*[|][^|]*[|][^|]*[|][^|]*[|][^|]*[|][^|]*[|][^|]*[|][^|]*[|]\([^|]*\).*/\1,\2,\3/p' | tr -s ',' '\n' | sort | uniq > NationalFile-Names.txt Which outputs something like this: Code: Agua Sal Creek I know the command is ugly but it worked. What I was doing: Find stuff before first tab, than a tab, than find more stuff before a tab, than a tab etc. Until I got to where I wanted to be. Then I used a () group to isolate what I wanted and continued. At the end, I call back my 3 backreferences (thank you so much epixoip) that I had extracted from each line. I seperate those with a comma and then use the tr command to change the commas to a linebreak. I then sort and remove duplicates. Any ideas on how to make this humanly readable? Finally, some stuff I learned using CSV files. If you want to replace all commas by a EOL (return): Code: tr -s ',' '\n' Replace a certain character in a file: Code: tr -s '_' ' ' If you want to work with something else than a pipe, lets say a comma, use [^,]*[,] and it will work. Same for tabs. Finally I have a question. Working on my cities, countries, counties, towns and whatever file. I'd like to ask what you'd prefer (if interested). It's quite a big list, 170MB last time I checked, and there are many spaces in the file. Take for example 'Dawson Creek'... Would you prefer if I share a list without spaces, or that I leave them in there (and you can easily do the work yourself)? Would you prefer a no-caps list? I'm asking because that might make the file smaller... Thank you, Socapex RE: Working with tab/comma/pipe delimited files (more sed). - M@LIK - 06-18-2012 Since this is not about hash-cracking, I believe this is not the right place to discuss it. However, I don't mind helping you: awk is the best tool for this job, on Windows using gawk: Code: type 1.txt RE: Working with tab/comma/pipe delimited files (more sed). - Hash-IT - 06-18-2012 (06-18-2012, 07:41 PM)M@LIK Wrote: Since this is not about hash-cracking, I believe this is not the right place to discuss it. This is what the new section is for !! ![]() Atom has already said that it is ok to talk about things in this section as long as it is loosely related to hash-cracking. The normal forum rules apply however, no posting hashes, warez and no advertising. This conversation is very interesting, please carry on ! ![]() RE: Working with tab/comma/pipe delimited files (more sed). - forumhero - 06-19-2012 I really appreciate this type of discussion as I'm new to awk and sed as well. RE: Working with tab/comma/pipe delimited files (more sed). - Socapex - 06-19-2012 Thanks for the help M@lik, yet another tool to learn ![]() RE: Working with tab/comma/pipe delimited files (more sed). - tony - 06-20-2012 Thanks a lot to all, appreciated also & follow closely this thread too ! |