Posts: 4
Threads: 1
Joined: Jan 2015
I have a large text file, over 1gb large containing data line by line. This is text file A.txt
I then have the second file, text file B.txt that contains 30,000 unique words that I want to extract from text file A, along with the rest of the line where the word is found in text file A.
An example of this is:
--Text File A--
dog in house
cat at school
kid in playground
tom at oaks
so much stuff
inhouse cool stuff
--Text File B--
house
oaks
--Result File Output--
dog in house
tom at oaks
inhouse cool stuff
How would I go about doing this that would work the fastest way possible? Is there any software on the market for purchase that specializes in this type of task?
I don't know any programming languages whatsoever so if anyone knows a solution that takes writing code I would need newbie instructions on how to carry it out.
I've searched for hours and hours on google in hopes to finding a solution to this but have come up with absolutely nothing meaningful.
Thanks in Advance
Posts: 10
Threads: 1
Joined: Dec 2014
I think you might be able to achieve this playing around with some of the hashcat utilities but here is a short python snippet you can use as well:
Code:
#!/usr/bin/env python
import re
fileA = 'fileA.txt' # Your main input file
fileB = 'fileB.txt' # Your file full of stuff to match against
fileC = 'fileC.txt' # Output file we will save matching lines to
fr=open(fileA) # File Reader Handle
fw=open(fileC, 'w+') # File Writer Handle
tokens_to_match = open(fileB).readlines() # Read All Lines from FileB into an Array
# Iterate line by line in FileA
for line in fr:
# Check if any matches from fileB exist
for token in tokens_to_match:
# If match, then log the matching line to fileC
if re.search(token.strip(), line.strip()):
fw.write(line.strip() + "\n")
fr.close()
fw.close()
Just edit the filenames and paths for fileA, fileB (optionally fileC) and then run:
python scriptname.py
When it is done you should find fileC.txt in the same directory as the script with your matched lines.
Hope that helps a bit....
Posts: 4
Threads: 1
Joined: Jan 2015
Hi,
Thanks for your reply bro. I actually am paying blazer to make a program for me that will do what I described above. Once he is finished he will also release it to the public for anyone else who needs this type of task done.
Posts: 10
Threads: 1
Joined: Dec 2014
cool that he is going to share it but kind of lame you have to pay him to make it. If you just tell me what you want it to do that it isn't doing above I will gladly modify for you for free....
Posts: 143
Threads: 9
Joined: Dec 2012
01-05-2015, 02:53 AM
(This post was last modified: 01-05-2015, 02:58 AM by magnum.)
The program you need was written like 60 years ago and it's free.
grep -Ff B.txt A.txt > C.txt
EDIT: This is better:
grep -wFf B.txt A.txt > C.txt
(And to make it case insensitive, use -iwFf)
Posts: 10
Threads: 1
Joined: Dec 2014
one liner, even better indeed +1!
Posts: 2,936
Threads: 12
Joined: May 2012
always helps to know your tools.
Posts: 4
Threads: 1
Joined: Jan 2015
(01-04-2015, 05:57 PM)iRuser Wrote: cool that he is going to share it but kind of lame you have to pay him to make it. If you just tell me what you want it to do that it isn't doing above I will gladly modify for you for free....
Hi, I have no problem at all paying a software developer to create a program. It was my idea to pay blazer for the project.
I've been Using ULM and ULM CCR every single day for months and months, it's my most used software and it's saved me hundreds and hundreds of hours in productivity. Giving a payment to blazer is the very least of what I can do for all he's already given to me and everyone else who benefits from ULM.
I appreciate you offering to help out Iruser, if there's anything I can help you with as well just hit me up
Posts: 4
Threads: 1
Joined: Jan 2015
(01-05-2015, 02:53 AM)magnum Wrote: The program you need was written like 60 years ago and it's free.
grep -Ff B.txt A.txt > C.txt
EDIT: This is better:
grep -wFf B.txt A.txt > C.txt
(And to make it case insensitive, use -iwFf)
You my friend are amazing! Thank you so much for that great info, grep works like it's literally some sort of magic or something, it boggles my mind how it finds all the matches and how it does it so fast. True genius.