Email Scraping From Text File

  """ A. Input Get 50% off on every purchase. contact marketing team at market@qq.com. Find all your linkedin contact...

 

"""
A. Input
Get 50% off on every purchase. contact marketing team at market@qq.com. Find all your linkedin
contacts for free, jeff.peterson@b2bsearch.com. qq.com partnership program apply at
market@qq.com
B. Expected Output
{ "market@qq.com" : {"Occurance":2, "EmailType": "Non-Human"} ,
"jeff.peterson@b2bsearch.com" : {"Occurance":1, "EmailType": "Human"}
}
C. Explanation:
The output must be in a nested json format.
"Occurance" : No of times the email is repeated in the text.
"EmailType" : Type of the email. You can have more complex logic to identify human and non-
human emails but in this exercise, Just try the logic given below:
Finding human emails: If the email is of format firstname.lastname@email.com then you can
assume that the email is human.
Finding non-human emails: If the email format is text@email.com where text is less than 8
characters, then you can assume that the email is likely to be non-human.
Note: Get text file from here ' '
"""
import json
import re
# Email filter using split and loop
def filterEmail(data):
strings = data.replace("\n"," ").split(" ")
emailList = [string for string in strings if "@" in string and len(string)>7]
jsonifyOutput(emailList)
# Email filter using regex
def filterEmailWithRegex(data):
emailList = re.findall('\S+@\S+', data) # \S -> Matches any non-whitespace character
jsonifyOutput(emailList)
# create nested dictionary for the output result as per description
def jsonifyOutput(emailList):
mainDict = {}
for email in set(emailList):
subDict = {}
occurance = emailList.count(email)
subDict["Occurance"] = occurance
emailSplit = email.split('@')[0]
if '.' in emailSplit:
subDict["EmailType"] = "Human"
elif '.' not in emailSplit and len(emailSplit)<8:
subDict["EmailType"] = "Non-Human"
else:
subDict["EmailType"] = "Null"
mainDict[email] = subDict
exportJsonResult(mainDict)
# Export nested dictionary output into json
def exportJsonResult(outputResult):
with open("result.json","w") as resultFile:
json.dump(outputResult,resultFile)
# Read the test file given "websiteData.txt"
def readFile():
with open("websiteData.txt","r") as file:
text_data = file.read()
filterEmail(text_data)
# uncomment below code for email filteration using regex
# filterEmailWithRegex(text_data)
if __name__ == "__main__":
readFile()

COMMENTS

Name

Accident Alert,1,AI,2,Array,1,Aurdino,1,C,2,Computer Graphics,9,Data Science,3,Dataset,1,Decoratot,1,Django,1,ESP32,1,Fixed point/iteration method,1,Greater or smaller,1,html,1,Image Processing,1,JAVA,1,Javascript,22,Machine Learning,1,Matlab,3,Numerical Method,13,OOP,1,Other,3,PHP,1,Point operation,1,Python,11,Raspberry pi,1,Recommendation System,1,Regression,1,Reservation System,1,Robotics,1,Simulation,2,sine wave,1,String Handling Function,1,Web scrap,1,Webpage,1,
ltr
item
COMPUTER PROGRAMMING: Email Scraping From Text File
Email Scraping From Text File
COMPUTER PROGRAMMING
https://computerprogram4ru.blogspot.com/2022/03/email-scraping-from-text-file.html
https://computerprogram4ru.blogspot.com/
https://computerprogram4ru.blogspot.com/
https://computerprogram4ru.blogspot.com/2022/03/email-scraping-from-text-file.html
true
8672391763020279633
UTF-8
Loaded All Posts Not found any posts VIEW ALL Readmore Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS CONTENT IS PREMIUM Please share to unlock Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy