webpage-scraper

Python web scraping/info gathering utility created for Applications Development module.

Enter link into getwebinfo.py script then run to call all other scripts.

Each script can be run individually by uncommenting the 'sys.argv.append' line in each script.

####webpage_get : Retrieves html structure of website via URL.

####webpage_getfiles : Retrieves page content via URL, parsing out files. Downloads all files and produces formatted list of matched files(download path must be specified in script). Compares downloaded files with a set of bad hashes.

####webpage_getlinks : Parses links out of page content and produces formatted list of matched links.

####webpage_getuniqueinfo_crackhash : Retrieves page content, parses md5hash hex numbers, email addresses and phone numbers then produces a formatted list of matched information. Cracks hash passwords using dictionary of common passwords.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
getwebinfo.py		getwebinfo.py
webpage_get.py		webpage_get.py
webpage_getfiles.py		webpage_getfiles.py
webpage_getlinks.py		webpage_getlinks.py
webpage_getuniqueinfo_crackhash.py		webpage_getuniqueinfo_crackhash.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

webpage-scraper

About

Releases

Packages

Languages

kepcon/webpage-scraper

Folders and files

Latest commit

History

Repository files navigation

webpage-scraper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages