Humanoid Web Crawler Map site structures like humans
We need to see the unseen before making structural design decisions. Present day crawlers rely heavily on the coded (programmed) link structure. They do not experience the digital place as a human.
The humanoid web crawler is designed to ignore the hidden, duplicate and irrelevant navigational links. It traverses the web like any human, starting from the main navigation and then the side navigation. It generates the web structure instead of a link farm.
Project Management, Scripting Crawler, Generating Sitemaps for Clients and Internal Teams
Used Scrapy-Python framework. Employed techniques like Xpath traversal, Recursion, Class inheritence.
Generating Sitemaps for Clients and Internal Teams
Used D3.js SVG visualization technique. Created the same visualization in Omnigraffle using Apple Scripting.