Humanoid Web Crawler Map site structures like humans


The Problem:

We need to see the unseen before making structural design decisions. Present day crawlers rely heavily on the coded (programmed) link structure. They do not experience the digital place as a human.


The Solution:

The humanoid web crawler is designed to ignore the hidden, duplicate and irrelevant navigational links. It traverses the web like any human, starting from the main navigation and then the side navigation. It generates the web structure instead of a link farm.

Role:

Project Management, Scripting Crawler, Generating Sitemaps for Clients and Internal Teams


Team:

Nick


Scripting Crawler

Used Scrapy-Python framework. Employed techniques like Xpath traversal, Recursion, Class inheritence.

Generating Sitemaps for Clients and Internal Teams

Used D3.js SVG visualization technique. Created the same visualization in Omnigraffle using Apple Scripting.

Crawler

Contact:
somrahul@umich.edu | @SomeshRahul