About the project
Logo-API - Automatic load of logos.
This is a must have tool for any website with lists of
references to other websites.
If properly used, it makes lists of links look better and more comprehensive.
For most people scaning a logo if much easier than reading a name of some company.
I use this service on my 'Technolgies I use' page to display icons for links in one move.
Web Scraping
Extracting information from HTML+CSS
It was a challenge of the kind I like :-)
The easiest part was extracting favicons and apple-touch-icons,
obeying all the rules a browser would obey.
The fun started later on, with logos!
I can't expalin the whole algorithm in here.
In short, it involves parsing HTML and CSS and the combination of the two,
with some sophisticated probabilistic algorithms.
Parsing is just half of the story. You still need to manipulate
the parsed document.
I used a Web Scraper tool
that is very fast at parsing big pieces of HTML and allow traversing the parsed document
the same way jQuery does, only on server side using PHP.
It's called hQuery.php.