Insight Behind our Launch

| December 6, 2011

Today marks the launch of, a new DomainTools site that provides an excellent showcase for the millions of historical website thumbnails we’ve collected over the years.

It’s also typical of the kinds of engineering problems that seem relatively straightforward until you try them on web-scale.

Most of us know that browsing the web with Internet Explorer version 7 can be difficult. If you browse carefully you may be able to avoid the problem sites, but sooner or later you’re bound to trip up. Intentionally trying to visit every webpage on the ‘net would be downright silly.

And yet, that’s precisely what we’ve been doing for years to generate the website thumbnails you see on our Whois product. It’s also how we’ve built a database of more than 254,819,641 website screenshots (and counting!).

It’s a messy business aided somewhat by virtualization technologies and a carefully-engineered home built queueing architecture. Yet, it still presents significant engineering challenges and non-obvious business questions.

How do you teach computers to know whether a website has changed “significantly” since you last looked at it so you don’t store a bunch of duplicate images? (Hint: read about perceptual hashes and Hamming codes).

How do you decide how tall of an image to capture? For that matter, how do you capture part of the browser that’s outside the screen?

If you want your screenshot to capture what most people would see when they visit the site, which web browser and operating system do you use?

Most sites are not as OCD about cross-browser support as we are. At one time, IE7 was the best browser to target since it had the broadest support, which is why we selected it as the ‘default thumbnail browser.’  Now, after reviewing our stats, we’re thinking it’s time for an upgrade, maybe even to Firefox or Chrome.

That’s one of many things we’re changing in our thumbnail system–the system which already made much more than just a bunch of images. Our engineers conceived a nifty tool that discovers interesting domain names mentioned in news feeds and highlights their screenshot on the site’s landing page. They also took several of their latest ideas and experimented with them on the search tool. It’s still a work in progress, but you can already use it to reveal interesting insights about a domain (try searching for “hertz” to see what their home page looks like in different TLDs).

We’re also moving quickly to expand our infrastructure, improve our capture rate, and add new servers to support the features we’re planning to add. We already had 20 virtual servers capturing screenshots; soon that number will increase to 40, with more supporting servers coming online shortly thereafter.

Now the fun part begins – we get to hear what you think of it, what your ideas are, and what novel usage patterns you come up with. Send us your feedback to or comment here.


Tags: , ,

Category: Domain Tools Updates

About the Author ()

Mark has spent more than eight years at DomainTools helping major brand holders, cyber security companies, large Internet organizations and leading incident responders investigate online threats with DNS and Whois data. He has held engineering and product leadership roles at the company, led business development and partner integration activities, and pioneered sales relationships with major public and private organizations. Mark now leads partnership discussions with leading cybersecurity product companies and manages relationships with DomainTools customers in the public sector. He is based in Seattle, WA.

Comments (9)

Trackback URL | Comments RSS Feed

  1. sdgmontreal says:

    Terrific new site! I’m sure this will be a very popular addition to DomainTools’ suite of products!

    In terms of feedback, after having played on the site for a while, I did find I was often having to re-type (almost) identical queries into the search box.

    As example, when exploring a typical domain — e.g. WidgetThingy.ext — I found I was searching/typing ~6 variations in order to fully explore both my site & competitors sites:

    widget thingy
    widget thingys

    If the last query was to remain in the search box, I could simply add/change a few letters (versus having to re-type the (almost) identical term) to perform the next search. Might not seem like much — but after having done this type of investigation for just 5-6 of my domains, it quickly became apparent this would be a great timesaver!

    Thanks again for a great product!

  2. Monica says:

    Thanks for the feedback, sdgmontreal! We’ve shared this with our product team.

  3. infosoporte2_1511 says:

    I agree with sdgmontreal, I tried to do the same with several whose name was and the result is a fast and effective. Just missing a detail to be perfect, that could be used with Spanish names. I guess it will be very difficult.