Behind The Scenes with ReverseMX

Submit to Digg.com!

August 29th, 2011 by Jason

At DomainTools, we often throw around ideas on new and interesting data we could collect and combine with our existing data sets to discover interesting information.

One idea that had been floating around was to look up and store mail server (MX) records for all domain names we know of in the popular top level domains (TLDs). This would enable us to find relationships between domains that share mail servers, show mail server host names which resolve to the same IP address, and compare numbers of domains which use hosted mail services like Google Apps for Domains and Microsoft’s Exchange Online.

In addition to MX data, we also wanted to crawl TXT records of domains to collect SPF rules. We think SPF rules are a good idea, but since they are optional, it’s interesting to know how many domains elect to use them. For example, we can now see how many domains hosted on Google’s mail servers are publishing correct SPF records.

There was a lot of interest at DomainTools for this information and I was given some time to create a proof of concept. This quickly turned into a new DomainTools site called ReverseMX.

To find more details about what MX and SPF records are and how they are used, be sure to check out the  we have added some definitions of terms and FAQs.

ReverseMX was built in two parts. The first is what we call the ‘backend’ work where we built a distributed system to resolve MX DNS records, parse that data into a large Hive table, aggregate it in Hive and combine it with other data sets, and finally build MySQL tables for the website. The ‘frontend’ part of this product is a website powered by Django on top of these custom built tables.

We already use a Hadoop cluster for all sorts of batch data processing. As an experiment for this project, we decided to build a DNS crawler on top of Hadoop. Admittedly, this isn’t the best use of our powerful Hadoop infrastructure, but we had some idle nodes and we decided to use them.

Our DNS crawlers are implemented as a Hadoop map function which takes a domain name and ‘maps’ this to a DNS response containing the MX records for that domain. We use Hadoop streaming so the crawlers are simple Python scripts that take domain names from stdin and write DNS responses to stdout. To get enough throughput from the crawlers, we perform asynchronous DNS requests using the ADNS library and Python module. This worked so well we needed to rate limit our requests so to not put too much load on any DNS servers.

As we are using Hadoop, distributing the crawler across multiple nodes was as simple of running a large input file of domain names through our DNS mapper. The Hadoop streaming utility handles the complicated tasks of splitting the work and distributing it among a set of clusters. We just had to write the Python scripts that would accept a domain name, perform the work on it, and return a result which Hadoop would then write to the file system in it’s native format (HDFS).

To get Hadoop to play nice while working as a crawler, a few extra steps were needed. First, we turned off speculative execution to stop two nodes crawling the same data. The full crawl takes around 40 hours, so we also split the crawl into many Hadoop jobs. We were then able stop scheduling crawl jobs at certain times of the day, as well as enabling other Hadoop jobs to be interleaved with the crawl jobs. Splitting the crawl into multiple jobs also helps if a job ever fails because of network or hardware problems. If we used one job for a full crawl each node would be crawling tens of millions of Domains. If this node failed the full task for that node would need to be re-run.

When the full crawl is completed, this raw DNS data is mapped again using Hadoop through a parser which removes invalid responses and writes the data in a column format ready to be loaded into Hive.

With this data in Hive we can, for example, build a MySQL table of mail servers for the website by querying distinct mail servers along with a cluster-aware auto-increment function for the primary key. The output of these Hive queries is what is loaded into MySQL as a table. For performance, pre-calculated common queries like counts of domains that use a certain mail server are also exported to tables.

The front-end side was a standard Django implementation, although we decided not to use Django’s Models to access our custom built tables. This is the second website we’ve built with the Django framework (DailyChanges was the first) and our engineers have been very happy with it.

ReverseMX has been my pet project which I have really enjoyed building. As a backend engineer, I live for creating and processing huge data sets and building the tools to visualize and display the data to users. If you are a software developer and this sounds like fun, then there’s good news.  We are currently hiring! We have 3 open positions in the engineering department:

Director of Engineering
Python/PHP Engineer
JavaScript Engineer

Lastly, if you have any feedback regarding ReverseMX, feel free to comment on this blog, on our Twitter and Facebook pages, or via email at memberservices@DomainTools.com. Thank you in advance for your feedback!

Posted in Domain Tools Updates | Comments Off

How to Use Alerts to Get the Scoop on Your Competitors

Submit to Digg.com!

August 25th, 2011 by Susan Prosser

Have you ever read a story on news site or blog about how a well-known company is planning a new product or service, which is based on the domain names it has recently registered? Have you ever wondered how the writer came across their information?

Last week, TechCrunch spotted that Google had become the proud owner of Android.meAndroid.me, for example. Gaming blogs were also filled with the news that Activision had registered over a dozen domains related to possible future games in its Call of Duty franchise. The news that Warner Bros is fighting for the domain TheHangover3.comTheHangover3.com strongly suggests it is planning another movie sequel.

One way to discover this kind of information would be to do a random Whois search every day on the domains you guess a company might want to register. If you have that much time to kill, good luck!

There are quicker ways, fortunately. DomainTools subscribers receive timely data about the companies that interest them, delivered direct to their in-boxes every day, after signing up to one of our suite of domain monitoring tools, such as Registrant Alert.

These tools are not only useful for bloggers or fans of particular brands. If you’re a company in a competitive marketplace, knowing which domain names your rivals are registering or buying could prove to be priceless business intelligence. Registrant Alert quite simply emails you every day with a complete list of the domain names that have just started using your chosen keywords in the Whois record. The alerts cover newly registered domains (such as the Call of Duty domains Activision defensively registered), deleting domains, as well as domains that have changed ownership (such as Android.meAndroid.me).

Registrant Alerts are very easy to set up. If you’ve ever used a Google Alert, it’s just as simple. If you are interested in what Apple has planned, monitoring for “Apple Inc” will alert you whenever the company shows up as the registrant of a domain. Be careful not to be overly broad in your query, if you want to avoid receiving too many false positives.

That’s just one way DomainTools enables you to keep track of what your favorite companies – or your competitors – are doing with domain names. If you are more technically minded, you could use Name Server Alert or one of our other monitoring tools, but I will discuss those in a future post.

Next time, I will look at how companies differ in the timing of their product-related domain name registrations, and why there’s no one-size-fits-all strategy.

Posted in Alerts, Compete, Domain Tools Updates, Domainers, In The News | 1 Comment »

Was Your Domain Used for Porn? How to Avoid a Costly Mistake

Submit to Digg.com!

August 18th, 2011 by Susan Prosser

With the launch of .xxx domain names coming soon, I thought now would be a good time to address an important topic sometimes overlooked by domain buyers: how to avoid accidentally purchasing a domain that was once used for pornography.

Almost as long as the web has been around, companies have been selling content filtering software. Parents and network admins can use it to stop their kids, employees or users from accessing inappropriate web sites at work and at home, or in colleges, schools and libraries.

It can be quite difficult to get a domain name removed from one of these legacy block-lists, especially if the company that originally compiled it is no longer around.   You may find yourself cut off from some potential customers when purchasing a domain on a blocked list.

As a result, if you plan to invest in a domain name that was once used to host pornographic content, you may find that its resale value is not what you thought. The same can be said if you are interested in purchasing a domain for the value it has in adult traffic.   So it’s important to know what a domain has been used for before deciding whether to buy it and how much to offer.

As you can see from the small number of premium names already released by the .xxx registry, it’s sometimes not easy to tell whether a domain has hosted adult content just by looking at the domain name itself.

It should be obvious what you will find if you point your browser to casting.xxxcasting.xxx or muscle.xxxmuscle.xxx, which were some of the first .xxx domains to be sold, but can you say the same about casting.comcasting.com or muscle.commuscle.com? They could be porn, or they could just as easily belong to a Hollywood casting agency or be used to sell dietary supplements and home gym equipment.

Common dictionary words sometimes have special meanings in the adult entertainment world that might not be obvious to somebody from outside that industry, which is why it’s important to do your research before making an offer.

Adult content publishers often trade under generic-sounding company names, so a simple historical Whois search might not be enough to alert you to the domain’s past usage.

That’s one of the reasons why DomainTools offers a comprehensive screenshot history with most Whois queries. Not only can you see who owned a domain name in the past, you can also very quickly check to see what it was used for.

Take the generic-sounding domain WebmasterAccess.comWebmasterAccess.com, for example. It could be used to host a forum for webmasters to exchange technical tips, it could be a web hosting company, or it could be used as a jobs site for designers and developers.

In fact, it’s owned by a large adult entertainment publisher and is used to promote a porn webmaster show. The site may be almost safe-for-work today, but the DomainTools screenshot history clearly shows that as recently as January this year it contained very adults-only imagery. It’s easy to see that just from the thumbnails in our archive, too – you don’t need to look at the full-sized capture if you don’t want to!

If that domain was for sale, and you were thinking about buying it to develop or resell, that’s important background info that you’d need to know.

Posted in Domain Industry, Domain Tools Updates, Domainers, Whois | 5 Comments »

How Whois Busted the “IE users are dumb” Hoax

Submit to Digg.com!

August 4th, 2011 by Susan Prosser

If you’re a DomainTools customer, you already know the value of Whois for researching the history of domain names, but not everybody is as savvy.

A hoaxer this week managed to fool some of the world’s most respected news organizations into reporting that Internet Explorer users are “dumber” than users of other browsers, and it was a Whois search that eventually blew the story open.

Dozens of outlets – including CNN, the BBC and Forbes – fell for a story put out by a fake Canadian company called AptiQuant, which claimed to have proved scientifically that IE users have below-average IQs.

AptiQuant said in a press release that it had offered free online IQ tests to over 100,000 people and then correlated the scores with the browser used to take the test. IE users, it said, were found to have much lower IQ scores than everybody else.

The media rapidly picked up the meme and ran with it. Headlines such as “If You’re Reading This On Internet Explorer, You’re Probably Dumb” and “Dumb people use Internet Explorer, survey says” were among the hundreds around the world that AptiQuant’s news generated.

But the story was completely bogus, as a simple Whois search could have revealed in an instant.

After the initial wave of reports, readers started doing a bit of digging. Most of AptiQuant’s web site content, they discovered, had been copied and pasted from a French company called Central Test. Even the photographs of AptiQuant’s non-existent staff had been copied.

But here’s the kicker: Whois shows that the domain name aptiquant.comaptiquant.com was only registered on July 14 this year. That’s in contrast to the web site itself, which had content claiming to date back to 2005.

A developer named Tarandeep Gill has now confessed to being behind the hoax. He said that he just wanted to highlight what a pain IE 6.0 can be to support when building web sites.

“We are really surprised that it took so long for people to figure it out, a mere Whois on the domain could have revealed it all,” Gill wrote.

To make things worse, some of the news sites now reporting the hoax have claimed that Gill lives in San Francisco, whereas he in fact lives near Vancouver, Canada – as the Whois record clearly shows!

It’s not just the media that could benefit from making Whois part of their standard research toolkit. Just as reporters were fooled by a hoaxer telling them what they wanted to hear, there are a lot of bad guys out there making “too good to be true” offers who have less frivolous intentions.

If you find yourself on a web site that looks a bit fishy, Whois should be your first port of call.

Posted in In The News, Whois | 1 Comment »

The Newest Members of the DomainTools Team!

Submit to Digg.com!

August 3rd, 2011 by Monica

Introducing two of the newest members that have joined the DomainTools team within the past month….Ben and Mike!  They’e added some wonderful new energy and expertise to our office and we’re excited to have them on board.

Meet Mike (on left) and Ben (on right)

Here’s a little about them…

Ben, our SEO & Analytics manager, graduated from Iowa State University in 2006 with a BS in Management Information Systems. After graduating, he began his SEO career at space150, a digital agency based in Minneapolis. Over the course of 4 years, he handled increasingly larger SEO clients such as American Express, Discovery Channel, Dairy Queen, General Mills, Ameriprise Financial, and many others.

In 2010, Ben and his wife relocated to Seattle to explore the Northwest (and escape Minnesota winters!). For the past year,  he was the SEO Manager at Point It, a search marketing agency in the Lower Queen Anne area of Seattle. Ben worked with clients such as Microsoft, Clarisonic, Car Toys, and many others.

Ben says that he wanted to work for DomainTools because it offered the opportunity to engage his SEO and analytics expertise with a fun and passionate in-house team. He really liked DomainTools’ philosophy of hiring “fewer, better employees” and also the unique perks like ping pong and lunches twice a week. Ben also adds that thanks to his new coworkers, he’s already seen a 214.8% increase in his Foodie IQ since starting last Monday and he fully expects that trend to continue.

Mike, our newest engineer, has been working with web applications and related technologies for almost 10 years. He graduated from Montana State University with a BS in Computer Science and still has fond memories of late night coding sessions in the lab (EPS 254). Prior to joining DomainTools’ engineering team, he was with Infogears Inc., a software company based out of Bozeman Montana. Mike enjoys exploring the city in his free time, trying new eats and drinks and experiencing all that Seattle has to offer.

Mike says that he was eager to join DomainTools for many reasons, including being able to work with a talented team, solve difficult problems and, of course, the opportunity to have an impact on the products so many people rely on.

Interested in joining our team or know of someone who is interested? Good news, we still have a few positions open:

User Experience Designer

Front-end JavaScript Engineer

Back-end Python/PHP Engineer

Here is a quick Twitter or Facebook announcement that you can copy and paste to help us spread the word:

Calling all UX Designers/Engineers looking for a job w/ an exciting #Seattle company in Belltown! Check out DomainTools: http://goo.gl/RhZbU

Posted in Domain Tools Updates | 1 Comment »