JavaScript SEO

This post contains the slides and some commentary from a presentation I gave at the MountainWest JavaScript conference on March 17th, 2014 in Salt Lake City, UT. If you don't like this blog format, you can view the slides here or watch the video recording here.

JavaScript SEO

Hello, my name is Jeff Whelpley and I am here to talk about the "right" way to build client side JavaScript Web Applications so that they are not only indexed by Google, but will eventually rank highly.

GetHuman.com

I am the Chief Architect at GetHuman. GetHuman helps you get customer service faster and easier so you never have to wait on hold again. This past year 1 in 8 people in the United States used GetHuman.com, but only a small fraction of those visitors got to our site by typing 'gethuman.com' into their browser.

Google

Most people start with Google and they have no idea who we are. They are typically having some sort of issue and need to get in touch with a company to get help. So, they typically will type in either:

  • {any large company} customer service
  • {any large company} phone number
  • contact {any large company}

If you plug any of these searches into Google, GetHuman.com will almost always appear in the top 5 of the organic search results. Many times in the top 3, sometimes even BEFORE the website of the actual company. Why is that?

Matt Cutts

Let's ask this guy. Matt Cutts is the face of Google when it comes to SEO, but what he would tell you is that you shouldn't even be thinking about SEO. He always stresses that if you just focus on building an awesome website that people love, the site will eventually rank highly.

While I agree with Matt at a high level, it is not quite that simple in practice. Let's take a look at top 5 reasons why GetHuman.com ranks highly.

GetHuman in the Press

One of the biggest factors comes from the number of other highly ranked websites that legitimately link to out website and talk about our website. A long history of great inbound links from other sites that have high authority is a big factor our search rankings, but it is not the only one.

No Pogosticking

Another important signal that search engines pay attention to is when a user clicks on a link for your website, but then goes back to the Google search results and clicks on something else or enters a new search.

This is called pogosticking and the more users that do it, the higher the likelihood that your website is not the best to answer a given search. GetHuman.com is often a "long click" for searches like "{company name} phone number" which means that most searchers click on our website and don't have to click the back button because their problem is solved.

Every millisecond matters

There was a great article in the NY Times a couple years ago talking about how website performance affects user behavior on Google.

These days, even 400 milliseconds — literally the blink of an eye — is too long, as Google engineers have discovered. That barely perceptible delay causes people to search less.

We have optimized the hell out of GetHuman.com and try to make sure the end client page load time (including all resources) is consistently well under 2 seconds (with the initial server response time under 200 milliseconds) for BOTH our server side driven pages and our client side web apps.

Unfortunately most SPAs out there have terrible inital page load times and this does have a significant impact on search ranking.

Social Sharing

Do a twitter search for your company. Are there people talking about how useful and relevant you are? Chatter on social media websites can provide proof to search engines that your app solves their problems.

Be careful here, though. The point is that you need to make your app so good that people will naturally talk about it and share it. Don’t think that the goal is to artificially generate inbound links. Hopefully we all learned that lesson from the recent Rap Genius fiasco.

On Page Optimizations

While Google weighes the other 4 factors more heavily than any on page optimizations, the reality is that what we do on the site influences everything else. So, instead of thinking in terms of hacks just to get Google to rank you higher, you should be thinking: What will help provide my users with a great experience?

Users generally like clean URLs, good and prominent titles, organized site structure, etc. Google cares about good UX, so you should as well.

(of course...this also means that you have to buy into Google's idea of what good UX is, which may conflict with your ideas in some cases...but, hey, who am I to question our all powerful overlord?)

Socrates

Now, as much as I think I understand certain aspects of SEO, the reality is that I don't know anything for sure. As Socrates said:

..the fact is that neither of us knows anything..but he thinks he does know when he doesn't, and I don't know and don't think I do: so I am wiser than he is by only this trifle, that what I do not know I don't think I do

So, as a disclaimer I don't think that our ideas at GetHuman about SEO are the absolute truth.

Alt Title Slide

This presentation is really presenting more of a philosophy that was formed over the past year as we started the move from our old PHP website to a full JavaScript stack. We have been building out a new platform that combines the best of our existing techniques for SEO along with the latest and greatest client side JavaScript technologies.

SEO vs SPA

Whenever you talk about client side JavaScript and SEO, there is a fundamental problem. Historically search engines have not indexed content rendered by client side JavaScript. This means that whatever you render on the client using Backbone, Angular, Ember or another framework do not appear in any Google search results.

AJAX Fragment Spec

The most common solution to deal with this problem is the AJAX Fragment Specification, but we quickly realized that this solution was not a good fit for our needs. There are 4 primary concerns we have with the Fragment Spec.

Shit URLs

The first issue is that most people that implement the Fragment Spec using hash bang which means that all pages on their site must have #! in the URL. The problem with this is that despite what you and your fellow nerds think, humans do care about how URLs look.

We don't recommend the Fragment Spec, but if you are going to use it, there is a way to do it with HTML5 PushState so you don't need the hash bang in the URL.

Cloaking

Cloaking refers to the practice of presenting different content or URLs to human users and search engines.

This quote comes from the Google Webmaster Guidelines and is something that Google takes very seriously. You don't get off the hook just because you use the Fragment Spec:

In response to an _escaped_fragment_ URL, the origin server agrees to return to the crawler an HTML snapshot of the corresponding #! URL. The HTML snapshot must contain the same content as the dynamically created page.

So, if you give Google a snapshot, you are good, right?

Well...sort of.

I don't have any evidence whatsoever to support this, but anecdotally I have seen many implementations of the Fragment Spec in which the HTML fed to Google is nothing like what the user sees. I think (or perhaps I should say fear) that this is a factor in Google's search rankings.

Headless Browsers Suck

The de facto way to provide an exact snapshot of your client side rendered page to Google is to use a headless browser.

I should clarify that headless browsers are actually awesome...when you are using them for something like testing.

The problem is that they are extremely resource intensive. Running PhantomJS on your production web servers is a serious mistake.

You can run it on an separate offline server or utilize a cloud service provider like Prerender or BromBone, but then the issue is that you likely will need to live with serving Google pages that have been cached for a day or more. We utilize page caching, but we have a requirement that caching must be on the order of minutes rather than days. This would be very difficult to implement with a headless browser when you have tens of thousands of pages.

Universe of SPAs

The biggest issue we have with the Fragment Spec, however, is that there just isn't enough data out there.

Think about it. How many SPAs are there on the web relative to server side rendered websites? Among those SPAs, how many use the fragment spec? Among those, how many do it "right" in all aspects?

Answer: not many.

So, when Google is making predictions about how to answer a user's question, they rely not just data from your website, but all the data they have from all the websites similar to yours. In other words, it is easier for Google to predict the accuracy of a server side website than a client side rendered SPA using the fragment spec because there is so much more supporting data.

I don't think this situation will last forever, but it is they reality today.

What if

I was talking with Matias Niemela, member of the AngularJS core team about my presentation and he challenged many of my assumptions.

  • What if Google actually has already started to index client side rendered JavaScript pages?
  • What if they are in fact collecting just as much data on SPAs that they do on static server side websites?
  • What if a web app loads very quickly and follows every best practice perfectly?

Sure...if all that were true, theoretically I suppose a Fragment Spec app could rank highly in a competitive space...

Risk

...but I guess what is comes down to is that I just don't like to take crazy, completely unnecessary risks with my livelyhood.

So what is the solution

So what is the solution?

Server prerendering

We believe the solution is to use server pre-rendering.

The idea is that during the initial page load, you serve up an actual web page. All users and search engines would get a server side generated web page. The page would be exactly the same as if you had a completely static server side website, except that you also include references to JavaScript resources that can be downloaded asyncronously in the background.

Server Pre-Rendering

Once the JavaScript for your web app has downloaded, it takes over control. Any subsequent clicks or actions by a user are handled by the web app (using HTML5 Pushstate). The server is only hit for API calls. Search engines don't run JavaScript so they would continue to get server rendered pages for every link on the site. If you want to see an example of how this would work, check out...

Twitter

...Twitter.

Do you remember when all pages in Twitter were hashbang? In fact, they played a huge part in the creation of the Google fragment spec. In May of 2012, however, Twitter decided to move from client rendering to server rendering. They said the primary reason was speed. Twitter uses Ruby on the server side and with their solution, the client side JavaScript gets fully rendered pieces of the page from the server.

Isomorphic JavaScript

Since it is possible to have JavaScript on the server now with Node.js, why not have the client and server render in the same way using the same JavaScript code? Well, there are a number of frameworks out there that allow you to do just that.

You will notice that my favorite client side framework, Angular, is not on this list. I heard rumors last year that the Angular team was actually working on server side rendering.

Jeff Whelpley and Misko Hevery

So, in January at ng-conf I asked Misko about it. It turns out the Angular team had indeed been working on an isomorphic component for Angular but they ended up dropping it. They decided the future is on the client side and that search engines will eventually catch up.

Do'ah.

Looking on the bright side, Misko did say that they got something working at one point. That means it is technically possible to create an Angular server side rendering engine.

So, how hard could it be?

Epic Fail

It turns out, really hard.

In retrospect, I think my first attempt to was a failure because I didn't fully understand the problem. I was trying to create the perfect generic solution right off the bat.

You don't know what you don't know, right?

Fortunately, I realized within a couple weeks that it wasn't going to work and I shifted my focus toward a simplier, more straightforward solution.

2nd Attempt

Instead of creating a framework right off the bat, I decided to start in my comfort zone. At a high level this meant simply building an Angular app and then manually duplicating all rendering logic from Angular controllers and templates to the server side Node.js controllers and templates.

Not an elegant solution, but I knew 1) it would accomplish my goal of server pre-rendering and 2) it would give me the opportunity to learn about isomorphic JavaScript at a much deeper level so that I could create a more elegant solution in the future.

Client Server Routing

The same exact routing configuration file is used for the client and server. On the client this configuration file is passed into the Angular UI Router and on the server it is passed into the Express router. It was important to make sure the routes were exactly the same because we want to be able to render any route on either the client or the server.

Component Focused

Everything in my Angular client side app is component-focused. I broke down every page into smaller and smaller pieces so that when I added the server side, I would be replacing one small component at a time instead of one huge, complicated page.

Client Server Controllers

The controllers are very similar but do deviate from each other from time to time. Still, I tried to keep the code looking the same line-for-line whenever possible.

Client Server Templates

In order to do a like for like, we created our own JavaScript-based template engine. The client tempaltes are compiled into HTML using a Grunt plugin and then packaged into the main JS file. I had to create some server side functions to mimc the Angular syntax, but I got close with my like for like comparison.

Demo

For the demo we will go to answers.gethuman.com. This app is still in beta.

Notice when you add ?server=true to any page that it is only the server rendered content. Once the Angular client app loads, it will re-render part of the page with the appropriate client side bindings.

DRYing code

So, we have a working version now the quick and dirty way. The next step is to refactor the separate client/server code into one unified framework.

Takeaways

For the the things I want you to takeaway from this talk, I came up with the 4 I's:

  1. Intrigue - SEO starts with creating an awesome product that people love, but to actually achieve a top ranking in a competitive space you need to follow the Google Webmaster Guidelines.
  2. Initial load - The Fragment Specification can help get your client side web application indexed, but we believe you will rank higher if you use server pre-rendering instead.
  3. Isomorphic - To implement server pre-rendering you can either choose from one of the Isomorphic JavaScript frameworks I mentioned or implement it yourself manually by duplicating your rendering logic on the client and server. The duplication solution is not elegant, but it gets the job done and buys you time to either migrate to another framework or build your own.
  4. Iterate - For software development in general, don't think you can build a perfect tool from scratch in one shot. This is especially true if you don't fully understand what you are trying to build. Your goal with the first couple iterations should be to simply get something working functionally and reveal all unknowns. Only then can you start thinking about what the ultimate version of your tool should look like.

shout outs

Thanks to everyone who provided feedback and suggestions for this presentation. Christian and Adam are my partners in crime at GetHuman. Matias, Lukas and Donn are actually helping me build our framework. Pete and Alex reviewed this presentation and gave me some really good suggestions.

links

thank you

Leave a comment -

Profile Image

Jeff Whelpley

Chief Architect | GetHuman