Sunday, July 21, 2013

Dashing through (Zero) Windows

One of the neatest devices Sony has ever sold is the Sony Dash.  For those unfamiliar, the Dash is/was Sony's take on the Chumby platform.  Most of what it does is now done better by tablet computers, but it is still far-and-away the best nightstand alarm clock ever conceived, with separate controls for alarms (nearly infinite) and screen wake/sleep (also nearly infinite).  In fact, it serves so well as an alarm clock that I use it to teach my kids when it's OK to be up for the morning. ("If the screen is sleeping, you should be sleeping.  If the screen is awake, you can be awake.")

A few days ago, both of my Dash's mysteriously stopped working, or rather stopped being able to connect to Sony's back-end servers through ssm.internet.sony.tv, which basically turns the Dash into an expensive brick.  We did suffer a power outage, but everything else on my home network recovered OK, and I had no problems reaching anything on the Internet from any of my other devices.  So naturally I assumed the problem was on the Sony end.  In the process of (and after) getting nowhere with Sony's online forum community and tech support, I did a bunch of troubleshooting on my home network.  I double-checked configs on my firewall and home switch.  I rebooted everything.  I wiped and re-configured my firewall.  I even swapped my trusty SSG5 for an SRX210 I had in my lab.  Nothing changed.

So I hypothesized that the problem was in the FiOS network outside my house.  I turned on the hotspot on my mobile phone and connected the Dash to the Internet through it.  Success!  So what could Verizon be doing to cause the problem?  Luckily, I had a Wireshark trace from the Dash during the outage.  So I traced a HTTP connection to ssm.internet.sony.tv from the laptop, both via my home network and via the mobile hotspot.  I noticed that when I tried to connect to Sony via my home network, the browser session merely timed out after a while.  When I tried to connect to Sony via my mobile hotspot, I got an immediate message about a bad proxy.  That sounded like a successful connection from a plain old browser.

So I compared the trace from the Dash to the traces from the laptop.  It took several reads, but I finally saw something interesting in the TCP session setup.  (You can follow along if you like with the trace from the Dash and the trace from the laptop via the mobile.)  In frame #2 of the Dash trace, you can see that the SYN-ACK packet that comes back from Sony indicates a TCP Window size of 0, whereas the SYN-ACK packet in the laptop trace indicates a TCP Window size of 5512 (see frame #3).  I double-checked this against the laptop trace via my home network, and the laptop received a SYN-ACK with Win=0 as well when connected to my home network.  So now I definitely had a smoking gun.  Moreover, I had an indication that Verizon probably wasn't the problem; it was more likely my firewall proxying the TCP three-way handshake.  I was using the SRX at this point, so I executed the set security flow tcp-session no-syn-check and set security flow tcp-session no-sequence-check commands and committed the changes.  I cleared the old sessions to ssm.internet.sony.tv from the SRX session table and fired up the Dash on my home network.  Success!

Now what's interesting is that what changed at the time of the outage was not my firewall config, it was something else.  The jury is still out as to whether the issue was caused by a change Sony made or Verizon made, but somebody changed something that broke the default TCP proxy behavior of both my SSG and SRX.  If I ever get to the bottom of that, I'll post an update.

-- Greg