23 Jan 2017
Towards a realtime streaming architecture
Outline of the streaming architecture we are standardising around in the data tribe at Sky Betting & Gaming
09 Dec 2016
When Hadoop tools disagree with each other
We recently saw an 8-year spike on one of our graphs recently. It caused much amusement when it was tweeted out, but there’s actually a good story behind this apparent 8-year lag in data processing.
02 Dec 2016
Guardian SSL Actively Harmful
Earlier this week the Guardian posted a piece about how they'd switched to SSL everywhere, how hard this was, and why it's a great thing. Using SSL/TLS is generally a good thing, but in this case it's actually harmful.
25 Nov 2016
Our Top 10 Big Data News Sources
Keeping on top of an area of technology that is as rapidly moving as the big data ecosystem is hard. Our data tribe share some of their resources for keeping up to date.
05 May 2016
Measuring Impala performance using Apache JMeter
Our web performance teams regularly use JMeter to load test our websites to identify performance of the various components involved, but it turns out you can actually use it to directly test the performance of a Hadoop datawarehouse.
30 Mar 2016
Google Phone Numbers in Spark
Our CRM team rely on having clean phone numbers to push SMS messages to customers, various people have tried creating some logic for this validation but surely this is a solved problem.
17 Nov 2015
How to DBA - All Your Base conference experience
Some thoughts from this year’s All Your Base conference on the past, present and future of how we manage databases.
21 Oct 2015
Using Zookeeper Locks in Jruby
How to use Zookeeper locks in Jruby
05 Aug 2015
Distributed Database Query Optimisation with Lego
Lego based notes from a workshop on Hive, going from basic unpartitioned tables through to partitioned Impala tables with stats computed and backed by parquet.