Towards a realtime streaming architecture
Outline of the streaming architecture we are standardising around in the data tribe at Sky Betting & Gaming
When Hadoop tools disagree with each other
We recently saw an 8-year spike on one of our graphs recently. It caused much amusement when it was tweeted out, but there’s actually a good story behind this apparent 8-year lag in data processing.
Guardian SSL Actively Harmful
Earlier this week the Guardian posted a piece about how they'd switched to SSL everywhere, how hard this was, and why it's a great thing. Using SSL/TLS is generally a good thing, but in this case it's actually harmful.
Our Top 10 Big Data News Sources
Keeping on top of an area of technology that is as rapidly moving as the big data ecosystem is hard. Our data tribe share some of their resources for keeping up to date.
Measuring Impala performance using Apache JMeter
Our web performance teams regularly use JMeter to load test our websites to identify performance of the various components involved, but it turns out you can actually use it to directly test the performance of a Hadoop datawarehouse.
Google Phone Numbers in Spark
Our CRM team rely on having clean phone numbers to push SMS messages to customers, various people have tried creating some logic for this validation but surely this is a solved problem.
How to DBA - All Your Base conference experience
Some thoughts from this year’s All Your Base conference on the past, present and future of how we manage databases.
Using Zookeeper Locks in Jruby
How to use Zookeeper locks in Jruby
Distributed Database Query Optimisation with Lego
Lego based notes from a workshop on Hive, going from basic unpartitioned tables through to partitioned Impala tables with stats computed and backed by parquet.