Platinum Sponsor

Big Data & Politics – Analysing Free-text Docs in HDInsight

In this session we will walk through how we collected States of Jersey Hansard transcripts from the web, analysed them using HDInsight and loaded them into a data warehouse to be queried and visualised.

The transcripts are unstructured, free-text documents with all the errors and inconsistencies a human can devise! So how do we do it? How can we impose some structure and turn it into something we can work with?

Technologies we will cover include:

  • HDInsight
  • MapReduce
  • Hive
  • Data Quality Services
  • Sql Server Tabular/DAX
  • Python

This session will give you an introduction to using these technologies and help you to understand how you can use them and how to get started.
Presented by Charles Robertson at SQLBits XII
Tags (no tags)
  • Downloads
    Sorry, there are no downloads available for this session.
  • SpeakerBIO
    Charles_Robertson.jpg
    Charles has nearly ten years experience, first as a .NET software developer, but then crossing over to become a BI consultant working across the full stack of MS technologies. Being based in Jersey he mainly works in Financial Services, and this can often take him to the UK or Europe. His current professional interests are Big Data and Data Science, and helps organise a local tech meetup group: techtribes.je. He can be found on twitter (@charles_jsy) and LinkedIn
    http://www.altius.je http://feeds.feedburner.com/AltiusConsultingCommunity
  • Video
    The video is not available to view online.
  • Session Files Explorer
    The network name cannot be found.