Hadoop: Everything you wanted to know but were afraid to ask
This session is an introduction to the Apache Hadoop framework and its benefits for processing large volumes of data - all from the perspective of a SQL Server professional. What are the challenges in the RDBMS world? When do we need to work with large data sets (multi-terabyte)? Why do we need a new data framework?
I will present an overview and comparison of several commercial Hadoop distributions (Cloudera, MapR, HortonWorks, Microsoft, IBM, Intel, and EMC-Greenplum), and will discuss Hadoop features, components and extensions. I will show what is available to start a small proof-of-concept Hadoop project, including hardware, software, network, installation, configuration, testing, and tuning details. We will walk through several demos on a small desktop, 3-node Hadoop cluster with data access via ODBC from familiar Windows tools (Excel, SQL Server Integration Service, SQL Server Reporting Services, etc.).