In today’s enterprise, Apache Hadoop has found a place alongside established databases and data warehouses.
The Apache Hadoop software library is an open-source framework that allows for the distributed processing of large data sets across clusters of computers. With Apache Hadoop, you can:
- Store massive amounts of modern data (social media, click-stream data, Web logs, sensor data etc.) cost effectively across commodity services with the Hadoop distributed file system (HDFS)
- Distribute data processing across clusters of coordinated nodes using MapReduce – a batch processing paradigm to scale up or down without system interruption
- Manage resources using YARN – the Hadoop operating system – to handle non-MapReduce workloads more effectively
SAP is committed to providing choice among Hadoop distributions. It has established reseller agreements with Intel and Hortonworks. In addition, SAP solutions have been certified to work with Cloudera and MapR.
SAP HANA platform for Big Data + Hadoop
By using SAP HANA platform and Hadoop together, you combine instant results with infinite storage for real-time insights of Big Data. SAP supports integration with Hadoop in several ways, including the SAP HANA smart data access capability, SAP IQ native MapReduce API, SAP Data Services Hadoop connector, and SAP BusinessObjects BI universes.
Learn how you can manage different types of Big Data - including Hadoop - with a real-time platform delivered by SAP HANA.
SAP HANA: smart data access
The smart data access capability of SAP HANA allows you to access information from remote sources – such as a SAP IQ or Hive data warehouse – using virtualization techniques. With smart data access, you define a virtual table using SAP HANA studio. Query execution is pushed down to the remote server, so that only a minimal amount of data is returned to SAP HANA to complete the query. As a result, you can perform queries across tiered storage without the cost and effort of moving data.
SAP IQ: Native MapReduce API and flexible Hadoop integration
SAP IQ provides a native MapReduce application programming interface (API). With this API, programmers can develop and deploy MapReduce programs natively in SAP IQ to analyze very large data sets covering structured, semi-structured and unstructured data formats. SAP IQ also flexible Hadoop integration and federation techniques – enabling you to combine Hadoop data with structured data for new insights.
SAP Data Services: Hadoop connector
SAP Data Services delivers a Hadoop connector that provides high-performance reading from and loading into Hadoop. With SAP Data Services, you can push down text data processing into Hadoop to extract relevant text from files, Web logs, surveys, social, media, geo-location data, and more. This enables you to pre-process large volumes of data in Hadoop to identify relevant text, which can be rapidly loaded in to SAP HANA or SAP IQ for real-time analysis with structured data to obtain insights never seen before.
SAP BusinessObjects BI: Hadoop universe
The SAP BusinessObjects BI platform offers an analytic modeling environment and BI tools. The analytic modeling environment enables you to build data models (called universes) that can be used by the BI tools to answer queries. You can deploy these universes against data in SAP HANA, SAP IQ, or non-SAP data sources including Hive and HDFS. End users can interact with universes using BI tools such as SAP Predictive Analysis, SAP Lumira, and SAP BuinessObjects Explorer.