Big Data Analytics with Hadoop 3
上QQ阅读APP看书,第一时间看更新

Using Hive

As opposed to relational data warehouses, nested data models have complex types such as array, map, and struct. We can partition tables based on the values of one or more columns with the PARTITIONED BY clause. Moreover, tables or partitions can be bucketed using CLUSTERED BY columns, and data can be sorted within that bucket via SORT BY columns:

  • Tables: They are very similar to RDBMS tables and contain rows and tables.
  • Partitions: Hive tables can have more than one partition. They are mapped to subdirectories and filesystems as well.
  • Buckets: Data can also be pided into buckets in Hive. They can be stored as files in partitions in the underlying filesystem.

The Hive query language provides the basic SQL-like operations. Here are few of the tasks that HQL can do easily:

  • Create and manage tables and partitions
  • Support various relational, arithmetic, and logical operators
  • Evaluate functions
  • Download the contents of a table to a local directory or the results of queries to the HDFS directory