PigLatin
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
100_customers.csv
10_pig_transformation_script.pig
11_transformation_for_hive.csv
12_transform_data_for_hive.pig
13_group_in_pig.pig
14_group_input_file.csv
15_remove_NULL.pig
16_NULL_values_input.csv
17_input_to_HDFS.txt
18_pig_load_to_HDFS.pig
19_input_pig_to_hive.csv
1_Sqoop_Import_Command.txt
200_join.pig
200_orders.csv
20_pig_to_hive.pig
21_hive_table_creation.hql
22_input_for_sort.csv
23_sort_in_pig.pig
24_input_for_removing_duplicates.csv
25_removing_duplicates.pig
26_input_parallel_tasks.csv
27_SET_multiple_reducers.pig
28_PARALLEL_multiple_reducers.pig
29_customers_input.csv
2_example.conf
30_orders_input.csv
31_join_operation.pig
32_customers_input.csv
33_orders_input.csv
34_replicated_join.pig
35_input_TEZ_mode.txt
36_pig_script_tez_mode.pig
37_input_UDF_invoke.csv
38_UDF_invocation.pig
39_hive_query.sql
3_pig_demo.txt
40_hive_managed_table.sql
41_input_hive_external_table.csv
42_input_partition_hive_table.csv
43_hive_partitioned_table.sql
44_hive_bucketed_table.sql
45_hive_table_with_ORC.sql
46_sequence_file_hive.sql
47_input_delimiter_hive.tsv
48_hive_table_tab_delimiter.sql
49_input_to_load_from_local.csv
4_pig_wordcount.pig
50_create_hive_table_for_local_load.sql
51_input_to_load_from_hdfs.csv
52_create_hive_table_for_hdfs_load.sql
53_create_hive_table_for_SELECT_load.sql
54_input_file_for_compressed_data.csv
55_hive_table_for_compressed_data.sql
56_first_input_file_for_join.csv
57_second_input_file_for_join.csv
58_first_hive_table_for_join.sql
59_second_hive_table_for_join.sql
5_Pig_Schema_Less_Relation.pig
60_input_file_for_subquery.csv
61_hive_create_table_for_subquery.sql
62_input_file_for_ordering_output.csv
63_hive_create_table_for_order_by.sql
6_input.csv
7_Pig_Relation_With_Schema.pig
8_hive_to_pig.pig
9_pig_transformation_input.txt
README.md
_config.yml

README.md

Welcome to HDPCD Repository

You can use this repository for preparing the Hortonworks Data Platform Certified Developer certification. The link for the certification is https://hortonworks.com/services/training/certification/exam-objectives/#hdpcd

Following objectives are tested through this certification

## DATA INGESTION
- Import data from a table in a relational database into HDFS
- Import the results of a query from a relational database into HDFS
- Import a table from a relational database into a new or existing Hive table
- Insert or update data from HDFS into a table in a relational database
- Given a Flume configuration file, start a Flume agent
- Given a configured sink and source, configure a Flume memory channel with a specified capacity

## DATA TRANSFORMATION
- Write and execute a Pig script
- Load data into a Pig relation without a schema
- Load data into a Pig relation with a schema
- Load data from a Hive table into a Pig relation
- Use Pig to transform data into a specified format
- Transform data to match a given Hive schema
- Group the data of one or more Pig relations
- Use Pig to remove records with null values from a relation
- Store the data from a Pig relation into a folder in HDFS
- Store the data from a Pig relation into a Hive table
- Sort the output of a Pig relation
- Remove the duplicate tuples of a Pig relation
- Specify the number of reduce tasks for a Pig MapReduce job
- Join two datasets using Pig
- Perform a replicated join using Pig
- Run a Pig job using Tez
- Within a Pig script, register a JAR file of User Defined Functions
- Within a Pig script, define an alias for a User Defined Function
- Within a Pig script, invoke a User Defined Function

## DATA ANALYSIS
- Write and execute a Hive query
- Define a Hive-managed table
- Define a Hive external table
- Define a partitioned Hive table
- Define a bucketed Hive table
- Define a Hive table from a select query
- Define a Hive table that uses the ORCFile format
- Create a new ORCFile table from the data in an existing non-ORCFile Hive table
- Specify the storage format of a Hive table
- Specify the delimiter of a Hive table
- Load data into a Hive table from a local directory
- Load data into a Hive table from an HDFS directory
- Load data into a Hive table as the result of a query
- Load a compressed data file into a Hive table
- Update a row in a Hive table
- Delete a row from a Hive table
- Insert a new row into a Hive table
- Join two Hive tables
- Run a Hive query using Tez
- Run a Hive query using vectorization
- Output the execution plan for a Hive query
- Use a subquery within a Hive query
- Output data from a Hive query that is totally ordered across multiple reducers
- Set a Hadoop or Hive configuration property from within a Hive query

Hope you guys like it. You can visit my LinkedIn profile at https://www.linkedin.com/in/milindjagre/