Thursday, January 14, 2016

Apache Pig Scripts and examples 1



Apache Pig Scripts and examples:
I am listing few pig scripting examples with the results to help get a better understanding around apache pig coding.

1. Total record count with Apache Pig script:
In the below post we are reading a normal text file and producing the output as total record count on the console as well as storing the output on disk.

@ Source data file details:
Name: filesgrads-2.txt
Delimeter: tab
File Header: CDS_CODE ETHNIC GENDER GRADS UC_GRADS YEAR
Total records: 20,747

File data sample:

...


@ Running Pig grunt shell in local mode:
Execute below on Unix prompt:

$ pig -x local



@ Output criteria:
Finding the total record count in the file.

@ Pig script:
Please execute the following script at the grunt shell prompt and review the final output.

/* load data file in pig memory/reference variable */
data = load '/Users/Neo/Downloads/filesgrads-2.txt' using PigStorage('\t');

/* group all data in the file and store in pig reference variable */
grp_stud_all = GROUP data ALL;

/* apply count function on file data and store in pig reference variable */
total_stud_count = FOREACH grp_stud_all GENERATE COUNT(data);

/*store or display final out put on console */
dump total_stud_count;

@ Final output review:
Please note that, with dump statement the output is displayed on the console of the grunt shell.



@ Storing the results:
To have the output in the file on the disk, we use store command with appropriate folder location. Please issue following on grunt prompt.

grunt> store total_stud_count into '/Users/xxx/tot_stud_recs.txt'

You can review below to if the command was successful or not.



@ Review results from the stored file:
Please note, the above command creates a folder named tot_stud_recs.txt and the actual output is stored in a file called part-r-00000 and not in a file tot_stud_recs.txt



Thanks!

@ Reference/s:

  • https://pig.apache.org
  • http://pig.apache.org/docs/r0.14.0/


0 comments:

Post a Comment