Sunday, December 14, 2014

Como no morir en el intento: primer tutorial de HortonWorks Sandbox

# For english, go to the ERROR section below#
Hola amigos!!!

Ahora que he estado jugando con BigData, me he topado con un sinnúmero de problemas para poder terminar el primer tutorial del Sandbox de HortonWorks. Realmente, no puedo entender como una empresa 1) genera un VM,  2) publica un tutorial que no funciona y 3) los foros de discusión no atinan a resolver los problemas. Me recuerda a esos libros de matemáticas que decían "dejamos al lector la demostración de teorema XXX ya que es TRIVIAL"

Todo camina bien con Beeswax y es hasta que intentamos ejecutar el ejemplo de Pig que nos damos cuenta que nuestra vida ha terminado.

El primer error es el infierno del 1070 que tiene una solución súper fácil, pero que me tomó bastante resolverlo. Únicamente es necesario escribir la sentencia -useHCatalog en la sección de "pig arguments".

El segundo problema es con las bibliotecas de hive. Es necesario agregar las clases que hacen falta al  archivo hive.tar.gz en la ubicación del hdfs /apps/webhcat/

Finalmente, por extrañas razones, es necesario modificar el valor de la propiedad  templeton.hive.properties, la manera más fácil es iniciar el Ambari e ir a la sección de hive para reemplazar el valor. Adicionalmente, modifiqué de acuerdo con la recomendación del wizard los valores de hive.tez.container.size y del hive.tez.java.opts.

A continuación los pasos para resolverlo:

---- ERROR (1) ----
pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
---- STEP BY STEP (1) ----

  1. At the end of the "pig script" section, there is a text box that says "pig arguments".
  2. Type -useHCatalog and then press Enter
  3. It should highlight in gray and add a new empty text box
  4. Use org.apache.hive.hcatalog.pig.HCatLoader(); instead of "org.apache.hcatalog.pig.HCatLoader();"
Before:
After:

---- ERROR (2) ----
ls: cannot access /hadoop/yarn/local/usercache/hue/appcache/application_1418520658982_0007/container_1418520658982_0007_01_000002/hive.tar.gz/hive/lib/slf4j-api-*.jar: No such file or directory
ls: cannot access /hadoop/yarn/local/usercache/hue/appcache/application_1418520658982_0007/container_1418520658982_0007_01_000002/hive.tar.gz/hive/lib/commons-lang3-*.jar: No such file or directory
ls: cannot access /hadoop/yarn/local/usercache/hue/appcache/application_1418520658982_0007/container_1418520658982_0007_01_000002/hive.tar.gz/hive/hcatalog/lib/*hbase-storage-handler-*.jar: No such file or directory
Error: Could not find or load main class hive.metastore.sasl.enabled=false

---- STEP BY STEP (2) ----

This is because the file hive.tar.gz doesn't has the classes (slf4j-api-*.jar, commons-lang3-*.jar & *hbase-storage-handler-*.jar)

  1. Start a SSH session to the sandbox using the command: ssh root@127.0.0.1 -p 2222
  2. The password is: hadoop
  3. Copy the current hive.tar.gz to your current directory:  hadoop fs -copyToLocal /apps/webhcat/hive.tar.gz
  4. Decompress the file to a folder named hive: tar zxvf hive.tar.gz

  5. Copy the "slf4j-api-1.7.5.jar" to the hive/lib by typing:  cp /usr/hdp/2.2.0.0-1084/hadoop/lib/slf4j-api-1.7.5.jar ./hive/lib
    1. If you do not find the jar, type: cd /
    2. Look for the file path by typing:  find | grep "slf4j-api-1.7.5.jar"
    3. Go back to your working directory (for me is root, type: cd ~)
  6. Copy the "commons-lang3-3.1.jar" to the hive/lib by typing: cp /usr/hdp/2.2.0.0-1084/pig/lib/commons-lang3-3.1.jar ./hive/lib
    1. If you do not find the jar, type: cd /
    2. Look for the file path by typing:  find | grep "commons-lang3"
    3. Go back to your working directory (for me is root, type: cd ~)

  7. Go to any internet explorer like Chrome
  8. Go to Google and search for the maven repository: maven repository 
  9. Search for the hbase-storage-handler jar, it is important the version "0.13.0" because versions "0.12.x" & "0.13.1" doesn't work, type:  a:"hive-hcatalog-hbase-storage-handler" AND v:"0.13.0"


  10. Download the jar file by pressing the jar word
  11. Move the file to any folder you like (mine is "BigData")
  12. Go to the VirtualBox Console
  13. Select the VM Sandbox and press the "Configuration" button
  14. Go to the "Shared Folder" section
  15. Add the folder where the *hbase-storage-handler*.jar is located, (mine is "BigData")
  16. Mark the automount option

  17. Create the "lib" folder by typing: mkdir ./hive/hcatalog/lib
  18. Copy the hbase-storage-handler by typing: cp /media/sf_BigData/hive-hcatalog-hbase-storage-handler-0.13.0.jar ./hive/hcatalog/lib
    1. My folder is sf_BigData, validate yours
  19. The permissions should be Ok, otherwise type:
    1. chmod 777 ./hive/lib/*.*
    2. chmod 777 ./hive/hcatalog/lib/*.*
  20. Remove the previous hive.tar.gz by typing: rm hive.tar.gz
  21. Gzip the hive directory by typing: tar -zcvf hive.tar.gz hive

  22. Backup the previous hive.tar.gz file in hdfs by typing: hadoop fs -cp /apps/webhcat/hive.tar.gz /apps/webhcat/hive_bkp.tar.gz
  23. Remove current hive.tar.gz by typing: hadoop fs -rm /apps/webhcat/hive.tar.gz
  24. Upload to hdfs://apps/webhcat/ the hive.tar.gz file by typing:  hadoop fs -copyFromLocal hive.tar.gz /apps/webhcat/
  25. Try again you pig script, the error should be gone!!

Full log is attached here

---- ERROR (3) ----

Error: Could not find or load main class hive.metastore.sasl.enabled=false
---- STEP BY STEP (3) ----

  1. Go to your Internet Explorer and go to http://127.0.0.1:8000/about/
  2. Look for the Ambari Service and Enable It

  3. Once the Service is successfully started, go to http://127.0.0.1:8080/
  4. Login by typing 
    1. user: admin
    2. password: admin
  5. Go to the "Hive" section, using the left bar
  6. Go to the "Configs" tab
  7. Scroll down to the "Advanced webhcat-site" section
  8. Go to the "templeton.hive.properties" value and edit it by typing the following: hive.metastore.local=false,hive.metastore.uris=thrift://localhost:9083,hive.metastore.sasl.enabled=false
  9. Go to the "Advanced hive-site" section
  10. Go to "hive.tez.container.size" value and edit it by typing: 512
  11. Go to "hive.tez.java.opts" value and edit it by typing: -server -Xmx410m -Djava.net.preferIPv4Stack=true
  12. Press the Save Button
  13. Restart All Services




"Advanced hive-site" section

Try again your pig script, the error should be gone!!