Hola amigos!!!
Ahora que he estado jugando con BigData, me he topado con un sinnúmero de problemas para poder terminar el primer tutorial del Sandbox de HortonWorks. Realmente, no puedo entender como una empresa 1) genera un VM, 2) publica un tutorial que no funciona y 3) los foros de discusión no atinan a resolver los problemas. Me recuerda a esos libros de matemáticas que decían "dejamos al lector la demostración de teorema XXX ya que es TRIVIAL"
Todo camina bien con Beeswax y es hasta que intentamos ejecutar el ejemplo de Pig que nos damos cuenta que nuestra vida ha terminado.
El primer error es el infierno del 1070 que tiene una solución súper fácil, pero que me tomó bastante resolverlo. Únicamente es necesario escribir la sentencia -useHCatalog en la sección de "pig arguments".
El segundo problema es con las bibliotecas de hive. Es necesario agregar las clases que hacen falta al archivo hive.tar.gz en la ubicación del hdfs /apps/webhcat/
Finalmente, por extrañas razones, es necesario modificar el valor de la propiedad templeton.
A continuación los pasos para resolverlo:
---- ERROR (1) ----
pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]---- STEP BY STEP (1) ----
- At the end of the "pig script" section, there is a text box that says "pig arguments".
- Type -useHCatalog and then press Enter
- It should highlight in gray and add a new empty text box
- Use org.apache.hive.hcatalog.pig.HCatLoader(); instead of "org.apache.hcatalog.pig.HCatLoader();"
Before:
After:
---- ERROR (2) ----
ls: cannot access /hadoop/yarn/local/usercache/hue/appcache/application_1418520658982_0007/container_1418520658982_0007_01_000002/hive.tar.gz/hive/lib/slf4j-api-*.jar: No such file or directory ls: cannot access /hadoop/yarn/local/usercache/hue/appcache/application_1418520658982_0007/container_1418520658982_0007_01_000002/hive.tar.gz/hive/lib/commons-lang3-*.jar: No such file or directory ls: cannot access /hadoop/yarn/local/usercache/hue/appcache/application_1418520658982_0007/container_1418520658982_0007_01_000002/hive.tar.gz/hive/hcatalog/lib/*hbase-storage-handler-*.jar: No such file or directory Error: Could not find or load main class hive.metastore.sasl.enabled=false
---- STEP BY STEP (2) ----
This is because the file hive.tar.gz doesn't has the classes (slf4j-api-*.jar, commons-lang3-*.jar & *hbase-storage-handler-*.jar)
- Start a SSH session to the sandbox using the command: ssh root@127.0.0.1 -p 2222
- The password is: hadoop
- Copy the current hive.tar.gz to your current directory: hadoop fs -copyToLocal /apps/webhcat/hive.tar.gz
- Decompress the file to a folder named hive: tar zxvf hive.tar.gz
- Copy the "slf4j-api-1.7.5.jar" to the hive/lib by typing: cp /usr/hdp/2.2.0.0-1084/hadoop/lib/slf4j-api-1.7.5.jar ./hive/lib
- If you do not find the jar, type: cd /
- Look for the file path by typing: find | grep "slf4j-api-1.7.5.jar"
- Go back to your working directory (for me is root, type: cd ~)
- Copy the "commons-lang3-3.1.jar" to the hive/lib by typing: cp /usr/hdp/2.2.0.0-1084/pig/lib/commons-lang3-3.1.jar ./hive/lib
- If you do not find the jar, type: cd /
- Look for the file path by typing: find | grep "commons-lang3"
- Go back to your working directory (for me is root, type: cd ~)
- Go to any internet explorer like Chrome
- Go to Google and search for the maven repository: maven repository
- Search for the hbase-storage-handler jar, it is important the version "0.13.0" because versions "0.12.x" & "0.13.1" doesn't work, type: a:"hive-hcatalog-hbase-storage-handler" AND v:"0.13.0"
- Download the jar file by pressing the jar word
- Move the file to any folder you like (mine is "BigData")
- Go to the VirtualBox Console
- Select the VM Sandbox and press the "Configuration" button
- Go to the "Shared Folder" section
- Add the folder where the *hbase-storage-handler*.jar is located, (mine is "BigData")
- Mark the automount option
- Create the "lib" folder by typing: mkdir ./hive/hcatalog/lib
- Copy the hbase-storage-handler by typing: cp /media/sf_BigData/hive-hcatalog-hbase-storage-handler-0.13.0.jar ./hive/hcatalog/lib
- My folder is sf_BigData, validate yours
- The permissions should be Ok, otherwise type:
- chmod 777 ./hive/lib/*.*
- chmod 777 ./hive/hcatalog/lib/*.*
- Remove the previous hive.tar.gz by typing: rm hive.tar.gz
- Gzip the hive directory by typing: tar -zcvf hive.tar.gz hive
- Backup the previous hive.tar.gz file in hdfs by typing: hadoop fs -cp /apps/webhcat/hive.tar.gz /apps/webhcat/hive_bkp.tar.gz
- Remove current hive.tar.gz by typing: hadoop fs -rm /apps/webhcat/hive.tar.gz
- Upload to hdfs://apps/webhcat/ the hive.tar.gz file by typing: hadoop fs -copyFromLocal hive.tar.gz /apps/webhcat/
- Try again you pig script, the error should be gone!!
Full log is attached here
---- ERROR (3) ----
Error: Could not find or load main class hive.metastore.sasl.enabled=false---- STEP BY STEP (3) ----
- Go to your Internet Explorer and go to http://127.0.0.1:8000/about/
- Look for the Ambari Service and Enable It
- Once the Service is successfully started, go to http://127.0.0.1:8080/
- Login by typing
- user: admin
- password: admin
- Go to the "Hive" section, using the left bar
- Go to the "Configs" tab
- Scroll down to the "Advanced webhcat-site" section
- Go to the "templeton.
hive. properties" value and edit it by typing the following: hive.metastore.local=false,hive.metastore.uris=thrift://localhost:9083,hive.metastore.sasl.enabled=false - Go to the "Advanced hive-site" section
- Go to "hive.
tez. container. size" value and edit it by typing: 512 - Go to "hive.tez.java.opts" value and edit it by typing: -server -Xmx410m -Djava.net.preferIPv4Stack=true
- Press the Save Button
- Restart All Services
"Advanced hive-site" section
Try again your pig script, the error should be gone!!
Thanks, it was helpful. Only the first error I had to fix with the latest sandbox version (2.2).
ReplyDeletevery very helpful. I was able to solve my issues with the help of this blogpost. Many many thanks.
ReplyDeleteThanks a ton... It was very helpful
ReplyDeleteIn my case using single node sandbox can't find the /apps/webhcat directory. where should I copy the modified hive.tar.gz ?
ReplyDeleteThanks a lot for the information.
Hello Amit. My apologies, I didn't see you post before. Do you need still some help?
ReplyDeleteThanks dude!
ReplyDeletethanks
ReplyDelete