Posts

Showing posts from May, 2016

Web Crawling using Apache Nutch + MySQL + Solr

If you have come here, then you probably know what Nutch is and you are looking for a test to integrate it with MySQL and Solr or Elasticsearch. So, lets get started. I am not going to guide you with the steps, but give you this awesome link which helped me run a test crawl in couple of hours. Technically, it should have taken not more than half hour, but I had to spend more time trying to figure out the issues mentioned below on my Mac - El Capitan. Firstly follow the steps mentioned in this link.  https://anil.io/post/92/apache-nutch-2-2-mysql-and-solr-5-2-1-tutorial In the process of trying to make things work, you may come across the following issues. I have provided solution for each below 1) Don't ignore this step [STEP 2 in the link above] export JAVA_HOME = "$(/usr/libexec/java_home -v 1.8)" export NUTCH_JAVA_HOME = "$(/usr/libexec/java_home -v 1.8)" 2) Issue when creating the database nutch in MySQL. [STEP 3. 2 in the li...