**How install dataparksearch onto Ubuntu Feisty (x86_64) LTS - command line only** I have not tried it under Ubuntu - but a simpler approach that I used under Debian: - Fetch a 4.49 snapshot of dpsearch. - untar the tarball. - go to the dpsearch-4.9-xxx directory - edit debian/rules and change the configure line to match the other options that you might want (e.g. change the --with-pgsql to --with-mysql if using myslq. - type "fakeroot dpkg-buildpackage -uc -us -sa" to build the debian package (it will be one directory above the current one and called dpsearc-xxx.de where xxx depends on the version etc.) - installe the debian as "sudo dpkg -i dpsearch-xxxx.deb" - the configuration files will be in /usr/etc/dpsearch, var directory in /usr/var/lib/dpsearch etc. - If you are running a local repository then it can be added to using the "dpsearch-xxx.changes" file. Keeps it nicely in the debian package management system. Will improve this later to automatically build the version for each db. Amit what about - aptitude install dataparksearch? - sudo su (easier to do now... !) - apt-get update - apt-get install nano - add new debs (see seperate doc) to /etc/apt/sources.list (use nano /etc/apt/sources.list - make changes - ctrl-O to save - ctrl-X to exit) - apt-get update - apt-get install make - apt-get install apache2 - apt-get install php5 - apt-get install libapache2-mod-perl2 - apt-get install libapache2-mod-perl2-dev - apt-get install zlib1g-dev - apt-get install zlib1g (might already be installed) - apt-get install mysql-server - apt-get install libmysqlclient15-dev (to get mysql.h) - cd to toplevel - mkdir downloaded_software - wget http://www.dataparksearch.org/dpsearch-4.46.tar.gz -tar -zxf dpsearch-x.x.tar.gz - mysqladmin create search - cd dpsearch-x.x - apt-get install gcc - apt-get install aspell - apt-get install aspell-en (english dic) - ./install.pl - answer questions! - problem with aspell at the mo... fails everytime - make - make install (run as root!) - cd /usr/local/dpsearch/etc - cp indexer.conf-dist indexer.conf - nano indexer.conf - change DBAddr - make sure its correct... if you havent touched mysql you can use mysql://root@localhost...blah - ctrl-O to save - ctrl-X to exit - cp langmap.conf-dist langmap.conf - cp search.htm-dist search.htm - cp stopwords.conf-dist stopwords.conf - cp sections.conf-dist sections.conf - nano sections.conf and remove # from lines you wish to use, this is used by spider to add wait to certain bits. - nano search.htm and make DBAddr the same as you put in indexer.conf, you can also edit any of the html if you wish (scroll down the file to find it.. ) - nano stopwords.conf - scroll down to the line "StopwordFile stopwords/ja.sl" and place a # at the start of that line.. - /usr/local/dpsearch/sbin/indexer -Ecreate - it SHOULD bring back something like "blah, blah, blah - 42 queries sent, 42 succeeded, 0 failed" - if so :) - mkdir /var/www/cgi-bin - chomd 777 /var/www/cgi-bin (again just to make it work) - cp /usr/local/dpsearch/bin/search.cgi /var/www/cgi-bin/search.cgi - chmod 777 /var/www/search.cgi (just to make sure ;) - cd /etc/apache2/sites-available/ - nano default - find and change the following code: ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/ AllowOverride None Options ExecCGI -MultiViews +SymLinksIfOwnerMatch Order allow,deny Allow from all to this code: ScriptAlias /cgi-bin/ /var/www/cgi-bin/ AllowOverride None Options ExecCGI -MultiViews +SymLinksIfOwnerMatch Order allow,deny Allow from all * save and exit nano * /etc/init.d/apache2 restart * fire up a web browser and point to http:///cgi-bin/search.cgi * if you see a normal (but simple) web page with a search box.. then well done! thats the hardest part over :o) * if you see a load of gobbeldy gook then go back over the bit about /etc/apache2/sites-available/default bit again.. it means the cgi is getting presented as plain text and not being executed as a perl binary.. check the permissions on both the file and dir.. now we have to bang in some URLs to search and off we go.. to do this - cd /usr/local/dpsearch/etc/ - nano urls (make a file called urls to hold info about the urls...) Then you need to add a URL and tell the search engine how to search it.. a simple one could be something like this: (Copy and paste the following code if you wish..) ############# # # Simple URL List # ############# # # scan the C4 news website index page only Period 7d # scan again in 7 days. Server page http://www.channel4.com/news/ # # ###################### - Ctrl-o to save and then ctrl-x to quit nano... - Now you have to connect the url file to the indexer.conf file by adding a link to the urls file.. first scroll down the indexer.conf file and find the section on Server - something like this: ########################## #Server [Method] [Subsection.......... # This is the main command.......... #to describe web-space....... #.......... #.......... #.......... ############ just add at the end of that section the following line: Include urls (note there is no leading hash..!) * ctrl-o then ctrl-x to save and exit nano * next you need to index the site.. in the shell type /usr/local/dpsearch/etc/sbin/indexer This will show you in real time the page being scanned.. and should return some information once complete. * Pop along to your web front end (that cgi-bin/search.cgi page) and try typing in something off the CH4 news website, hopefully it will return some results. * Read some of the documents in /docs/samples folder (in the datapark folder) and start by changing some of the elements within the IRL * Take a look at http://www.dataparksearch.org/dpsearch-indexcmd.en.html for more info on URL settings.. read the bit about urls in the instructions provided by dpsearch (see /usr/local/dpsearch/docs/ for more info) Hopefull this has been useful in some way, I use it for specific searchengines and because i like the technology around search engines. Another search engine i have used and dabbled with is phpdig.net, and have managed to get it to index around 300 websites (a huge index for a php based system) with the average return time for a search request of around 5seconds. The dataparksearch engine is much quicker... Next I will be looking at the java based search engine Nutch.. http://en.wikipedia.org/wiki/Nutch Adam