Swank Wiki
Recently Visited

Swank v0.04.04

Swank::ElasticSearch

All indexing and searching capabilites are provided by the Swank::ElasticSearch class.

It uses Search::Elasticsearch to talk to an ElasticSearch server or OpenSearch server, which can run locally.

For small web sites, the server can run on the local machine listening on localhost (127.0.0.1) only, so that no other machines can access it. The extra security setup for elasticsearch is not needed in this case.

Special Features:

Any page with a "noindex" field set to true will not be indexed.

Link href values are indexed in a "_link" field, so links to a specific page can be found by doing $sys->search(' _link:"/my/page" ').

Code is indexed in a "_code" field, so you can search for pages containing certain code strings.

Dates are detected by the iso format, and are indexed by day only (without the time) for effeciency.

TODO: there are plans to support per-field indexing options (private, unindexed, keyword, etc), but that depends on field-level meta data being implemented. 

Also, lucene has no syntax for searching in non-parsed keyword fields.  If you really need a specific value, without additional data in front or behind, you must check the values returned by the search to be sure.

Advanced and/or dubious searches

See the Java documentation for the full query syntax documentation.

Date search:   date:[YYYYMMDD YYYYMMDD]

All records with "field" defined (not sure if/why this works):  field:[0 TO 0]

All records without "field" defined, by getting all pages and removing those with fields (all pages have a path field):  path:[0 TO 0] AND NOT field:[0 TO 0]

Field defined but empty???

Requires:

Search::Elasticsearch

ElasticSearch or OpenSearch

Provides:

search( 'search string', [ options ... ] )

Provides the search function for the Swank system. Returns a Swank::ElasticSearch::Results object.

'search string' is a lucene search string.  Syntax summary:

word  -- does a full text search for word in any field

field:word  -- does a search for word in the given field only

word AND word  -- does a boolean search. AND, OR, and NOT must be upper case

"a phrase" / field:"a phrase"  -- searches for an exact phrase

[begin end] / field:[begin end]  -- does a range search

options may be:

sort => 'fieldname [desc], fieldname ...'  -- sorts results by the given field name. The default is to sort by RELEVANCE.

sort => \&sub  -- a sort subroutine may also be given, subject to the restrictions for sort subroutines being passed to other perl classes.  This means $a and $b will not work; use this syntax instead:

                     sub ($$) { $_[0] cmp $_[1] } 

refresh => 1 -- will close and reopen the internal lucene objects.

Overrides:

write()  -- indexes page objects after they are written.

delete()  -- de-indexes deleted objects.

Support pages:

/search

/searchbox

Swank::ElasticSearch::Search

This helper class does the actual searching and indexing.  It should not be necessary to access it directory for any reason.

API:

index()  -- called by Swank::ElasticSearch::write to index a page.

delete() -- called by Swank::ElasticSearch::delete to un-index a page.

search() -- called by Swank::ElasticSearch::search to do searches.

reindex() -- clears the index and reindexes all pages returned by $sys->storage->_enumerate

optimize() -- optimizes the index.

Swank::ElasticSearch::Results

Encapsulates the results from a lucene search.

API:

length() -- number of hits returned by the search.

next() -- returns the page object for the next hit in the search.

get( index )  -- returns a specific hit in the search, numbered 0 .. length()-1

all() -- returns a list of page objects for all hits from the search.