You are here: System » SolrPlugin

Solr Plugin

Enterprise Search Engine for Foswiki based on Solr

About Solr

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface.

Screenshots

Installation

The below installation procedure assumes that you are going to install Solr as well as Foswiki on the same server using Linux.

Foswiki plugin installation

You do not need to install anything in the browser to use this extension. The following instructions are for the administrator who installs the extension on the server.

Open configure, and open the "Extensions" section. "Extensions Operation and Maintenance" Tab → "Install, Update or Remove extensions" Tab. Click the "Search for Extensions" button. Enter part of the extension name or description and press search. Select the desired extension(s) and click install. If an extension is already installed, it will not show up in the search results.

You can also install from the shell by running the extension installer as the web server user: (Be sure to run as the webserver user, not as root!)
cd /path/to/foswiki
perl tools/extension_installer <NameOfExtension> install

If you have any problems, or if the extension isn't available in configure, then you can still install manually from the command-line. See https://foswiki.org/Support/ManuallyInstallingExtensions for more help.

Download Solr

The current plugin requires Solr 9 or later. Download version 9.8.0 it from the here.

Extract software, create user, install system service

tar xzf solr-9.8.0.tgz solr-9.8.0/bin/install_solr_service.sh --strip-components 2 
./install_solr_service.sh ./solr-9.8.0.tgz -n

Note that the -n option prevents the service from starting yet. We will start it once everything has been adjusted to our needs.

Configure Solr service

First relocate logs to a standard unix directory: mv /var/solr/logs /var/log/solr. Edit /etc/default/solr.in.sh and append the following lines:

SOLR_HEAP="1024m"
SOLR_LOGS_DIR=/var/log/solr
SOLR_OPTS="$SOLR_OPTS -Djetty.host=localhost -Ddisable.configEdit=true"
SOLR_TIMEZONE=GMT+1
ENABLE_REMOTE_JMX_OPTS=false
SOLR_HOST="127.0.0.1"
SOLR_JETTY_HOST="127.0.0.1"
SOLR_IP_ALLOWLIST=127.0.0.1
SOLR_ADMIN_UI_DISABLED=true
SOLR_REQUESTLOG_ENABLED=false
SOLR_MODULES=langid

Edit /var/solr/log4j2.xml:

  • set log level from info to warn

Install Foswiki configuration set

cd /var/solr/data
cp -r <foswiki-dir>/solr_9/cores .
mkdir configsets
cd configsets
ln -s <foswiki-dir>/solr_9/configsets/foswiki_configs 
chown -R solr.solr /var/solr

Updating from a previous configuration set

An updated SolrPlugin might come with a newer configuration set, i.e. a newer schema.xml pr solrconfig.xml files. Make sure that these files coming with an update are installed to the solr server as well. This will be taken care of when the foswiki_configs directory is linked into the solr server's configsets directory. Note however that any local changes you made to these files will be overwritten by the update. You might eigher create a config set of your own and adjust the core definition accordingly to make use of the newly created config set, or you need to merge changes into the standard foswiki_configs set of files.

Increasing the security limits

You may get a warning when starting the solr service in the next step saying along the lines of

[WARN] *** Your open file limit is currently 1024.
 It should be set to 65000 to avoid operational disruption.
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh

To increase the file limit create a file /etc/security/limit.d/solr.conf with

solr             soft    nofile          65000
solr             hard    nofile          65000
solr             soft    nproc           65000
solr             hard    nproc           65000

The warning should be gone when starting the service.

Start solr service

service solr start

Test

cd <foswiki-dir>/tools
./solrindex topic=Main.WebHome

… should produce Indexing Main.WebHome

cd <foswiki-dir>/bin
./rest /SolrPlugin/search

… should return a JSON response from Solr showing the recently indexed topic

Skin integration

SolrPlugin comes with a skin overlay - called solr - that will replace the upper left search boxes in PatternSkin with a solr-driven auto-suggest search box. To switch that on use

   * Set SKIN = solr, pattern

in your SitePreferences.

ALERT! Note that you won't need to enable the solr skin overlay in case you are using NatSkin as it comes with support for SolrPlugin out of the box.

Preference settings

There are a couple of preference settings that you may set in your SitePreferences in order to customize some basic parameters of the solr search user interface:

Parameter Description Default
SOLR_DATEFORMAT date format for search results, see JQMomentContrib for documentation dddd, Do, MMMM YYYY, HH:mm
SOLR_DEFAULTSORT default sorting order of search results score desc
SOLR_DEFAULTWEB the web to search for, e.g. %BASEWEB%. all defaults to a global search all
SOLR_EXACTSEARCH boolean switch to select between two kinds of search; set this to true to get a sharper result set based on your query false
SOLR_EXTRAFILTER solr query filter added ontop of the user-specified query  
SOLR_INSTANTSEARCH boolean switch to fire up search while you type false
SOLR_TOPICSEARCH boolean switch to enable topic search by default false
SOLR_NUMROWS default number of search results returned per page 10
SOLR_QUERYFIELDS specify the qf solr parameter; note that this will disable SOLR_EXACTSEARCH settings see solrconfig.xml
SOLR_INCLUDEWEB regular expression of webs to be listed in the web facet  
SOLR_EXCLUDEWEB regular expression of webs not to be listed in the web facet  

Commandline scripts

There is a set of tools to interact with the Solr index from the commandline. These can either be used to index Foswiki manually - as we did in above tests - as well as for searching or deleting specific documents in the index.

The set of tools comes in two variants, one for normal single-host Foswiki installations and for virtual hosting using VirtualHostingContrib

The virtual-hosting aware scripts have a prefix virtualhost-... and take an optional host=<domain> parameter to specify the virtual domain to interact with. When not specified will the script be executed for each domain in turn as configured in VirtualHostingContrib. Only exception is solrjob (see below).

solrindex / virtualhosts-index

cd <foswiki-dir>/tools
./solrindex ...

Parameter Description Default
web="..." the web to be indexed; if undefined all webs will be indexed all
topic="<web>.<topic>" the topic to be indexed; use this parameter to index one specific topic  
mode="full/delta" mode of operation: full will unconditionally index all content as specified by web or topic; delta will only index content that has changed since the last time the script was run delta
optimize="on/off" optimize the Solr database by de-fragmenting its internal segments for better performance; this is normally not required unless a full indexing of larger chunks of content is performed; note that optimizing the Solr index might require considerable time and I/O resources on the filesystem of the server off

solrdelete / virtualhosts-delete

cd <foswiki-dir>/tools
./solrdelete ...

Parameter Description Default
<lucene-query> delete topics matching the query do nothing

For instance to empty your index completely use:

./solrdelete *:*

solrjob

cd <foswiki-dir>/tools
./solrjob ...

This tool is a wrapper around solrindex and will use either solrindex or virtualhost-solrindex depending on the host commandline parameter. It is mainly used in cronjobs. In contrast to solrindex a locking & throttling strategy is used to prevent multiple indexers being started simulataneously.

Parameter Description Default
-f / --file <file-path> index the topic that the given file points to  
-h / --host <virtual-domain> specifies the virtual domain to operate on (only makes sense when running VirtualhostingContrib); Or specify all to perform the operation on all known virtual hosts  
-m / --mode full/delta mode of operation (see solrindex above) delta
-t / --throttle <seconds> number of seconds to wait until the indexing process is started; note that any other calls to solrjob are prevented from entering the indexing loop as well 5

Using Solr search on the commandline

cd <foswiki-dir>/bin
./rest /SolrPlugin/search ...

Parameter Description Default

TODO

Setting up an indexing strategy

Before using SolrSearch and get back results you will need to index your content completely and do so repeatedly to keep up with changes in the Foswiki content base. This is basically achievable in various ways:

  1. full indexing: index all of the content from start to end
  2. delta indexing: index topics that changed since the last time (delta) indexing was performed
  3. realtime indexing: monitor changes in the Foswiki store and fire up indexing as close to the actual change event as possible
  4. online indexing: index content changes as part of the content being saved

We will discuss these strategies and line out their advantages. A combination of a few of the above ways will then make up the recommended indexing strategy for Foswiki content.

Full indexing

./solrindex mode=full optimize=on

This will crawl all webs, topics and attachment and submit them to the Solr server, which will build up the search index. This can take a considerable amount of time depending on the amount of content and number of users registered to your site, so you may prefer to do it at a quiet time.

Note that full indexing is required the first time you installed SolrPlugin. From there on will you be able to use delta indexing to update the index incrementally as content changes in Foswiki.

It is recommended to only perform a full indexing again once in a week or preferably in longer intervals.

Delta indexing

./solrindex 

This will inspect all of the content base and check for changes since the last time the content was added to Solr. Any update content will be added to the index as required. The delta indexing procedure will also look up all of the index and delete those documents from it where the original topic in the Foswiki content base has been removed.

Delta indexing is a relatively fast operation that is best performed every 15 minutes or so. Don't shorten the intervals of delta indexing too much as that would create additional load on the server where no content is found to be delta-indexed.

Realtime indexing

This mode of operation requires a separate service to be installed called foswiki-watch. This is a perl script that monitors any actions in Foswiki's event.log.

Note that this is only a "near-realtime" indexing behavior as the used script to perform the indexing is configured to throttle the procedure for a given amount of time defaulting to 5 seconds. So any change to the content will then show up within 5 seconds after the event.

Assuming you are running Foswiki on a Linux server with a systemd master server, use the following commands to install the foswiki-watch service

cp <foswiki-dir>/tools/systemd/foswiki-watch.service /etc/systemd/system
cp <foswiki-dir>/tools/systemd/foswiki-watch.defaults /etc/default/foswiki-watch

Configure /etc/default/foswiki-watch to match your installation. Available settings:

  • FOSWIKI_ROOT: the path to your foswiki, e.g. /var/www/foswiki
  • FOSWIKI_WATCH_EVENTS_LOG: file to watch, e.g. /var/www/foswiki/working/logs/events.log
  • FOSWIKI_WATCH_PARALLEL: number of indexers to start in parallel at max, default 1
  • FOSWIKI_WATCH_THROTTLE: number of seconds to wait before starting an indexerm default 1 second
  • FOSWIKI_WATCH_VHOSTING: boolean switch to enable operate in a virtual hosting setting, default 0
  • FOSWIKI_WATCH_DEBUG: boolean switch to enable debugging

If you are running Foswiki using VirtualHostingContrib and all your vhosts are located in /var/www/vhosts then set the FOSWIKI_WATCH_EVENTS_LOG to a glob path such as /var/www/vhosts/*/working/logs/events.log to watch all event logs of all vhosts. Don't forget to enable FOSWIKI_WATCH_VHOSTING.

Finally enable and start the service with

systemctl enable foswiki-watch
systemctl start foswiki-watch

Indexing is then reported to the system's log service.

Online indexing

Not recommended, however …

This mode of operation refers to a way to update the search index immediately as part of the save operation performed by Foswiki on behalf of the user.

The biggest advantage here is that changes to the content base will immediately show up in the search index reflecting the exact changes being made to the content base. Note however that this can significantly cause performance issues interacting with Foswiki as indexing a topic an take quite some time.

There are a couple of flags to switch on/off online indexing in your configuration.

Enable / disable indexing content as part of a save operation:

$Foswiki::cfg{SolrPlugin}{EnableOnSaveUpdates} = 0;

Enable/disable updates when a new attachment has been uploaded:

$Foswiki::cfg{SolrPlugin}{EnableOnUploadUpdates} = 0;

Enable/disable updates when a topic or attachment has been moved or deleted:

$Foswiki::cfg{SolrPlugin}{EnableOnRenameUpdates} = 0;

Setting up cronjobs

Below will set up performing

  • a full indexing every Saturday midnight and
  • a delta indexing every 15 minutes

0 0 * * 6 <foswiki-dir>/tools/solrjob --mode full
*/15 * * * * <foswiki-dir>/tools/solrjob --mode delta 

HELP Add --host all to index all virtual hosts, or --host <hostname> to index a single virtual host.

Recommendations

By now we are able to orchestrate a couple of ways how to keep up with changes in Foswiki while indexing it into an external database such as Solr.

There are a couple of pros and cons to keep in mind innate to every of the above methods. Also, your own business requirements might significantly shift any decision how and when to schedule crawling the content. Some of the criteria to keep in mind are:

  • size of content base
  • speed of indexing content determined by server resources
  • interactive performance as perceived by the user
  • real-time requirements for updates in search results
  • changes in access control structures such as:
    • new users being registered to Foswiki,
    • changing member ship in user groups,
    • changing clearance of user groups for specific content

What to keep in mind for full indexing

Especially changes in access control structures might affect clearance to content in a broader scale. As the indexing procedure caches the current authorization for a specific piece of content along with it, will a change to access control -- independent to any change of the content itself -- render access control incorrect as cached into the Solr index unless this content is indexed again. This is not a problem when the ACL of a single document is altered as this document is re-indexed again as part of the change event. No such re-indexing is triggered automatically when a user group changes or is granted more or less authorization for content. This will indeed only be reflected the next time a full indexing is performed.

Access control structures might be changing totally outside of Foswiki when using LdapContrib or PluggableAuthContrib where users and groups are provided by an external identity provideer. These user and group records immediately affect Foswiki granting access to documents (there is some caching involved here as well, but let's ignore this for now). Only after indexing affected documents again will a search on the index exclude / include new content users have access to when visiting the page directly.

Therefore a regular full indexing is required, presumably once a week or once a day during off times.

The runtime of a full indexing run depends on the size of your content base as well as the size of the user base. Both directly affect the throughput indexing content. It is strongly recommended to plan full indexing during off times when the system isn't used otherwise. Also, make sure that two full indexing runs don't overlap as that would constantly increase load on the involved servers.

In those cases where a full indexing run over all of the content base exceeds off times (e.g. starting Friday night, doesn't finish on Monday morning) will you need to add more server resources. There are multiple ways to do so. Step one would be to use separate servers for both Foswiki and Solr. Please read up on how to scale Solr beyond a single-node installation as has been outlined in above configuration.

Correctness of search index

A search index might show "incorrect" results for example when the content it indexes doesn't actually exist anymore. So users get a positive search hit but won't be able to access the content anymore: both content base and search index are out of sync. Keeping the search index "correct" is of importance for any indexing strategy.

A search index might also be "incorrect" when it doesn't reflect the access rights a users has got on the content itself. That is: the search engine shall only return search results for content that the user has clearance for. No such search result shall ever be returned for content that the user isn't allowed to access of even get to know that it exists.

In SolrPlugin any Foswiki ACLs are added to the Solr database while content is indexed. So ACLs are checked as an additional filter on any search operation that an authenticated user might perform.

Correctness of the search index as we discuss it now is more concerned with the time it takes for to keep any content change in Foswiki in sync as it is being indexed and added to the Solr database.

There are two general categories for indexing content that we want to compare now:

  • online indexing: index content as part of the interaction performed by the user
  • offline indexing: perform content indexing independent from the user interacting with the system online

Offline indexing is performed by the solrindex script as well as the solrjob wrapper. Both might be used in a cronjob or by the foswiki-watch service as described above.

Looking at online indexing there is a price in doing so that we should keep in mind before switching it on.

Indexing will be part of a save, delete or rename operation performed by the user and thus directly increase the perceived time for the user to interact with the system while applying content changes.

You may decide yourself when trading interactive performance against negative side-effects due to "incorrect" search indexes. It is recommended to rather sacrifice a short period of time for the search index not being quite up-to-date rather than slowing down the interactive performance of the system by hooking the indexing procedure into the online operations of Foswiki.

Using Solr for WebSearch, WebChanges and WikiUsers

It is recommended to replace Foswiki's default AutoViewTemplatePlugin with AutoTemplatePlugin. This will allow you to replace the default WebSearch, WebChanges and SiteChanges as well as WikiUsers with a Solr-driven interface for better usability and performance.

Configure AutoTemplatePlugin by adding the following {ViewTemplateRules}

$Foswiki::cfg{Plugins}{AutoTemplatePlugin}{ViewTemplateRules} = {
...
  'WebChanges' => 'WebChangesView',
  'SiteChanges' => 'SiteChangesView',
  'WebSearch' => 'SolrSearchView',
  'WikiUsers' => 'SolrWikiUsersView',
...
};

The SolrWikiUsersViewTemplate implements a person search driven by Solr. It allows you to facet on properties as defined in the UserForm such as:

  • filter by location
  • filter by profession
  • filter by organization

There is a specific configuration option for Foswiki to detect which topics are actually user profile pages.

$Foswiki::cfg{SolrPlugin}{PersonDataForm} = '(*UserForm)';

Any topic that has got a UserForm attached to it will participate in the person search interface at %USERWEB%.WikiUsers. Note that the value at {SolrPlugin}{PersonDataForm} specifies a Solr filter query that might be customized and extended as required. For example, to also include any topic that has got a PersonTopic DataForms attached to it use:

$Foswiki::cfg{SolrPlugin}{PersonDataForm} = '(*PersonTopic OR *UserForm)';

Finally, you'll need to make this configuration accessible in wiki applications such as the WikiUsers view template. Add '{SolrPlugin}{PersonDataForm}' to the {AccessibleCFG} list as in

$Foswiki::cfg{AccessibleCFG} = [
    '{ScriptSuffix}',
    '{LoginManager}',
    '{AuthScripts}',
...
    '{SolrPlugin}{PersonDataForm}',
];

Macros

SolrPlugin comes with a set of search macros tailored to the extensive capabilities of Solr's responses to search queries. All of them make use of the same set of options to render a response as listed in SOLRSEARCH.

SOLRSEARCH

This is the most important macro. It allows you to interact with the Solr server and display results within wiki applications. An example search looks like this:
%SOLRSEARCH{"test"
  format="   1 $web.$topic$n"
  sort="date desc"
}%

This will list the 10 most recently changed topics that match the string "test".

To list the 20 most recently changed topics topics that have the string "test" in their name use:
%SOLRSEARCH{"topic_search:test"
  format="   1 $web.$topic$n"
  sort="date desc"
  rows="20"
}%

SOLRSEARCH allows you to use the full power of the Lucene query language. This works with syntactically correct boolean queries like "title:foo OR body:foo". Consult the Lucene Query Syntax guide to learn more about how to form more complicated queries.

SOLRSEARCH also allows you to run a query in dismax mode. The dismax query parser only supports a subset of the Lucene syntax, but is highly tolerant of all sorts of strange user input. The query syntax is uses is familiar to many search engine users, and supports +/- and quotes for groupings words. The edismax mode adds several more powerful features, though still short of what is offered by the full Lucene syntax.

Parameter Description Default
id a search can be cached optionally for the time of the current request, for example using id="solr1". further calls to %SOLRFORMAT can make use of the cached solr response to render it independent from the location of the %SOLRSEARCH call on the wiki page  
search query string: depending on the search type this can either be a free-form text (type=dismax), a valid lucene query (type=standard) or a combination of both (edismax) *:*
type dismax/edismax/standard: query type standard
fields list of fields to be returned in the result; by default all fields in solr documents are returned; communication between Foswiki and the solr search can be optimized by specifying only those fields that you are interested in while rendering the response *, score
Flags:
jump on/off: jump to the topic specified explicitly in the seach string on
lucky on/off: jump to the first result found off
highlight switch on/off highlighting of found terms off
spellcheck switch on/off spellchecking to propose alternative spellings in case no search result was found off
Pagination:
start integer index within the result from where to start listing results 0
rows maximum number of documents to return 10
Filter parameters:
web filter by web: this can be any webname all
contributor filter by contributor to a topic  
filter lucene query to filter results  
extrafilter additional lucene filter query (see SolrSearchBaseTemplate for the difference in filter and extrafilter  
reverse on/off - reverts sorting if switched on; note: this overrides sorting order specified in sort off
sort sorting expression; examples: score desc, date desc, createdate, topic_sort  
checktopics on/off - if enabled found topics that don't exist anymore are excluded off
Dismax Parameter:
boostquery a raw query string (in the solr query syntax) that will be included with the user's query to influence the score. example: type:topic^1000 will boost results of type topic see solrconfig.xml and SolrSearchBaseTemplate
queryfields list of fields and their boosts giving each field a significance when a term was found in them. the format supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that fieldOne has a boost of 2.3, fieldTwo has the default boost, and fieldThree has a boost of 0.4 … this indicates that matches in fieldOne are much more significant than matches in fieldTwo, which are more significant than matches in fieldThree see solrconfig.xml and SolrSearchBaseTemplate
phrasefields list of fields and their boosts similar to queryfields. this parameter may contain fields and boosts that pharses (specified in quotes) are matched against. boosting those fields higher than their counterpart in queryfields allows you to prefer phrase matches over separate word matches see solrconfig.xml and SolrSearchBaseTemplate
Grouping:
group name of the field to group results by  
groupfunction    
groupquery    
groupsort   score desc
grouplimit   1
groupoffset    
Faceting:
facets list of facets to be rendered during search; each facet can be a title=name pair specifying the facet name and the title label used to display it in the result; example:
%MAKETEXT{"Webs"}%=web, %MAKETEXT{"Topic type"}%=field_TopicType_first_s
 
facetquery query to be used for a facet query  
facetoffset used to page through a list of facets being returned by a search  
facetlimit maximum number of values to be displayed per facet; this is a list of pairs name=integer specifying a per-facet limit; example: 50, tag=100, contributor=10, category=10 will constraint the global limit of facet values to be returned to 50, tags to 100, list the top 10 contributors in the hit set as well as the 10 most used categories 100
facetmincount minimum frequency of a facet to be included in the result 1
facetprefix prefix string of a facet to be included  
facetdatestart part of a date facet describing the start of a time interval NOW/DAY-7DAYS
facetdateend part of a date facet describing the end of a time interval NOW/DAY+1DAYS
facetdateother part of a date facet describing the time intervals excluding the one specified with facetdatestart and facetdateend before
hidesingle comma separated list of facets to be hidden if there's only one choice left  
disjunctivefacets list of facets that are queried using OR; so searching within one facet will expand the search instead of drilling down facet values are combined using AND
combinedfacets list of facets where values are queried in each of them using OR; for example listing field_ProjectMembers_lst and field_ProjectManager_s will result in a lucne filter of the form field_ProjectMembers_lst:WikiGuest OR field_ProjectManager_s:WikiGuest  
Formating results:
correction format string for corrections proposed by the spellchecker Did you mean <a href='$url'>$correction</a>
header format string prepended to the result  
format format string used to render each hit in the result set  
nullformat format string used when no results were found  
separator format string used to separate hit results rendered using format  
footer format string appended to the result  
header_interesting format string prepended to more-like-this queries (see %SOLRSIMILAR)  
format_interesting format string used to render more-like-this results  
separator_interesting format string used to separate hit results in more-like-this queries  
footer_interesting format string appended to more-like-this queries  
include_interesting regular expression terms must match in a more-lile-this result  
exclude_interesting regular expression terms must not match in a more-lile-this result  
header_group format string for grouped results  
format_group format string for grouped results  
separator_group format string to separate results in grouped results  
footer_group format string for grouped results  
include_group regular expression groups must match  
exclude_group regular expression groups must not match  
header_<facet> format string prepended to a facet result  
format_<facet> format string used to render a facet value  
separator_<facet> format string used to separate facet values  
footer_<facet> format string appended to facet results  
include_<facet> regular expression facet values must match to be displayed  
exclude_<facet> regular expression facet values must not match to be displayed  

SOLRFORMAT

When a solr response has been cached using the id parameter to SOLRSEARCH, it can be reused by subsequent calls to %SOLRFORMAT.

%SOLRSEARCH{"test" 
  id="solr1"
  facets="web,contributor"
  facetlimit="web=10, contributor=10"
}%

<noautolink>
*Results:*
%SOLRFORMAT{"solr1"
  format="   1 [[$web.$topic][$topic]]$n"
}%

*Webs:*
%SOLRFORMAT{"solr1"
  format_web="   * $key ($count)$n"
}%

*Contributors:*
%SOLRFORMAT{"solr1"
  format_contributor="   * $key ($count)$n"
  exclude_contributor="UnknownUser|AdminGroup|AdminUser|RegistrationAgent|TestUser"
}%
</noautolink>

SOLRSIMILAR

SOLRSIMILAR allows to return a list of similar topics given the current one.

Parameter Description Default
"..." query string referencing the document(s) to return similar ones for id:System.SolrPlugin
like list of fields used to compute similarity category, tag
fields list of fields and their boost value to be included in result items web, topic, title, score
filter restricts results to those matching this filter type:topic
include switches on/off inclusion of the matched document found in the query parameter off
rows maximum number of results to return 100
boost    
mintermfrequency    
mindocumentfrequency    
minwordlength    
maxwordlength    

SOLRSCRIPTURL

returns a link to a SolrSearch with the given parameters pre-set.

Parameter Description Default
"..." or search search string to render a link for  
id get a link to the search defined by SOLRSEARCH  
topic name of the search topic to jump to WebSearch
union a list of fields whose values can be selected in a union (using an "or" operator)  
multivalue a list of fields that may be searched by multiple values  
start    
sort    
<field_name> any field defined in in solr's schema.xml  


---+++ Rest inteface

---++++ search

---++++ terms

---++++ similar

---++++ autocomplete

---+++ Commandline tools

---++++ solrstart

---++++ solrindex

---++++ solrdelete

---+++ Perl interface

---++++ registerIndexTopicHandler()

---++++ registerIndexAttachmentHandler()

Solr indexing schema

SolrPlugin comes with a custom schema to index general Foswiki data as defined in the <solr-home-dir>conf/schema.xml file. It offers support for generic DataForm values, so adding any new DataForm definition will allow to use those formfields for faceting directly without changing configurations or having to reindex the content.

The process of indexing content is configured on the Foswiki side which will crawl all webs, topics and their attachments thus creating lucene documents which are then sent over to the solr server. A lucene document is made up of fields of a certain type which defines the way the document should be processed by the solr server. This is configured in the schema.xml file.

While the schema is able to cover all Foswiki related data it is still kept generic enough to be used for non-wiki content as well.

Field types

This is the list of the most common field types used in the default schema. See the schema.xml for more exotic field types like point and location, useful for spatial search.

Type Description
string not analyzed, but indexed/stored verbatim
boolean boolean values (true, false)
binary the data should be sent/retrieved in as Base64 encoded strings
int, float, long, double default numeric field types. for faster range queries, consider the tint/tfloat/tlong/tdouble types
date the format for this date field is of the form 1995-12-31T23:59:59Z, and is a more restricted form of the canonical representation of dateTime. The trailing "Z" designates UTC time and is mandatory. Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z All other components are mandatory. Note: for faster range queries, consider the tdate type
text_ws a text field that only splits on whitespace for exact matching of words
text a general text field that has reasonable, generic cross-language defaults: it tokenizes with StandardTokenizer, removes stop words from case-insensitive "stopwords.txt", and down cases. At query time only, it also applies synonyms.
text_std same as text but without processing stopwords an synonyms
a general unstemmed text field - good if one does not know the language of the field. this field type is usful when searching for parts of a WikiWord |
text_generic same as text but also splits words on case change while generating word parts. text_substr general substring decomposition
text_prefix substring decomposition starting at the front of the string
text_suffix substring decomposition starting at the back of the string
text_spell generic text analysis for spell checking
text_sort this is a text field suitable for sorting alphabetically
text_rev a general unstemmed text field that indexes tokens normally and also reversed, to enable more efficient leading wildcard queries.
type a list of strings used to analyse different media types. these are analysed using the system's mime types table and generating meaningfull values; for example a gif image would be of type "gif", "image" and "attachment"

Fields

Name Type Multivalued Stored Description
access_granted string multivalued   this field controls view access of users to this topic or attachment in the search index; every query is augmented with an ACL check against this field; only users listed in this field are allowed view rights; special value is "all" when there are no view restrictions
edit_granted string multivalued   field holding the change rights of a user on this topic or attachment
attachment string multivalued stored list of all attachment names of this topic
author string   stored the name of the person that changed the document most recently
author_title string   stored title name of the person that changed the document most recently
catchall text_generic multivalued stored copy-field that gathers content from (allmost) all fields; this is the default search field for the "standard" query parser; note that fields to be queried can be configured per request using the "dismax" handler
category string multivalued stored list of categories this document is in; note: this field will only be used if Foswiki:Extensions/ClassificationPlugin is installed; it will populate it with the list of all categories up to TopCategory; content of this field is copied to category_search as well (see generic fields below)
comment text_generic   stored comment field of an attachment
concept string multivalued stored support for uima processing chain
container_id string   stored id of containing document, e.g. the topic this is a comment or attachment for
container_title string   stored title name of containing document
container_topic string   stored topic of containing document
container_url string   stored url of containing document
container_web string   stored web of containing document
contributor string multivalued stored list of users that contributed to this topic at some point in time
createauthor string   stored author of the initial version of this document
createauthor_title string   stored title name of the initial author of this document
createdate tdate   stored date when the initial version of this document was created
date tdate   stored time the the document was changed last
form string   stored name of the form attached to the current topic
icon string   stored icon to indetify the rendition for this document
id string   stored unique identifier for each document; this is the external id usable in applications; there's an internal solr document id not related to this field
language string   stored language of the current document; this may be specified explicitly using the CONTENT_LANGUAGE preference, or set to "detect" to let the solr update chain detect the language automatically
macro string multivalued   list of wiki macros being used in this topic
name string   stored filename of an attachment
outgoing string multivalued stored list of all outgoing links; this information is used to detect backlinks
parent string   stored parent topic of the current topic
phonetic phonetic multivalued   holds the phonetic analysis of the most important search fields
charnorm text_charnorm   multivalued result of the character normalization analysis
preference string multivalued stored this field catches all topic preferences. each preference is captured in a dynamic field as well (see dynamic fields below)
sentence text_generic multivalued stored support for uima processing chain
size tint   stored size of an attachment in bytes
spell text_spell multivalued   used for spellchecking
state string     used by comments or any other application that tracks specific states of a document, such as "new", "unapproved", "approved", "draft", "unpublished", "published", …
text_prefix text_text_prefix multivalued   holds substring analysis of the most important search fields, starting at the front
text_suffix text_text_suffix multivalued   holds substring analysis of the most important search fields, starting at the back
summary text_generic   stored this is a plainified summary of the topic text
tag string multivalued stored list of tags assigned to this document; note: this field will only be used if Foswiki:Extensions/ClassificationPlugin is installed; content of this field is copied to category_search as well (see generic fields below)
text text_generic     document text
thumbnail string   stored url to thumbnail representation of this document; mostly used for images
timestamp tint   stored epoch time when the document was added to the index
title string   stored title of a document; a topic title is read from a TopicTitle formfield, a TOPICTITLE preference variable or defaults to the topic name itself; for attachments this is the filename with the extension stripped off
topic string   stored name of the topic
type type   stored holds the type facet of the document; this is "image" for all kinds of images, "video" for all kinds of videos, "topic" for Foswiki topics and the verbatim file extension for everything else; note: plugins like Foswiki:Extensions/MetaCommentPlugin might use specific types as well (like "comment" in this case)
url string   stored url used to access the document being indexed
version float     current version of the topic
webcat string   stored combined web-category facet
web string   stored name of the web this document is located in
webtopic string   stored concatenation of the web and topic part

Dynamic fields

Dynamic fields are generated based on the content properties of the document to be indexed. Fields are specified using some kind of wildcard in schema.xml. When a document is indexed, the wildcard will be expanded to create a proper field name. Dynamic fields allow to apply specific ways of analyzing fields based on their name, as well as cover fields that aren't known in advance, like the name of all formfields of a DataForm that ever could be invented.

When SolrPlugin is about to index a DataForm attached to a topic, it tries to guess the data type of each formfield. Normally, Foswiki does not specify any type information within a DataForm definition. Exceptions are (1) date: these are mapped to a *_dt field for the iso date and an *_i field for the epoch seconds (2) checkbox, select, radio, textboxlist: these are potentially multi-value fields and are thus indexed in a *_lst field.

Every other formfield is stored into an *_s field as well as into a *_search, *_prefix and , *_substr, *_sort and *_std fields. These capture the exact content while a slightly different analysis of the text.

DataForm formfields are mapped to lucene document fields by prepending the field_* prefix to prevent name clashes with other dynamic fields generated on the fly. So for example a formfield ProjectManager will be stored in field_ProjectManager_s and field_ProjectManager_search. Likewise a select+multi formfield ProjectMembers will be stored in field_ProjectMembers_lst as it is a multivalued field.

If a formfield name already comes with one of the below suffixes (_i, _l, _f, _dt, etc) then this suffix will be used instead of any heuristics trying to derive the best field type for the lucene field. That way DataForm fields although untyped by Foswiki can be indexed type-specific nevertheless.

Similarly topic preferences are indexed using a preference_* prefix.

Name Type Multivalued Stored Description
*_i tint   stored fields with a _i suffix are indexed as an integer number
*_l tlong   stored fields with a _l suffix are indexed as a long integer
*_f tfloat   stored fields with a _f suffix are indexed as a float
*_d tdouble   stored fields with a _d suffix are indexed as a double precision float
*_b boolean   stored true, false
*_s string   stored dynamic field for unanalyzed text
*_std string not stored dynamic field for standard analysis, i.e. stopwords not being removed
*_t text_generic   stored generic text
*_dt tdate   stored a dateTime value
*_lst string multivalued stored this field is used for any multi-valued formfield in DataForms like, select, radio, checkbox, textboxlist
preference_* string   stored preference values such as preference_NAMEOFPREFERENCE_t
*_search text_generic   stored generic text, optimized for searching
*_sort text_sort   stored text optimized for sorting alphabetically

Copy fields

Finally, after having defined all field type there are some fields that are created by copying some source field to a destination field using the copyField feature of solr. So while most of a lucene document to be indexed is created by the crawler and indexer explicitly, some more are created automatically to facilitate specific search applications. The destination fields are then analysed using the dynamic field definitions as given above.

Source Destination
attachment catchall
attachment charnorm
attachment phonetic
attachment spell
category catchall
category category_search
category charnorm
category phonetic
comment catchall
comment charnorm
comment phonetic
comment spell
concept catchall
concept charnorm
concept phonetic
concept spell
field_* catchall
field_* charnorm
field_* phonetic
field_* spell
form catchall
form charnorm
form phonetic
form spell
name catchall
name charnorm
name phonetic
name spell
name name_std
name name_search
tag catchall
tag charnorm
tag phonetic
tag tag_search
text catchall
text charnorm
text phonetic
text spell
text text_prefix
text text_std
text text_ws
text text_suffix
text text_substr
title catchall
title charnorm
title phonetic
title spell
title title_first_letter
title title_prefix
title title_search
title title_sort
title title_std
title title_suffix
title title_substr
topic catchall
topic charnorm
topic phonetic
topic spell
topic topic_search
topic topic_sort
topic topic_std
type catchall
type charnorm
type phonetic
web spell
webtopic webtopic_search
web web_search
web web_sort
web web_std

---++ Templates

---+++ Structure of !SolrSearchBaseTemplate

---+++ Replacing !WebSearch and !WebChanges

---+++ Creating custom search interfaces

Dependencies

NameVersionDescription
Foswiki::Plugins::MultiLingualPlugin>=4.10Required
Foswiki::Contrib::JQMomentContrib>=1.0Required
Foswiki::Contrib::JQPhotoSwipeContrib>=1.0Required
Foswiki::Contrib::JQSerialPagerContrib>=2.0Required
Foswiki::Contrib::JQTwistyContrib>=1.0Required
Foswiki::Contrib::StringifierContrib>=6.00Required
Foswiki::Plugins::AutoTemplatePlugin>=1.0Optional
Foswiki::Plugins::ClassificationPlugin>=1.0Optional
Foswiki::Plugins::DBCachePlugin>=1Optional
Foswiki::Plugins::FilterPlugin>=2.0Required
Foswiki::Plugins::FlexWebListPlugin>=1.91Required
Foswiki::Plugins::ImagePlugin>=3.0Required
Foswiki::Plugins::JQueryPlugin>=6.00Required
Foswiki::Contrib::CacheContrib>=0Required
Linux::Inotify2>=2Required
HTML::Entities>=3.64Required
JSON::XS>=2.231Required
LWP::UserAgent>=5.820Required
Moo>=2.00Required
Types::Standard>=1.00Required
XML::Easy>0Required
Foswiki::Plugins::TopicTitlePlugin>1.00Required for Foswiki < 2.2

Change History

27 Jan 2025: shorten hightlight fragment; use nobody.png from JQueryPlugin instead of NatSkin
17 Jan 2025: added support for solr-9
14 Mar 2023: replaced iwatch with foswiki-watch service
26 Jan 2022: gave up on stopwords: removed stopwords filter from the solr schema
26 Sep 2019: performance improvements of indexer; implemented instant search; new crawler interface to index not only wiki content but also external sources; new filesystem crawler; eased configuring the search interface with preference variables; extended solr schema to cope with multiple data sources; improved handling of substring searches in text fields; require validation and authentication in rest handlers; removed hardcoded WorkflowPlugin support (plugins need to hook into the indexing api instead); added support for formfields of type number, percent, currency and bytes; improved indexing of used makros in a page; improved detection of outgoing links while indexing; changed handling of admin rights on content, i.e. not granting admin rights on external sources; improved autosuggestion dropdown search; added api to iterate over facet values while ignoring access rights
31 Jan 2019: reduce amount of presumably unrelated search results; improved language detection in solr; added fields name_std and name_search for better searchability of attachments; don't display wiki markup in search result summaries; added field macro to capture use of wiki macros
10 Oct 2018: mime types are now multivalued, e.g. and image is now tagged type: ["gif", "image", "attachment"]; better support for attachments listed in the autosuggest drop down box; the rudimentary type mapping is now based on the system mime types table and not using a typemap file in solr's config anymore; removed dependency on Image::Magick; fixed error exceeding the max string length in solr; the form name will now be used when no TopicType field is present to construct the TopicType facet; fixed support for ALLOWWEBVIEW = *
13 Aug 2018: new alphabetical navigation for wiki users; fixed searching for summary; replaced jquery.scrollto with native scroll api; make number of items suggested configurable in jquery.autosuggest drop-down box
07 Jun 2018: new index fields author_title, createauthor_title, title_first_letter; added support indexing arbitrary meta data; added support for ListyPlugin; added toggle "exact search" to search interface; depending on new TopicTitlePlugin now; fixed keyboard interaction of autosuggest box; fixed sorting facet values by title; much improved relavancy sorting
09 Jan 2018: added support for jquery.i18n; improved solr schema for better findability; fixed solr sidebar in subwebs
18 Sep 2017: replacing text_substring with text_prefix and text_suffix to improve substring matching; truncate document values larger than 32k to prevent solr from crashing; use flexbox for people search interface; fixed creating urls to ImagePlugin rest interface to generate thumbnail previews
23 Jan 2017: converted WebServices::Solr to Moo; fixed documentation for iwatch realtime indexing; documentation of SOLRSCRIPTURL macro; using jquery.i18n for javascript translations now; new facet filter to search in facet values; improved indexing of user profile pages and their thumbnail image; indexing image geometry now; improved jquery.autosugest widget; improved ToggleFacetWidget; improved boosting of query ingrediences; mapping all office documents to a combined attachment type (document, presentation, spreadsheet, chart, …); better support for plenv in system services and cron jobs
18 Oct 2015: fixed backwards compatibility with pre-unicode Foswiki; bring back solr::queryfields in SolrSearchBaseTemplate; fixed language facet to properly match language tags to their name; improved layout of search results as well as autosuggestion widget; removed workflow facet from default search; fixed icon mapping for topics that don't come with an icon defined in their TopicType; don't try to encode html entities without a code point in utf8; don't remove all macros from topic text, just some; removed dependency on MimeIconsPlugin as we are using fontawesome now; improved formula for sorting results by reference; fixed sorting in ajax-solr; fixed exposing/hiding parameters in ajax-solr; improved findability of content; i.e. when containing stop words only in the title; removed unused /browse search handler from solr config
01 Oct 2015: improve default layout of search results; moved unsafe inline-javascript into a js file of its own
21 Sep 2015: cache stringified attachments using Cache::FileCache now and added api to purge/clear cache regularly; removed IndexExtensions config parameter to let the stringifier decide on supported file formats; added support for Foswiki:Extensions/LikePlugin boosting search results by social preferences
17 Jul 2015: added support for Foswiki-2.0 ; indexing workflow and state facets supporting Foswiki:Extensions/WorkflowPlugin; added author_url to solr schema; added google image and video mime types mapping them to "image" while indexing
27 Feb 2015: upgraded to solr-5.0.0
29 Sep 2014: moved to jsrender for templating, replacing the deprecated jquery.tmpl
29 Aug 2014: fix mailto links in WikiUsers view template; fully specify rest security; fixed creating of working area for timestamps db; improved indexing of list values; fixed encoding error in SOLRSEARCH/FORMAT; use SOLR_EXTRAFILTER preference setting in auto-suggest widget as well; fixed applying strings and defaults in solrDictionary class; fixed applying extra-filters in SolrSearch; harvest facet headings for translations;
28 May 2014: implemented new ACL style compatible with Foswiki >= 1.2
14 Jul 2013: added support for PiwikPlugin
14 Mar 2013: improved indexing performance; added configurable http timeouts takling to the solr backend; fixed language mappings for multilingual content; fixes due to latest changes in jquery.moment
17 Oct 2011: fixed WebServices::Solr to only encode to utf8 if needed; fixed handling character encoding on a pure utf8 foswiki; fixed schema for spell correction
29 Sep 2011: improved schema.xml: replaced StandardTokenizer with WhitespaceTokenizer, using new ClassicTokenizer and ClassicFilter to feed the spellchecker, switched spellchecker to JaroWinklerDistance and lowered the frequency threshold for a term to be added to the spellchecker; building the spellchecker when optimizing the index now; fixed detecting the content language
28 Sep 2011: added multilanguage support per document; fixed default values in %SOLRSIMILAR; speeding up indexing by better caching ACLs; implemented mapping facet values to any other label; during query time; added Language facet to default search interface
26 Sep 2011: improved default boosting in dismax to prefer topic hits a lot stronger than attachments; improved default cache settings for better default performace; added support to distribute updates and search in a master-slave setup; added boostquery, queryfields, phrasefields parameter to customize boosting and sorting; improved default schema while documenting it
21 Sep 2011: upgrading to solr-3.4.0; fixed utf8 handling; added jump and i-feel-lucky options; made hidesingle configurable per facet; added disjunctivefacets and combinedfacets; fixed handling of date fields; support new ui::autocomplete in JQueryPlugin; using type-specific icons in Foswiki:Extensions/MimeIconPlugin if installed; fixed quoting lucene queries; indexing outgoing links to support fast backlinks; adding fields createauthor, language and collection to schema; disabling phonetic boost in schema by default; be more robust in case of mallformed DataForm definitions; copying every string field into a search field also to allow exact as well as fuzzy search; enhancing normalizeWebTopicName to create uniform web names using dots, not slashes everywhere; fixed parsing inline topic permissions; externalized sidebar pager into a new plugin of its own: Foswiki:Extensions/JQSerialPagerContrib; upgrading to WebService::Solr-0.14 … which now requires CPAN:XML::Easy instead of CPAN:XML::Generator; lots of improvements to SolrSearchBaseTemplate; now supporting Foswiki:Extensions/InfiniteScrollContrib in SolrSearch; documentation improvements
19 Apr 2011: shipping a multicore setup by default; added support for Foswiki:Extensions/VirtualHostingContrib; fixed utf8 recoding; some usability improvements to faceted search interface; fixing illegal control characters in output (Oliver Schaub)
16 Dec 2010: added state field to schema used for approval workflows; added solrjob to ease cronjobbing indexing; added docu how to use iwatch for almost-realtime indexing; fixed dependencies to include Foswiki:Extensions/FilterPlugin as well; fixed mapping facet values to their display title in search interface; fixed delta updates not properly removing outdated attachment entries when these where moved/renamed; and some minor html improvements
03 Dec 2010: fixed solr-based WebChanges and SiteChanges using PatternSkin
01 Dec 2010: adjustments due to changes in stringifier api; fixed removal of deleted webs from search index
22 Nov 2010: fixes integration with pattern skin
18 Nov 2010: initial public release
This site is powered by FoswikiCopyright © by the contributing authors. All material on this site is the property of the contributing authors.
Ideas, requests, problems regarding arbeitsgruppe.ch? Send feedback