Pages

Monday 16 June 2014

SQL Server: SPOTLIST Optimization Tool

A stoplist contains a list of words that are ignored in full-text searches. These words are also known as noise words. When a full-text index is created, DBA/Developer can include the WITH STOPLIST clause to specify the stoplist that should be used for the full-text index. These stopwords are not included in the full-text index. By default, SQL Server uses a system stoplist that contains meaningless words like articles and prepositions. However, DBA/Developer can create DBA/Developer own stoplist and add words to it. For example, the following statements create a new stoplist, named CustomStoplist, from the system stoplist and add the word unknown to the stoplist:

CREATE FULLTEXT STOPLIST CustomStoplist
FROM SYSTEM STOPLIST;
ALTER FULLTEXT STOPLIST CustomStoplist ADD 'unknown';

After creating the stoplist, DBA/Developer specifies the stoplist when creating the full-text index as follows:

CREATE FULLTEXT INDEX ON dba.Product(Details)
KEY INDEX
PK_Product_ProductID WITH STOPLIST = CustomStoplist;

Previous versions of SQL Server used noise-word files instead of stoplists. Any modifications to the default noise-word file will not be included in the system stoplist when the database is upgraded to SQL Server 2008. DBA/Developer must manually modify the system stoplist after upgrading to SQL Server 2008.DBA/Developer should not use a full-text search filter. Full-text search filters are used when performing full-text searches on varbinary, varbinary(max), image, and xml data type columns. The filter processes the document to obtain the text information that should be used in the search. DBA/Developer should not use a thesaurus. Thesaurus files identify words with similar meanings. Thesaurus files are used with queries that use the FREETEXT predicate and the FREETEXTTABLE function, and with queries that use the CONTAINS predicate and the CONTAINSTABLE function with the FORMS OF THESAURUS clause. A global thesaurus file, named tsGlobal.xml, and a thesaurus file for each language are located in the SQL_Server_install_path\Microsoft SQL
Server\MSSQL10.MSSQLSERVER\MSSQL\FTDATA\ folder. These default thesaurus files have XML elements commented out initially, but DBA/Developer can modify them as required to add or remove synonyms. DBA/Developer should not use a dictionary. A dictionary is not a valid full-text index component. SQL Server does use a thesaurus file to identify synonyms, but does not use a dictionary.

No comments:

Post a Comment