DEV Community

Qing
Qing

Posted on

Data Types(4)

Bit String Types

Bit strings are strings of 1's and 0's. They can be used to store bit masks.

openGauss supports two bit string types: bit(n) and bit varying(n), in which n is a positive integer.

The data of the bit type must match the length n exactly. An error will be reported if shorter or longer bit strings are stored. The data of the bit varying type is of variable length up to the maximum length n. Longer strings will be rejected. Writing bit without a length is equivalent to bit(1), while bit varying without a length limit means unlimited length.

Image description

Image description

Text Search Types

openGauss offers two data types that are designed to support full text search. The tsvector type represents a document in a form optimized for text search. The tsquery type similarly represents a text query.

·tsvector

The tsvector type represents a retrieval unit, usually a textual column within a row of a database table, or a combination of such columns. A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word. Sorting and deduplication are done automatically during input. The to_tsvector function is used to parse and normalize a document string.

A tsvector value is a sorted list of distinct lexemes, which are words that have been formatted different entries. During segmentation, tsvector automatically performs duplicate-elimination to the entries for input in a certain order. Example:

Image description

It can be seen from the preceding example that tsvector segments a string by spaces, and segmented lexemes are sorted based on their length and alphabetical order. To represent lexemes containing whitespaces or punctuations, surround them with quotation marks:

Image description

Use double dollar signs ($$) to mark entries containing single quotation marks ('').

Image description

Optionally, integer positions can be attached to lexemes:

Image description

A position normally indicates the source word's location in the document. Positional information can be used for proximity ranking. Position values range from 1 to 16383. The maximum value is 16383. Duplicate positions for the same lexeme are discarded.

Lexemes that have positions can further be labeled with a weight, which can be A, B, C, or D. D is the default and therefore is not shown in output.

Image description

Weights are typically used to reflect the document structure, for example, by marking title words differently from body words. Text search ranking functions can assign different priorities to the different weight markers.

The following example is the standard usage of the tsvector type. Example:

Image description

For most English-text-searching applications, the above words would be considered non-normalized, which should usually be passed through to_tsvector to normalize the words appropriately for searching:

Image description

·tsquery

The tsquery type represents a retrieval condition. A tsquery value stores lexemes that are to be searched for, and combines them honoring the Boolean operators & (AND), | (OR), and ! (NOT). Parentheses can be used to enforce grouping of the operators. The to_tsquery and plainto_tsquery functions will normalize lexemes before the lexemes are converted to the tsquery type.

Image description

In the absence of parentheses, ! (NOT) binds most tightly, and & (AND) binds more tightly than | (OR).

Lexemes in a tsquery can be labeled with one or more weight letters, which restrict them to match only tsvector lexemes with matching weights:

Image description

Also, lexemes in a tsquery can be labeled with asterisks (*) to specify prefix matching:

Image description

This query will match any word in a tsvector that begins with “super”.

Note that prefixes are first processed by text search configurations, which means that the following example returns true:

Image description

This is because postgres gets stemmed to postgr.

Image description

It then matches postgraduate.

'Fat:ab & Cats' is normalized to the tsquery type as follows:

Image description

Top comments (0)