We have been working hard over the last few weeks to improve our filtering engine, in both efficiency and with new features.
Firstly, lets cover the differences between the last iteration of the
Changes to CONTAINS operator
We have a new more efficient way of searching for keywords and/or phrases that now replaces the old implementations of CONTAINS, CONTAINS_WORD & CONTAINS_PHRASE operators, and has merged them into a single CONTAINS operator (CONTAINS_WORD & CONTAINS_PHRASE are retained for backwards compatibility) This change should not affect the behaviour of most rules, and in some cases improve the expected results, as CONTAINS now matches whole words and not subsections of words (i.e. the Scunthorpe problem). If however you were using CONTAINS to explicty search for subsections of words, we now provide the SUBSTR operator that retains this old behaviour of CONTAINS.
Changes to text operator arguments
The other change to the existing CSDL is how we handle text arguments (i.e. “quoted” text) that are used for the operators that work on text Fields. We have had added escape sequences to obtain certain characters , \ : “ <newline> <carrage_return> <tab> within an operator’s argument. This means that the CSDL compiler will no longer accept a single \ to use \ in your search it will need to be escaped like this: \\
For most of the text based operators, this change will not affect any existing rules. However for the Regular Expression based operators REGEX_PARTAIL & REGEX_EXACT there is a high probability that changes will need to be made to their arguments due to the increased likelihood of the \ character being present.
We have made this change to enable users to search for terms containing some of the control characters that are present within the new operators that we have added. See here for mappings
Now, time to look at all the shiny new bits
Introducing the ANY operator
Now that our alpha users have had time to play about with the CDSL, they have started to create streams that search for ever increasing numbers of terms, from @users & brands to an exceedingly long rule searching for rude words that we didn’t even know existed. One thing all of these rules have in common is that they are all very long chains of interaction.content contains “term” connected by OR’s and are rather cumbersome to use. Like all good developer’s our team like to do as little typing as possible when they can get away with it, and thus we came up with the ANY operator. This allows you to specify a comma separated list of terms to search for (using the new CONTAINS implementation) that will return true as long as at least 1 of the items in its argument matches the target.
For example, searching for phone manufacturers used to be written like this:
interaction.content CONTAINS “HTC” OR interaction.content CONTAINS “Nokia” OR interaction.content CONTAINS “RIM” OR interaction.content CONTAINS “Apple” OR interaction.content CONTAINS “Samsung” OR interaction.content CONTAINS “Sony”
Can now be shortened to this:
interaction.content ANY “HTC,Nokia,RIM,Apple,Samsung,Sony”
Introducing the NEAR operator
This is another operator that was born out of our user’s feedback. When searching for multiple terms that all have to be present for a match to be successful, it sometimes helps if all of these terms are close to each other. Particularly if they are processing a stream of blog posts, which can have several thousand words each interaction.
By using the NEAR operator you can specify two or more words that have to be present, as well as the maximum number of words that they can be apart from each other.
interaction.content NEAR “fish,chips:1”
Will match “fish chips” “fish and chips” “fish n chips” “fish & chips”