Migrating to Indexed Task Search

Home / Task Engine / Migrating to Indexed Task Search

Hello everybody.

Some time ago I was faced with the challenging task to migrate a task application from Standard Task Search (STS) to Indexed Task Search (ITS).  And I have to say: challenging it was. It was the proper exploring material.

There are some pitfalls when migrating to ITS. Hopefully, after reading this article you will recognize and address them in due time.

I will underline the problems that I faced and the solutions I have found for them. The migration’s difficulty will highly depend on the level of customizations made in the task application you are trying to migrate.

But let’s see first the technical differences between Standard Task Search (STS) and Indexed Task Search (ITS).

 

Difference between STS and ITS

 

First of all, my recommendation when developing a Task Application (in case you develop it from scratch) is to use the ITS especially if the number of fields you are going to search for is low.

This type of search is more performant than the standard one and if you are building the task application from the ground up there is no reason not to use ITS.

Task Engine has got several performance enhancements in the meantime and a new “face”, as stated by Mark Imel here. This, however, will be the subject of another post, so we will not dive into it now.

Although the recommendation is to use ITS, most of us have been developing task applications before webMethods 8.2 when the ITS was introduced. Back then the only option was to use the STS, so let’s see how STS works.

 

STS  technical overview

 

What is important to note is that the Task Engine is storing all the business data of the tasks in the T_TASK_DATA database table.

The structure of the database table is the following:

The TASK_ID is the primary key of the table and TASK_DATA is a binary large object (BLOB) that holds the business data for the task.

This structure has the advantage that the search can be performed over all business data contained in all task instances.
The great disadvantage, however, is that in order to make the task data information available, the BLOB has to be deserialized.
The duration of this operation might not be a problem if the inbox contains few tasks, but it certainly becomes an issue if the number of task instances is large.

If you are still using a webMethods version pre-8.2 a solution to the restrictions above (albeit it is more a workaround) is to store the value of the often searched fields also in the TaskInfo fields.

The values from the TaskInfo fields are stored in the T_TASK database table and searching after them is quite fast as there is no deserializing involved.

So, if you know that a lot of searches are made after a certain field (ex: order number), you can save the value of the field (besides in the business data) also in some TaskInfo field (let’s say TASK_DESCRIPTION or CUSTOM_TASK_ID). This solution will do the job, however, it has two issues:

  • It requires custom implementation so that when the field value is changed, the change is done in both TaskData and TaskInfo.
  • It perverts the meaning of the task info fields. In this specific example, the CUSTOM_TASK_ID will lose its original meaning and will store something totally unrelated.

From the Integration Server, the following service can be used to do an STS: pub.task.taskclient:searchTasks.

 

ITS technical overview

 

If the task application is configured to use an indexed search provider then the storage schema of the business data changes.
The business data fields will continue to be stored in the BLOB, as for STS. Additionally, indexed business data fields are stored in a database table in the run-time environment.

For each task type that is published to the My webMethods Server, there is one database table, in spite of the fact that this task types are from the same process or operate on the same business data.

When a task type with indexed business data is published for the first time, the Task Engine dynamically creates the database table with one column for each indexed field in the task type.
The indexed fields mentioned above are the fields that appear on the Search page of the task application (obligatory) or on the Results page (optional).

The indexed field table has the following naming convention: T_TASK_I_<ID>, where <ID> is an identifier that actually contains a portion of the task type id.

The structure of the T_TASK_I_<ID> database table is the following (for a task that has 3 index fields):

Several aspects are important to be noted here:

  • The columns of the table do not have business relevant names. The mapping between the column names and the business fields represented by these columns is available in the Task Engine Administration tab of the MWS Admin Console.
  • All the columns have indexes on them for fast search
  • The Task Engine maintains both database tables (T_TASK_DATA and T_TASK_I_<ID>) and updates them dynamically as new tasks are queued or existing tasks are deleted from the system.

The great advantage of ITS is the performance. Because the search does not need to deserialize a BLOB, but just does a query/join on a rather simple database table with indexes set for every column, the time it needs to execute will be small (even if the task inbox contains a large number of tasks).

A downside to this approach is that the indexed fields should be carefully chosen because the search will work only for these fields and will ignore the rest.
Obviously, an indexed search can return only the business data fields which are indexed.
If a field is not indexed, a search for that field will return no results.

Since the business data is kept in 2 places (T_TASK_DATA table and T_TASK_I_<ID>), you can use one of the following Integration Server services:

  • pub.task.taskclient:searchTasksIndexed – specially designed to search after indexed tasks.
  • pub.task.taskclient:searchTasks – service used also for the STS.

Both of them will return the correct results, but the first service will do it faster.

 

Migration to ITS

 

This part will not cover the basic actions that are needed to migrate from STS to ITS. These actions (which include the actual indexing, changes to the search portlet and search results managed beans, results table bindings, etc.) are covered in the webMethods BPM Task Development Help resource and can be checked in chapter 11 (for version 8.2) or chapter 10 (for version 9.8).

 

Max Results

 

For STS it is possible to configure the properties of a task type inbox to specify a limit to the number of tasks returned in the search results or to select the No Maximum checkbox, which basically returns all tasks that match the search criteria.

When the task type associated with the task type inbox is configured with an indexed search provider, clearing or selecting the No Maximum checkbox has no effect; all search results are always returned.

This means that for ITS the max-results notion loses it meaning. It is therefore replaced by the paging size notion. The total number of tasks available is counted and then all searches are executed with the paging limit set for the results table.

The default paging size of the search provider is 50. Ways to change this value are (among others):

  • hard-code another value in the search provider bindings
  • at runtime based on the results table configuration

An example of how this works can be found below:

Task Type Inbox A has 45 tasks available and results table is set to display 20 tasks per page.

When the first search is executed there are 2 operations called (count and fetch):

  • count will return the number of tasks that match the search criteria
  • fetch will return a sorted page of results

and, in this case, the first 20 tasks are read and displayed in the results table.

When page 2 of the results table is selected, the second set of 20 tasks are read and displayed.

When page 3 of the results table is selected, the remaining tasks are read and displayed, meaning the last 5 tasks.

In order to optimize the search, the search provider paging size value should be updated at runtime with the results table page size.

 

 

Re-indexing

 

If any changes have been made to the indexed business data (adding/removing a field or changing the type of a field) the Task Engine will:

  • detect this automatically when the task application is published to the MWS.
  • drop the existing index table for the task type (and all its data).
  • create a new (empty) table.

After this operation the re-indexing of tasks is obligatory. The re-indexing procedure runs in the background so MWS can be used while this procedure is ongoing.

However note that for the time period between publishing the modified task type and the completion of the re-indexing procedure, any searches run for this task type will return incomplete or no results.

 

Indexing “special” fields

 

When using ITS the indexing of the primitive type fields is supported. What is not supported is the indexing of lists. With ITS is possible to index only a specific element of the list (ex: 1st or 2nd or nth element of the list). So be aware of this restriction if you want to index lists.

A workaround for this problem is to store your concatenated list elements in a separate field and index that field. If you take this approach you have to be careful at the following:

  • always update the indexed field when the list is updated (i.e: items are changed, added or deleted).
  • use this workaround if the list does not have many elements and/or the elements are not large. Otherwise, you will run into index size restrictions (see Index size chapter below).

 

Index size

 

As far as I saw this restriction appears only for Microsoft SQL Server RDBMS. So, if you are using Oracle you might be out of the woods.

Basically, the restriction says that the size of an index field cannot be more than 900 bytes. Therefore is not recommended to index fields that by their nature tend to be large:

  • fields that contain the concatenation of other (potentially large) fields.
  • fields that contain comments, remarks or otherwise other fields for which you cannot establish/restrict the size.

 

Number of indexes

 

The Internet is filled with articles referring to the benefits and also to the disadvantages of using DB indexes. The main benefit is clear the faster search queries.

Among the disadvantages:

  • disc space: many indexes will use more disc space.
  • slower inserts/updates/deletes: as with the original data, also the index values have to be maintained; this will result in a decrease in performance for the operations that modify the data (INSERT/UPDATE/DELETE).

The classic recommendation is to use indexes for fields that are searched a lot, yet rarely updated.

In the context of webMethods ITS, where the developer does not have immediate access to the DB level, the recommendation is to index only the searchable fields. There is no benefit in indexing fields that are not on the search page.

 

Updating saved searches

 

If during the migration you changed the search provider properties, you will most probably need to change the saved searches as well. Otherwise, they will not work or improperly work after the migration.

There are multiple ways to update saved searches:

  • All at once: You can write a IS service that retrieves and updates all of the saved searches and that will be run once after the release. The main disadvantage here is that a bug in the service will corrupt a lot of saved searches.
  • One at a time: You can update a saved search when it is loaded by the user. This approach has a low risk, but, at the same time, there is no clear deadline until when all the saved searches will be migrated. Therefore your custom update code will remain for a long period of time before it can be safely removed.

 

Sorting

 

If you have a small application with rather few fields in the search results table then the migration from STS to ITS from the sorting point of view will be easy as it will mean only to modify the binding expression for the table column sort property.

 

from STS sort property:

 

to ITS sort property: 

 

The sort property changes also for the TaskInfo fields. For the list of new values, please check BPM_Task_Development_Help.

However, if your application is big, it might be the case that you have converters in the search results table and that the sorting is done after the converters are applied to the received data. Examples of such converters would be:

  • converters that transform an amount value from string (how is saved in the Task Data) to double (how it should be treated in the results list).
  • converters that return a readable, pretty name for a certain id based on a well-known mapping. For example, let’s assume that the requirement is to store the department information on the task. This can be done by saving the unique department id in the task and having a mapping from department id to department name and possibly other information in a different table. This way if the department information changes you will not need to update all the tasks to reflect the change, but only the mapping table. This kind of converters are very important as end users want to see and sort over department name and not over some internally defined department id’s that mean nothing to them.
  • converters that make some kind of translation on the input data (language translations or of another kind).

For such cases just changing the binding expression for the “sort” property will not work as expected. Why?
Because there is a big difference on how sorting is done in STS vs. ITS.

For tasks using STS, this sorting is done by the table control at the custom inbox application level.
This means that the search result table is responsible for sorting the data it receives as an outcome of running the search.

On the opposite side, when ITS is used, the sorting is done directly in the database; therefore, sorting on translated values is not possible out of the box.


Changing the sort property binding expression as mentioned above will result in sorting after the original value (department id) and not the translated value (department name).

So how to sort on the translated value (whatever this means according to the above examples)? A solution is the following:

  • in your custom subclass of TaskIndexedSearchContentProvider overwrite the refreshPage method.
  • in the refreshPage method:
    • retrieve the tasks as they were returned by the search.
    • sort them according to your custom sorting needs.
    • set the sorted tasks list back on the page.

 

By this moment you have probably identified the major restriction of this approach. It only sorts the tasks at the page level.
The sort will not work for all the tasks in the inbox, it will only sort the tasks for the current page.

Of course, this restriction applies only to the fields that have special behavior/store data differently (like in the examples above).

Unfortunately, I did not find a solution that resolves the above-described page based sorting problem. Please review/investigate this restriction before investing too much work in the migration.

 

Best practices

 

  • make sure that the DB  user configured for the MWS has DROP privileges.
  • check to see if the Task Engine and MWS fix levels have some framework incidents. If needed compare your fix levels with the next ones released by SAG.
  • the field types used in the task data should be consistent with the values they hold (i.e. don’t store numbers, booleans, etc. as strings in the task data).
  • if it is too late to follow the previous recommendation, then when defining the indexed fields you can still decide to index a field according to the values it holds. For example, even though a field is string in the task data, it can be indexed as Integer if you know that it holds just numeric values.
  • do not index large strings for 2 reasons:
    • you rarely search for fields that store user comments or remarks
    • you will most probably run into index size problems
  • do not index fields that correspond to unbounded text areas as you can never know how much text did the user store in those.
  • to evaluate that ITS is performing faster than STS do load tests before and after the migration.
  • Keep track of the task index time. In the Production environment which has a lot of tasks, this time is not to be ignored but planned appropriately in the release todo list.
  • Don’t forget to update the saved searches.

 

And finally, we reached the end of this long, long post. I hope you will find it helpful.


It was mostly targeted people still working with webMethods version 8.2, but people working with higher versions might get a benefit out of it as well.

If you have to take one thing from this post take the following: The fewer customizations and deviations from the standard you have in your task application the easier the migration to ITS will be.

Thanks for reading. If you have some questions or comments, please drop me a line below.

 

Recommended readings:

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: