Hello, my dear readers and welcome back. Today we release number 5 of the series and let me tell you something: I like the number 5. Why?
Maybe because I was born on a 5.
Or maybe because in some school grading systems 5 is the maximum grade. Also in the ones where 10 is the maximum grade, 5 means you passed.
Or maybe it is just because I used to listen to Lou Bega and his Mambo No. 5 in my teenage years.
Whatever the reason might be, let’s make this post a great one.
Last time we talked about performance improvements with special emphasis on caching, audit logging and pipeline management.
Today we will finish the performance topics because, well, nobody likes stuff half-done.
Performance considerations (Cont.)
Using transformers
I think we have established in the last post a strongpoint of transformers: they are useful when doing service caching.
If you want to catch up with this topic go ahead and read the Service Caching section from part 4, available here. (hint: do not skip the comments)
Pros for using transformers:
-
- easier pipeline management: only the data from the service signature will be passed to the service (not all the pipeline data at the invocation point)
- service caching: as explained above; however, there are other ways to achieve service caching
- multiple transformers / MAP step: if you want to call multiple services (or the same service multiple times) in a single step then using transformers is the way to go; in such cases transformers actually improve readability
- better performance: transformers scope down the data sent to the service; because fewer data gets passed around, the performance is better.
Cons for using transformers:
-
- diminished readability*: you have to drill in a MAP step to see what service is invoked and open the transformer to see the signature; not all information is available at first glance
- no implicit mapping: you have to map the inputs and the outputs explicitly
*The easy solution is to add a relevant comment to the MAP step so it is clear what the transformer does.
In regards to performance, I would question everything (as the blog tagline says: Question. Learn. Explore!).
If you have performance problems do some profiling and identify the bottleneck.
Do not replace direct invocations with transformers as a default. Do load tests in both cases. Review and analyze the results. Adjust accordingly.
As a personal opinion: Would I use transformers every time? Probably not!
I like to use them for what I feel is their purpose: transformation, i.e, invoking mapping services that take some input data and map it (with transformation if necessary) to some output data.
Using the Service Scope property
The Service Scope property is used to limit the pipeline data sent to invoked services.
The scope of the pipeline does not have to be reduced to the exact service signature, but it usually is.
I am personally rarely using this property. I use it when I cannot use a transformer.
Why I do not like it:
-
- diminished readability: it is easy to overlook the Scope property; afterward you find yourself with a scoped down pipeline data without knowing immediately why.
- development error-prone: do you feel that the Scope property is too close to the Comments. Cuz I do. I have written on several occasions (usually when in a hurry) the comments in the Scope field. Viceversa is possible as well.
How to choose between direct service invocations, transformers and using service scope?
Well, review your use case and see which style fits best at that time based on your requirements.
There is no absolute right, nor absolute wrong.
Usage of pub.storage services
The services from the folder pub.storage of the WmPublic package should not be used.
These services are used internally by IS to persist data in a short-term store in case of server restarts.
Actually, there is a recommendation from SAG in the webMethods Integration Server Built-In Services Reference that goes like this:
SAG does not recommend the usage of this functionality for the following cases:
-
- large volumes of data
- large data records
- to permanently store data
People tend to use this as an all-purpose database that can fit multiple use cases as it is rather flexible.
I know it might be tempting to use this functionality because it is well, right there, but I would advise not to.
You will probably find better options that fit your use case.
If you want more details on the locking mechanism that IS uses for this functionality, you can find them in the corresponding chapter of the above mention document.
Debug Log
The default debug log service (pub.flow:debugLog) should not be used in order to avoid performance problems.
And performance is just a side of the problem. You cannot develop a professional application and rely on the default logging provided.
Logging is an important aspect of any application and has to be designed in such a way as to provide maximum value with a minimal footprint.
You will probably want your logs separated per package, configurable and easily manageable.
Consulting these logs should be easy and should give you fast the information you want.
Let’s face it, you are not checking the logs for fun. You probably have a bug to investigate and navigating the logs should not add extra complexity or take too much time.
pub.flow:debugLog will write to the server log and I guess that is the last thing you want.
Timeouts
I personally like timeouts. Probably in the same way I like the number 5 🙂
They are always so helpful, letting you grab something from the fridge….
All jokes aside, if a service offers a timeout option, I say you use it!
Core reason: if the service is trying to connect to another system and has to wait for the default timeout, it will consume IS resources during this time.
So the next time you are invoking a service that has a timeout option (pub.client:ftp, pub.client.ftp:login, pub.client:http, etc.), put a value there!
Pay attention especially to the FTP cases as there the default is to wait forever and forever is a really long time.
But what value to set there? Well, it certainly depends on the application landscape and the number of systems you are integrating.
If you are integrating multiple systems in a chained fashion, the timeout settings should look like a funnel.
This means that the systems that are upstream of you have a bigger timeout and the ones downstream have a smaller timeout.
If this is not respected, then, in border scenarios, the upstream system will timeout before you are able to respond.
Combine map steps
Having extra or unnecessary MAP steps can affect performance.
I have mentioned some guidelines on how to structure MAP steps here (especially the Bring MAP steps together chapter).
Go with those guidelines and you should be OK.
You will want to find the best compromise between performance and readability.
Loop operations
For performance reasons, it is not recommended that you use the services pub.list:appendToDocumentList and pub.list:appendToStringList.
The performance degradation is noticeable when you are trying to append a large number of records (1000+) and, of course, it depends on the resources you allocate to the JVM.
You will fill no pain in case of small lists.
The problem comes from the way these services are implemented. There is a lot of array allocation going on there.
To not duplicate information, I will point you to some very good TechCommunity posts on this topic: this one and this one and, also, this one. 🙂
So, what to do in this case?
Well, if the size of the list will not increase very much you can continue to use the appendTo… services.
If not, you can contemplate the following options:
-
- check the PSUtilities package if there is a service that matches your needs
- do not add records to a list in a loop; use the Output array property of the LOOP construct; use this if the output array has the same size as the input array
- implement your own Java service that stores the records in the collection of your choice (ArrayList, LinkedList, etc..)
In the first link from the TechCommunity forum, Percio describes other options as well and their performance comparison.
Of course, nothing is more relevant in these cases than rigorously testing (both functional and performance).
Create and run your tests and they will speak the truth.
Copy by value vs. Copy by reference
So much is written (at least in the Java world) on this topic. The StackOverflow site is filled with questions on when variables are passed by value and when by reference.
So how does this hot topic look for the Integration Server?
Definition
When the source or target variable is a string, IS will copy the value of the source variable in the target variable => copy by value.
For other variable types (including document types), IS will create a reference to the source variable and set the value of the target variable to this reference => copy by reference.
The copy by value definition has an exception: When the source is a string and the target is an object a copy by reference will be performed.
Although not mentioned in the docs (I did not find it), a simple test shows that primitive values (i.e: objects with a defined Java wrapper type) are copied by value as well.
Other things to know
Copying by reference offers a superior performance compared to the copy by value in terms of memory usage and execution time. That is the main reason this topic appears in the performance guidelines.
Copying by reference has a logical side effect but nonetheless a side effect that you should take into consideration. Because the target variable holds a reference to the source variable, subsequent changes to the source variable will affect the target.
This can be easily tested with a 3 MAP steps Flow service:
Prerequisite: have 2 documents with the same structure in the pipeline (for easily testing, the document might contain just a string)
-
- Set the value of the source document content to “value1”
- Link the source document to the target document
- Modify the value of the source document content to “value2”
Aftermath: if you run the service you will see that the target document content is now also “value2”
What can you do with this info
Well, you cannot change the way the copying is done, that’s for sure.
But you can change the way you define your service signatures.
Let’s take an example. Let’s assume you have a service that takes the address details of a client.
How would one define the input of the service? Well, there are (at least) 2 options:
Option 1: Define every field of the address (street, number, flat number, district, etc..) as a String and add it to the service input
Option 2: Define a document type, put the address fields in it and add the document type to the service input
The number of the address fields might be big (maybe 10, maybe more). Copying all of these fields by value is less performant than copying a document type by reference.
Therefore, the best option would be Option 2 (for both performance and readability).
Now, where do you draw the line between using just strings and having a wrapper document type for them? What is the maximum number of strings that a service should expose?
Other people might have other answers, but mine is: 3. (You thought it will be 5, didn’t you?)
In my opinion, having up to 3 string fields in the service input is acceptable. Anything more and it is better to have a wrapper document type for them.
Unused services
Delete the services that are not used.
You will gain 2 benefits from this:
-
- No memory will be unnecessarily allocated in the JVM => performance improvement
- Other developers will not lose time analyzing why these services still exist => maintenance improvement
Well, there you go. Number 5 is done. We are close to wrapping up this wonderful journey that we had so far.
I was planning to include the security part in this post. But after talking and talking and talking some more about performance I found the post growing too big.
Therefore, the security guidelines will be part of the next (and last) post.
Meet me next time for the Coding Guidelines parting post.
Until then, as always,
Happy exploring,
Tury
PS: Are you tired of not knowing when my next post is going to come out?
No worries! Hit the Sign Me Up! button and you will receive my posts as soon as they are published.
Do you think my ideas would benefit someone you know? Share it! Spread the word! Sharing is caring 🙂
Hi,
As for MAP transformers, I tend to create some structure-building services which will fall under these categories:
– encapsulate known (logical) structures (reduce manual mapping)
– have a potential for reuse (will be used in another similar transformation)
In the first case I some times have a need to create a reverse transformation, so I end up by storing these two services in the same place which helps find them in a latter opportunity (so they tend to be reused often).
Best regards
“better performance: transformers scope down the data sent to the service; because fewer data gets passed around, the performance is better.”
As you note a few lines after this, measure to determine. In my mind, the difference between direct invoke and via transfer has got to be so minisule as to be a non-factor. For me, the use, or non-use, of a transformer is for reasons other than performance.
My rule of thumb: if the map step will map multiple items and use one or more transformers, use transformers. If you create a map step and the ONLY thing it does is use one transformer, do direct invoke instead for readability.
“Having extra or unnecessary MAP steps can affect performance.”
Have you measured this? As above, my impression is that any such performance hit is negligible. Combine for readability/maintainability, not performance.
“For performance reasons, it is not recommended that you use the services pub.list:appendToDocumentList and pub.list:appendToStringList.
The performance degradation is noticeable when you are trying to append a large number of records (1000+) and, of course, it depends on the resources you allocate to the JVM.”
The number of records at which performance drops off varies. These days, 1000+ really isn’t a problem at all (the JVMs have gotten really good at array realloc). Might I offer that instead of recommending these be avoided, point out the performance aspects to be considered and evaluated. For many scenarios, those 2 services are just fine — indeed, I use them more than not. As stated in the thread you mentioned: “Do *not* assume one approach will be faster than another. Measure.” 🙂
Nice set of articles!
Hi Rob,
Thank you for the comment and for your ideas.
I agree with you. Measurements are essential to see what solution works best for every integration scenario.
The tests that I have done were not “formal”, but offered some answers and validations:
For the unnecessary MAP steps topic
– the performance was better by between 30% and 50% when deleting unnecessary MAP steps (11 unnecessary MAP steps)
– although the percentage is high, this only translates in a response time faster by 1-3 ms (so, not that much)
– when having a huge data model that is passed through the unnecessary MAP steps, the performance is worse
– when only strings or small doc types are passed through the unnecessary MAP steps, the performance is the same in all scenarios
As I said in one of the posts, the desire is to find the best compromise between performance and readability, and certainly useless MAP steps affect readability more than performance.
For the transformer performance topic
– the tests that I did here showed little improvement (in my test setup)
– I believe that there might be scenarios in which this justifies itself, but giving up on readability in preparation for such a scenario is not a good idea
– I agree with your rule of thumb as it goes along the lines of what I said in part 5 in the Using Transformers PRO/CON section
For the appendTo… topic
– Here I did not actually test. The Tech forum and the SoftwareAG docs speak of the design of these services as not being tailored at working with a huge number of records (the definition of huge changed in the last years)
– I agree that this services can be used as far as they do not degrade the performance. In most cases, using them is the most convenient solution.
– The idea is that we should be aware of the restrictions that come with this services. They might be a fit for one integration but yield poor performance for another. Highly dependent on the setup and integration scenario
– 100% agree with your “Do not assume…Measure!” statement. It speaks the truth and it is catchy!