Real time = real problem: it’s up to developers to make the choice – on
Ben Forta
Most Web-based applications operate in real time. Add an article to a database and it shows up immediately on content pages. Update a user address and the new contact information is available immediately. Add or remove an employee and the phone directory is correct when next viewed. Real-time data in a real-time world. That’s a good thing, isn’t it?
Real Time Is Real Expensive
In an ideal world, real-time everything would indeed be a good thing. But we don’t live in an ideal world. As appealing as always being up-to-date is, real time comes with a real cost:
* Performance: To put it quite simply, nothing eats up performance as much as real-time processing. The more dynamic your application, the worse it will perform. After all, static pages don’t suffer from performance problems, ever.
* Scalability: An extension of the above, the more real-time processing your application performs, the less it will scale. If you make your applications work harder they’ll just be able to do less concurrently, there’s no way around that one.
If you were to analyze all the timings and debug output from your ColdFusion applications, you’d undoubtedly find that more time is spent processing tags than anything else (possibly even everything else combined). In other words, eliminate all your tags and your application will fly. And while avoiding databases is not at all practical (or even advisable), understanding the price of real-time data interaction is important.
Of course, database access and are not the only culprits; you likely use Web services and CFX tags and calls to COM and Java and more, and all of those can impact performance too. But databases are a good place to start because they’re so prevalent and because making changes (where appropriate) is actually not that difficult.
Is Real Time Really Necessary?
Obviously, some applications (or parts thereof) must be real time. Could you imagine eBay auctions if bids were updated only periodically? How do you think users would react to Amazon.com listing products as in stock only to send out an “oops” e-mail after checkout? What would users do if they changed their AOL or Yahoo or MSN passwords only to find that the password change took some unknown length of time to take effect? All of these sites utilize realtime processing to ensure the best customer experience–some operations simply must occur in real time.
And that’s key–some operations, but not all. If you were to place an online classified listing on any of the major classified sites (including Yahoo) you’d find that your ad did not appear instantaneously. Rather, listings are updated with current information at regular intervals. Similarly, online user directories obtain new data on a regular basis, but online listings sometimes take months to be updated.
So why are some operations performed in real time and others not? Simply because there is a tradeoff to be made–the more data is real time, the greater the hit on performance and scalability. As such, developers of applications have to choose between the two, and for many, the choice is to not perform real-time processing unless it is absolutely required.
But most ColdFusion developers don’t make the choice at all. ColdFusion makes implementing real-time processing so easy (much easier than implementing anything non-real time) that they go the real-time route by default. That’s a real problem.
Reducing Database Reads
I’m not going to be able to cover every real-time scenario in this column, but I would like to point out some ideas you should think about as a starting point. The simplest (and remarkably effective) change you can make to your application involves the reading of data from database tables. is a very powerful tag; it lets you access all sorts of databases easily, maybe too easily. And so developers tend to overuse , often rereading data that likely has not changed (or has changed with changes that need not be utilized immediately).
I covered reducing database access via caching extensively in a column entitled “Caching in on Performance” (CFDJ, Vol. 1, issue 2). As explained there:
Where would you use caching within your applications? Here are some examples:
* Almost every form that prompts for an address displays a list of states. Those states should never be hard coded (even though there’s no 51st state scheduled to join the U.S. at this time); instead, state lists should be populated by a query against a states table. But as that states list doesn’t change often (it’s 40 years since Hawaii came on board), reading it from the database every time it’s needed is a waste of database resources. The states list is thus a primary candidate for caching.
* Employee lists are another good example. While it’s true that employee lists can change frequently, it’s doubtful that they change so often that they have to be read from the database each time they’re needed (if they do, do yourself a favor: find a new employer, and quickly). Caching employee lists for a few hours will reduce database activity, and the only penalty is that personnel changes won’t be immediately reflected in your lists.
Even though frequently retrieved data is likely cached by the database server itself, retrieving the data again is obviously more resource intensive than not requesting it at all. Furthermore, as ColdFusion usually isn’t running on the same box as the database server, eliminating unnecessary database requests can also reduce network traffic between the two machines, which in turn further eliminates potential performance bottlenecks.
ColdFusion provides two different ways to cache database reads:
* Query-based caching using CACHED-WITHIN.
* Variable-based caching in which queries are stored in persistent scopes.
I am not going to explain these here; refer to the previously mentioned column to learn more.
Reducing Dynamic Processing
Beyond database access, you likely have entire blocks of your application that are being generated programmatically in real time, but that perhaps need not be. For example, the above mentioned employee list. Not hitting the database unnecessarily is a great first step, but you also loop through the results creating output and embedding formatting. Does that really need to occur on each and every page request?
ColdFusion provides several options that may be used to reduce dynamic processing, and you may use any or all of them (or roll your own). In order of granularity:
* can be used to save the results of any processing to a variable, perhaps a variable in a persistent scope. This allows developers to mark blocks of code that are executed after timeouts or on specified intervals. Using it is possible to cache Web services results, file reads, returned parsed XML, and much more.
* can be used to cache entire pages so that the generated page output is saved and served up on future requests until a specified timeout. Using an entire page can be returned in what is effectively a and a few other CFML statements.
* It is also possible to save generated CFM pages as static HTML files so that no application server processing need occur at all at runtime.
Of course, each of these options requires that you give something up; if you serve cached content you are serving old (not real-time) content. But depending on what your app is, that may be entirely acceptable. And if so, all you have to lose are performance and scalability problems.
Using Delayed or Batch Processing
One of the most important (and least trivial) concepts in the real-time discussion is the use of delayed or batch processing. The best way to understand the idea is via examples. So:
* You need to import data from a text file into your database. You know that reading the file and then inserting (or updating) each row using a within a is slow and highly error prone. So you use a batch upload utility (like SQL Server’s bcp) to dump all the data into a new empty table. A scheduled event on the database server checks for data in this table, and if any is present, fires off a stored procedure that reads each row, validates it, breaks it up into the appropriate relational components, and then performs the database INSERT or UPDATE operations as needed.
* You want contact information in your database tables to be clean and consistent–all states are two-letter abbreviations in upper case (with no trailing period), all names are first letter capped, titles like Mr. and Mrs. must have a trailing period, phone numbers must be formatted in a specific way, leading and trailing spaces on all fields must be removed, and so on. Initially you did all that cleanup in CFML before the SQL INSERT, but then you realized that in doing so you were not only hurting performance, you also were not cleaning up any data that did not originate in a ColdFusion application. And so you change the app so that data is written as is, and whenever an INSERT or UPDATE occurs, you set a row level flag named DIRTY to true. You then create a database scheduled event that runs once a day and performs all the cleanup for any rows where DIRTY is true, and then upon completion sets DIRTY to false to flag rows as clean.
* Your e-commerce site allows users to pay by credit card. You know that most credit card transactions fail because of poor data entry (bad number or expiration date, for example) and so you do basic error checking to ensure that the credit card number is valid post data entry. But you do not actually submit the credit card information for approval while the user waits. Rather, using the assumption that most credit card transactions do not fail (especially from known repeat customers) you thank the customer for the order and place the credit card transaction in a queue. You’ll notify the customer via e-mail only if there is a problem. This ensures that the application is responsive. It can also prevent double billing (which could occur if a customer were to submit a form twice), and creates a better user experience. (FYI, you may be surprised to know that some of the largest e-commerce sites on the Net do just this.)
In all of these examples, some processing is postponed and/or batched. The result? Not only are the applications faster and more scalable, but the developers also have greater control over exactly what operations occur and when.
Conclusion
Real time has become the norm by default, not by necessity. And real time causes real problems. Database caching, dynamic output caching, and delayed or batch processing are all concepts that can (and should) be leveraged so as to improve application performance and scalability. The truth is, there is no right or wrong here–everything is a tradeoff. Not all options will always be usable (you’d not want to use delayed batched credit card processing, for example, if you are selling access to a paid Web site). As a developer you get to make the real-time versus non-real-time choice. The important thing is that you actually make the choice. And I think you’ll find that most parts of most applications actually need not function in real time at all.
Ben Forta is Macromedia’s senior product evangelist and the author of numerous books, including ColdFusion MX Web Application Construction Kit and its sequel, Advanced ColdFusion MX Application Development, and is the series editor for the new “Reality ColdFusion” series. For more information visit www.forta.com.
ben@forta.com
COPYRIGHT 2003 Sys-Con Publications, Inc.
COPYRIGHT 2003 Gale Group