Cover V13, i14

Article

dec2004.tar

Peer Deep with DTrace

Sun Microsystems, Inc.

Track, tune, and troubleshoot your systems in real time with Sun's new dynamic tracing framework, part of the SolarisTM 10 OS.

18.May.04 -- Imagine if any question you had about your systems could be magically answered -- instantly. Imagine how much easier it would be to find system bottlenecks or understand complicated performance issues. That's the dramatic effect dynamic tracing, or DTrace, a comprehensive dynamic tracing framework for the Solaris Operating System (Solaris OS), can have on your data center.

DTrace is one of several revolutionary new technologies found in the Solaris 10 OS, which is available for preview now through the Sun Software Express Program. Built into the Solaris OS with 25,000 probes in the kernel alone, DTrace is a boon to developers, system administrators, and IT managers.

More powerful than any other tool in the industry, DTrace is an unmatched dynamic tracing framework for troubleshooting your network and tuning system performance -- in real time. DTrace lets you see your entire Solaris OS system in an entirely new way, revealing systemic problems that were previously invisible and fixing performance issues that used to go unresolved. With DTrace, you can:

  • Examine the behavior of user programs and the Solaris OS and quickly identify the root causes of system and application bottlenecks
  • Highlight trends and patterns to tune systems for best performance
  • Track down performance problems across many layers of software
  • Locate the cause of aberrant behavior
  • Write reusable scripts for common or complex routines
  • Specify the data DTrace collects, the actions it takes, and the conditions under which it should take those actions
Across-the-Board Return on Investment (ROI)

    "By providing a thorough understanding of your systems' behavior, DTrace can lead to phenomenal network speed gains, slashed support costs, and exceptionally effective tuning," says Greg Papadopoulos, Sun Chief Technical Officer. "Simply put, DTrace is one of the most significant innovations in operating systems in the last decade."
DTrace can help you pare your IT budget in several key ways:
  • Performance problems can be understood on production machines; there's no need to waste time and money reproducing them in a separate test bed.
  • Bottlenecks can be identified and fixed in minutes or hours instead of days.
  • Existing systems can handle more users or transactions.
  • Service availability can be improved.
For example, when a server at Sun experienced degraded performance, DTrace found and isolated a rogue application in just 20 minutes -- a task that might have taken 30 hours before DTrace. That Sun server now supports 30 percent more desktops.

    "The wins from DTrace can be so significant that orders-of-magnitude performance increases are realized in production. This immediately translates to a bottom line benefit for the business unit," says Jarod Jenson, chief systems architect of Aeysis, a performance consultancy in Houston.
The Mother of Invention

In 1997, Sun's Bryan Cantrill, now a senior staff engineer in Solaris Kernel Development, and his team were working feverishly on a performance problem that cropped up in the just-introduced Sun Enterprise 10000 server. While running a benchmark, the server mysteriously slowed down for a period of time. Six sleepless days later, the team finally discovered the problem's root cause. A "totally knuckle-headed" configuration mistake had misconfigured the server to act as a router.

    "I came away shaken," declares Cantrill. "This was a problem that any customer could have, but they wouldn't have the luxury of kernel developers working around-the-clock writing custom code to understand the problem. We had to find a better way." After two and a half years of intense development, Cantrill and his team built that better way: DTrace.
With DTrace, Sun is making available several innovative features not found in other tracing software:
  • You can safely use DTrace on production machines, as well as in development and test bed systems, in real time.
  • DTrace provides a single view of the software stack, from kernel to application.
  • You do not have to modify applications -- or even restart them -- before putting DTrace into action.
  • DTrace fully instruments the operating system.
    "There's a class of problems for which you spend most of your time theorizing what might be happening and then trying to use the available monitoring tools to prove or disprove those theories," explains Philip Beevers, a developer at Surrey, U.K.-based royalblue, a leading supplier of global financial trading software.
    "This is true particularly for performance problems in complex production environments, which you can't easily reproduce and where traditionally you can't add your own instrumentation. With DTrace, developers can create tools which are tailored to proving or disproving those theories."
Boon for Developers

Developers can use DTrace to analyze and optimize application performance. DTrace makes testing and tuning more effective, with shorter test cycles. That yields lower support costs.

For example, when a financial institution applied DTrace to one of its business-critical applications, it uncovered a serious scalability problem. The institution was able to fix the problem in less than a day and netted "more than a 10 times throughput increase," according to Cantrill.

In another first in the industry, DTrace lets programmers see the interaction between their applications and the kernel by observing the flow of control across the user/kernel boundary. And with DTrace's easy-to-learn D language, you can build custom programs to dynamically instrument the system and provide immediate, concise answers to arbitrary questions about the operating system and user programs.

System administrators can use DTrace in real time on a production system because the system cannot be accidentally disrupted. While active, DTrace only minimally affects the system by dynamically selecting just the probe points you need. To further minimize its impact, DTrace never requires a reboot, forced failure, special diagnostic mode, or other changes to your system, applications, or user accounts. Because you can resolve problems more quickly than was ever before possible, you'll endure fewer and shorter service interruptions.

    "[An] Oracle [server] was eating CPU under a low load, and it was very difficult to determine why," says Peter Baer Galvin, chief technologist of Corporate Technologies, an enterprise systems integrator in Burlington, Mass. "After a lot of debugging and experimenting on [the] Solaris 8 [OS] without DTrace, we found that the problem was actually the application server that was calling the database server. With DTrace, this probably could have been solved in an hour, rather than in a week."
No Comparison

Every major UNIX vendor, as well as Microsoft, offers some form of tracing, but no method stacks up against DTrace.

    "At first glance, many [users] try to infer commonalities to existing observability tools. It does not take long, however, to realize that a new paradigm has been established," says Jenson. "At every site that I have used DTrace, a line of developers and support personnel forms asking for their application to be next. DTrace is a competitive advantage that everyone should utilize."
IBM AIXTrace, Linux Trace Toolkit, and Microsoft Event Tracing for Windows are the most widely used alternatives. Each records a small amount of predefined data at a few predefined points. These tools can be used only for questions that can be answered with these points or with the data that they provide. In contrast, DTrace has tens of thousands of probes, can instrument running applications without restarting them, and can record arbitrary data at each probe -- all on production systems. With DTrace, you can query the system arbitrarily, receive a precise answer in seconds, and take immediate action to resolve the problem.

    "I've been using DTrace to fetch details of disk I/O. Now I find systems without DTrace can feel uncomfortable," says Brendan Gregg, a UNIX developer and security consultant in Sydney, Australia. "I keep wanting to fetch more details, and they aren't there. You could say that DTrace is addictive."
Every programmer, every system administrator, and every IT manager faces inexplicable performance problems that bog down their systems, bleed network resources, and squander the company's money. But you don't need magic to tackle these problems. All you need is DTrace.