Visit Heroix at http://www.heroix.com
Subscribe to the Heroix eNewsletter
Visit Heroix at http://www.heroix.com
Charting Life in the IT Environment

>> Single Pane of Glass Monitoring - What does it mean today?

August 19, 2008

The notion of a single pane of glass – being able to view your entire network & infrastructure from one console – is not new to IT, but over the years it has come to mean different things to different people. On one level it can be as simple as having a more intelligent view of network connectivity – so, for example, if you lose connectivity to a couple dozen servers at the same time as a result of a router failure, you would receive a single alert, not a couple dozen alerts. That kind of correlation has been a giant leap forward toward root cause analysis, and is applied in a number of ways by monitoring software such as Longitude to help detect and diagnose multi-symptom problems (check out tip 4 in our June 9 blog entry on preventing monitoring false alarms)

Today, with IT organizations now focused on delivering business services, the pane of glass is being viewed from a higher level. Whether the underlying cause is the network, a server, a router, a database, or a web site, IT staff need to know what business activity is compromised so they can respond appropriately. Furthermore, IT needs to be able to report to management on performance from a business perspective.

This requires the ability to collect and correlate an ever widening array of performance and availability metrics, and many organizations find themselves struggling with a piecemeal approach that relies on a patchwork of open source software, shareware, point products, and in-house scripts. Longitude allows you to collect and correlate data from a wide range of sources:

  1. Windows, Unix, and Linux operating systems (including VMware)
  2. Databases, including Microsoft SQL Server, Oracle, and MySQL
  3. Web Servers, including Microsoft IIS and Apache Web Server
  4. Microsoft Exchange Server
  5. J2EE™ Application Servers, including BEA WebLogic®, IBM WebSphere®, and JBoss®
  6. Cisco & any Network Device that uses a MIB
  7. SNMP Traps
  8. DHCP
  9. Infrastructure components, including Active Directory, Citrix, Dell OpenManage™, HP Systems Insight Manager (HP SIM), IBM Director
  10. Protocol Availability
  11. Syslog & Windows Event Logs
  12. End User Experience (Synthetic Web Transactions)
Longitude Event Monitor Showing Business Units - Click to enlarge
Longitude Real Time Statistic Dashboard for ERP Application - Click to enlarge

Longitude can then combine data from any of these sources in tailored Event Displays or Real Time Statistic Dashboards (aka “single pane of glass”) according to the business services you support.

Better Pane of Glass

Furthermore, Longitude actually elevates the pane of glass using Service Level Agreements. A Longitude SLA allows you to group together all the disparate components that work together to support multi-tiered applications that underlie critical business processes, and monitors for degredations in performance or availability of the service as a whole.

For example, if a mission critical application depends on the availability and performance of a web server, application server, back end database, network connectivity and bandwidth, Longitude enables you to define a service level agreement that represents the convergence of all the underlying operational components.

If any single component is down or operating out of acceptable tolerance, it is reflected in the status of the overall SLA. Longitude can then report and alert – in real time or historically – exactly what was out of compliance, for how long, and how severely. This helps you provide better service in several ways:

Longitude SLA for Multi-Tiered Application - Click to enlarge
  1. First, because it incorporates all of the components that support the business service, Longitude eliminates finger-pointing and cuts resolution time by showing you exactly what is causing the problem.
  2. Second, by allowing you to specify degraded as well as unacceptable levels of performance for each component, Longitude can alert you before end users are affected, and even take corrective action if desired.
  3. Third, by allowing IT staff to drill down into underlying issues, Longitude puts actionable information into the right hands.
  4. Finally, by allowing you to annotate SLAs with information about outages and remedies taken (see blue pin in screen shot), SLAs also provide the foundation for more meaningful management reporting.

Register to Download Free SLA Guide
Register to Download Free Reporting Guide


Posted by Heroix Support

>> Resolve problems faster - get proactive with SLAs

July 22, 2008

SLAs (Service Level Agreements) often call to mind images of historical reporting and compliance – essentially documenting performance (or problems with it) after the fact. While that’s part of the picture, it causes many IT organizations without formal reporting requirements to overlook their benefit in proactive monitoring. So if you think you don’t need SLAs, think again.

Traditionally, Service Level Agreements – also called SLAs – have been used to measure the availability of specific services, and report on the percentage of time a given service is up or down. Longitude builds on this concept by allowing you to define Service Level Agreements to track anything from a simple up/down status to the overall health of an entire multi-tiered application. For example, if a mission critical application depends on the availability and performance of a web server, application server, back end database, network connectivity and bandwidth, Longitude enables you to define a service level agreement that represents the convergence of all the underlying operational components.

Click to enlarge

If any single component is down or operating out of acceptable tolerance, it is reflected in the status of the overall SLA. Longitude can then report and alert – in real time or historically – exactly what was out of compliance, for how long, and how severely. This helps you be proactive in two ways. First, because it incorporates all of the components that support the business service, Longitude eliminates finger-pointing and cuts resolution time by showing you exactly what is causing the problem. Second, by allowing you to specify degraded as well as unacceptable levels of performance for each component, Longitude can alert you before end users are affected, and even take corrective action if desired. You can learn more in our SLA Best Practices Guide.

Register to Download Free Guide

Posted by Heroix Support

>> Lose the False Alarms: 5 Tips for Better Performance Monitoring

June 9, 2008

“We started getting so many alerts we couldn’t tell what to pay attention to.”

Sound familiar?

Unfortunately, any IT monitoring effort can come with a snag: factory settings that are too high, too low, or just not applicable to your workload. Whether you are using commercial software or working with an open source or home-grown monitoring solution, over-notification can actually make you less productive and allow real, sometimes serious, problems to fall through the cracks.

If you are considering implementing a monitoring solution – or looking to improve what you are already doing – here are five common pitfalls and how you can avoid them.

1. Watch out for “one size fits all” thresholds.
Different workloads require different performance thresholds, and unless your monitoring software is tailored to your environment, you will end up with false alarms for applications where high utilization is the norm.

Save yourself from headaches by addressing this the first time you receive what might be considered an “over eager” alert. In Longitude, you can change settings right from the event monitor dashboard, as soon as you see the problem.


Screen shot showing threshold adjustment

Click to enlarge (click enlarged image to focus)

Furthermore, Longitude helps you determine appropriate thresholds by calculating minimum, maximum, and average workload values for any threshold you may need to adjust. This saves you time and takes the guesswork out of configuring Longitude. You can even view workload values and change thresholds globally or for a subset of servers, all in one step. Configuring a few, hundreds or even thousands of servers is quick, easy, and simple.

Screen shot showing minimum, maximum and average workload values - click to enlarge

Click to enlarge (click enlarged image to focus)

2. Filter out “non-problems.”
Just as there may be threshold values specific to your environment, there may also be individual components or even whole classes of problems that you do not want reported. For example, there may be specific Windows services, Unix/Linux file systems, or network interfaces that are not considered mission critical. Longitude allows you to specify filters based on component names as well as performance characteristics, so you can skip data collection for those you do not wish to monitor.

Screen shot showing data collection filter - click to enlarge

Click to enlarge (click enlarged image to focus)

3. Avoid repetitive notification for persistent problems.
Some problems take time to correct. When you or someone else on the staff will be working on an issue for a period of time, repeated reminders are not only unnecessary, but annoying and distracting.

Longitude allows you to suppress notification – again, right from the event monitor – to allow for repair time. If for any reason you decide that an event is not applicable to your environment, you can disable it entirely and should the situation change, you can simply re-enable the event.

Screen shot showing event shutoff - click to enlarge

Click to enlarge

4. Don’t be fooled by multi-symptom problems.
It’s not uncommon for a single problem to exhibit multiple symptoms. For example, if a router is down, it may “look” like all the systems it serves are down, resulting in multiple alerts that are in reality all attributable to the same root cause. Better visibility into underlying causes eliminates event clutter and speeds time-to-resolution.

Using correlated events, Longitude can determine the root cause of a problem and avoid the duplicate notifications. In the case of the router outage, Longitude can recognize this situation by correlating the state of individual servers with the state of the router, and send just one notification (suppressing individual server notifications) if the router malfunctions.

Screen shot showing correlated event - click to enlarge

Click to enlarge

5. Remember: Some problems are time-based.
Depending on when the symptom occurs, an issue may or may not require attention. For example, if your virus scan runs at 1 AM and causes a spike in CPU usage for two hours at that time, you would not want to be notified during that time period. Or, if you need to notify different personnel at different times of day, it makes sense to notify only those staff on duty at any given time. Longitude accomplishes this by allowing you to schedule notifications for different events. You can also have non-problems eliminated from the event database altogether during specified periods such as system maintenance windows.

Screen shot showing notification schedule - click to enlarge

Click to enlarge

Solution or Shelfware?
Automated performance monitoring holds great potential for any IT organization striving to maintain high levels of service for their critical business applications, but experience shows that “factory” settings – even those based on industry best practices – can lead to over-alerting that is annoying, distracting, and counter-productive. Many overwhelmed IT organizations ignore or even decommission monitoring software because it is just too difficult to tune to their unique environment.

As the above examples show, properly tailored monitoring software can filter out false alarms and alert staff to true problems before they affect business processes. This saves them time and money and allows IT to focus on strategic organizational objectives rather than on constantly finding and fixing problems after they’ve occurred.

>> Monitoring non-Cisco devices

June 3, 2008

Q: Longitude has a built-in solution for monitoring Cisco network devices. Can Longitude monitor non-Cisco network devices?

A: Yes. Longitude’s built-in Cisco solution uses a standard RFC1213 MIB for data collection, so in many cases, you can use it to monitor non-Cisco devices out of the box. The Cisco solution proactively monitors key performance metrics including bandwidth utilization, IP packet errors, TCP errors, TCP retransmits, UDP errors, queue lengths, etc. Longitude alerts you when there is a problem, and also provides pre-configured, on-demand reports and graphs to help you understand performance trends and ensure maximum availability. For more information about built-in monitoring of Cisco and other network devices, please consult the Data Sheet for the Cisco Solution (http://www.heroix.com/downloads/pdf/Longitude_Network.pdf).

If you wish to monitor items not collected by the built-in solution, then Longitude’s SNMP Studio enables you to monitor any SNMP-based device or application, including switches, routers and other hardware devices, as well as middleware and custom applications. The SNMP Studio also provides an interface for browsing Management Information Base (MIB) files. SNMP Studio comes pre-loaded with a variety of MIBs, and additional MIBs can be added easily. Creating a solution in SNMP Studio is as simple as browsing a MIB tree to select SNMP objects for collection and then filling in brief forms in order to configure integrated events and reports for Longitude to create. For more information, please consult the Data Sheet for the SNMP Studio (http://www.heroix.com/downloads/pdf/Longitude_SNMP_Studio.pdf).

Posted by Alison Murphy, Senior Technical Support Engineer

>> Managing Longitude Database Size

March 25, 2008

Q: How do I manage the size and disk usage of the Longitude database?

A: Longitude uses an open source SAP database, which is automatically created with 3 GB allocated on the drive you choose when you install Longitude. The database will auto-expand on that drive when it reaches either 80% full or less than 100MB free. You can manually extend it on the same drive, but that is rarely needed given Longitude’s self-maintaining features.

Do not gauge database consumption based on what Windows shows as the size of the \Longitude\sapdb\indep_data\wrk\FZEDB1\DATA0001 file. Even if Windows shows that file at 3 GB, the database is not necessarily nearing the full 3 GB allocated.

You can check the consumption by logging into the WebDbm:

http://localhost:7230/webdbm

u: dbm
p: {the password you specified for the original Longitude user during installation}

See the screen shot below.

Posted by Greg Savas, Technical Support Engineer

Screen shot showing database size

Next Page »
© 2008 Heroix | Heroix | RSS | Privacy Policy | Email: info@heroix.com