Archive

Posts Tagged ‘SCOM’

The Operations Manager 2007 R2 Admin Resource Kit is now available

15/06/2011 Leave a comment

Microsoft Operations Manager product team just announced the availability of the System Center Operations Manager 2007 R2 Administration Resource Kit over on their blog.

The System Center Operations Manager 2007 R2 Admin Resource Kit includes three tools designed to help improve the Operations Manager Administrator experience.

Included: Schedule Maintenance Mode, Clean mom and MP Event Analyzer.

  • Scheduled Maintenance Mode – Ability to schedule and manage maintenance mode in the management group.
  • Clean Mom – Helps remove all installed R2 components.
  • MP Event Analyzer – MP Event Analyzer tool is designed to help a user with functional and exploratory testing and debugging of event based management pack workflows like rules and monitors.

The resource kit can be found on the Microsoft Download site here .

Advertisements
Categories: OpsMgr Tags:

SCOM Authoring console and the mysterious Microsoft.SystemCenter.Library.mp version 6.1.7221.61

07/06/2011 1 comment

Updated – 13/08/2011

Microsoft released a new KB Article that allows us to download the latest version of the Microsoft.SystemCenter.Library management pack. now we can reference it in the Authoring Console.

NOTE: Do not import the Microsoft.SystemCenter.Library management pack into your management group. You do not have to do this. In fact, in a very limited set of circumstances, importing this management pack can result in the need to restore the Operations Manager database from backup.

________________________________________________________________________

After upgrading my SCOM server to CU4 I can not open any management pack using SCOM authoring console, the following message appear “ Referenced management pack not found…”

image

OK, lets try to locate the file…the file is not part of the installation files of CU4 and is not part of any MP that is available to us on System Center Marketplace (the new pinpoint site). the only file with a same name is located under “c:\program files\system center operations manager 2007” and the date point to an old file (unfortunately MP files are missing the version property).

What is really happening here?
It is possible that SCOM team update the MP without replacing the file?

The answer we looking for is in CU4_Database.sql script. the management pack is updated directly via SQL!!!

image

that mean that every time we export an unsealed MP, the version of Microsoft.SystemCenter.Library.mp will be 6.1.7221.61.

Workaround:

1. Do not use the authoring console “Import MP from..”
image

2. Export the MP from the SCOM console.

3. Open the XML file using text editor and change the version of the referenced MP from:

<Reference Alias="SystemCenter">
  <ID>Microsoft.SystemCenter.Library</ID>
  <Version>6.1.7221.61</Version>
  <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
</Reference>

To:

<Reference Alias="SystemCenter">
  <ID>Microsoft.SystemCenter.Library</ID>
  <Version>6.1.7221.0</Version>
  <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
</Reference>

4. Save the file and open it using the authoring console.

Categories: OpsMgr Tags: ,

Operations Manager 2007 R2: The Notification challenge

02/06/2011 Leave a comment

Recently, more and more customers want more granular/complex notification using SCOM 2007. As you probably know, notification in operations manager is based on 4 different channels (E-Mail, SMS, IM and Command). These 4 channels can be used to deliver alerts based on specific criteria (the criteria are part of a subscription).

Subscription = what to send + to + how

The Criteria in the subscription changed from OpsMgr SP1, and luckily OpsMgr team add some great features to the product like: send alerts that “raised by a specific rule or monitor” or “raised by any instance in a specific group or class”. Despite these changes, it still hard to set up alerts based on the needs of our enterprise, especially if we dealing with large and complex environment.

That’s why i can clearly say that it’s hard to get notification to work as we want and in most cases, without fully understand the object oriented class model of OpsMgr, it’s even harder.

I made a list of “What most of my customers wants”

1. Ability to manage and maintain notification information in a reliable and simple way.

2. Ability to limit (in some cases) the notification to only one alert. For example, when we have a server with IIS role that hold several web sites, all web site are monitored. When IIS stop working we will get alerts for all the web sites hosted on the same server.

3. Trace which alert was send to a recipient, when and how (Mail, SMS, Etc.).

4. Ability to set up on-call list. (Duty Roster)

5. The Ability to let IT personal to route alerts to others in case of a vacations (like out of office mechanism).

6. Ability to ensure notifications reaches the designated personal using two way communication and escalation

Most of the above are not present in Operations Manager, and with the problems that we had we needed to search for alternatives.

image

One of the products that I tested was SNS++ from Highnet Systems, in the beginning we needed to use a command channel to send our notifications, but after several meeting with Highnet guys they started to develop a connector that connect with the universal connector that ships with SCOM R2.

Now after successfully implemented SNS++  (very simple to do I must add) at several customers, most of my notification problems are not part of Operations Manager 2007, all alerts are forwarded to SNS++ and with the Smart Routing feature of the product we can deliver any alert to any recipient base of message filters in SNS++.

For more information you can visit Highnet Systems.

Categories: OpsMgr Tags: ,

Speed up console launch – Operations Manager 2007 (SCOM R2)

30/05/2011 Leave a comment

Recently I came across complaints from customers about SCOM console. the problem is that the console launch slower than before. 
Every time I checked the complaints in my lab or on other customer sites I have not encountered the problem they experienced.

Looking for another problem I had recently, I came across this post.  as a result of an upgrade to CU4, environments that are not connected to the Internet will experience slow opening of the interface. The problem, as I understood it, is related to parameters related to DotNet applications.

Using the solution proposed by François Dufour the console launches fast as before.

Categories: OpsMgr Tags:

VMware Management With Big 5 Enterprise Management Systems from Microsoft, HP, IBM, CA and BMC

20/04/2011 Leave a comment

Veeam Software, a VMware Technology Alliance partner, develops innovative products for virtual infrastructure management and data protection. VMware monitoring options include the nworks Management Pack™ for Microsoft System Center Operations Manager.

The nworks MP provides continuous monitoring of enterprise VMware environments. It features a centrally managed, distributed architecture for scalability and automatic failover and load balancing for high availability.

The nworks MP integrates fully with both VMware and System Center. It enables all System Center functionality: alerts, topology diagrams, dashboards, reporting, auditing, notifications, responses and automation for all VMware components. It provides a detailed VMware health model, including metrics such as memory pressure and disk IOPS that are only available from Veeam.

The nworks MP also includes a comprehensive knowledge base that serves as your VMware knowledge book, just like Microsoft provides in all MP’s.

While the nworks MP itself collects VMware data agentlessly, it also fully integrates data from Operations Manager agents running inside virtual machines, providing full end-to-end visibility of virtualized applications and services.

Arch

for more details click here: http://www.veeam.com/vmware-microsoft-esx-monitoring/features.html

In March 2011 Vanson Bourne, an independent market research organization, conducted an online survey of 253 CIOs whose organizations used VMware on the topic of virtualization and IT management.

According to the survey the predominant management framework used in the enterprise today is Microsoft System Center (55 percent), this is followed by IBM Tivoli (20 percent) and HP OpenView (11 percent).

survy

Categories: OpsMgr Tags:

How to stop false heartbeat alerts for DMZ servers

12/04/2011 Leave a comment

One of the strange things in OpsMgr is the relationship between Health Service Watcher object and the Root Management Server. A common mistake is to think that when we point a server to a gateway server (GW) or a Management Server (MS), the GW or MS are responsible to alert us about the availability of the monitored server.

This is not the case in the current version of OpsMgr (hopefully next version will help us dill with it better). All  Health service Watcher objects are placed on the RMS, and if the GW server is down we will get a lot of “Computer not reachable” & “Health Service Heartbeat Failure” for servers that are up and running!!!

HB_RMS

Lets start with a common scenario where we have a GW server that is connected to a MS thru a FW.

GW

in this scenario when the MS,GW or FW is down we will get a lot of false alarms in the console that alert us that all agents behind the FW are down.

We have 2 options to work around this:

Option 1: 

GW1

Add another GW server (GW2) and set all agents to failover to the new one in case of a failure in GW1. (How to failover an agent\GW). take in mind that if the FW or network devices that connect the GW to the MS fail you will still get all the unwanted alerts.

Option 2: We need to create an override for the 2 monitors “Computer not reachable” & “Health Service Heartbeat Failure” and to create a rule on the GW server that will catch an event when a monitored server is down.

1. Create a group that contain all health service watcher (agent) in the DMZ, in my case it was easy, I just needed to exclude all my internal domains agents

hb_2

2. Go to authoring pane and search for the 2 monitor “Computer not reachable” & “Health Service Heartbeat Failure” and set an override to the group created in step 1.

hb_3

hb_4

3. Create an event rule that catch the following event and assign the rule only to the GW server.

hb_5

in the end you will have 2 overtraded monitors and a new event rule

Overrides

Hope this will help you to lower the number of false notification alerts.

Categories: OpsMgr Tags: , , ,

StateChangeEvent table and grooming problem in OperationsMananger Database

23/02/2011 Leave a comment

I noticed this problem several times in different customer sites. the problem is that the OperationsManager DB is growing and the grooming process is skipping the state change events table.

There are several good blogs that explain how the groom process works like:

Steve Rachui ,Kevin Holman and Daniele Grandini, I used their knowledge to free some space used by operations manager database.

To verify the space used in the DB, just execute the standard report named “Disk usage by top table” in the SQL management studio console

clip_image002

The following query give us indication which monitors are the noisiest. keep in mind that some are in a result of config churn and we must tune them.

select distinct top 50 count(sce.StateId) as NumStateChanges, m.MonitorName, mt.typename AS TargetClass
from StateChangeEvent sce with (nolock)
join state s with (nolock) on sce.StateId = s.StateId
join monitor m with (nolock) on s.MonitorId = m.MonitorId
join managedtype mt with (nolock) on m.TargetManagedEntityType = mt.ManagedTypeId
where m.IsUnitMonitor = 1
group by m.MonitorName,mt.typename
order by NumStateChanges desc

clip_image004

Opssss…..This seems to be a lot of state change events!!!

Since we already take care and changed some of the discoveries to deal with config churn, we wanted to find out for how long we keep data in the database, to do so just run the following query, take in mind that the result is in days.

SELECT DATEDIFF(d, MIN(TimeAdded), GETDATE()) AS [Current] FROM statechangeevent

clip_image006

It’s time to loose some weight! (the query is in the bottom of the page)

here are the results:

clip_image008

Most noisiest monitors in the database

clip_image010

How many old change state data in Days

clip_image012

and as you can see we align the data with the grooming settings.

clip_image014

To delete the old data from State Change event DB run the following:

USE [OperationsManager]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
BEGIN

    SET NOCOUNT ON

    DECLARE @Err int
    DECLARE @Ret int
    DECLARE @DaysToKeep tinyint
    DECLARE @GroomingThresholdLocal datetime
    DECLARE @GroomingThresholdUTC datetime
    DECLARE @TimeGroomingRan datetime
    DECLARE @MaxTimeGroomed datetime
    DECLARE @RowCount int
    SET @TimeGroomingRan = getutcdate()

    SELECT @GroomingThresholdLocal = dbo.fn_GroomingThreshold(DaysToKeep, getdate())
    FROM dbo.PartitionAndGroomingSettings
    WHERE ObjectName = ‘StateChangeEvent’

    EXEC dbo.p_ConvertLocalTimeToUTC @GroomingThresholdLocal, @GroomingThresholdUTC OUT
    SET @Err = @@ERROR

    IF (@Err <> 0)
    BEGIN
        GOTO Error_Exit
    END

    SET @RowCount = 1  

    — This is to update the settings table
    — with the max groomed data
    SELECT @MaxTimeGroomed = MAX(TimeGenerated)
    FROM dbo.StateChangeEvent
    WHERE TimeGenerated < @GroomingThresholdUTC

    IF @MaxTimeGroomed IS NULL
        GOTO Success_Exit

    — Instead of the FK DELETE CASCADE handling the deletion of the rows from
    — the MJS table, do it explicitly. Performance is much better this way.
    DELETE MJS
    FROM dbo.MonitoringJobStatus MJS
    JOIN dbo.StateChangeEvent SCE
        ON SCE.StateChangeEventId = MJS.StateChangeEventId
    JOIN dbo.State S WITH(NOLOCK)
        ON SCE.[StateId] = S.[StateId]
    WHERE SCE.TimeGenerated < @GroomingThresholdUTC
    AND S.[HealthState] in (0,1,2,3)

    SELECT @Err = @@ERROR
    IF (@Err <> 0)
    BEGIN
        GOTO Error_Exit
    END

    WHILE (@RowCount > 0)
    BEGIN
        — Delete StateChangeEvents that are older than @GroomingThresholdUTC
        — We are doing this in chunks in separate transactions on
        — purpose: to avoid the transaction log to grow too large.
        DELETE TOP (10000) SCE
        FROM dbo.StateChangeEvent SCE
        JOIN dbo.State S WITH(NOLOCK)
            ON SCE.[StateId] = S.[StateId]
        WHERE TimeGenerated < @GroomingThresholdUTC
        AND S.[HealthState] in (0,1,2,3)

        SELECT @Err = @@ERROR, @RowCount = @@ROWCOUNT

        IF (@Err <> 0)
        BEGIN
            GOTO Error_Exit
        END
    END   

    UPDATE dbo.PartitionAndGroomingSettings
    SET GroomingRunTime = @TimeGroomingRan,
        DataGroomedMaxTime = @MaxTimeGroomed
    WHERE ObjectName = ‘StateChangeEvent’

    SELECT @Err = @@ERROR, @RowCount = @@ROWCOUNT

    IF (@Err <> 0)
    BEGIN
        GOTO Error_Exit
    END 
Success_Exit:
Error_Exit:   
END

Categories: OpsMgr Tags: