Do More With Less: SQL CMS and MSX

5

March 11, 2015 by Mike Hillwig

I’m not the typical DBA. The environment I support is in hosting, meaning I support a larger number of environments per DBA than nmost people I know. My team focuses more on the infrastructure and less on the application. I have dozens of the same application across many servers. The next DBA to have my job will look at the environment and say “This is not normal.”

When I first started this job, I realized that we had a problem with how we were doing things. We had maintenance plans everywhere. When we tried to change anything, it would take hours or days to make the smallest change across dozens of servers. That had to change. That’s when I stumbled across SQL Agent Multi-Server Administration. When I figured out how this thing worked, my world forever changed.

There are two out-of-the box tools that SQL Server provides that make my job a lot easier, and that’s the Central Management Server and the SQL Agent Mutli-Server Administration. The premise of this presentation is that these two tools and minimize the administrative overheead for DBAs.

These two tools, while completely independent, compliment each other incredibly well. What they do is completely independent of each other, but when you use them right, they pack a one-two punch.

The Central Management Server

This is one of the most handy tools I work with on a daily basis. If you need to query multiple servers, that’s precisely what this tool is made to do. I’d love to know how this tool evolved, but I’d love to find out. My guess is that some developer at Microsoft needed to query a few servers at once, realized that it was difficult. They probably spent a day or two writing the core internals and it became the tool we have today.

The use cases are many. I’ve used it for security tasks like creating a new SQL login across multiple instances, providing reports for auditors, and even finding a specific SQL Agent job that I knew I put somewhere.  Just recently, I had an incident come into my queue of our service desk software stating that a SQL Agent job had failed. The problem is that it was missing the server where it had failed. I knew the name of the job that failed, and I knew the date of the failure. I just didn’t know what server it had run on. With a few lines of code, I was able to query msdb.dbo.sysjobhistory on every one of my servers to find that thing.

What makes this tool even more powerful is that it allows you to break your servers into categories. In my production environment, we categorize based on the application and then the environment type, such as dev, test, or prod. Within a few clicks, I can query every one of my SSRS servers.
An example of querying multiple servers with the CMS

But wait… There’s more! Microsoft had made it incredibly easy for us to automate the population of our CMS catalog. With two stored procedures,  sp_sysmanagement_add_shared_server_group and sp_sysmanagement_add_shared_registered_server, we can do this without digging to far into the GUI. This data lives in sysmanagement_shared_server_groups and sysmanagement_shared_registered_servers. This makes it much easier for you to integrate with a Configuration Management Database (CMDB) or some other type of master catalog. In fact, I’m in the process of writing the automation to do just that in my environment.

One of the added benefits of the Central Administration Server is that the security is handled by each target server. Even if someone has access to your CMS, it doesn’t necessarily guarantee they have access to the targets. Your security and auditing are still valid.

There is the ability to configure CMS to use SQL logins with earlier versions of the CMS. Don’t. Just don’t.

And if you use Policy Based Management, the CMS plays really nicely with PBM. Make sure you check out the Enterprise Policy Management Framework on Codeplex.

SQL Agent Multi-Server Administration

SQL Agent Multi-Server Administration can be a controversial tool among people who have played with it. I’m actually one of the few people I know who have used this tool in a production environment. We live and die by this thing.

With only three DBAs, we manage over 130 instances. With that ratio of servers to DBAs, it’s pretty clear that I don’t have time to log into every server every day to do a health check. I have a very simple philosophy with server checklists. We automate that crap out of that. If we need to check things on a regular basis, we have a SQL Agent job that does it for us. I have all kinds of SQL scripts to check the craziest things. I was showing my environment to someone at the PASS Summit last year, and they were amazed at the number of jobs we deploy just to monitor our servers.

The problem with SQL Agent jobs is that they tend to sprawl. And keeping them consistent becomes brutal. That’s where Multi-Server Administration (referred to as MSX) comes in very handy. It allows you to define one server as the master server and it owns all of the jobs. From there, you can enroll target servers, and then chose what targets what get what jobs.

There are two problems with MSX. First, it tends to be a little quirky. If something doesn’t go quite right, it can put your MSX-TSX relationship for that target server in a blocked state. Second, when you deploy a job to a target server, the WHOLE job gets deployed, including the schedule. This can be tricky if you deploy IO-intensive jobs to multiple servers at once. Your storage administrator (or the SAN itself) may get a little bent out of shape.

The good news is that both of these limitations can be easily worked around. Give me some time, and I’ll document more about the blocking situation. It’s not as dire as it sounds. And as far as the scheduling thing goes, I’m a fan of using a two-job approach. I will frequently deploy IO-intensive jobs without a schedule. Then I will create a job locally that uses msdb.dbo.sp_start_job to call the master job. That allows me the benefit of keeping my scripts consistent while still having the benefit of scheduling flexibility at the instance level.

One of the questions I often hear about MSX is about the inevitable looming disaster of the master server disappearing. First of all, you should practice recovering that server, including the MSDB database. However, the good news here is that when you deploy a job to the target server, that target server will keep doing what it’s supposed to do, even when the master server goes down. That means if your master server does go south, you can rebuild a new one and then migrate each of your targets to the new master. In three years, that’s not yet been necessary in my environment. However, should it happen, my team will be ready to recover the master server and migrate if necessary. As part of my CMDB integration, I’m moving to a new server. That will will include a new MSX server.

An example of the SQL Agent job targets

 

There are two gotchas of MSX that I really want to stress. One is that it absolutely requires AD authentication. This simply will not work with SQL Server authentication, nor should it. The other one is security. Be very careful who has the ability to manage SQL Agent jobs on your master server. Someone who can create a SQL Agent job on the master server will be able to deploy that job to a target server, even if they don’t have access to the target. That means if their job has a CREATE LOGIN or GRANT statement, it will run as the target server’s SQL Agent service account. That could be a security gap if you have segregation of duty issues.

  • Brian K

    Very nice article, I’ve seen you discuss this a few years back in the ‘burgh, and I’m in a situation where I might be able to leverage some of this, although limited by the fact I’m dealing with six domains.

    The OCD in me requires me to point out that your VM should be named BEDROCK and your seventh instance should be DINO – particularly since you’re a pug lover.

  • Andy

    One of the really cool MSX features that you didn’t mention in this post is about job history.

    On the Job Activity Monitor of the MSX, you can see the most recent outcome of the job from each Target server. It won’t show the full job history, but being able to quickly see if any job failed on it’s most recent run is really helpful if you’re managing a lot of jobs on a lot of servers.

  • Mike, one more question for you – how do you handle security? I’m getting the “MSX enlist failed”, which seems to want me to either change the registry on both servers to have SSL encryption be either 1 or 0, or go through what seems to be a major change (so major they mention having the SQL Service fail to start)

  • Wish I were closer so I could see it at one of the SQLSats up there. Thanks for the slides – any recordings? : )

    Also, would love to know any of the other gotchas; think I’m going to do a small-scale rollout of a maint job to a few servers

    Thanks!
    Michael

  • William Hug

    In this video I see you using a script to add servers to the central management server, the full text of the script is not shown. May I get the script you use?

    https://youtu.be/PfI6gruMX0o