The problem:
You don’t automate as much as you could or do as good a job with the many tools (systems) to manage your infrastructure as you could because you don’t have time. You want it done but it’s not high enough on your unmanageably long to do list. You’d still like to add health checks to monitor components you don’t now, at least occasionally when you have an outage. Your infrastructure tools aren’t all the latest and greatest stable version. You certainly don’t have time to evaluate if other tools are useful enough to use, much less install them. Because of this you have small inefficiencies. You end up manually executing menial tasks, because they aren’t frequent enough to justify teaching a lower pay minion to do them or figuring out how to automate them. Once or twice a year you have to go through a huge effort to patch your servers. You know there are tools to help with automated (or semi-automated) patching, but you end up not using those systems because you simply haven’t had the time. Fortunately you only occasionally generate infrastructure reports or totals for things like inventory, licensing and audit. I could go on: You’d like better fancier tools but don’t have time.
Chances are excellent you spend some time installing agents, infrastructure, and and writing administrative scripts out of necessity. We bet you, just like every other company, install 3rd party software (monitoring, backup control systems, etc) to make your environment easier to manage and more reliable. Our guess is that most of this is relatively standard stuff but setup so it works for your environment. We also guess you have at least done some custom scripting done to reduce manual work of tasks done at many many companies (but that works for your company). How many of the management tools or systems do you have that are actually unique to your organization? Could they be done better? Upgraded or already on the current version? Furthermore, aside from testing, and configuration you do mostly the same administrative setup and maintenance tasks as countless other admins in other companies do. If these are standardized, can’t someone just do this (once correctly) and provide a way to install and validate this to everybody?
The solution:
A bundled management infrastructure suite of useful management services (monitoring, backup control system, nfs share, inventory reporting, security & logging agents, and scripts) that you just drop in. You provide a Linux server (your standard) and give the suite ssh access (keys or executing a script) to install our agents on every server in your environment (but only once). This will self-configure. Fire wall it from the internet and keep it fully under your control. Our agent collects most configuration information it needs (server names, etc) and self-installs agents talking to the suite. This is intended as a duplicate management infrastructure, you keep your existing management tools and use whichever ones is more useful (or familiar) for the task at had.
We use (mostly) open source software so we can make this really cheap given the scope of what we install. We write the software for many many companies so we get efficiencies of scale, even though we also have to write automated installation and configuration scripts. Our solution is designed to duplicate most of the management infrastructure you already have. The point is to give you at least something of value you don’t have and for you to ignore the parts ones that you don’t like. This is intended to be very low maintenance (to you) aside from customization (monitoring threshold updates, etc).
Where it gets really interesting:
We can write you custom scripts for you. Actually, they aren’t really custom (or untested) because most of your custom scripts do the same task as the next company’s.. Here’s how: The suite has access to every server, tons of environment specific information, and has a standard components (this suite) it can depend on. We (not you) can now write your “custom scripts” for you, which is really any menial task common to most organizations. We bet we’ll write more scripts and write them better than you. (No offense intended).
This has the potential to become a slightly disruptive technology within some IT organizations. We put our scripts in a web GUI with 1)execution history and output/failures (audit & quality control) 2) what was done & by who 3) access control of who can run what. If you can train people when to do and not do something they can be given access to do tasks previously unavailable to them either for access or skill barriers. Providing change control is respected, people can push buttons in a GUI instead asking you to do work. Interns can add disk space. Project managers can run server health checks & restart applications (at 1 AM). We can enforce requiring change control tickets for high risk script runs.
Not all of your work is unique to your company. Why do you have to do it if someone else is already doing it!?