July 12, 2006
Impact 2006: Wednesday Session 5: Scalable Reliable Vista Infrastructure
This is another session on setting up a system presented by North Carolina State University.
This session talked a bit more about operational and planning tasks, as well as hardware issues.
Planning should be campus wide. You need to communicate that the Learning Management system is a mission critical institutional system, just like HR or SIS. It requires basic infrastructure spending for hardware upgrades.
They recommend one six-hour downtime be blocked every two weeks to install service paks, etc. More time is required for major upgrades. Use this downtime even if not needed, to get the customers used to the system being unavailable during those times.
They recommend splitting the public network (communications between users and the application servers) from the private network (communications between the cluster, load balancer, and database). This makes it easier to monitor the traffic separately. If we find a lot of database activity, we can upgrade the internal network. If we find a lot of user hits, we can upgrade the external network.
It is important to track usage over time to identify spikes, and ongoing trends, so we can plan for expanding the cluster well in advance.
Setup and Recovery
Try to remove human error from the equation by making setup scripts that automate installation and setup as much as possible. This will ensure compatibility of nodes, and gives the ability to rebuild nodes quickly in an emergency.
We need to monitor two things. Reliability, and Scalability. Reliability is done using things like Nagios to monitor low disk usage, server status, etc. Scalability can be monitored using "Cacti". This shows trend analysis, time of day spikes, etc. (I've never heard of Cacti but should look into it). Currently we can get some of this information using web log analysis tools like Analog. But we also need to monitor database usage, internal and external network usage, disk usage over time on both the application and database servers, etc.
Backup and Recovery
They recommended backing up the database to disk instead of to tape in order to do a quick backup. This could include doing a database snapshot, checkpointing, etc. Need to discuss this with DBAs to find the option that provides the best recovery options at the best price.
Posted by kvl014 at July 12, 2006 10:17 PM