Skip to main content

OIT Investigating E-mail, ACES problems

Disruptions appear to be related to separate hardware and software malfunctions

An ongoing investigation into the root causes of recent unrelated technical problems with university e-mail and online senior fall class registration is centering on separate hardware and software malfunctions, according to staff in Duke's Office of Information Technology (OIT). They said the technical difficulties occurred during significant upgrades to Duke's information technology infrastructure.

According to Duke's Senior Information Technology Architect, Michael Gettes, problems began March 3, when Duke's aging central acpub e-mail system locked-up during an upgrade, delaying mail delivery overnight for almost nine hours.

New e-mail problems arose March 23 when OIT staff discovered a corrupted file system. As a result, the e-mail of about a dozen users appeared as unread, and mail routing for a few hundred other users was temporarily disrupted.

Simultaneously, hardware failure occurred in two drives on Sun 3510 disk arrays, used as a backup system to ensure there are identical copies of every individual file on two different disks in the e-mail system. OIT staffers repaired the file system and restored service in about 10 hours, ensuring that no e-mail messages were lost.

Regarding the failure in OIT's backup hardware, John Board, faculty representative on the Information Technology Advisory Council, said, "From what I understand of the problem, that which cannot happen, happened."

On March 25, OIT began seeing similar problems in another file system containing e-mail belonging to about 7,000 users, forcing an emergency shutdown of the mail server just after 5 p.m. Repairs took 12 hours to complete, delaying mail delivery for the 7,000 users of that particular file system, according to Gettes.

Concerned about the reliability of the system, OIT ordered, installed and tested new Sun 3510 disk arrays over the weekend of March 27. This installation interrupted e-mail service on acpub accounts for about 12 hours in the middle of a Saturday night, when usage is minimal.

"Even though we scheduled the downtime at the time we considered to be least disruptive, mail is something you never want to have interrupted," said Tracy Futhey, Duke's vice president for information technology.

In advance of these service shut-downs, affected e-mail users were notified by OIT and kept updated via an informational Web site www.oit.duke.edu/helpdesk/service/updates.php.

Unrelated technical problems also interrupted the online senior registration for fall classes via ACES. On March 26, just as registration was about to open at 7 a.m., the Oracle database supporting the ACES Web interface used up all of the memory available to the system, causing it to lock up.

The University Registrar sent a mass e-mail at 8:30 a.m. notifying all seniors of the problem; registration was rescheduled for the following Monday, March 29. According to OIT technical staff, they and the Student Information Services and Systems office worked throughout the weekend to diagnose the memory problem and implement a fix, and senior registration was completed without incident and in record time..

"This was a really tough couple of weeks for our systems," said Futhey. "Between the e-mail and ACES problems, we probably had a dozen folks putting in sixty to eighty hour weeks."

According to Gettes, OIT has postponed its e-mail upgrades to keep from further changing the environment until the systems were fully stabilized with about 35 percent of e-mail accounts left to relocate. OIT technical staff members have increased the computer memory allocated for ACES, and a group of university administrators is evaluating the registration process and will recommend improvements this summer that will not compromise system reliability.

"Sometimes computer hardware fails or develops unforeseen problems such as computer bugs that unfortunately affects the user," Ginny Cake, OIT's director of customer service, said. "Planned outages may be necessary if and when we need to perform updates or patches and to make hardware and software upgrades to meet the changing demands of Duke, such as improving the speed of e-mail distribution. We try our best to contact the affected users in advance to educate them about the problem and what is being done to fix it."

Gettes said taking great care in making significant improvements to the university's information technology environment -- such as upgrading Duke's mail servers to more powerful systems and working to improve the ACES operating platform -- is necessary to better serve end-users.

"Duke is undergoing a bit of a renaissance with respect to information technology and how it is used in the academic and business life of the institution," he said. "Improvements to Duke's IT environment allow us to better concentrate on the service we provide with the technical and human resources we have at Duke to respond to the challenges and opportunities of continuing in our role as a world-class education leader."