Tuesday, August 19, 2008

Unable to Switch User Error - su: no shell

We had some users complaining about not being able to switch or substitute user, su. Here is the error message they were receiving: "su: no shell.” At first I thought the users had inadvertently locked out their accounts. But after querying nisplus and checking the file-based users, I didn’t observe any locked accounts. I tried switching to various users from root and received the same error. Then I tried switching user on a different workstation - no problem. The problem was tied to a particular box.

Then we used the truss command to trace the system/lib calls. It pointed to an unexpected access/permission issue. Observed the /usr directory was set to 600. Frankly speaking, the permission problem was somewhat a surprise since it was working fine the previous day. At any rate, changed the permissions and things were back to normal.

# truss su esofthub

3 comments:

Matt said...

I'd run chkrootkit, just because I'm paranoid.

How many admins are there on that machine, or people who have access to su or sudo? You should at least verify that this was an inadvertent mistyped command by a valid user.

Good luck

esofthub said...

Yes, it was a valid user and it was an inadvertent fat finger. Thanks for your comment matt.

UX-admin said...

Holy smokes. You actually let users log into systems?

Over here, if anybody (including root) EVER has to log in, that's considered an error.

Even in that case, one is only allowed to look, and heavens help one if they even so much as think of modifying something!

What happens next is:

- the problem is diagnosed
- an issue is opened in the tracking system
- a RID (Requirement ID) will be assigned from engineering (me)
- the RID has to pass verification
- if the RID passes verification, a formal document will be produced; if it's a bug, the fix will be built into the specification, for the next release cycle
- if necessary, the manuals will updated into the Run Time Platform, into the Flash(TM) archive.
- the fix will be generated according to the specification, and built into the RTP Flash(TM) archive.

If the issue is due to a hardware failure, it will be remedied and the system reflashed, if necessary; then all the applications will be automatically installed from packages (*everything* is packaged), and the backup will restores the application data only.

Applications are NEVER restored from backup (they're never backed up, on purpose), as they can, and must be consistently installable and configurable via packages.

The OS is also NEVER restored, as every system is and MUST be completely identical. All software installation of a specific product must be identical.

Any systems found to be installed or "fixed" to work any other way are immediately reflashed to the Run Time Platform, and the person responsible for "fixing" will have some serious explaining to do... they better have a phenomenal, absolutely valid excuse we've never heard before!

So, if any such thing as inetraction with a system happens, it is considered an error, and is reported to engineering immediately.

Principally, anything that is not fully automated, and that is messed with manually is considered an error.