Some information might not apply to the latest versions of Fedora. We should group this information by version.
This page is intended to document problems we encounter with Fedora, hopefully resulting in a better understanding of Fedora functionality and debugging methods. As we identify explanations for certain types of behavior, they will be merged into the "main" documentation or turned into a more traditional troubleshooting list.
General troubleshooting procedure
The best place to look for error messages are:
- The shell from which you started fedora
- nohup.out (usually in the directory from which you started fedora, but may be in server or server/config)
- startup.log (in the log directory)
- other fedora logs (though they're usually useless)
Checking for startup/shutdown
In the "primary" fedora logs (the ones whose names are timestamps):
- When server is shut down, log file is closed out, and ends with "Server shutdown requested."
- When server is started, log file begins with "Server home is...."
The solution to a process that hangs is almost always to increase the Java heap space. Find the batch/shell script that your process is running, and change the -Xmx setting. For most applications, at least 512mb is desired.
2005-12-19: Ryan Scherle made a minor change to the FULL_VIEW_XSL for the Hohenberger collection (iudl:10) on rhyme. This change did not take effect.
- Tried restarting Fedora, making another change to the stylesheet, restarting Fedora again. No effect.
- Stopped my local Fedora to make sure it wasn't referencing the wrong stylesheet.
- Manually constructed the equivalent URL, based on contents of the bmech. This worked.
- Purged the collection disseminator and re-added it. This disseminator no longer shows up in the method list!
- Updated the stylesheet datastream in my local Fedora, to see if behavior was the same. This worked. (Which was different behavior than before.) The only difference was originally, I hit the Import button to update the stylesheet. The second time, I hit Edit and then Import.
- Purging the collection object and re-importing from my local Fedora fixed the behavior on rhyme.
Server not responding
2005-12-19 (ca. 10:40pm): Nagios reported the Fedora server on rhyme was critical. It appears that the server simply stopped responding, and as of 9am on the 20th, is still not responding.
- There are two nohup.out files, one in the home directory for the fedora user, and one in the server/config directory. These are not much help, as they don't include timestamps. The file in the home directory doesn't have any useful information, as the most recent entry is a known user-caused error from earlier in the day. The other file ends with some NullPointerExceptions, all in regular expression classes.
- The main fedora server log shows no errors. There is an entry at 10:33pm with the last sucessful Nagios query. The next entry is at 9:07am on the 20th (a manually run query to try diagnosing the problem), and shows 0 available connections in the database connection pool.
- There was nothing interesting in the tomcat logs.
- Randall reports that the Oracle instance was backing up at that time, which likely caused the problem.
- Restarting fedora fixed the problem.
Resource Index problems
2006-1-19: Noticed that (on rhyme) there were some Hohenberger objects that did not get listed in an RI search for all Hohenberger items.
- Tried the Fedora rebuilder, but this only resulted in further corruption of the database. It turns out that the currently-released rebuilder doesn't work with Oracle.
- Got access to the main Fedora CVS repository, and built a new copy of the rebuilder. Unfortunately, the new rebuilder is somewhat tied into the other code, so tried bringing all the code up to 2.1.
- Even though the new RI rebuilder works fine with Oracle, the DB rebuilder doesn't, so the database corruption wasn't fully removed.
- Reverted all code back to 2.0.
- Fixed RI on rhyme by moving all content to another machine and then back to rhyme.
Can't import object
If an object cannot be imported from one repository to another, the known possibilities are:
- Incorrect password.
- An extremely long error message, complaining of HttpServiceNotFoundException:[DefaultExternalContentManager] and a "non-200 response code (500)". This means that the server-side beSecurity.xml file (or the XACML files it generates) doesn't have the correct IP address for the server.
- Too many objects are being ingested at once. The ingest process has some memory leaks. It is better to break the ingest into batches of around 500 objects. (Fedora 2.2 is supposed to make this easier.)
Fedora won't start
ModuleInitializationException / Connection refused
This usually means you are trying to start Fedora with the wrong database profile (and it's trying to connect to the wrong database). Make sure you run "fedora-start dbprofile" where dbprofile is usually oracle for the large Fedora instances, or mckoi for smaller test Fedora instances.
ModuleInitializationException / DefaultAuthorization
This usually means there is something wrong with the database or resource index:
- Make sure the database is running.
- Use "fedora-rebuild dbprofile" to rebuild the database.
- Try starting Fedora again. If you get the same error, use "fedora-rebuild dbprofile" to rebuild the resource index.
The following exceptions:
have been seen with the message refering to a db concurrency error.
Furthermore, the concurrency issues seem to be interfering with the purging of old items. A purge will seem to complete normally, only to have the item remaining in the repository, causing the following exceptions:
That last exception is when the item is gone, but the datastreams for that object still exist. This indicates that when the purge fails (presumably from a concurrency error), the transaction is not properly rolled back.
Details at 11:00.
If there are problems with Unicode, it is best to ensure both the Fedora server and the client are using UTF-8. This can be accomplished by adding the URIEncoding="UTF-8" attribute to the Connector element(s?) in server_fedoraTemplate.xml in Fedora's Tomcat/conf directory, as well as to server.xml (where?). For clients, add request.setCharacterEncoding("UTF-8") where needed.
Fedora Disseminator/Datastream Binding key name
When writing code to create a disseminator, there is a structure used to bind a datastream to a disseminator, which looks something like this:
Notice the "NULLBIND", above. This is not typical; typically, the "NULLBIND" will be the name of the datastream, which is "DC" in this case. This is an example of where the binding is only because Fedora requires at least 1 DS binding for a disseminator, even if it is not really needed.
In any case, the "NULLBIND" name is required by the bmech, and if the name is wrong, a non-obvious error will occur. In this example, I inadvertantly used "DC" instead of "NULLBIND" in the code, and received the following error from the Ingest Tool output summary:
Examination of the FoXML and the Mets MD will show that both are well-formed and valid.
Also, you will probably see something like this in the Fedora logs:
Examination of that log might seem to indicate that the failure (the first warning) is referring to the uploaded METADATA DS. However, it is not – Fedora has finished with the upload processing, and is now referring to the inproper reference in the datastream binding. This failed reference in the DS binding causes the later GeneralException which proves fatal to the ingest.
It would be nice to see better errors from Fedora for this particular issue.
401 when calling management API (Version 2.1.1 and below)
Removed the JAAS-realm from the server_config template and restarted. Seems to be working. This issue is said to be fixed in Fedora 2.2.
Fedora Admin Client: Unconnected sockets not Implemented Exception (Version 2.2.1)
Symptom: The Fedora Admin Client refuses to add a new "Internal XML" or "Managed" datastream with error message "Unconnected sockets not implemented" or similar.
Configuration: Fedora APIM and the upload servlet is over SSL encrypted connection. A self signed certificate is used, this shouldn't happen if the security certificate is signed by a known authority (such as Verisign). Fedora Version 2.2.1, JDK Version 5.
Description: Normal APIM operations (over SOAP) do not require the client to specify the correct password to the certificate file. However, the upload operation requires the correct password to be specified (for example, in fedora-admin.bat). I'm not sure why we don't run into the same issue in the ingest tool because we don't specify a password there. It might be the case that if a password is specified, it should match the certificate password.
Fix: Modify the default password to the truststore file in the fedora-admin.bat.