When the printer connected successfully and you could start a print, the connection settings are normally at least 95%. This entry tries to guide you, in case you could start a print and later on the print stopped for unknown reasons. That is something that normally does not happen, so there must be a reason. And often if the causing reason does not get removed it will happen over and over again. So if you have frequent unintentional stops, follow this guide and see what it is.
First thing you need to understand is how the communication flow is. The flow from printer into Repetier-Server takes the following steps.
- Printer Firmware: The firmware has a communication buffer for in and outgoing messages. It checks incoming messages with line numbers and checksums for communication errors.
- Printer USB: This is either a USB-Serial converter and printer firmware talks to it via serial connection or a native USB driver that is integrated in the printer firmware.
- USB Cable: No active component, but electric noise, bad contacts, bad shielding or just long cables can add communication errors here, so it is a possible source of failure.
- Usb Hip (optional): Some users have cascades of USB hubs to get enough USB ports. Each hub has to split and resend data, so possible source again.
- Usb of your PC, PC Operating System: The OS serial driver sees the device, creates a communication port in the OS to connect to.
- Repetier-Server connected vie communication port
As you see there are up to 6 instances involved and each has it’s own possible source of problems. So the main step required is to find out which device is causing it and what can you do against it.
One of the most important parts to identify the problem are log files. On server side the relevant parts are server.log which shows when a serialconnection gets opened or closed. It does not show the reason though. But it is with these informations at least possible to identify the timestamp when exactly the problem happened. The other server file is server print logging. You can enable this in printer conext menu. Normally it is just unneeded information and writes, so it is disabled by default. But when you are searching why you are loosing connections you should enable it. Part ofit can also be seen in the console, so that is also a nice source for real time view of problems, but you only see result of last 2000 lines, so information might be gone when you need it. The last are logs from the operating system. For linux this is the file /etc/logs/system – which you can download in the server log window as well. Note that syslog gets rotate on dayly basis, so if you download you always only get the one for the current day.
First check is connection quality when it works. So for a longer print check a log and count how many resend requests you see from printer and how many timeouts you see.
A resend happens when a firmware receives a command with checksum and the checksum does not match, so it means the line we did send is not identical to the received one. Something between step 5 and 1 changed the content or dropped part of the data send.
Especially with real serial<->USB converter a modification happens easily. There is no timing line, so if clocks of devices are not running at exactly same speed they might misinterpret a bit every now and then. When the printer allows switching baud rate you can try switching 115200 with 250000 or the other way around. Sometimes one of them creates less errors. Speed wise it makes not much difference as usb latency is the main speed limiter.
If you get very frequent resends, like 1 every 5-10 lines, your setting for input buffer size is too high and we are sending more data then the printer firmware can cache. Reduce it to 63. If that does not help switch to ping-pong-mode where we only send a new command when firmware marks the last send command as finsihed.
Timeouts are a result of communication errors in the other direction from printer firmware to server. For every command we send, we need to receive one line that starts with “ok”. A typical case is that the “o” is missing and “k” is not “ok”. So we wait and wait for it and firmware is waiting for next command not knowing that we did not receive the “ok”. So after the set value of “timeout” seconds and when we know the command should be fast, we assume this case, write the timeout message and continue sending data. Here you need to watch out to shoose a good value. Modern firmwares support a busy protocol, meaning if a command takes longer it send “busy” resetting the timeout every 2 seconds. In this time you can set timeout to 3 seconds and have a fast recover. If this is not supported, timeout should be 30 seconds or at least longer then the longest slowest move you plan to execute. Often this is a z move from top to bottom.
Especially if you get a good combination of timeouts and resends more frequent then one every 1000 lines (many printers are even much better, but I think below is a value where you really should investigate for the source), you have some electronic interference. This can be a power cable next to unshielded usb cable or heater/motor cable close to communication cable inside printer, long cable making problems, and more.
It might also be that the printer it self has a problem. Known problems and signs:
- Firmware detected a recoverable error: In log/console you see normally a message from firmware with the info that M999 would restart firmware again. Newer server version also often give a hint depending on error and firmware.
- Firmware detected a non recoverable error: Some firmware errors are thought of as critical by firmware authors and they stop firmware and keep in an endless loop. In console you typically see the reason. They ignore all communication until printer is reset.
- Printer did reset for some reason. Many firmware will send the reason on next connection. If you see a message about brown out this means the printer had undervoltage. Watchdog means printer firmware had an unexpected hang.
Next big field is power. Especially the cpu boards of the printer can be either powered by main power of printer or from usb and main power. Latter case can draw power from the printer PC. USB 2.0 allows 500mA, but they might draw more or usb might not be able to send as much (especially passive hubs). Especially on tiny PC like the Raspberry Pi which is very sensitive to power issues, this can cause all kinds of problems form just a warning over disconnected printers or non working printers up to crashing operating system. For the Pi we have a hardware info in Repetier-Server GUI (bolt icon) that shows if the pi had or just suffers undervoltage, just because this is such a frequent issue for problems. Read this article for possible solutions: Undervoltage and throtteling of Pi
Linux actively closes usb connections for four reasons:
- You unplugged the usb cord or disble dthe device so it stops communicating. That is the obvious one you can easily rule out.
- Undervoltage exceeded a certain level. You see a message in /var/log/syslog
- EMF. Again linux logs this one in /var/log/syslog
- Driver crash. This is actually tricky. If it officially crashes you get a message in syslog I guess. But fact is that also it is active it is often only sending data in one direction and keeps the connection open.
When you see in print log “Connection closed by os” this means the port we were talking with disappeared. Here you should have a look at the syslog to see what linux tells about the reason. The timestamp would be close to the message timestamp, just account for syslog having a different time zone as our logs which are using local time.
For undervoltage/EMF the port appears a few seconds later and printer reconnects. You can tell in server settings to continue print in this case. Do this only if connection happens without reset. We try to prevent reset on fast reconnect, but not all serial drivers support this. Native usb connections do not reset anyway.
The really annoying case is when the serial port keeps open, but no or or only in one direction communication is happening. It is unclear where exactly this is happening. As a result you normally see frequent timeouts and we have added a server option to restart usb driver on linux system (USB Reconnect on Timeout). Early means after one timeout, conservative after 2 timeouts for the case of recoverable timeouts as described above. It restarts the server communication and linux driver. If you need to unplug and unpower printer to recover in this case, it seems that the hang is on the printer side.
The last and quite unlike case is a dead lock in Repetier-Server. This happened in the past, but by now we have resolved all known cases and had no found cases for a long time. None the less I want to mention it for completeness and in case we have accidentially added one. When it happens it is quite typical that the gui does not load completely or seems to hand as well. Page changes are not executed or content is missing that was there before. In case this happned on a linux system we would be grateful if you send us a backlog of all threads as described in here: Debugging crashes/hangs on Linux