2013-12-13

Line rate HTTP server on the OpenBlocks AX3

This article explains how I'm using the OpenBlocks AX3 as a line-rate HTTP server for testing purposes.

Original idea

In 2003, while doing some performance tests on Netfilter, I realized how frustrating it was to always be limited by the load generators performance.You generally need at least 4-6 machines to load a firewall, with 2-3 HTTP clients and 2-3 HTTP servers. The second one of each is here to ensure that the bandwidth is never limited by a single machine, and the third one is here to prove that the limit reached with the first two cannot be overcome with more clients.And it's generally hard to find that many similar machines, you generally know that some are faster for sending, others for receiving, or that some are more efficient with large packets and others with small packets. In practice you're never totally confident in your own tests.

Two years later, while running some network benchmark to compare several firewall products for a customer, I faced the same issue again, especially when trying to stress the firewall with many short requests to maximize the connection rate. Then I got the idea of a dummy HTTP server which would only work in packet mode, without creating real TCP sessions. That would make it lighter and improve its ability to get close to line rate. Unfortunately, working with SOCK_PACKET by then was not really faster than the local TCP stack so I temporarily gave up on this idea.

After I recently became the lucky owner of an OpenBlocks AX3/4 microserver, the idea of exploiting to the maximum extents its high networking capabilities immediately woke up my old idea of stateless server. The platform is very recent and I needed to go deep into some kernel drivers, which explains why it took quite some time to reach a point where it's working.

The OpenBlocks AX3/4 microserver

The OpenBlocks AX3/4 microserver is a very neat device built by Japanese company Plat'Home.
The microserver compared to a 3,5" floppy disk for scale
This fanless device runs a dual-core 1.33 GHz Marvell Armada XPCPU (ARMv7), has 3 GB of RAM, 128 MB of NOR flash, 20 GB of SATA SSD, and, best of all, 4 true Gigabit Ethernet ports (I mean not over USB nor an internal switch nor crippled by design like in many Cortex-A9 based CPUs). In terms of average performance, it is comparable to a dual-core Atom running at the same frequency, though it consumes 4x less power.And indeed, even at full load, it becomes just warm to the touch. The design is robust and compact, so I now carry it everywhere with me as it's a very convenient device for many usages. The only criticism I could make is that it's a bit expensive, it clearly targets the enterprise market, which will value its benefits for building an ideal router, firewall, web server or monitoring device. But even then, many companies will prefer a cheaper low-end x86 box if they don't value the device's strong differenciators.

Where this device really shines is in the area of network communications. The 4 GigE ports are included in the Armada XP itself, so they're much closer to the CPU caches than usual devices which communicate via a PCIe bus. And this design pays off. After hacking a little bit the mvneta driver, it becomes obvious that each port is capable of both sending and receiving in parallel at line rate for all packet sizes, resulting in exactly 1.488 million of packets per second (Mpps) in each direction. This is something rare and very hard to achieve with more conventional hardware, so that made me want to try to port some network stress testing tools to this platform.

Note that there are other devices using the same family of CPU. I also have a Mirabox running an Armada 370, which is a low-end single-core CPU with a 16-bit memory bus and a smaller cache. It includes two of the same network controllers. What I'm describing here also works with the Mirabox to a certain extent. The limited memory bandwidth and the fact it's a single core prevent this from scaling to multiple ports. The peak performance is also about 10% lower.

Stateless HTTP server : principle

HTTP is a pretty simple protocol when you only look at the exchanges on the wire. It's what I call a "ping-pong" protocol : each side sends one thing and waits for the other side to respond. This is only true for small data transfers, and does not take pipelining into consideration. But for what I need in tests, it's very simple.

I've long been wondering if it was possible to use this "ping-pong" property to build a totally stateless server, which means a server which would only consider the information it gets from the packets and which would not store any session. Looking what a transfer looks like at the TCP level, it's clear that it is possible. Even when optimized, there's everything there for the job (please consult RFC793 if you have difficulties following these exchanges, as I won't paraphrase it here) :
Basic HTTP fetch
Faster HTTP fetch
For the server, all the information is provided in the client's ACK. If you look at the ACK and compare it to the initial SEQ sent by the server, you can determine exactly what step is being processed, so how to act accordingly. The problem is that after the response is sent, the server does not necessarily know how long the response was, so by how much it could have shifted the next sequence numbers. So the idea was to use only the lower bits of the sequence numbers to store the state. That way, each response size just needs to be adjusted so that the next sequence number matches the value we want to assign it.

For this first implementation, I wanted to support multi-packet responses, so I decided to have a limit of 16 states, resulting in 4 bits for the state and the rest for the transfers. That means that responses have to be rounded up to the next multiple of 16 bytes plus or minus the shift to reach the desired state. In HTTP we can easily do this using headers. So I added an "X-Pad" header which serves exactly that purpose. Another point is that the size of the Content-Lengthheader varies with the size of the response. So we need to adjust X-Pad last. Both the SYN flag and the FIN flag count as one unit in sequence numbers (just like one byte), so when we plan on sending any of them, we must also count one unit. This imposes some constraints on the states ordering, but they are easily met. For example, the response contains both the data and the FIN packet. Some clients will ACK the data first, then the FIN. This results in two ACKs offsetted by exactly one point. So in order to properly handle these two different acknowledgements, the two respective states must have a value with a difference of exactly one.

The beauty of this mechanism is that it even supports HTTP keep-alive (serving multiple objects over the same connection) and resists to packet losses since the client will retransmit either a request or an acknowledgement and the server will always do the same thing in response. Note that the multi-packet feature is not totally reliable for two reasons :
  • clients generally wait 40ms before acknowledging one segment, so the transfer is slow, unless segments are sent two at a time, but then we need a reliable way to distinguish their acks and to recover from partial losses
  • if a client's ACK for an intermediate packet is lost, the session will remain stuck as nobody will retransmit.

I found one ugly solution to all of these issues, which can work when the client supports the SACK extension. The principle is to send all segments but the first one so that the client constantly acks the first one and indicates in the SACK extension what parts were received. But this becomes complex, not universally usable and in the end does not provide much benefit. Indeed, when I designed this mechanism, I had objects up to 5-10kB in mind in order to try to fill the wire, I didn't imagine I would saturate a wire with single packet objets! So a next implementation will probably only use 2 bits to store the 4 states needed to perform a single-packet transfer and will not support the multi-packet mode anymore. Also with only 4 states, we'll be able to send even-sized packets more often than now. The complete state machine looks like this :
Complete state machine

Stateless HTTP server : first implementation

The first implementation of this server was made as a module for Linux kernel 3.10.x. This module registered a dummy interface which responds to any TCP port accessed through it. The concept is ugly but it was easy to implement. The performance was quite good. On the OpenBlocks, 42000 connections per second were achieved this way, using a single external NIC bound to a single CPU core. This means that about 84kcps could be reached with incoming traffic split on two NICs, which was confirmed. This is not bad at all, it's basically the same level of performance that httpterm gives me on a Core2 Duo at 2.66 GHz. But it's not huge. The issue is that the packets have to pass via all the routing stack, defeating a little bit the purpose of the server. However this mode is convenient to run locally because there is no inter-cpu communications, a response packet is produced for each incoming packet in the context of the sending process.

Stateless HTTP server : NFQueue implementation

The second implementation was done using NFQueue (Netfilter Queue). It's very easy to use and allows packets to be returned very early (in the raw table). So I wanted to give it a try. The result is basically the same as with the interface, except that two CPU cores are involved this time, one for the network and the other one for the user process acting as the server. However for local tests when you have lots of spare cores, it becomes more interesting than the interface version because it reduces the overhead in the network stack, increasing the limit of performance a single process may observe (typically 105k conn/s vs 73k on a Core2 Quad 3 GHz, with one CPU at 100% for the server).

Ndiv framework to the rescue

These numbers are both encouraging and frustrating. They're encouraging because they prove that the mechanism is good and efficient. And they're frustrating because we spend most of our time at places we'd prefer to avoid as much as possible.

So I decided it was time for me to be brave and finish the work I started 6 months ago on my ndiv framework. This is the Ethernet Diverter framework with which I could verify that the mvneta NICs are able to saturate the wire in both directions. Basically it consists in intercepting incoming packets the closest possible to where they're collected in the drivers, and deciding whether to let them pass, drop them or emit another packet in response. I already had an unfinished line-rate packet capture module using it. I temporarily stopped developing on it by lack of time, of needs, and feedback. I needed to implement the ability to forge response packets but I was not happy with its API which was already difficult to use an inefficient. I presented it in details to my coworker Emeric Brun with whom we could define a new "ideal" API that would be optimal for hardware assisted drivers and well balanced so that neither the application nor the driver has too much work to do.

After one full day of work, I could adapt the mvneta driver to the new ndiv API and make it respond packets! The driver looks like the diagram below with the framework plugged into it. The beige part is the ndiv "application" called by the ndiv-compatible driver.
How NDIV is inserted into the network stack
Among the cool things provided by the framework, we can enumerate the fact that it considers the role of the driver (or NIC) to validate incoming protocols and checksums, and to compute outgoing checksums if the application needs so. This makes sense because noawadays, most NICs do all this stuff for free and we'd rather not have the application do it. Similarly, if some checksums have to be computed by the driver or NIC on outgoing packets, it's the responsibility of the application to indicate the various header lengths because it already knows them.

Stateless HTTP server as an Ndiv application

After completing the port of ndiv to mvneta, I was absolutely impatient to see the stateless server run directly in the driver as an ndiv application. It did not take long to port it, just a few hours, and these hours were spent changing the sequencing of the code to clean it up since it was not needed anymore to compute checksums in the application.

The results are astonishing. First, when bombarded with a SYN flood from 5 machines, the theorical limit is immediately reached with 1.488 Mpps in both directions. The CPU usage remains invisible since the periods are too short for the system to measure them. I developped a tool just for this instead.

Second, it appears that line rate is almost always achieved for whatever object size. In keep-alive mode, line rate is achieved for objects of 64 bytes and above, at 564000 requests per second and 94% of one CPU core. Empty responses go higher, 663000 requests per second, but the wire is not full (816 Mbps). The reason is that Ethernet frames are padded to 64 bytes and that for too short responses, there's automatically some padding appended. It is also important at these rates not to forget about Ethernet's preamble (8 bytes) and Inter-Packet-Gap (IPG) of 12 bytes, totalizing 20 bytes. This overhead is represented in yellow on the diagram below.
Performance at various object sizes
The transfers in HTTP close mode are excellent as well. The OpenBlocks reaches 340000 HTTP connections per second. This means a connection establishment, an HTTP request, a fast close (FIN then RST). This is 3 packets in one direction, 2 in the other one. The theorical limit for this test is 496000 connections per second (1.488 M/3). It happens that my client (inject36) sends very large requests (about 166 IP bytes). So if we do the math, we have :
  • 64 + 8 + 12 bytes for the SYN packet = 84 bytes
  • 166 + 14 + 8 + 12 bytes for the request = 200 bytes
  • 64 + 8 + 12 bytes for the RST packet = 84 bytes
So for each request, the clients have to upload 368 bytes on the wire. This times 340000 equals exactly one gigabit (125000000 bytes). So in practice we're still not saturating the device nor its CPU, just the wire again. Just for the comparison, it's 3 times as fast as what I can achieve on a Core i7 3.4 GHz using httpterm.

Conclusion

First thing is that one may note that I rarely spoke about CPU usage. That's the beauty of this device. The CPU is fast enough so that a whole HTTP request parsing + response takes less than 1.4 microsecond and supports being done at line rate. The second point is that the network connectivity inside it as fantastic. I can achieve with this device packet rates that I cannot achieve with some very respectable 10G NICs. Now I urge Marvell to develop a next generation of Armada XP with a 10G NIC on chip! Now what is absolutely cool is that I finally know I won't ever have any problem anymore in benchmarks with the components being too short. Well I still need the clients... By the way, in theory it is possible to develop a client on the same model. The only thing is that the applications I implement in ndiv are reactive, which means they need some traffic to respond to. So we won't initiate a connection this way. One elegant solution however could be to use a classical SYN flooder on the device to initiate connections to the server, which in turn will respond and sollicit the client. But I'm still not completely convinced.

Other things I'd like to experiment with in the near future is porting the ndiv framework to more NICs (at least my laptop's e1000e) and to the loopback interface, so that we can even use the stateless server when developing on the local machine. I've started the ndiv project with a line-rate packet capture module which is not complete. I'm wondering if other uses can arise from this framework (eg: accelerators, load balancing, bridges, routing, IDS/IPS, etc...). Thus I'm not sure whether it's worth submitting for mainline. Any feedback would be much appreciated.

Concerning the stateless HTTP server itself. It has limited uses beyond test environments. But still I can think about delivering very small objects (favicon, redirects, ...) that fit in a single TCP segment and do not require any security. It can also be used for various types of monitoring devices which are ethernet-connected and which prefer to report measures using HTTP to make it easier for their clients to retrieve them. Some system identification or configuration might also be retrieved using such a mechanism embedded in very dumb devices which don't even have an IP stack.

Downloads

The code is available for various Linux kernel versions here. The most up-to-date version is ndiv_v5. The commits are grouped in 6 categories :
  • for mvneta : add support for retrieving the device's MAC address from the boot loader. Not strictly needed but quite convenient as this avoids running with random MAC addresses ;
  • for mvneta: some fixes for the mvneta driver ; they are required.
  • for mvneta: improvements for the mvneta driver ; they are required as well.
  • the NDIV network frame diverter framework. Required of course!
  • driver support for the NDIV framework (currently only mvneta, ixgbe, e1000e, e1000, igb).
  • the SLHTTPD server.

Useful links

2013-02-24

Mirabox: much better than GuruPlug

This quick review aims at describing my first contact with GlobalScale's Mirabox, how it compares with the other machines I've used before, namely the GuruPlug and the Dockstar, then how I managed to unbrick it.

Switching to the Mirabox

The Dockstar is nice but still limited to one port and has limited performance. After all the overheating issues, GlobalScale finally abandonned the GuruPlug Server Plus and replaced it with the DreamPlug which was a much nicer and safer design. I had one in hands and was considering buying one. I'm among the people who complain a lot about poor quality and point the finger at companies who put awful products on the market and do nothing to fix them. But when these people go back to the blackboard to completely redesign the product, I applaud. So I was OK with ordering a new product from them again.

When wandering on GlobalScale's web site last year, I noticed some teasing for the upcoming Smile Box, then the Mirabox. Both were using the same platform, a new Marvell Armada370 (ARMv7) at 1.2 GHz. The Mirabox has 1GB of RAM, 1GB of NAND flash, 2 GigE ports, a PCIe port, a MicroSD slot, an RTC with a battery, 2 USB3 ports, well it looked really nice. It was not much more expensive than the DreamPlug, so I finally decided to order one as well as a JTAG adapter in case things go wrong.

As soon as I received the box, I couldn't resist opening it. I was quite impressed by the quality of the hardware design. There is a very clean PCB with BGA chips on both sides, not a single wire at all, not even a heatsink. The device is very thin, basically the thickness of the RJ45 ports. There are jumpers inside, as well as serial and JTAG connectors that are compatible with the GuruPlug's adapter. Be careful when opening, the small plastic part which conducts the light from the leds sits in an unstable position and is annoying to reinstall. I finally glued it to the case.

Board inside
Enclosure
Unstable parts
WiFi antenna
Board bottom
Board top
Among some nice things I noted, I found that the internal serial ports were connected to the same serial port as the USB console (which goes to a PL2303 chip), and since they're using pull-ups, both are usable simultaneously. The PL2303 chip is powered by the USB and not by the Mirabox so that you don't lose the ttyUSB from the client when you power-cycle the Mirabox. This is much appreciated, the Snowball should adopt such a design. The device does not heat much and the CPU can always be touched. The MicroSD and MMC internal connectors are directly connected to the USB2.0 ports of the SoC. The jumpers are there to change the CPU/cache/DRAM frequencies though I only identified a few of them at the moment.

The Good, the Bad and the Ugly

A debian is installed on the device. I quickly installed haproxy to see if the network performance was any better than with the Dockstar. I noticed that traffic would not flow at all ! After installing strace, I discovered that the splice() system call would systematically fail in an unexpected way, meaning that some nasty untested patches were applied to the 2.6.35.9 kernel. So I went to their site to download the sources an found none at all. The only thing I could find there were a binary kernel image and a broken boot loader image (Note that a few days ago, the boot loader image was fixed there, and the kernel and U-Boot sources were finally released on Plugcomputer.org).

So I continued the tests by disabling splice() in haproxy and found that the performance was very low due to iptables and conntrack being hard-linked and impossible to disable.

So I looked on the net to find another kernel. I found one in Arch Linux ARM. Fine! Tried to boot it, it booted correctly and UBIFS complained about a lot of errors, then the kernel died consecutive to the inability to mount the root FS. After that the original kernel would also fail to mount the rootfs. Thanks to the captures I had taken earlier, I finally found that the config in the boot loader is wrong about the partitions sizes, which must have been hard-coded into their proprietary kernel that ignores the boot loader's settings. And it seems that UBIFS performs some recovery attempts before failing, resulting in a corrupted FS. Pfff.....

I could boot it with a Formilux rootfs and their proprietary kernel to recover the installation by reformating the partition and reflashing the original rootfs which is provided on their site. I got a bit angry at the product because it's full proprietary and bogus. I want to use it as a gigabit network sniffer and its network performance sucks because of the proprietary kernel!

First hope

Looking at kernel 3.7-rc sources, I found that the platform was recently introduced by the cool guys at Free Electrons, namely Thomas and Grégory, which are already known for porting Linux to a number of devices. So I contacted them to get some pointers and they told me about the Git repository where all their work is. I could test their latest work and could start hacking the device and providing them some feedback on some of their patches.

Second crash

Unfortunately I managed to boot a kernel with the incorrect partition table a second time and it again destroyed my rootfs and marked a lot of blocks as bad... Except that this time even after several passes of flash_erase and nand_write, there were some remains of "UBI" on some blocks, and I couldn't get rid of the fake bad blocks. Probably that the NAND driver is still a bit young... I finally used the nand scrub option in the boot loader to completely erase the rootfs partition... Error! This one is bogus too, it randomly erases other areas, and marks block 0 as bad ! (the boot loader). So now I must not cut power! I only had one attempt left! I reformated the whole flash using nand scrub again and all went fine. I had to reflash the U-Boot boot loader, but the one on the site above was defective. I finally found one on another site. It looked right, contained references to the reference design board. I crossed fingers and flashed it, checked that it was correctly flashed, then rebooted... Aie, Bricked!

BootROM 1.08                                                                    
Booting from NAND flash
BootROM: Bad header at offset 00000000
BootROM: Bad header at offset 00004000
BootROM: Bad header at offset 00008000
BootROM: Bad header at offset 0000C000
...

Second disappointment

My only hope was flashing the NAND via JTAG. I took my GuruPlug JTAG adapter which directly connects to this board and to the GPIO board. But Armada370 is totally unknown from OpenOCD, and no patches are available. Then I was really angry at GlobalScale. They sold me a JTAG device to unbrick the mirabox and which I cannot use at all because the software to use it does not exist! I contacted the support, they offered me to reprogram it for $25, except that shipping costs are as high as the device's price. And they'd reinstall the same bogus kernel that I cannot use. So I declined and thought that I'd rather find some time to try to reverse-engineer the JTAG TAP and the flash controller on a spare week-end. (Note: I was recently told that GlobalScale is considering sponsoring a port to OpenOCD, which is nice then).

New hope

One or two months after leaving the device unused on my desk, I decided to get it to work using OpenOCD. I downloaded all the doc, read it, started editing some board and target files, and could detect the TAP ID, which means the JTAG link is OK. I managed to reset the board via JTAG, which was good.

While reading the OpenOCD doc, I noticed something changed in an xterm behind. It was my minicom connected to the Mirabox which stopped scrolling on the "BootROM Bad Header" messages, and which displayed "Trying to boot from UART" ! UART ? I searched the net and found references to kwboot which is a tool made for loading firmwares into Marvell SoC via the serial port. In fact, Thomas had suggested it to me a while ago but I forgot since I couldn't figure how I could use it. Anyway now I got hope again and left OpenOCD to completely focus on kwboot.

Booting an image using kwboot

So I downloaded kwboot. I initially didn't find it alone, it's integrated into U-Boot in this tree and depends on a number of includes from this tree. I was about to clone the whole repository when I found a precompiled binary version here.

This utility sends a "magic" sequence to the BootROM boot loader installed in the Armada370's ROM (or maybe in the small I2C flash that's next to it, I don't know). The magic is a 8-byte sequence : 0xBB 0x11 0x22 0x33 0x44 0x55 0x66 0x77. The boot loader reads this image before the first attempt to boot, and each time it loops over the whole flash image (which is quite long on the 1 GB NAND). So it's better to start sending the sequence with the Mirabox powered down, and then to power it up. That's why it's very important that USB consoles are powered by the PC and not by the device!.

Fixing the boot loader image

The original U-Boot image could not be loaded via kwboot. The reason is that the first byte of the image indicates the boot device. Here I have 0x8B which indicates the image boots from the NAND. To boot on the serial port, we need 0x69. Changing it by hand will not work because there's a checksum. kwboot is able to patch this byte and recompute the checksum. But it looks like the cheksumming algorithm has changed on this new platform, because a fixed image does not boot either, and the original image does not show the correct checksum.

I found another tool, kwuartboot. I tried it in case it would handle a different checksumming algorithm, but that did not work either, it failed very similarly. So I concluded that I had to find how to regenerate my boot loader image to boot from the serial port and write the correct checksum.

Find new doimage

I found various incompatible versions of the "doimage" utility used to produce the boot image. I finally found that a version packaged for ArmadaXP was compatible with my Mirabox. (Note that since the patched U-Boot sources have finally been released, there is no need anymore to use the ArmadaXP image). Here's how I managed to rebuild a working doimage utility :

1. retrieve this U-Boot patch

2. rebuild the sources from the patch :

$ mkdir tmp-tools-doimage
$ cd tmp-tools-doimage
$ patch -Ntp1 < ../u-boot-2009.08-mv78460-20110404.patch >/dev/null 2>&1
$ cd tools
$ tar cf - doimage_armada_xp | gzip -9 > ../../doimage_armada_xp.tar.gz
$ cd ../..
$ rm -rf tmp-tools-doimage
3. rebuilt the executable from the sources :
$ tar zxf doimage_armada_xp.tar.gz
$ cd doimage_armada_xp
$ make

Find the various parts in the original image

The sources helped me a lot understand the image format. There are two binary images included in the U-Boot image, one is the DDR3 initialization code, and another one is the boot loader itself. There are several headers in the image and checksums that are computed on the global image. So it is possible to extract the embedded images, if necessary to modify them, then to reassemble them together and put the 32-bit checksum on them. After probably more than one hundred of attempts, I found that I needed the following parts of my original mtd0 partition :
TypeOffsetLengthmd5sumBegins with
DDR30x2448584f4b165ce...02 00 00 00 5B 00 00 00
U-Boot0xC000675420c739a005...12 00 00 EA 14 F0 9F E5
These images can then be assembled together using the shiny new doimage utility, to produce a new u-boot.bin image :
$ doimage -T uart -D 0x600000 -E 0x6A0000 -G mtd0-hdr.bin \
  mtd0-uboot.bin u-boot.bin

Still fails to load at 48k and issue DDR3 on the console

I then tried to flash the image this way :
$ kwboot -b u-boot.bin /dev/ttyUSB0

Unfortunately, the mirabox would still reject this image, but in a new and more consistent way :   it systematically rejects the image after 48kB of data, and the green LED "D5" turns on   just at the moment of the hang. I switched to kwuartboot to see if I got the same error, and it   behaved exactly the same way. So I modified it to display the invalid characters it got that   confused it, and I saw "DDR3 Training Sequence". Wow! This means that the DDR3 code was   executed, because this line is normally displayed at boot when the Mirabox boots. So I suspected   that the DDR3 initialization code is loaded into the cache or some SRAM on the device, which must   then initialize DDR3 to load the rest. So if this code is executed early during the boot sequence   and displays things on the console port I'm using to upload an image, it is conceivable that it   breaks the upload sequence... 

Patch the DDR3 thing to talk to ttyS1 instead

So I wondered how to shut that DDR3 code down. Since the device has two serial ports, I thought that it could be easier to make it chat on ttyS1 instead. ttyS0 is located at 0x10000 and ttyS1 is at 0x12000. I looked for occurrences of 0x10000 in the image and patched them to use 0x12000 instead. I was a bit scared because on ARM you don't have many bits for immediate values, and I was afraid not to be able to add a 2 there if some higher bit was set. This was not the case, the values were used absolute. I found 4 locations which needed to be patched in mtd0-hdr.bin : 0x1BC0, 0x1C08, 0x1D0C, 0x1D30. I then rebuilt the whole image using doimage :
$ doimage -T uart -D 0x600000 -E 0x6A0000 -G mtd0-hdr-uart1.bin \
  mtd0-uboot.bin u-boot-uart1.bin

New attempt to boot

Last step was to try to boot the image again :
$ ./kwboot  -b u-boot-uart1.bin -t /dev/ttyUSB0
Sending boot message. Please reboot the target...\
Sending boot image...
0 % [......................................................................]
1 % [......................................................................]
2 % [......................................................................]
...
Well, it goes further than 48kB this time, let's cross fingers... After about 45s, I got this :
99 % [.........................................................]
[Type Ctrl-\ + c to quit]

__   __                      _ _
|  \/  | __ _ _ ____   _____| | |
| |\/| |/ _` | '__\ \ / / _ \ | |
| |  | | (_| | |   \ V /  __/ | |
|_|  |_|\__,_|_|    \_/ \___|_|_|
        _   _     ____              _
       | | | |   | __ )  ___   ___ | |_ 
       | | | |___|  _ \ / _ \ / _ \| __| 
       | |_| |___| |_) | (_) | (_) | |_ 
        \___/    |____/ \___/ \___/ \__| 
** LOADER **

U-Boot 2009.08 (Sep 16 2012 - 22:50:06)Marvell version: 1.1.2 NQ
U-Boot Addressing:
       Code:            00600000:006AFFF0
       BSS:             006F8E40
       Stack:           0x5fff70
       PageTable:       0x8e0000
       Heap address:    0x900000:0xe00000
Board: DB-88F6710-BP
SoC:   MV6710 A1
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1200Mhz, L2 @ 600Mhz
       DDR @ 600Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access
PEX 0: Detected No Link.
PEX 1: Root Complex Interface, Detected Link X1
DRAM:   1 GB
       CS 0: base 0x00000000 size 512 MB
       CS 1: base 0x20000000 size 512 MB
       Addresses 14M - 0M are saved for the U-Boot usage.
NAND:  1024 MiB
Bad block table found at page 262016, version 0x01
Bad block table found at page 261888, version 0x01
FPU not initialized
USB 0: Host Mode
USB 1: Host Mode
Modules/Interfaces Detected:
       RGMII0 Phy
       RGMII1 Phy
       PEX0 (Lane 0)
       PEX1 (Lane 1)
phy16= 72 
phy16= 72 
MMC:   MRVL_MMC: 0
Net:   egiga0 [PRIME], egiga1
Hit any key to stop autoboot:  0 
Marvell>>

Yesss! For those who want to experiment with this without bricking their devices,   I'm putting here  this working image with a modified prompt to display "Recover>>" instead of   "Marvell>>" so that it's always easy to tell what boot loader you're running from. 

Now reflash the recovered device

Now that I'm on the device again for the first time in a few months, I don't want to see it go away. It's urgent to erase and reflash it. WARNING! do not copy-paste what follows if you don't understand what it is about, it will wipe out your whole device and may render it unusable! First, erase the whole flash :
Marvell>> nand erase clean
Marvell>>

Let's check that the flash correctly shows 0xff everywhere :
Marvell>> nand dump 0
Page 00000000 dump:
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
        ...
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
OOB:                                                                            
        ff ff ff ff ff ff ff ff
        ...
        ff ff ff ff ff ff ff ff
Marvell>>

OK. We'll pre-fill the memory with 0xFF before loading the boot loader there, to avoid   flashing crap : 
Marvell>> mw.l 0x7000000 0xffffffff 0x00100000
Marvell>> md.b 0x7000000 100
07000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ...............
...
070000f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ...............
Marvell>>
Now copy the original mtd0 image from a TFTP server (from a file called "mtd0" there) to that location :
Marvell>> tftpboot 0x7000000 mtd0
Marvell>> md.b 0x7000000 100
07000000: 8b 00 00 00 60 4e 0a 00 01 00 00 c0 00 c0 00 00    ....`N..........
07000010: 00 00 60 00 00 00 6a 00 00 02 01 00 00 00 01 0a    ..`...j.........
...
070000f0: aa 25 ad 00 ed 19 2b 60 00 23 ab 20 80 00 c0 19    .%....+`.#. ....
Marvell>>

The contents look fine (note the 8b at the beginning which means that   this is a NAND flash image). If everything looks OK and only in this case, you   can write the memory contents down to the flash then control that it looks similar to   the dump above : 
Marvell>> nand write 0x7000000 0 0x00400000
Marvell>> nand dump 0

Then reset the device, it should reboot directly from the flash. If it fails, you just   have to retry the procedure above and figure what you got wrong. 
Marvell>> reset

Back to hacking the device again

Since the device is fixed and I don't fear losing it anymore, I'm back playing with it. I'm running it with a 3.8 kernel with all development patches from Free-Electrons, as well as a recent attempt I made to port Marvell's NAND Flash Controller driver to this kernel. Right now it works but the code is ugly, contains many copy-pastes and I don't trust it a lot. I found it safer to buy a microSD card and install my FS there.

I could also remove some of Marvell's patches from their ugly kernel (I could get splice() to work again) and rebase it on 2.6.35.14, but at this point in time, I think it does not make sense to spend more time with this dead kernel, better try to get most features working with 3.8 and 3.9-rc. Oh, and BTW, UBIFS managed to destroy my rootfs again using Marvell's kernel when it tried to mount an uncleanly shutdown rootfs! I don't know if it's the FS or the Flash controller which is to blame, but what's certain is that a filesystem driver should not destroy the data it's responsible for, and that it should at least offer tools to fix devices which report errors. Here, after any minimal error, the file system is definitely lost. So that's one more reason for switching to a replaceable microSD for the root FS.
There are still a number of issues that I would like to see fixed in future versions :
  • the mtdparts variable in the boot loader does not match the real partition size, which apparently is responsible for UBIFS corrupting my root FS.
  • the Ethernet MAC addresses at the bottom of the box are different from those on the sticker on the board ! I don't know which ones are the right one, so if I plug this device on a network with another one sharing the same addresses, it could cause trouble.
  • their bogus kernel needs some fixing. I don't understand why they patched (and broke) the splice() system call. Also, having iptables+conntrack hard-linked really is problematic (and there is no NOTRACK target).
  • the boot loader only reserves 4 MB for the kernel, which is too small for development kernels. Anyway you'll probably have to repartition the flash after the rootfs gets corrupted.
  • the internal jumpers and connectors are not documented.
  • I tried a dual-gigE mini-PCIe boards (i350-based Jetway ADMPEIDLA) and it did not work, it was not even detected. I don't know which one is the culprit since the NIC works on an Atom board and other cards work on this board.

A few captures

Here come a few captures of what people always want to see from a new board :-)

cpuinfo

# cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 1 (v7l)
BogoMIPS        : 1196.85
Features        : swp half fastmult vfp edsp vfpv3 vfpv3d16 tls
CPU implementer : 0x56
CPU architecture: 7
CPU variant     : 0x1
CPU part        : 0x581
CPU revision    : 1

Hardware        : Marvell Armada 370/XP (Device Tree)
Revision        : 0000
Serial          : 0000000000000000

meminfo

# cat /proc/meminfo
MemTotal:        1034348 kB
MemFree:          998484 kB
Buffers:            4476 kB
Cached:             9820 kB
SwapCached:            0 kB
Active:             7116 kB
Inactive:           8576 kB
Active(anon):       1768 kB
Inactive(anon):       76 kB
Active(file):       5348 kB
Inactive(file):     8500 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:        270336 kB
HighFree:         248016 kB
LowTotal:         764012 kB
LowFree:          750468 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          1424 kB
Mapped:             2484 kB
Shmem:               448 kB
Slab:               4388 kB
SReclaimable:        952 kB
SUnreclaim:         3436 kB
KernelStack:         280 kB
PageTables:          152 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      703356 kB
Committed_AS:       5236 kB
VmallocTotal:     245760 kB
VmallocUsed:        6120 kB
VmallocChunk:     237500 kB

lspci

# lspci -nnv
00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (pro
g-if 00 [Normal decode])
        Flags: bus master, 66MHz, user-definable features, ?? devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        Capabilities: [fc] <chain broken>

00:02.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (pro
g-if 00 [Normal decode])
        Flags: bus master, 66MHz, user-definable features, ?? devsel, latency 0
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        Memory behind bridge: c1000000-c10fffff
        Capabilities: [fc] <chain broken>

02:00.0 USB Controller [0c03]: Device [1b73:1009] (rev 02) (prog-if 30)
        Subsystem: Device [1b73:0000]
        Flags: bus master, fast devsel, latency 0, IRQ 105
        Memory at c1000000 (64-bit, non-prefetchable) [size=64K]
        Memory at c1010000 (64-bit, non-prefetchable) [size=4K]
        Memory at c1011000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [b0] MSI-X: Enable- Count=8 Masked-
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: xhci_hcd

iomem

# cat /proc/iomem
00000000-3fffffff : System RAM
  00008000-0044b073 : Kernel code
  00738000-007f6393 : Kernel data
c1000000-c100ffff : xhci_hcd
d0010300-d001031f : d0010300.rtc
d0012000-d001201f : serial
d0018100-d001813f : /soc/gpio@d0018100
d0018140-d001817f : /soc/gpio@d0018140
d0018180-d00181bf : /soc/gpio@d0018180
d0050000-d00504ff : ehci_hcd
d0051000-d00514ff : ehci_hcd

interrupts

# cat /proc/interrupts
           CPU0
 16:      29435  armada_370_xp_irq  armada_370_xp_per_cpu_tick
 17:       2592  armada_370_xp_irq  serial
 23:        684  armada_370_xp_irq  mvneta
 26:          1  armada_370_xp_irq  d0010300.rtc
 27:        231  armada_370_xp_irq  ehci_hcd:usb1
 28:          0  armada_370_xp_irq  ehci_hcd:usb2
105:          0  armada_370_xp_irq  xhci_hcd:usb3
106:          2  armada_370_xp_irq  d0060800.xor
107:          2  armada_370_xp_irq  d0060800.xor
108:          2  armada_370_xp_irq  d0060900.xor
109:          2  armada_370_xp_irq  d0060900.xor
Err:          0

dmesg

Booting Linux on physical CPU 0x0
Linux version 3.8.0-mbx (willy@pcw) (gcc version 4.5.2 (Sourcery G++ Lite 2011.0
3-41) ) #2 Sun Feb 24 11:58:31 CET 2013
CPU: ARMv7 Processor [561f5811] revision 1 (ARMv7), cr=10c53c7d
CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
Machine: Marvell Armada 370/XP (Device Tree), model: Globalscale Mirabox
Memory policy: ECC disabled, Data cache writeback
On node 0 totalpages: 262144
free_area_init_node: node 0, pgdat c075ba60, node_mem_map c0b90000
  Normal zone: 1520 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 193040 pages, LIFO batch:31
  HighMem zone: 528 pages used for memmap
  HighMem zone: 67056 pages, LIFO batch:15
pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
pcpu-alloc: [0] 0
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 260096
Kernel command line: console=ttyS0,115200 mtdparts=armada-nand:4m(uboot),4m(uima
ge),8m(nv),16m(rescue),480m(rootfs),-(pad) ubi.mtd=4 root=/dev/ram0
PID hash table entries: 4096 (order: 2, 16384 bytes)
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
__ex_table already sorted, skipping sort
Memory: 1024MB = 1024MB total
Memory: 1020756k/1020756k available, 27820k reserved, 270336K highmem
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
    vmalloc : 0xf0000000 - 0xff000000   ( 240 MB)
    lowmem  : 0xc0000000 - 0xef800000   ( 760 MB)
    pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
    modules : 0xbf000000 - 0xbfe00000   (  14 MB)
      .text : 0xc0008000 - 0xc044b074   (4365 kB)
      .init : 0xc044c000 - 0xc0736934   (2987 kB)
      .data : 0xc0738000 - 0xc075c480   ( 146 kB)
       .bss : 0xc075c480 - 0xc07f6394   ( 616 kB)
NR_IRQS:16 nr_irqs:16 16
Aurora cache controller enabled
l2x0: 4 ways, CACHE_ID 0x00000100, AUX_CTRL 0x1a086302, Cache size: 262144 B
sched_clock: 32 bits at 18MHz, resolution 53ns, wraps every 229064ms
Calibrating delay loop... 1196.85 BogoMIPS (lpj=5984256)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
Setting up static identity map for 0x354c20 - 0x354c78
devtmpfs: initialized
pinctrl core: initialized pinctrl subsystem
NET: Registered protocol family 16
DMA: preallocated 1024 KiB pool for atomic coherent allocations
irq: Cannot allocate irq_descs @ IRQ33, assuming pre-allocated
irq: Cannot allocate irq_descs @ IRQ69, assuming pre-allocated
irq: Cannot allocate irq_descs @ IRQ102, assuming pre-allocated
Initializing Coherency fabric
bio: create slab <bio-0> at 0
mvebu-pcie pcie-controller.1: PCIe0.0: link down
mvebu-pcie pcie-controller.1: PCIe1.0: link up
mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
pci_bus 0000:00: root bus resource [mem 0xc1000000-0xc8ffffff]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci_bus 0000:00: scanning bus
pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
pci 0000:00:01.0: calling pci_fixup_ide_bases+0x0/0x11c
pci 0000:00:02.0: [11ab:7846] type 01 class 0x060400
pci 0000:00:02.0: calling pci_fixup_ide_bases+0x0/0x11c
pci_bus 0000:00: fixups for bus
PCI: bus0: Fast back to back transfers disabled
pci 0000:00:01.0: scanning [bus 00-00] behind bridge, pass 0
pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:00:02.0: scanning [bus 00-00] behind bridge, pass 0
pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:00:01.0: scanning [bus 00-00] behind bridge, pass 1
pci_bus 0000:01: scanning bus
pci_bus 0000:01: fixups for bus
PCI: bus1: Fast back to back transfers enabled
pci_bus 0000:01: bus scan returning with max=01
pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
pci 0000:00:02.0: scanning [bus 00-00] behind bridge, pass 1
pci_bus 0000:02: scanning bus
pci 0000:02:00.0: [1b73:1009] type 00 class 0x0c0330
pci 0000:02:00.0: reg 10: [mem 0x42000000-0x4200ffff 64bit]
pci 0000:02:00.0: reg 18: [mem 0x42010000-0x42010fff 64bit]
pci 0000:02:00.0: reg 20: [mem 0x42011000-0x42011fff 64bit]
pci 0000:02:00.0: calling pci_fixup_ide_bases+0x0/0x11c
pci 0000:02:00.0: supports D1
pci 0000:02:00.0: PME# supported from D0 D1 D3hot
pci 0000:02:00.0: PME# disabled
pci_bus 0000:02: fixups for bus
PCI: bus2: Fast back to back transfers disabled
pci_bus 0000:02: bus scan returning with max=02
pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
pci_bus 0000:00: bus scan returning with max=02
pci 0000:00:01.0: fixup irq: got 135
pci 0000:00:01.0: assigning IRQ 135
pci 0000:00:02.0: fixup irq: got 135
pci 0000:00:02.0: assigning IRQ 135
pci 0000:02:00.0: fixup irq: got 105
pci 0000:02:00.0: assigning IRQ 105
pci 0000:00:02.0: BAR 8: assigned [mem 0xc1000000-0xc10fffff]
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:02:00.0: BAR 0: assigned [mem 0xc1000000-0xc100ffff 64bit]
pci 0000:02:00.0: BAR 0: set to [mem 0xc1000000-0xc100ffff 64bit] (PCI address [
0xc1000000-0xc100ffff])
pci 0000:02:00.0: BAR 2: assigned [mem 0xc1010000-0xc1010fff 64bit]
pci 0000:02:00.0: BAR 2: set to [mem 0xc1010000-0xc1010fff 64bit] (PCI address [
0xc1010000-0xc1010fff])
pci 0000:02:00.0: BAR 4: assigned [mem 0xc1011000-0xc1011fff 64bit]
pci 0000:02:00.0: BAR 4: set to [mem 0xc1011000-0xc1011fff 64bit] (PCI address [
0xc1011000-0xc1011fff])
pci 0000:00:02.0: PCI bridge to [bus 02]
pci 0000:00:02.0:   bridge window [mem 0xc1000000-0xc10fffff]
PCI: enabling device 0000:00:01.0 (0140 -> 0143)
pci 0000:00:01.0: enabling bus mastering
PCI: enabling device 0000:00:02.0 (0140 -> 0143)
pci 0000:00:02.0: enabling bus mastering
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
Switching to clocksource armada_370_xp_clocksource
NET: Registered protocol family 2
TCP established hash table entries: 8192 (order: 4, 65536 bytes)
TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
TCP: Hash tables configured (established 8192 bind 8192)
TCP: reno registered
UDP hash table entries: 512 (order: 1, 8192 bytes)
UDP-Lite hash table entries: 512 (order: 1, 8192 bytes)
NET: Registered protocol family 1
pci 0000:02:00.0: calling quirk_usb_early_handoff+0x0/0x8c4
PCI: CLS 64 bytes, default 64
Trying to unpack rootfs image as initramfs...
rootfs image is not initramfs (junk in compressed archive); looks like an initrd
Freeing initrd memory: 10608K
bounce pool size: 64 pages
squashfs: version 4.0 (2009/01/31) Phillip Lougher
msgmni has been set to 1486
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
armada-370-pinctrl d0018000.pinctrl: registered pinctrl driver
mv_xor d0060800.xor: Marvell XOR driver
mv_xor d0060800.xor: Marvell XOR: ( xor cpy )
mv_xor d0060800.xor: Marvell XOR: ( xor fill cpy )
mv_xor d0060900.xor: Marvell XOR driver
mv_xor d0060900.xor: Marvell XOR: ( xor cpy )
mv_xor d0060900.xor: Marvell XOR: ( xor fill cpy )
Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled
d0012000.serial: ttyS0 at MMIO 0xd0012000 (irq = 17) is a 8250
console [ttyS0] enabled
brd: module loaded
loop: module loaded
libphy: orion_mdio_bus: probed
mvneta d0070000.ethernet eth0: mac: f0:ad:4e:01:a5:f3
mvneta d0074000.ethernet eth1: mac: f0:ad:4e:01:a5:f4
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
orion-ehci d0050000.usb: Marvell Orion EHCI
orion-ehci d0050000.usb: new USB bus registered, assigned bus number 1
orion-ehci d0050000.usb: irq 27, io mem 0xd0050000
orion-ehci d0050000.usb: USB 2.0 started, EHCI 1.00
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: Marvell Orion EHCI
usb usb1: Manufacturer: Linux 3.8.0-mbx ehci_hcd
usb usb1: SerialNumber: d0050000.usb
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
orion-ehci d0051000.usb: Marvell Orion EHCI
orion-ehci d0051000.usb: new USB bus registered, assigned bus number 2
orion-ehci d0051000.usb: irq 28, io mem 0xd0051000
orion-ehci d0051000.usb: USB 2.0 started, EHCI 1.00
usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: Marvell Orion EHCI
usb usb2: Manufacturer: Linux 3.8.0-mbx ehci_hcd
usb usb2: SerialNumber: d0051000.usb
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 1 port detected
ehci-pci: EHCI PCI platform driver
xhci_hcd 0000:02:00.0: enabling bus mastering
xhci_hcd 0000:02:00.0: xHCI Host Controller
xhci_hcd 0000:02:00.0: new USB bus registered, assigned bus number 3
xhci_hcd 0000:02:00.0: enabling Mem-Wr-Inval
xhci_hcd 0000:02:00.0: irq 105, io mem 0xc1000000
usb usb3: New USB device found, idVendor=1d6b, idProduct=0002
usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb3: Product: xHCI Host Controller
usb usb3: Manufacturer: Linux 3.8.0-mbx xhci_hcd
usb usb3: SerialNumber: 0000:02:00.0
xHCI xhci_add_endpoint called for root hub
xHCI xhci_check_bandwidth called for root hub
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
xhci_hcd 0000:02:00.0: xHCI Host Controller
xhci_hcd 0000:02:00.0: new USB bus registered, assigned bus number 4
usb usb4: New USB device found, idVendor=1d6b, idProduct=0003
usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb4: Product: xHCI Host Controller
usb usb4: Manufacturer: Linux 3.8.0-mbx xhci_hcd
usb usb4: SerialNumber: 0000:02:00.0
xHCI xhci_add_endpoint called for root hub
xHCI xhci_check_bandwidth called for root hub
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
Initializing USB Mass Storage driver...
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
rtc-mv d0010300.rtc: rtc core: registered d0010300.rtc as rtc0
i2c /dev entries driver
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
IPv4 over IPv4 tunneling driver
TCP: bic registered
TCP: cubic registered
NET: Registered protocol family 17
8021q: 802.1Q VLAN Support v1.8
VFP support v0.3: implementor 56 architecture 2 part 20 variant 9 rev 6
UBI error: ubi_init: UBI error: cannot initialize UBI, error -19
rtc-mv d0010300.rtc: setting system clock to 2013-02-24 14:03:59 UTC (1361714639
)
Warning: unable to open an initial console.
Freeing init memory: 2984K
usb 1-1: new high-speed USB device number 2 using orion-ehci
usb 1-1: New USB device found, idVendor=1a40, idProduct=0101
usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0
usb 1-1: Product: USB 2.0 Hub
hub 1-1:1.0: USB hub found
hub 1-1:1.0: 4 ports detected
usb 1-1.1: new high-speed USB device number 3 using orion-ehci
usb 1-1.1: New USB device found, idVendor=05e3, idProduct=0723
usb 1-1.1: New USB device strings: Mfr=3, Product=4, SerialNumber=0
usb 1-1.1: Product: USB Storage
usb 1-1.1: Manufacturer: Generic
usb-storage 1-1.1:1.0: Quirks match for vid 05e3 pid 0723: 8000
scsi0 : usb-storage 1-1.1:1.0
usb 1-1.2: new high-speed USB device number 4 using orion-ehci
usb 1-1.2: New USB device found, idVendor=05e3, idProduct=0723
usb 1-1.2: New USB device strings: Mfr=3, Product=4, SerialNumber=0
usb 1-1.2: Product: USB Storage
usb 1-1.2: Manufacturer: Generic
usb-storage 1-1.2:1.0: Quirks match for vid 05e3 pid 0723: 8000
scsi1 : usb-storage 1-1.2:1.0
scsi 0:0:0:0: Direct-Access     Generic  STORAGE DEVICE   9451 PQ: 0 ANSI: 0
sd 0:0:0:0: [sda] Attached SCSI removable disk
scsi 1:0:0:0: Direct-Access     Generic  STORAGE DEVICE   9451 PQ: 0 ANSI: 0
sd 1:0:0:0: [sdb] 15523840 512-byte logical blocks: (7.94 GB/7.40 GiB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 03 00 00 00
sd 1:0:0:0: [sdb] No Caching mode page present
sd 1:0:0:0: [sdb] Assuming drive cache: write through
sd 1:0:0:0: [sdb] No Caching mode page present
sd 1:0:0:0: [sdb] Assuming drive cache: write through
 sdb: sdb1 sdb2 sdb3 < sdb5 > sdb4
sd 1:0:0:0: [sdb] No Caching mode page present
sd 1:0:0:0: [sdb] Assuming drive cache: write through
sd 1:0:0:0: [sdb] Attached SCSI removable disk
mvneta d0070000.ethernet eth0: link up

Jumpers

Zoom on jumpers
Jumpers are numberred like this : J7 J4 J6 J5 J9 J8 J1 J2 J3. Pin 1 is at the top, pin 2 in the middle, and pin 3 at the bottom. We'll note 1 for jumpers between pins 1 and 2, and 0 for jumpers connecting pins 2 and 3.

The only combinations I found to work so far are the following ones. Overall we could say that J6/J5/J2 affect the CPU/L2/DDR frequencies, and athat J7/J4/J9 prevent the system from booting if changed.

CPU@1200 MHz, L2@600, DDR@600 (Default settings)

J7J4J6J5J9J8J1J2J3
011000101
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1200Mhz, L2 @ 600Mhz
       DDR @ 600Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access

CPU@1200, L2@800, DDR@400

J7J4J6J5J9J8J1J2J3
011000111
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1200Mhz, L2 @ 800Mhz
       DDR @ 400Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access

CPU@1000, L2@500, DDR@500

J7J4J6J5J9J8J1J2J3
010000101
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1000Mhz, L2 @ 500Mhz
       DDR @ 500Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access

CPU@1067, L2@534, DDR@534

J7J4J6J5J9J8J1J2J3
010100101
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1067Mhz, L2 @ 534Mhz
       DDR @ 534Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access

CPU@1333, L2@667, DDR@667

Note: this configuration did not boot.
J7J4J6J5J9J8J1J2J3
011100101
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1333Mhz, L2 @ 667Mhz
       DDR @ 667Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access

Useful links

Some news from the GuruPlug

Some of you probably remember my very disappointing experience with the Guruplug Server Plus 2.5 years ago. This device was overheating and very poorly designed. I was very angry at GlobalScale's for their crappy design and quite clearly I was not the only one considering the number of people reporting such issues with various fixing methods.

I finally had to resort to similar methods :-(. I had to remove the dangerous power supply from the box, and use its space to install a power plug, a large heatsink, and a thermally-controlled fan. I bought an external 5V/2A power supply and that resulted in a device that I could carry with me for various network experiments requiring two ports. Recently I also installed a small 4-pin connector to access the I2C bus from the outside to ease development.
Power plug
Fan
Heatsink

Switching to the Dockstar

I discovered the Seagate Dockstar, too late, when it was almost impossible to find one. I could put my hands on 4 of them, 3 of which were for friends. This device is tiny, uses the same CPU as the GuruPlug, can be powered from a single USB port with an easy mod, has free space to add a serial port, and is cheap. I always carry it with me everywhere and it's my network testing peer. It's much more convenient than the GuruPlug when a single ethernet port is required, and it can be powered by the GuruPlug's USB port when both are needed. Unfortunately it's impossible to find it now, and if you have one you don't use anymore, feel free to send it to me, I will be really pleased. Here are a few photos of the mods I made to a few devices.