Abstract — The recent release of Altera’s SDK for OpenCL has greatly eased the development of FPGA-based systems. Research have shown performance improvements brought by OpenCL using a single FPGA device. However, to meet the objectives of high-performance computing, OpenCL needs to be evaluated using multiple FPGAs. To this end, we propose and test a scalable FPGA architecture for high performance computing. The test results have shown peak throughput is achieved when six FPGAs are used. The throughput per watt shows 5× improvement using four FPGAs, over a general-purpose processor.