Light at the end of the tunnel for AMD with Istanbul

February 26th, 2009

Intel’s Nehalem is looming on the horizon and promises to take the crown in every aspect of enterprise & high performance computing. Intel has been talking about Nehalem for so long that it almost seems like Nehalem is a couple of years old even though the launch has not yet happened. Talk about marketing!

AMD has taken quite a beating with Intel’s new Core micro-architecture featured in its latest server processors (starting from Intel Xeon Woodcrest to Intel Harpertown). AMD Opteron Shanghai managed to hold its ground in the 4P space and lost a lot of market share on the DP space. Or in Intel’s latest terminology, EP space (Efficient Platform). The MP is now called the EX or Expandable Platform.

Istanbul, the next Opteron series from AMD features six cores and a faster HyperTransport interconnect. There have been a lot of official & unofficial news on Istanbul online in the recent past. AMD is maintaining its platform compatibility with Istanbul as well. If you already own a Shanghai or Barcelona based Opteron server, you should be able to upgrade to Istanbul processors with as little effort as a BIOS upgrade. The only requirement on the board being that it support Split Plane Power, also known as Dual Dynamic Power Management feature. DDPM allows the integrated memory controller and processor cores to run at varying levels of performance thus leading to enhanced power management.

Initial evaluations of Istanbul stand to prove that Opteron is still relevant in the High Performance (HPC) & Enterprise markets and is a strong competitor to Intel’s Nehalem processor. An Opteron 4P platform (four socket, quad core, 16way) is demonstrating up to 40 GB/s of memory bandwidth compared to the ~25 GB/s of AMD Shanghai Opteron system. Read the full report from TechReport here. Istanbul Opteron CPUs feature a number of new technologies that make it a strong contender for the Nehalem CPUs. HyperTransport 3.0 being the most notable enhancement, along with AMD’s implementation of a snoop filter, HTAssist.

Following is a brief video about the platform compatibility between Shanghai & Istanbul processors. AMD has been delivering platform compatibility so much better than Intel for a long time and the savings to the end user are tremendous. Talk about upgrading to a newer & better processor in under 10 mins!

Compare that with the costs involved in replacing an existing server with another brand new server. And talk about the price of DDR3 memory that Nehalem needs. DDR2-800 that Istanbul will use has pretty much become main stream and the pricing should be very competitive compared with the DDR3 modules. AMD has let Intel take the responsibility of stabilizing DDR3 costs and will probably position a DDR3 enabled CPU at the right time.

So what does AMD have going for it right now with Istanbul:

Bad economy – Businesses & customers would be more sensitive to pricing. Istanbul provides the best value for customers who already own a Shanghai or Barcelona based server. In-socket replacement, very low downtime for upgrades and better performance with just a change of CPU.

DDR2 memory – DDR2 memory is now priced very competitively against DDR3. This brings down the overall cost of the system. Istanbul will use DDR2 instead of the more costly DDR3 memory. This probably will be a repetition of the same situation as FullyBuffered DIMMs. FBDIMMs were costly & hot degrading the overall value of the Core micro-architecture. DDR3 may do the same for Intel’s Nehalem given the current circumstances.

Tried & Tested – AMD Opteron has had the integrated memory controller for a long time. Intel’s Nehalem platforms are brand new with a totally new architecture & components. New Micro-architecture, new memory architecture, new memory technology & other minor new components like power distribution & supporting chipsets – bring so many new components together & its hard to get everything right the first time. Customers will probably take a wait & see attitude towards the Nehalem platform than go gaga over the latest & greatest

Here is a brief video talking about DDR2 & DDR3

 

Here is another video demonstrating a 4 socket AMD Opteron system much like our own A1403 Quad Opteron 16way server and HiperStation 4000.

 

And stay tuned to see a 48-core server soon!

References:

http://techreport.com/articles.x/16448

http://www.youtube.com/user/AMDShanghaiExpress

Integrating Cell based system in to ROCKS

December 11th, 2008

After successfully installing Fedora 9 on Cell based system (Mercury 1U dual cell blade based system), now we had to integrate it in to a ROCKS cluster.

ROCKS sends the appropriate kernel image by looking at the vendor-class-identifier information. Current DHCP configuration file supports only IA64 (EFI), x86_64, x86 and of course, network switches. Although, ROCKS no longer supports IA64 (Itanium), the code is still there.

The first task is to add the Cell system in to the ROCKS database. We decided to add the node as a “Remote Management” appliance than as a compute node. Adding as compute node would modify the configuration files for SGE or PBS and will always show up as “down” status. To do this, execute the following command:

insert-ethers --mac <give your mac id here>

When the insert-ethers UI shows up, select “Remote Management” and hit ok. You may also choose to provide your own hostname using the option “–hostname”

The next task to identify the vendor class identification for the Cell system. After a quick test, it was determined that the system had no vendor class identifier. Since we were dealing with only one system, the best option was to match the MAC ID of the sytem with the following elsif block:

        } elsif ((binary-to-ascii(16,8,":",substring(hardware,1,6))="0:1a:64:e:2a:94")) {
                # Cell blade System
                filename "cellbe.img";
                next-server 10.1.1.1;
        }

“cellbe.img” is the kernel image for the Cell system. This has to be copied to “/tftpboot/pxelinux/”.

These changes will be lost if dhcpd.conf is overwritten, which happens every time you execute insert-ethers or use

dbreport dhcpd

to overwrite the file.

You could generate a patch file and patch the dhcpd.conf every as needed or you could edit

//opt/rocks/lib/python2.4/site-packages/rocks/reports/dhcpd.py

to include the new elsif block everytime the file is generated.

If you see your cell system is trying to load

/install/sbin/kickstart.cgi

means your dhcpd.conf file is overwritten.

References:

http://archives.devshed.com/forums/networking-100/cannot-see-the-offer-and-ack-packet-with-ethereal-2063723.html

http://forums.opensuse.org/network-internet/399162-dhcp-client-identifier-matching.html

http://osdir.com/ml/network.dhcp.isc.dhcp-server/2004-05/msg00037.html

Code block to identify the vendor class identifier and other useful information:

               log(info, concat("Debug Information:\t",
               binary-to-ascii(16,8, ":", substring(hardware,1,6)),
               "\t",
               binary-to-ascii(10,8, "-", option dhcp-parameter-request-list),
               "\t",
               pick-first-value(option vendor-class-identifier,"no-identifier")
               )
               );

Python 3.0 Released

December 4th, 2008

Read more here.

Python is one of those things that I am always excited about. It may have been the underdog compared to Perl at some point in the past, but Python now is the ubiquitous  language that is working behind the scenes for almost any application.

Python can be seen working from web servers to simple desktop applications and everything else you can think of.

MPI+Python holds a bit of an interest for me.

Some active projects:

MPI for Python

 pyMPI

pyPar

HPC Systems at SuperComputing 08 (SC08)

December 4th, 2008

We were at booth number: 1726

On display was:

HiPerStation 8000 with 2X NVIDIA Tesla C1060

Here is a brief video of our exhibit at SC08. The demo shows couple of codes from the NVIDIA CUDA SDK and an instance of NAMD ported to CUDA .

Unable to update SUSE Linux 10

December 3rd, 2008

If you get this error message when trying to update a SUSE 10 based system using the Novell Customer Center Configuration menu

Execute curl command failed with '60':
curl: (60) SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify
failed

the easiest & fastest way to fix it is to

Check the system time!

Yes, simple as that. Make sure your system time is correct and your update should proceed smoothly.

There goes almost a day wasted trying to figure this out.

As you can tell, I am not that good with SUSE, or am I? :)

RedHat/AMD trumps VMWare/Intel on live VM migration technology

December 3rd, 2008

In this YouTube video, RedHat & AMD demonstrate live migration of a virtual machine from Intel based server to an AMD based server.

Ever since Intel bought a stake in VMWare, VMWare products have noticeably and obviously featured enhancements for Intel processors. One of the coolest things to come out after the VMWare/Intel alliance was the capability to move virtual machines from one generation of Intel processors to another generation of Intel processors. This capability is markted as “VMWare Enhanced VMotion” and “Intel VT FlexMigration”. FlexMigration was a much needed feature given how incompatible one generation of Intel processors are from the other. Intel being one of the biggest investors, VMWare may be reluctant to provide enhancements to VMWare that work better with AMD’s products. A related post is here .

With the new demonstration, AMD might see this as a way to co-exist in data centers that are exclusive to Intel. When power, rack space and heating costs are going up, virtualization is efficiently consolidating hardware for a good number of today’s applications. FlexMigration allowed the customers to be able to invest in the (almost) entire Intel product line without worrying about the incompatibilities when trying to use VMWare VMotion.However, the same technology will prevent customers from investing in AMD technology becasue they will be unable to migrate workloads without downtime.

Regardless of business decisions, this capability, when it is commercialized, will put the choice back in customers hands.

Good stuff.

Installing Fedora Core 9 and Cell SDK 3.1 on Cell Blade

December 3rd, 2008

We recently had a customer requesting a Cell Blade system to be integrated in to their Infiniband cluster. Since they were looking at having only one node, we suggested using the 1U dual cell based system. I am going to explain here the process of installing Fedora Core 9 on this system. This should also apply to other RedHat based distributions.

If you are considering purchasing Mercury Systems 1U Dual Cell Based System from Mercury Systems, please note that they have humongous lead times. For the system we purchased the lead time was about 16 weeks. Another important aspect is that this system comes with just two Cell processors and memory on board. Nothing else. No hard disk, no PCI slots. On board video is available but is not supported by the system. If you are going to use any add-on cards you will have to order the PCI expansion slots along with the system. To use disk storage, you will have to order a SAS disk with the system and the PCI riser card as well. This is something we overlooked, hopefully this will help someone else when purchasing from Mercury Systems.

Turning on your system: The system cannot be accessed via the regular KVM connections. The provided serial cable has to be connected to a stand alone PC and a utility like HyperTerminal or minicom has to be used to access the system console.

  • Start HyperTerminal or minicom and open the serial port connection.
  • Switch on the system.
  • You will see lot of text go by. Press “s” a number of times to enter the firmware prompt. The system boots from network by default
  • Once the firmware prompt appears, you can choose which device to boot from
  • ex: boot net to boot from network
  • Two hotkeys F1 and F2 are available for entering the management system (BIOS)

System Installation: Cell system (Mercury Systems 1U Dual Cell Blade Based System or IBM QS22) cannot boot from a disk. The system can boot only from network. This is actually a big inconvenience because neither FC9 nor RHEL 5.2 support NFS based (nfsroot) installs. This becomes sort of a chicken & egg problem. Cell system can boot only from network but the OS does not support NFS root install. YellowDog Linux 6.1 from Terrasoft (now Fixstars) advertises fast nfs root install support. There is a nice installation tutorial available for YDL here. The guide does not mention that the NFS root install is available only for commercial version. After a good amount of wasted hours trying to do an NFS root install with YDL, I gave up on it.

IBM Support has a nice page on how to install Fedora / RedHat based distributions on QS21 / QS22 using a USB disk.
Using the IBM Support page and a USB disk, I was able to finally get the system running. Here is the procedure for Fedora Core 9 PPC:

  • You will need a TFTP / DHCP server to install or a USB DVD ROM drive. Instructions on setting up TFTP / DHCP server can be found here.
  • Copy /images/netboot/ppc64.img to the TFTP root directory. This is the kernel the system will boot when using TFTP/DHCP setup. If you are using a DVD drive, just boot from the DVD. Make sure to check the boot order. By default network is the first boot device. You can force booting from the firmware prompt (pressing “s” while system is booting) using the command “boot
  • Get a nice USB hard disk. According to the IBM Support page, only IBM 80 GB USB & Lenovo 120 GB USB are supported. I am using Western Digital 320 GB USB disk (My Book). I did face some issues with this, not serious though. More information below on the work around.
  • At the firmware prompt, use “boot net vnc” to boot the system over the network.
  • Answer the installer prompts till the GUI starts
  • Now use a VNC client to connect to the installer using the IP provided by the installer
  • When using a large USB disk (80 GB+), the installer will exit abnormally immediately after clicking “next” in the GUI welcome screen. If you do want to use a large disk, the workaround is to disconnect the USB disk before clicking “next” on the GUI installer welcome screen. As soon as the next screen shows up, reconnect the USB drive.
  • Do the install as any other RedHat/CentOS/Fedora Core install. A nice guide is available here.
  • When the installer finishes, do not click the “Reboot”.
  • Now go back to the serial console and use the following commands:
    • umount /mnt/sysimage/sys
    • umount /mnt/sysimage/proc
    • chroot /mnt/sysimage
    • source /etc/profile
    • mount /sys
    • mount /proc
    • Disable SELinux: Open /etc/selinux/config and change “SELINUX=’enforcing’” to “SELINUX=’disabled’”
    • Make sure your network card is set to use DHCP before going forward. If you have setup static IP, temporariliy change the configuration to use DHCP. This can be done by moving the configuration file: mv /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-eth0.bak
    • Generate a new zImage to boot the kernel ramdisk from the network.
      • /sbin/mkinitrd –with=tg3 –with=nfs –net-dev=eth0 /boot/initrd-2.6.25-14.fc9-net.ppc64.img 2.6.25-14.fc9.ppc64
      • At this time, if you had static IP and moved the configuration file, move the file back: mv /etc/sysconfig/network-scripts/ifcfg-eth0.bak /etc/sysconfig/network-scripts/ifcfg-eth0
      • wrapper -i /boot/initrd-2.6.25-14.fc9-net.ppc64.img -o zImage.initrd-2.6.25-14.fc9-net.ppc64.img /boot/vmlinuz-2.6.25-14.fc9.ppc64.img
    • Now copy the generated zImage to the TFTP root directory using scp or by copying it to a USB disk.
    • Exit the choort environment
      • umount /sys
      • umount /proc
      • exit
  • Now go back to the installer GUI and click on “Reboot”

This concludes the installation. Make sure you copy the generated zImage to the TFTP root directory so this image is privoded to the  system when it boots after the installation.

Post Install Configuration:
Boot the system with the new zImage. The system will boot using the attached USB disk. You will be able to look at the boot process from the serial console. Now login as root.

  • The first  step is to install a Cell BE optimized kernel.
  • Download the kernel from BSC site: wget http://www.bsc.es/projects/deepcomputing/linuxoncell/cellsimulator/sdk3.1/kernel-2.6.25.14-108.20080910bsc.ppc64.rpm
  • Install the kernel: rpm -ivh –force kernel-2.6.25.14-108.20080910bsc.ppc64.rpm
  • Add “–nodeps” to the command above if it does not successfully install the kernel.
  • Now generate a new zImage as per the above instructions using the newly installed initrd and vmlinuz (2.6.25.14-108.20080910bsc)
  • Copy this zImage over to the TFTP root directory and over write the old zImage generated with FC 9 kernel (2.6.25-14.fc9)
  • Reboot to boot in to the new kernel.

SDK Installation & Executing Demo code:
SDK installation is pretty straight forward.

  • Download the SDK v3.1 from IBM.
  • Instructions on SDK installation are avbailable here from IBM. Only to lookout for is to install tcl before SDK installer can be installed: yum install tcl and then install SDK installer: rpm -ivh cell-install-3.1.0-0.0.noarch.rpm
  • Important Note: Follow the instructions on IBM site to add exclude directives to YUM to rpevent YUM from over writing packages optimized for Cell BE.
  • Compiling demo code also is simple. Use the provided make files.
  • Before executing any demo codes, it is advisable to configure and mount a hugeTLBFS file system.
  • To maximize the performance, large data sets should be allocated from the Huge-TLBfs. This filesystem provides a mechanism for allocating 16MB memory pages. To check the size and number of available pages, examine /proc/meminfo. If Huge-TLBFS is configured and available, /proc/meminfo will have entries as follows:
  • HugePages_Total:    24
    HugePages_Free:     24
    HugePages_Rsvd:      0
    HugePages_Surp:      0
    Hugepagesize:    16384 kB

  • If your system has not been configured with a hugetlbfs, perform the following:
    mkdir -p /huge
    mount -t hugetlbfs nodev /huge
    echo # > /proc/sys/vm/nr_hugepages
    where # is the number of huge pages you want allocated to the hugetlbfs.
  •  If you experience difficulty configuring adequate huge pages, memory may be fragmented and a reboot may be required.
  • This sequence can also be added to a startup initialize script, like /etc/rc.d/rc.sysinit, so the hugeTLB filesystem is configured during system boot.
  • A test run of Matrix Multiplication code at /opt/cell/sdk/src/demos/matrix_mul is as follows:
  • [root@cellbe matrix_mul]# ./matrix_mul -i 3 -m 192 -s 8 -v 64 -n -o 4 -p
    Initializing Arrays … done
    Running test … done
    Verifying 64 entries … PASSED
    Performance Statistics:
    number of SPEs     = 8
    execution time     = 0.00 seconds
    computation rate   = 91.66 GFlops/sec
    data transfer rate = 6.70 GBytes/sec

AMD announces 45nm Opteron (Shanghai) availability

November 13th, 2008

AMD today announced general availability of the next generation 45nm AMD Opteron Quad core processors. The official press release is available here.

Major improvements in the new Opteron architecture (code named Shanghai) in this release are as follows:

  • 45nm Manufacturing Process
  • Larger Cache
  • Support for DDR2 800 MHz
  • Upcoming enhancements for HyperTransport 3.0 (HT3)
  • Other micro-architecture enhancements like
    • AMD SmartFetch
    • AMD CoolCore
  • Maintains platform compatibility leading to better return on investment.

What benefits can you expect from the new processor?

Energy Efficiency without sacrificing Performance: The new generation of AMD Opteron processors utilize the latest AMD 45nm manufacturing process. This process allows greater clock speeds leading to higher core frequencies without dissipating too much heat. As per AMD’s announcement, new generation processors deliver 35% more performance while drawing up to 35% less power. The new manufacturing process also allows much higher clock speeds than the previous generation quad core processors. Over all, AMD Opteron processors combined with support for DDR2 memory offers platform level energy efficiency and 100% x86 compatibility.

Improved Application Performance: The latest generation processors feature two major enhancements affecting application performance: DDR2 800 MHz support and larger cache. The latest AMD Opteron (Shanghai) improves on the previous generation of AMD Opteron processors (Barcelona) with the support of 800 MHz DDR2 memory. This memory technology offers improved memory bandwidth over the previous generation of processors and offer much better energy efficiency than Fully Buffered DIMM (FB-DIMM) technology. A 200% increase in L3 cache to 6 MB benefits a number of applications across verticals, like databases, virtualiztion, JAVA applications, scientific applications, media applications and more. A faster memory bus combined with a larger cache with out complicated prefetching and snooping algorithms offers overall application efficiency.

HyperTransport 3.0 (HT3) Support: AMD Opteron processors provide unparalleled scalability and aggregate memory bandwidith by employing AMD DirectConnect architecture with HyperTransport. Previous generations offered a 1GHz HyperTransport link among the processors. Next generation enhancements planned for Q2 2009 include support for coherent HyperTransport 3.0 offering up to 17.6 GB/s of bandwidth for inter-processor communication. cHT3 will enahce platform scalability for systems featuring 4 or more AMD Opteron processors and will enahce application performance for DP platforms.

Micro-architecture Enhancements: The next generation 45nm AMD Opteron processor (Shanghai) also features enhancement to numerous other micro-architectural features. Some are listed below:

AMD SmartFetch: This technology allows cores to enter in to a “halt” state when the processor core becomes idle. In a ‘halt” state, the processor does not consume power and enhances power efficiency. This technology does not affect application performance in any way thus offering better power efficiency with no penalties on performance.

AMD CoolCore: This technology allows powering down selected sections of the processors. When a particular section is not being used, that section will be powered down in order to enhance power efficiency.This technology does not affect application performance in any way thus offering better power efficiency with no penalties on performance.

Enhanced Virtualization Performance: The next generation 45nm AMD Opteron processor (code named Shanghai) offers unsurpassed enhancements in virtualization performance. Combined with the architectural enhancements like 45nm manufacturing, larger cache, higher frequencies, higher memory bandwidth, cHT3 support, the new processor delivers faster “world switch” time enhancing virtual machine efficiency. AMD’s innovating AMD-V featuring Rapid Virtualization Indexing reduces overhead associated with software virtualization. L3 cache index disable proivdes improved data integrity as well.

With 45 nm AMD Opteron quad core processor (Shanghai), AMD continues to build on its platform strengths while addressing certain drawbacks in Barcelona processors. AMD Opteron (Shanghai) can be used on all systems, with a BIOS upgrade, supporting the Barcelona processors. Customers can avail themselves of the new processors with a simple in socket upgrade without the associated costs of a total hardware replacement. Application software will instantly experience the performance enhancements that come with 45nm AMD Opteron Shanghai processors.

HPC Systems, Inc. a platinum partner of AMD now supports 45nm AMD Opteron Shanghai processors across the product line. Systems featuring the latest generation of AMD Opteron processors are immediately available.

Read the press release here.

Formatting large volumes with ext3

November 7th, 2008

In RedHat 5.1, the maximum file system size is increased to 16 TB from 8TB. However, getting mkfs to format a volume larger than 2 TB is not straight forward.

We do  ship large volumes to customers regularly. We recommend that customers use XFS for large volumes for performance and size considerations. However, sometimes customers want only ext3 because of the familiarity with the file system.

Before being able to format a volume,  you must be able to create a volume greater than 2 TB. fdisk cannot do this.

You will need to use GNU Parted (parted) to create partitions larger than 2 TB. Details on how to use parted can be found here and here

A simple example of using parted, we assume are working on /dev/sdb of size 10 TB from a RAID controller.

$> parted /dev/sdb

GNU Parted 1.8.9
Using /dev/sdd
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted)

(parted) mkpart primary ext3 10737418240
(parted) print
(parted) quit

A straight forward mkfs command on any volume larger than 2 TB will yield the following error:

mkfs.ext3: Filesystem too large.  No more than 2**31-1 blocks
(8TB using a blocksize of 4k) are currently supported.

A simple workaround is to force mkfs to format the device in spite of the size:

mkfs.ext3 -F -b 4096 /dev/<block device>

mkfs.ext3 -F -b 4096 /dev/<path to logical volume> if you are using LVM

In order to use the above command you need to have e2fsprogs 1.39 or above. The above command also sets block size to 4kb.

You could also use -m0  to set the reserved blocks to zero.

Note that ext3 is not recommended for large volumes. XFS is better suited for that purpose.

Further reading:

RedHat Knowledgebase  Article

 Knowplace

Unixgods

AMD Opteron claims the top 3 spots in 16 core virtualization performance benchmark VMmark

August 27th, 2008

AMD (NYSE: AMD) today announced it has achieved the top spot on the VMware® VMmark virtualization benchmark for x86 servers with the Quad-Core AMD Opteron processor-based HP ProLiant DL585 G5. AMD now holds the top three spots on the 16-core VMmark benchmark. This latest result is further proof that Quad-Core AMD Opteron processors provide a high-performance virtualization solution that allows data center managers to make large-scale virtualization deployments and do so at an attractive price point.

Read more here


WordPress database error COLLATION 'latin1_general_ci' is not valid for CHARACTER SET 'utf8' for query SET NAMES 'utf8' COLLATE 'latin1_general_ci' made by require, require_once, require_once, require_once, require_wp_db, require_once