PrimePower,
SPARC, and the Advanced Product Line
Mike Scott
In June 2004 Sun Microsystems and Fujitsu ended months of speculation
by announcing a partnership and consolidated roadmap for their products
in an effort to cut research/development and manufacturing costs
of the next generation product line.
It is currently anticipated that Sun will introduce the UltraSPARC
IV+ processor in late 2005, to be followed by "APL" (Advanced Product
Line), the first line of machines developed in partnership with
Fujitsu. Interestingly, this agreement currently only covers the
SPARC-based servers; Sun and Fujitsu will continue to develop and
market their own individual x86-based servers.
Over the past 18 months, I have been in a privileged position
to be able to work with many of the current Sun and Fujitsu products.
This article is intended to introduce the current PrimePower range
to those who are not already familiar with them and to consider
the future of the APL product.
PrimePower
As a seasoned bigot of Sun Microsystems servers, I was deeply
suspicious when we started to receive the Fujitsu-Siemens branded
hardware. However, my initial skepticism has since proved to be
unfounded as, to this day, the equipment continues to be installed,
and it is clear that the product range is well engineered and dependable.
The machines are constructed with rugged metal casings rather
like the NEBS-compliant Netra servers and finished with a smart
grey paintjob. As the servers are all locked away in the data center,
aesthetics are not a primary concern, but it does inspire confidence
in the product to have a "solid" feel about them. A small sector
of the market may be disappointed that the PrimePower range are
not NEBS-certified, but this is of no concern to the vast majority
of customers.
SPARC64
While Sun has a strategic alliance with Texas Instruments to fabricate
their processors, Fujitsu manufactures their own SPARC64 processor.
Any compatibility worries are addressed by SPARC V9 level 2 compliance,
as certified by SPARC international -- it will run any SPARC Solaris
application that will run on the equivalent Sun hardware.
The performance from the Fujitsu CPU is a significant improvement
over currently available processors for Sun hardware. A bonus prize
is also awarded to Fujitsu for having a 1.89-GHz processor where
Sun can only support 1.2 GHz (at the time of writing).
The SPEC CPU2000 speed benchmarks in Figure 1 are geared towards
the performance of a single-processor system, which puts at a disadvantage
the facilities available to Sun's dual-core UltraSPARC IV. However,
even when considering multicore and multiprocessor configuration,
Fujitsu still appears to hold a marginal advantage. Figure 2 shows
the comparison SPEC CPU2000 rate benchmarks between a multiprocessor
PrimePower 900 against a multiprocessor E4900.
With my formative years as a sys admin spent working with IBM
servers, I've always felt that having a display panel on the front
of a machine is a worthwhile addition. I was pleased to see that
Fujitsu have provided exactly this on their midrange and enterprise
servers. As the machine boots, status messages from the POST (power
on self-test) are displayed on the two-line LCD display, giving
a constantly updated status of the component being tested.
How many times have we all seen unlabeled (or mislabeled) equipment
in an unfamiliar datacenter? When the system finishes booting, the
LCD display usefully shows the hostname of the system, thus giving
a valuable verification of system identity. Additionally, by using
the two control buttons beside the LCD panel, a menu system can
be navigated that allows the operator to health-check the hardware
(e.g., AC input and temperature) and force a reset or even a crash
dump (similar to dropping to OBP and issuing the sync command).
PrimePower 2500
The top-of-the-range PP2500 is the notional equivalent of Sun's
F15k/E25k servers. The system cabinet capacity is 16 system boards,
each containing up to 8 of Fujitsu's SPARC64 processors -- giving
a maximum of 128 processors. The machine can be divided into domains,
much like the technology that Sun has had in its product lines since
the early days of the E10k.
What is different here is that a system board can be logically
divided into two, giving 32 "logical" system boards that can be
distributed between a maximum of 15 domains. This implies that a
single physical system board can potentially be a member of two
domains: this isn't a problem -- it just takes a little more thought
and care when designing your system and performing any dynamic reconfiguration
work.
We have been managing two of these beasts for approximately a
year now, and they have thus far proven themselves to be very dependable,
and computationally very quick.
The PP2500 is accompanied by a Systems Management Console (SMC)
much like the E10k System Service Processor (SSP), which provides
management and monitoring facilities. Much like Sun's SSP, Fujitsu's
SMC is standalone workstation that runs some specialized software
for managing and monitoring the PP2500. Although unlike the E10k,
the PP2500 does not depend on the SMC either to boot or for DR capabilities.
One peculiar feature of the PP2500 (and its predecessor PP2000)
range is the lack of a flexible console connection like the Sun
equivalent. Consider the example of the E10k where console services
are provided from the SSP over Ethernet to the control board. The
control board then communicates with the boot processor of any particular
domain via the backplane. Unfortunately, this appears to be a disappointing
omission from the Fujitsu range at present -- there is a console
connection, of course, but it is implemented as clumsy, thick RS/232
serial cables connected to the lowest numbered system board in each
domain.
On the face of it, this doesn't appear to be a big problem, but
one of the benefits of the dynamic system domains is the ability
to reconfigure the number of boards/CPUs/Memory dynamically without
having to visit a potentially remote or difficult-to-access site.
Many organizations running servers of this size also have strict
"secure access" policies regarding entry to data centers and, in
this situation, reducing the need for physical access to the server
can only be a good thing. Reconfiguring or creating a domain could
potentially require a cable to be moved on the server.
That aside, remote console services are provided via an arbitration
daemon running on the SMC that controls access to the Console Connection
Unit. The CCU is essentially just a Network Terminal Server, connecting
the serial cables of the domains to the network. In comparison to
the feature-rich "netcon" of the E10k, this setup appears to be
somewhat Heath Robinson (i.e., madly concocted).
The essentials are taken care of -- a single session can have
read-write access, and there can be multiple simultaneous read-only
sessions (switching of a session between RW and RO is unfortunately
not possible). Read-write sessions can be forcibly terminated for
those situations when your colleague has disappeared off to lunch
with the console locked.
Software
The entire PrimePower range runs Solaris, just like any other
Sun/SPARC product. If you examine a normal Solaris 8 media kit,
you'll notice that the label describes it as "FOR SUN COMPUTER SYSTEMS"
(I transcribe this from a media kit that is conveniently parked
on my desk at the moment). This label is the only externally noticeable
difference between the Sun and Fujitsu media kits. Solaris 9 08/03
is a step in the right direction; on this release, Sun has provided
the necessary OEM code to support PrimePower.
Disappointingly, although we have OEM support on the Sun media
kit, it is still necessary to use Fujitsu-qualified patches, downloaded
from Fujitsu's Web site, rather than from the usual source at SunSolve.
When using the operating system, there are very few clues to tell
you that you're not using a Sun machine -- uname -i shows
that the Fujitsu hardware is identified by a prefix of "FJSV", followed
by a code to identify the model of the machine. Here is the output
from a PP2500 domain:
# uname -i
FJSV,GPUZC-L
Fujitsu also bundles a plethora of extra packages to cover diverse
functions such as crash dump analysis, performance monitoring, and
management (see Figure 3). Included in the bundled software is very
flashy looking, Java-based, Web-based management GUI. I, however,
am not a particular fan of graphical interfaces and found it intolerably
slow. Additionally, I hear on the grapevine that Sun is bundling "Webmin"
with Solaris 10, which sounds like a much more comprehensive solution
for those who crave the ability to reboot the wrong machine with a
slip of the mouse.
Resource Management
Sun has for a number of years touted their "Solaris Resource Manager"
(SRM) product. This product allows more focused management of resources
within the operating system; for example, it allows an operator
to assign a maximum percentage of CPU or memory that an application
or user can occupy. Since its initial introduction, SRM has been
integrated into Solaris 9 and is now part of the standard install.
In fact, SRM is based on technology that was developed by a company
called Aurema -- Sun established a relationship and licensed the
Aurema "ShareII" technology for integration into the operating system.
Fujitsu appears to have taken this one step further -- they have
maintained a relationship with Aurema, and they distribute the latest
of its ARMTech software with the PrimePower servers.
Fundamentally, ARMTech is a dynamic resource manager, whereas
SRM is a static tool. ARMTech introduces the concept of a Resource
Consumer that may be a Solaris User, Group of Users, or an Application
(and any hierarchy including all of these).
An Application Resource Consumer is a defined executable, which
may be qualified with command-line switches or environmental parameters
that are specified at invocation. This makes it easy for ARMTech
to resource manage (e.g., multiple Oracle instances in a single
Solaris Operating Environment). ARMTech allows for resource reservation
(a guarantee of a minimum amount of resource), hard and soft caps
(a limit on the amount of resource consumed), and sharing (a relative
allocation of resource dependant on active resource consumers).
The value of ARMTech is that these settings can be changed dynamically
without the need to restart either ArmTech or the Solaris operating
system.
Teleservice
Teleservice is available as an optional component to the SMC.
This allows a modem to be connected to the SMC workstation that
is capable of dialing home in case of a reported failure. This has
the potential to be an invaluable service for organizations that
run mission-critical servers but don't have the luxury of 24x7 operations
staff.
One neat piece of functionality that teleservice also provides
is the ability to enable remote access for remote vendor support.
The obvious security problems associated with such a configuration
have been addressed by making the dial-in account password expire
30 minutes after it has been set.
When a Fujitsu engineer requires access, he telephones the customer
and asks for the password to be reset. The sys admin performs this
with the "teleadm" menu interface, which handles the detail of generating
and setting a valid password, which can then be communicated to
the engineer. However, it would appear that token-card authentication
systems, such as RSA securID, is not yet supported for extra security.
Of course, for obvious reasons with a single-modem setup like this,
the teleservice is unable to dial home in the event of a problem
for the duration that the engineer is connected (see Figure 4).
APL
By mid-2006, APL will replace the existing Sun Fire and PrimePower
product lines. According to information from Fujitsu, the design
of the low-end hardware (workstations and entry-level servers) is
to be entirely provided by Sun, using Sun's forthcoming Niagara
processor. The mid-range servers are to couple Fujitsu processors
(multicore SPARC64 VI due in early 2006) and Sun system boards.
The engineering of the enterprise-class server line will be provided
entirely by Fujitsu (also using the SPARC64) -- suggesting the high-end
servers will be the next generation of the current PP2500.
When the servers are launched, it is currently anticipated that
the product will continue to be branded to the vendor, regardless
of the production facility where it was manufactured. Fujitsu and
Sun will each receive revenue based on the engineering that each
partner has contributed to the product. Brian Sutphin, Senior VP
of Corporate Development at Sun says: "If a system is sold by Sun
that contains components developed by Fujitsu, there's going to
be revenue to Fujitsu on that sale". We hear that APL support can
be provided by either company. In fact, even today, Fujitsu is authorized
to sell and support the entire Sun product range.
Production Facilities
Sun currently has two manufacturing facilities -- one here in
Scotland and one in Oregon (alas, the Newark plant has closed its
manufacturing facilities in recent times). Current speculation about
the future of APL suggests that manufacturing will be shared among
Sun's existing manufacturing facilities and a Fujitsu plant near
Tokyo.
It would make sense for the facilities to be capable of production
to accommodate the needs of their geographic responsibilities (Tokyo
for APAC, Linlithgow for EMEA, and Hillsboro for the Americas);
however, recent press releases suggest that the high-end APL products
(presumably the successor to the PP2500) will be manufactured exclusively
in Fujitsu's Tokyo plant.
This appears to be a single point of failure. Sun was in a similar
position when the E10k product was produced exclusively in Oregon
-- an issue that Sun overcame by building a $57m extension to the
Scottish production facility and ramping up production of the F15k
range at that site also. It is possible that this plan is designed
to reduce the learning curve when they roll out production to other
manufacturing sites; however, it remains to be seen how Fujitsu
will address the risk of disaster recoverability on this site.
Conclusion
Overall, I believe that the partnership will bring together the
very best of these two organizations. Fujitsu has some innovative
engineering to help keep SPARC/Solaris competitive, and Sun has
a long and successful history of developing and supporting the Solaris
environment. The current PrimePower range has proved to us to be
a solid and dependable line of systems with an exceptional price/performance.
References
Standard Performance Evaluation Corporation -- http://www.spec.org/
Developers of ARMTech software -- http://www.aurema.com/
Mike Scott is the director of Hindsight IT Ltd, a small Solaris
consultancy based in Central Scotland. He has been working in the
North East and the central belt for the past 10 years, specializing
in systems management with a keen interest in security and performance
management. He can be contacted at: sysadmin@hindsight.it.
|