From: S_JUFFA@IRAV1.ira.uka.de (|S| Norbert Juffa) Newsgroups: comp.sys.ibm.pc.hardware,comp.sys.intel Subject: Compatibility issues Cyrix 486DLC/Intel 486SX w/ regard to NeXTstep OS Message-ID: <1j699uINN8qg@iraul1.ira.uka.de> Date: 15 Jan 1993 12:05:18 GMT Organization: University of Karlsruhe, FRG Lines: 550 Compatibility issues Cyrix Cx486SLC/DLC as compared to the Intel 80486SX There has been quite a bit of discussion here recently about compatibility issues involving the Cyrix Cx486SLC and Cx486DLC processors, in particular about the fact that the NextStep operating system doesn't run on the Cyrix processors for some reason. During the course of this discussion, we have heard *a lot of opinions* (e.g. "Intel sucks", "Cyrix sucks") but only *few facts*. So I thought it might a good idea to throw in a bit of the latter. I'll try to give the facts as accurate as possible, drawing from personal experience and Intel's and Cyrix' literature on the 80486DX/SX and 486DLC/SLC. If you think you have found erroneous information, feel free to contact me: S_JUFFA@IRAVCL.IRA.UKA.DE (Norbert Juffa) NOTE: I have no affiliation whatsoever with either Intel or Cyrix! The Cyrix 486DLC is a replacement chip for the Intel/AMD 80386DX. The Cyrix 486SLC is a replacement chip for the Intel/AMD 386SX. While the internals of the Cyrix 486SLC/DLC are roughly equivalent to those in the Intel 80486SX, the bus interface of these chips is identical to that of the Intel 80386DX and 80386SX CPUs, respectively to allow easy replacement of the Intel CPUs by the Cyrix chips. This also means that the Cx486SLC, as a replacement for the Intel/AMD 80386SX can only address 16 MB of memory. The 486SLC/DLC CPUs have a register set that is identical to that found on the Intel 80486SX. However, there are a few subtle differences in the meaning of certain bits in some systems registers (e.g. cache test registers). These are covered in more details below. The instruction sets of the Intel 486SX and the Cyrix Cx486SCL/DLC are identical. The execution times of specific instructions differ between the two chips, but the overall execution speed (measured in CPI = clocks per instruction) seems to be about same. On both, the Intel 80486SX and the Cyrix 486SLC/DLC, there is *no* on-chip FPU (floating point unit). To add floating point capabilities to a 486SX based system, one would install an 487 'coprocessor', which is basically a 486DX with a slighty different pin-out, or replace the 486SX with an OverDrive processor, a clock-doubled 486DX with the 486SX pinout. With the 486SLC/DLC, one buys a 387 compatible coprocessor to add floating-point capabilities. It is recommended to get a Cyrix coprocessor for this purpose, since these are the fastest387 compatible coprocessors available. Also, Cyrix sells kits consisting of a 486SLC/DLC and a coprocessor that have a favourable value for money ratio. The floating-point performance of a Cyrix 486DLC + Cyrix 83D87 combination is about 50% of that of an Intel 486DX running at the same frequency. The Cyrix 486SLC/DLC have a RISC-like execution unit with a flexible five stage pipeline, just as the 80486SX has. Unlike the Intel 80486, which has an 8 kB, 4-way associative cache on chip, the Cx486SLC/DLC only have an 1 kB, 2-way associative cache (the cache on the Cyrix chips can also be configured to be of the direct mapped type). The 486DLC provides up to 80% more integer performance than a 386DX at the same clock frequency, with the average performance gain being about 35%. With the 1 kB on-chip cache enabled, the 486DLC provides about 75% of the integer performance of a 486SX at the same clock frequency. With the cache disabled, the 486DLC provides about 65% of the integer performance of a 486SX. The lower performance of the Cyrix 486DLC as compared to the Intel 80486SX is mostly due to the slow 386DX bus interface the 486DLC uses, which is up to 2 times slower than the 486 bus interface. Some additional performance penalty is imposed by the smaller cache on the 486SCL/DLC, which provides significantly lower hit rates than the 8 kB cache of the 80486SX. I have personally used the Cyrix 486DLC with my 33.3/40 MHz 386 motherboard, which uses the Forex chip set. I have also used the Intel RapidCAD and the C&T 38600DX with this board. These are also replacement chips for the 386DX. Replacing the 386DX is very easy: Just pull out the AMD/Intel 386DX, then plug in the replacement chip (here: the Cyrix 486DLC). I haven't had *major* problems with either of the available replacement chips. The problems encountered using the Cyrix 486DLC were: 1) When a Cyrix EMC87, Cyrix 83D87 (chips produced prior to November 1991), or IIT 3C87 coprocessor is used with the 486DLC, the computer locks up completely at times, especially when running protected mode multitasked operating systems, such as Windows 3.1 in enhanced mode. This is caused by problems with the FSAVE and FRSTOR instructions when using these coprocessors. Cyrix tells me that this problem only occurs with first generation 486DLCs (read: sample chips like the one I have) and that the bug is fixed on the chips that are now available to OEMs and end users. 2) When using the DBOS 1.0 DOS-extender delivered with the Salford FTN/386 Fortran compiler, the executable of the DODUC benchmark produced by that compiler aborts with a general protection fault. The DODUC executable runs fine with the DBOS 1.0 DOS-extender on the Intel 386DX, C&T 38600DX, Intel RapidCAD, and Intel 80486DX. I have informed Cyrix of the problem. As for the problems with NextStep on the 486DLC, I have no idea what causes them. I can think of the following possibilities: 1) NextStep has been tailored extremely close to the 486 programming model, not allowing for even slight changes in the architecture (e.g. smaller cache), so that the subtle changes needed to adapt the different HW of the Cyrix 486SLC/DLC to the 486 programming model can not be accomodated. 2) NextStep includes code that only runs because it uses officialy undocumented features of the 80486 that have not been disclosed by Intel to other vendors. 3) NextStep includes code that only runs correctly on the 80486 by accident. E.g. it could mask the contents of an system register and erroneously include a bit that is undefined as per Intel's documentation. This undefined bit could then be '1' on the 80486 and '0' on the 486SLC/DLC, for example, thus leading to corruption of the system further down the execution path. 4) For correct execution, NextStep relies on the timing of certain instructions that execute slower or faster on the Cx486SLC/DLC than they do on the Intel 80486SX (a chip that reportedly runs NextStep). 5) NEXT Corporation used an early and possibly buggy sample chip to do their compatibility testing. 6) There is a bug in the Cyrix 486SLC/DLC that only creeps up if protected mode system level programs are used, similar to the problem I encountered with the DBOS 1.0 DOS-extender that is described above. However, it is interesting to note that several 32-bit operating systems have been successfully tested on the 486SLC/DLC (see below). Summary of Intel 486SX / Cyrix 486SLC/DLC implementation details Intel 486SX bus interface: supports burst mode memory accesses with the first memory access taking two clock cycles and subsequent accesses taking only one clock cycle. prefetch queue: 32 bytes on-chip cache: 8 KByte unified (code and data) write-through cache. The cache is 4-way set-associative, with 128 sets consisting of four cache lines each. Every cache line consists of 16 bytes. Four write buffers. Hit rate: ~95% Invalidation of cache lines: total cache line execution unit: RISC-like execution unit with five stage pipeline. Barrel shifter. Conditional jump taken/not taken: 3/1 clock cycles. Instructions that can be executed in 1 clock cycle if the destination is a register and the source is either a register or an immediate value: ADC,ADD,AND,BSWAP,CMP,DEC,INC,MOV,NEG,NOT,OR,POP,PUSH,SBB, SUB,TEST,XOR Cyrix 486DLC bus interface: Cx486SLC/DLC uses same the same bus interface as the Intel 386DX/386SX. Highest speed at which memory is accessed is two clock cycles per memory access, there is *no* burst mode. Seven additional signals have been assigned to pins that are not connected on the 386DX/ 386SX. After power-on or reset, these pins are also electrically disabled on the Cx486SLC/DLC and must be specifically enabled by software. Signals added are used for cache management (KEN#, FLUSH#, RPLSET and RPLVAL#), power management (SUSP#, SUSPA#), and A20 control (A20M#). Each signal can be enabled/disabled independently of the enable/disable status of the other signals. instruction set: complete Intel 486SX instruction set, including *all* 486 specific instructions: WBINVD (write back and invalidate data cache), XADD (exchange and add), CMPXCHG (compare and exchange), BSWAP (Byte Swap), INVLPG (Invalidate TLB entry), INVD (Invalidate Data Cache) prefetch queue: 16 bytes on-chip cache: 1 KByte unified (code and data) write-through cache. The cache is 2-way set-associative, with 128 sets consisting of two cache lines each. Every cache line consists of 4 bytes. Two write buffers. Hit rate: ~65% Invalidation of cache lines: single bytes in cache line The cache is disabled after power-on or reset for compatibility reasons and must be enabled by software. Under DOS, you can use a program provided by Cyrix for this purpose. As far as I know, there are no programs available yet for OS/2 and Unix that enable the cache. execution unit: RISC-like execution unit with five stage pipeline. Barrel shifter. 16x16 bit hardware multiplier (16x16 bit multiply: 3 cycles, 32x32 bit multiply: 7 cycles, AAD: 4 cycles). Conditional jump taken/not taken: 6/1 clock cycles. Instructions that can be executed in 1 clock cycle if the destination is a register and the source is either a register or an immediate value: ADC,ADD,AND,CDQ,CLC,CLD,CMC,CMP,CWD,DEC,INC,MOV,MOVSX, NEG, NOT,OR,SBB,SHLD,SHRD,STC,STD,SUB,TEST,XOR Summary of known compatiblity issues The following is an extract from the Cx486SLC and Cx486DLC Compatibility Report, Cyrix Corporation 1992, Order No. 94074-00, with some additional information added by me that has been taken from the Cyrix Cx486SLC Microprocessor Data Sheet, Cyrix Corporation 1991, Order No. 94073-00, the i486 Microprocessor Hardware Reference Manual, Intel Corporation, Order No. 240552-001, and the i486 Microprocessor Programmer's Reference Manual, Order No. 240486-001. SUBSTANTIVE DIFFERENCES - (SOFTWARE) SS-1 Description The TR4 cache test register holds the cache tag address, valid bits and LRU bits for the current cache test operation. The TR5 cache test register defines the cache line, cache set and control bits for the cache test operation. Since the cache size and organization differ between the Cx486SLC/DLC and the 80486, TR4 and TR5 have similar but not identical bit definitions on the Cx486SLC/DLC and the 80486. Analysis Cache test and diagnostic software - if written to explicitly depend on the cache size and organization of the 80486 - may produce unexpected results when run on a Cx486SLC/DLC. The results of the programs typically have no effect on operating systems or applications software. For proper test or diagnosis of the Cx486SLC/DLC cache, software should be used which is specifically written to comprehend the Cx486SLC/DLC. 80486SX 31 11 10 9 7 6 3 2 0 +------------------------------------------+---+--------+-------+--------+ TR4 | Tag | V | LRU | Valid | Unused | +------------------------------------------+---+--------+-------+--------+ V This is the valid bit for the particular cache line which was accessed. On a cache lookup, it is a copy of one of the bits reported in bits 3..6, which are the valid bits for all four cache lines in the selected set. On a cache write, it becomes the new valid bit for the particular cache line selected within the selected set. LRU On a cache lookup, these are the three LRU bits of the set which was accessed. On a cache write, these bits are ignored; the LRU bits in the cache are updated by the pseudo-LRU cache replacement algorithm. LRU bit 0 (TR4 bit 7) indicates which group of two cache lines in the set contains the cache line that has been least recently used. The bit is clear when the least recently used line is either line 0 or line 1, and is set when the least recently used line in the set is either line 2 or line 3. LRU bit 1 (TR4 bit 8) and LRU bit 2 (TR4 bit 9) indicate which of the two lines in the group of lines selected by LRU bit 0 is the least recently used, where LRU bit 1 indicates either line 0 (bit=0) or line 1 (bit=1) and LRU bit 2 indicates either line 2 (bit=0) or line 3 (bit=1) has been least recently used. A real LRU replacement algorithm would have to use 5 bits. Valid On a cache lookup, these are the four Valid bits of the set which was accessed, where each bit corresponds to one of the four cache lines in the set. 486SLC/DLC 31 9 8 7 6 3 2 0 +------------------------------------------------+-+-----+-------+-------+ TR4 | Tag |U| LRU | Valid | 0 0 0 | +------------------------------------------------+-+-----+-------+-------+ U bit 8 is unused. LRU On a cache lookup, this is the LRU bit associated with the cache set. On a cache write, this bit is ignored. Bit=0 means line 0 in the selected set has been least recently used, bit=1 means line 1 in the selcted set has been least recently used. Valid On a cache lookup, these are the four valid bits for the particular cache line accessed (one bit per byte in the cache line). On a cache write these are the valid bits written into the line. 80486SX 31 11 10 4 3 2 1 0 +--------------------------------------+------------------+-------+------+ TR5 | Unused | Set Select | Entry | Ctrl | +--------------------------------------+------------------+-------+------+ Set Select Selects one of the 128 sets of the cache. Entry Selects one of the four cache lines within the selected set. Ctrl 00 write to cache fill buffer, or read from cache read buffer 01 perform cache write 10 perform cache read 11 flush cache (mark all entries invalid) 486SLC/DLC 31 11 10 4 3 2 1 0 +--------------------------------------+------------------+-+-----+------+ TR5 | Unused | Set Select |U| Ent | Ctrl | +--------------------------------------+------------------+-+-----+------+ Set Select Selects one of the 128 sets of the cache. U bit 3 is unused Entry Selects one of the two cache lines within the selected set. Ctrl 00 ignored 01 perform cache write 10 perform cache read 11 flush cache (mark all entries invalid) SS-2 Description The 80486 NW (not write-through) bit in CR0 disables 80486 write-through capability. If the cache disabled bit is on, a write occurs to a cache-hit location, and NW is a 1, then the 80486 does not perform an external write bus cycle. This bit is not available on the Cx486SLC/DLC and is fixed at zero. Analysis The NW bit on the 80486 allows for a capability of self-contained processing once a program has been loaded into the cache and the cache disabled. Programs that use this feature will work on the Cx486SLC/DLC with writes happening on external write bus cycles. SS-3 Description On systems with hardware FPUs, whose FPU ERROR signal is routed to the CPU ERROR signal (NE bit set on the 80486DX), a floating point error is normally acknowledged by the CPU upon execution of the next floating point instruction. If the next floating point instruction is a load single or load double precision that would have generated a General Protection (GP) fault, it is possible for the Cx486SLC/DLC to acknowledge the GP fault before the coprocessor error fault. The 80486 acknowledges the coprocessor error first. Analysis This condition (FPU ERROR connected to CPU ERROR) does not occur in PC compatible designs. INFORMATIONAL DIFFERENCES - (SOFTWARE) IS-1 Description Certain 80486 flag bits in the flags register are documented by Intel as undefined after execution of certain instructions. Testing at Cyrix has shown that the final states of theses flag bits are in fact unpredictable. The Cx486SLC/DLC leaves the flag bit values unmodified after execution of the same instructions. Analysis Since the flag bits are documented by Intel to be undefined after certain operations, software can not reliably use the resulting flag bit values. IS-2 Description Early revision 80486SX CPUs have a programmable Numeric Exception control bit in control register CR0 (bit 28). This bit was intended to control whether numeric execptions are handled internally (NE=1) or driven externally on a discrete CPU pin (NE=0). On these 80486SXs, the NE bit can be set to a one even though numeric execptions can not be handled internally due to the fact that no coprocessor exists. Reading the NE bit on the coprocessor exists. Reading the NE bit on the Cx486SLC/DLC always returns a zero indicating that numeric exceptions are always handled externally. Analysis Since the Cx486SLC/DLC does not have an on-board floating point unit, the coprocessor interface (including numeric exception signaling) operates in a fashion compatible with the 80386.The Cx486SLC/DLC and 80386 use an external coprocessor which generates the numeric exception and always return zero when the NE bit is read. IS-3 Description When trying to reference CR1 in protected mode while not at the highest privilege level (level 0), the 80486 generates an Invalid Opcode fault, whereas the Cx486SLC/DLC generate a General Protection (GP) fault. Analysis The Cx486SLC/DLC and 80486 do not define the bits in the CR1 register. Since there are no valid bits in the CR1 register, any exception taken, whether it is a GP fault or Invalid Opcode fault, will signal that an invalid operation has taken place. IS-4 Description When using the Translation Lookaside Buffer (TLB) test registers, the undefined bits in TR7 may differ between the 80486 and the Cx486SLC/DLC when a look-up miss (TR7 bit 4 is clear) occurs. This includes the REP field (bits 2-3). Analysis The majority of the bits in TR7 are documented by Intel to be undefined after a TLB look-up miss. Therefore, software programs can not reliably use the resulting values of these undefined bits. IS-5 Description Cx486SLC/DLC reads and writes to Debug Register 4 (DR4) and Debug Register 5 (DR5) result in accesses to Debug Register 6 (DR6) and Debug Register 7 (DR7), respectively. Accessing DR4 and DR5 on the 80486 produces an Invalid Opcode fault. Analysis DR4 and DR5 are documented as undefined by Intel on the 80486. Since the results are undefined, software programs can not reliably use the register results. IS-6 Description Writing duplicate TLB tags using the TLB test registers generates different results on the Cx486SLC/DLC than on the 80486 when the duplicate address is looked up. The results of writing duplicate TLB tags is documented as undefined by Intel. Analysis Writing duplicate TLB tags using the TLB test registers is an unsupported operation. The Cx486SLC/DLC and 80486 return undefined results when looking up the resulting address. Since the results are undefined, software programs can not reliably use the register results. IS-7 Description The 80486 imposes a performance penalty in order to report debug faults precisely. The Cx486SLC/DLC reports debug faults precisely without a performance penalty (except for a repeated MOVS instruction). Analysis The Cx486SLC/DLC provides superior debugging capability. IS-8 Description The 80486 writes zeroes to the destination register when executing a Bit-Scan Forward (BSF) instruction if all zeroes are found in the specified bit map. The Cx486DLC/DLC leaves the destination register unchanged under this condition. Analysis The value in the destination register of a BSF instruction is specified by Intel to be undefined when a one bit is not found in the source operand. Since the results are undefined, software programs can not be reliably use the register results. IS-9 Description Memory versions of the instructions ADC, ADD, AND, DEC, INC, MOVS, NEG, NOT, OR, RCl, ROL, ROR, SAl, SAR, SBB, SUB, SHL, SHLD, SHR, SHRD, XCHG, and XOR read the destination memory, operate on it, and write it back to memory. The Cx486SLC/DLC checks the writability of the destination before performing these instructions. On non-writable locations, the Cx486SLC/ DLC faults before starting the instruction. The 80486 performs the read, sets the read location acessed bit, and modifies the flags before faulting. Analysis By checking the writability first prior to execution of the instruction (at no performance penalty), the Cx486SLC/DLC avoids unnecessary operations. Leaving the accessed bit and flag contents in their original state is prefered if the instruction is restarted. IS-10 Description In the case above, if the read locatuion is also not present, the 80486 will attempt the read, take a page fault, reload the page, restart the instruction, and then take a GP fault. The Cx486SLC/DLC will take a GP fault. Analysis The 80486 wastes time loading the requested page before taking the required GP fault. The GP fault is eventually detected by both the 80486 and the Cx486SLC/DLC. IS-11 Description If a locked instruction accesses a memory page marked as not present, the 80486 reports in the error code that the access type was a write while the Cx486SLC/DLC reports that the access type was a read. Analysis Since the page is not present in either case (read or write), the same page fault is taken by both the Cx486SLC/DLC and the 80486. IS-12 Description When alignment checking is enabled an an ENTER instruction that misaligns the stack is executed, the 80486 generates an alingment check fault even though the misaligned stack has not been accessed. The Cx486SLC/DLC generates the aligment check fault only when the misaligned stack is accessed. Analysis The Cx486SLC/DLC correctly generates an alignment check fault only when a misaligned stack is accessed. The 80486 unnecessarily takes the fault in the case described. IS-13 Description When executing a REP LOOPE (repeated loop while equal) instruction, the 80486 does not perform the "if equal" function of the instruction. The Cx486SLC/DLC does perfrom the "if equal" check under the same circumstances. Analysis The 80486 execution should be considered incorrect. The Cx486SCL/DLC correctly executes this instruction sequence. IS-14 Description The 80486 incorrectly asserts the LOCK# pin while enterinf the illegal instruction exception handler when using the LOCK prefix on instructions other than those allowed (Only BTS, BTR, BTC, XCHG, INC, DEC, NOT, NEG, ADD, ADC, SUB, SBB, AND, OR, XOR are allowed). The Cx486SLC/DLC correctly does not assert LOCK# in this case. Analysis When using the 80486 in a multi-processor environment, the bus may be locked unnecessarily causing performance degradation. Operating systems/operating environments tested with the Cx486SLC/DLC: Digital Research: Concurrent DOS 386 5.0, DR-DOS 6.0 Ergo: OS/386 IBM: IBM DOS 3.3, IBM DOS 4.0, OS/2 2.0, OS/2 SE 1.3 IGC: VM/386 2.01 Interactive: Interactive Unix 3.2 Mark Williams: Coherent 3.1, Coherent 3.2 Microsoft: MS-DOS 3.3, MS-DOS 4.01, MS-DOS 5.0, Windows 3.0, Windows 3.1 Pharlap: DOS-Extender 286, DOS-Extender 386 Quarterdeck: Desqview 386 2.32 Rational: DOS/4G SCO: SCO Open Desktop, SCO Unix, SCO Xenix 2.3.2c Symantec: Norton Desktop for Windows 1.0 UHC: Developers Environment, Network Module, X11R4/Motif Windowing Module, UNIX Release 4.0 Ver. 3.6