Ravi Kumar Arimilli - Austin TX, US Robert Alan Cargnoni - Austin TX, US Hugh Shen - Austin TX, US Derek Edward Williams - Austin TX, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 12/12
US Classification:
711133, 710 54, 711122, 711145, 712225
Abstract:
A method and processor system that substantially enhances the store gathering capabilities of a store queue entry to enable gathering of a maximum number of proximate-in-time store operations before the entry is selected for dispatch. A counter is provided for each entry to track a time since a last gather to the entry. When a new gather does not occur before the counter reaches a threshold saturation point, the entry is signaled ready for dispatch. By defining an optimum threshold saturation point before the counter expires, sufficient time is provided for the entry to gather a proximate-in-time store operation. The entry may be deemed eligible for selection when certain conditions occur, including the entry becoming full, issuance of a barrier operation, and saturation of the counter. The use of the counter increases the ability of a store queue entry to complete gathering of enough store operations to update an entire cache line before that entry is dispatched to an RC machine.
System And Method Of Re-Ordering Store Operations Within A Processor
Guy Lynn Guthrie - Austin TX, US Hugh Shen - Austin TX, US William John Starke - Round Rock TX, US Derek Edward Williams - Austin TX, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 12/00
US Classification:
711158, 710 40
Abstract:
A system and method for re-ordering store operations from a processor core to a store queue. When a store queue receives a new processor-issued store operation from the processor core, a store queue controller allocates a new entry in the store queue. In response to allocating the new entry in the store queue, the store queue controller determines whether or not the new entry is dependent on at least one other valid entry in the store queue. In response to determining the new entry is dependent on at least one other valid entry in the store queue, the store queue controller inhibits requesting of the new entry to the RC dispatch logic until each valid entry on which the new entry is dependent has been successfully dispatched to an RC machine by the RC dispatch logic.
System And Method For Completing Updates To Entire Cache Lines With Address-Only Bus Operations
Ravi Kumar Arimilli - Austin TX, US Guy Lynn Guthrie - Austin TX, US Hugh Shen - Austin TX, US Derek Edward Williams - Austin TX, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 12/12
US Classification:
711118, 711122, 711133, 711141, 710310
Abstract:
A method and processor system that substantially eliminates data bus operations when completing updates of an entire cache line with a full store queue entry. The store queue within a processor chip is designed with a series of AND gates connecting individual bits of the byte enable bits of a corresponding entry. The AND output is fed to the STQ controller and signals when the entry is full. When full entries are selected for dispatch to the RC machines, the RC machine is signaled that the entry updates the entire cache line. The RC machine obtains write permission to the line, and then the RC machine overwrites the entire cache line. Because the entire cache line is overwritten, the data of the cache line is not retrieved when the request for the cache line misses at the cache or when data goes state before write permission is obtained by the RC machine.
Method For Priority Scheduling And Priority Dispatching Of Store Conditional Operations In A Store Queue
Guy Lynn Guthrie - Austin TX, US Hugh Shen - Austin TX, US Derek Edward Williams - Austin TX, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 12/00
US Classification:
711158, 711117, 711118, 711128, 711151
Abstract:
A method, system, and processor chip design for reducing the latency between completing a LARX operation and receiving the associated STCX operation to complete the update to the cache line. Each entry of the store queue of the issuing processor is provided an additional tracking bit (priority bit). The priority bit is set whenever a STCX operation is placed within the entry. During selection of an entry for dispatch by the arbitration logic, the arbitration logic scans the value of the priority bits of each eligible entry. An entry with the priority bit set is given priority in the selection process within architectural rules. That entry is then selected for dispatch as early as is possible within the established rules.
Processor, Method, And Data Processing System Employing A Variable Store Gather Window
Hugh Shen - Austin TX, US Jeffrey Adam Stuecheli - Austin TX, US Derek Edward Williams - Austin TX, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 12/08
US Classification:
711154, 711125
Abstract:
A processor includes at least one instruction execution unit that executes store instructions to obtain store operations and a store queue coupled to the instruction execution unit. The store queue includes a queue entry in which the store queue gathers multiple store operations during a store gathering window to obtain a data portion of a write transaction directed to lower level memory. In addition, the store queue includes dispatch logic that varies a size of the store gathering window to optimize store performance for different store behaviors and workloads.
Guy L. Guthrie - Austin TX, US Jeffrey W. Kellington - Austin TX, US Kevin F. Reick - Round Rock TX, US Hugh Shen - Austin TX, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 12/00
US Classification:
711144, 711154, 711122
Abstract:
An efficient system for bootstrap loading scans cache lines into a cache store queue during a scan phase, and then transmits the cache lines from the cache store queue to a cache memory array during a functional phase. Scan circuitry stores a given cache line in a set of latches associated with one of a plurality of cache entries in the cache store queue, and passes the cache line from the latch set to the associated cache entry. The cache lines may be scanned from test software that is external to the computer system. Read/claim dispatch logic dispatches store instructions for the cache entries to read/claim machines which write the cache lines to the cache memory array without obtaining write permission, after the read/claim machines evaluate a mode bit which indicates that cache entries in the cache store queue are scanned cache lines. In the illustrative embodiment the cache memory is an L2 cache.
Reducing Number Of Rejected Snoop Requests By Extending Time To Respond To Snoop Request
A cache, system and method for reducing the number of rejected snoop requests. A “stall/reorder unit” in a cache receives a snoop request from an interconnect. The snoop request is entered in the first available latch of the stall/reorder unit unless the stall/reorder unit is full in which case the new snoop request is transmitted to a second unit configured to transmit a request to retry resending the new snoop request. Snoop requests have a higher priority than requests from processors and snoop requests are selected by the arbitration mechanism over processor requests unless the arbitration mechanism requests otherwise (“stall request”) to the stall/reorder unit. By snoop requests having a higher priority than processor requests, the number of snoop requests rejected is reduced. By having the arbitration mechanism issue a stall request, the processor will not be starved.
Reducing Number Of Rejected Snoop Requests By Extending Time To Respond To Snoop Request
A cache, system and method for reducing the number of rejected snoop requests. An incoming snoop request is entered in the first available latch in a pipeline of latches in a stall/reorder unit if the stall/reorder unit is not full. The entered snoop request is dispatched to a selector upon entering a bottom latch in the pipeline. The stall/reorder unit is not informed as to whether the dispatched snoop request is accepted by an arbitration mechanism for several clock cycles after the dispatch occurred. A copy of the dispatched snoop request is stored in a top latch in an overrun pipeline of latches in the first unit upon dispatching the snoop request. By maintaining information about the snoop request, the snoop request may be dispatched again to the selector in case the dispatched snoop request was rejected thereby increasing the chance that the snoop request will ultimately be accepted.
The University of Texas at Austin 1996 - 2000
Bachelors, Bachelor of Science, Electrical Engineering, Electronics Engineering
Skills:
Soa Unix Solution Architecture Enterprise Architecture Integration System Architecture Websphere Application Server Software Development Solaris Aix Requirements Analysis Ibm Aix Unix Shell Scripting Perl Debugging Computer Architecture Linux C C++ Software Engineering Vhdl Hardware Architecture Hardware