Monday, October 22, 2007

lazy tlb

Processors cannot synchronize their own TLB cache automatically because it is the kernel, and not the hardware, that decides when a mapping between a linear and a physical address is no longer valid.
Linux 2.6 offers several TLB flush methods that should be applied appropriately, depending on the type of page table change (see Table 2-11).
Table 2-11. Architecture-independent TLB-invalidating methods
Method name
Description
Typically used when
flush_tlb_all
Flushes all TLB entries (including those that refer to global pages, that is, pages whose Global flag is set)
Changing the kernel page table entries
flush_tlb_kernel_range
Flushes all TLB entries in a given range of linear addresses (including those that refer to global pages)
Changing a range of kernel page table entries
flush_tlb
Flushes all TLB entries of the non-global pages owned by the current process
Performing a process switch
flush_tlb_mm
Flushes all TLB entries of the non-global pages owned by a given process
Forking a new process
flush_tlb_range
Flushes the TLB entries corresponding to a linear address interval of a given process
Releasing a linear address interval of a process
flush_tlb_pgtables
Flushes the TLB entries of a given contiguous subset of page tables of a given process
Releasing some page tables of a process
flush_tlb_page
Flushes the TLB of a single Page Table entry of a given process
Processing a Page Fault
Despite the rich set of TLB methods offered by the generic Linux kernel, every microprocessor usually offers a far more restricted set of TLB-invalidating assembly language instructions. In this respect, one of the more flexible hardware platforms is Sun's UltraSPARC. In contrast, Intel microprocessors offers only two TLB-invalidating techniques:
All Pentium models automatically flush the TLB entries relative to non-global pages when a value is loaded into the cr3 register.
In Pentium Pro and later models, the invlpg assembly language instruction invalidates a single TLB entry mapping a given linear address.
Table 2-12 lists the Linux macros that exploit such hardware techniques; these macros are the basic ingredients to implement the architecture-independent methods listed in Table 2-11.
Table 2-12. TLB-invalidating macros for the Intel Pentium Pro and later processors
Macro name
Description
Used by
_ _flush_tlb( )
Rewrites cr3 register back into itself
flush_tlb,
flush_tlb_mm,flush_tlb_range
_ _flush_tlb_global( )
Disables global pages by clearing the PGE flag of cr4, rewrites cr3 register back into itself, and sets again the PGE flag
flush_tlb_all,flush_tlb_kernel_range
_ _flush_tlb_single(addr)
Executes invlpg assembly language instruction with parameter addr
flush_tlb_page
Notice that the flush_tlb_pgtables method is missing from Table 2-12: in the 80 x 86 architecture nothing has to be done when a page table is unlinked from its parent table, thus the function implementing this method is empty.
The architecture-independent TLB-invalidating methods are extended quite simply to multiprocessor systems. The function running on a CPU sends an Interprocessor Interrupt (see "Interprocessor Interrupt Handling" in Chapter 4) to the other CPUs that forces them to execute the proper TLB-invalidating function.
As a general rule, any process switch implies changing the set of active page tables. Local TLB entries relative to the old page tables must be flushed; this is done automatically when the kernel writes the address of the new Page Global Directory into the cr3 control register. The kernel succeeds, however, in avoiding TLB flushes in the following cases:
When performing a process switch between two regular processes that use the same set of page tables (see the section "The schedule( ) Function" in Chapter 7).
When performing a process switch between a regular process and a kernel thread. In fact, we'll see in the section "Memory Descriptor of Kernel Threads" in Chapter 9, that kernel threads do not have their own set of page tables; rather, they use the set of page tables owned by the regular process that was scheduled last for execution on the CPU.
Besides process switches, there are other cases in which the kernel needs to flush some entries in a TLB. For instance, when the kernel assigns a page frame to a User Mode process and stores its physical address into a Page Table entry, it must flush any local TLB entry that refers to the corresponding linear address. On multiprocessor systems, the kernel also must flush the same TLB entry on the CPUs that are using the same set of page tables, if any.
To avoid useless TLB flushing in multiprocessor systems, the kernel uses a technique called lazy TLB mode . The basic idea is the following: if several CPUs are using the same page tables and a TLB entry must be flushed on all of them, then TLB flushing may, in some cases, be delayed on CPUs running kernel threads.
In fact, each kernel thread does not have its own set of page tables; rather, it makes use of the set of page tables belonging to a regular process. However, there is no need to invalidate a TLB entry that refers to a User Mode linear address, because no kernel thread accesses the User Mode address space.[*]
[*] By the way, the flush_tlb_all method does not use the lazy TLB mode mechanism; it is usually invoked whenever the kernel modifies a Page Table entry relative to the Kernel Mode address space.
When some CPUs start running a kernel thread, the kernel sets it into lazy TLB mode. When requests are issued to clear some TLB entries, each CPU in lazy TLB mode does not flush the corresponding entries; however, the CPU remembers that its current process is running on a set of page tables whose TLB entries for the User Mode addresses are invalid. As soon as the CPU in lazy TLB mode switches to a regular process with a different set of page tables, the hardware automatically flushes the TLB entries, and the kernel sets the CPU back in non-lazy TLB mode. However, if a CPU in lazy TLB mode switches to a regular process that owns the same set of page tables used by the previously running kernel thread, then any deferred TLB invalidation must be effectively applied by the kernel. This "lazy" invalidation is effectively achieved by flushing all non-global TLB entries of the CPU.
Some extra data structures are needed to implement the lazy TLB mode. The cpu_tlbstate variable is a static array of NR_CPUS structures (the default value for this macro is 32; it denotes the maximum number of CPUs in the system) consisting of an active_mm field pointing to the memory descriptor of the current process (see Chapter 9) and a state flag that can assume only two values: TLBSTATE_OK (non-lazy TLB mode) or TLBSTATE_LAZY (lazy TLB mode). Furthermore, each memory descriptor includes a cpu_vm_mask field that stores the indices of the CPUs that should receive Interprocessor Interrupts related to TLB flushing. This field is meaningful only when the memory descriptor belongs to a process currently in execution.
When a CPU starts executing a kernel thread, the kernel sets the state field of its cpu_tlbstate element to TLBSTATE_LAZY; moreover, the cpu_vm_mask field of the active memory descriptor stores the indices of all CPUs in the system, including the one that is entering in lazy TLB mode. When another CPU wants to invalidate the TLB entries of all CPUs relative to a given set of page tables, it delivers an Interprocessor Interrupt to all CPUs whose indices are included in the cpu_vm_mask field of the corresponding memory descriptor.
When a CPU receives an Interprocessor Interrupt related to TLB flushing and verifies that it affects the set of page tables of its current process, it checks whether the state field of its cpu_tlbstate element is equal to TLBSTATE_LAZY. In this case, the kernel refuses to invalidate the TLB entries and removes the CPU index from the cpu_vm_mask field of the memory descriptor. This has two consequences:
As long as the CPU remains in lazy TLB mode, it will not receive other Interprocessor Interrupts related to TLB flushing.
If the CPU switches to another process that is using the same set of page tables as the kernel thread that is being replaced, the kernel invokes _ _flush_tlb( ) to invalidate all non-global TLB entries of the CPU.

No comments: