c++ - algoritmo - lru cache python

Diseño de caché LRU (7)

¿El caché es una estructura de datos que admite el valor de recuperación por clave como la tabla hash? LRU significa que la memoria caché tiene cierta limitación de tamaño que necesitamos eliminar las entradas menos utilizadas periódicamente.

Si implementa con lista enlazada + hashtable de punteros, ¿cómo puede hacer O (1) recuperación de valor por clave?

Implementaría LRU caché con una tabla hash que el valor de cada entrada sea value + punteros a prev / next entry.

En cuanto al acceso de subprocesamiento múltiple, preferiría el bloqueo lector-escritor (idealmente implementado por bloqueo de giro ya que la contención suele ser rápida) para monitorear.

La memoria caché utilizada menos recientemente (LRU) es descartar primero los elementos usados menos recientemente. ¿Cómo se diseña e implementa una clase de caché de este tipo? Los requisitos de diseño son los siguientes:

1) encuentra el artículo lo más rápido que podamos

2) Una vez que una memoria caché falla y una memoria caché está llena, debemos reemplazar el elemento utilizado menos recientemente lo más rápido posible.

¿Cómo analizar e implementar esta pregunta en términos de diseño de patrones y algoritmos?

Aquí está mi implementación para un caché LRU simple y básico.

//LRU Cache #include <cassert> #include <list> template <typename K, typename V > class LRUCache { // Key access history, most recent at back typedef std::list<K> List; // Key to value and key history iterator typedef unordered_map< K, std::pair< V, typename std::list<K>::iterator > > Cache; typedef V (*Fn)(const K&); public: LRUCache( size_t aCapacity, Fn aFn ) : mFn( aFn ) , mCapacity( aCapacity ) {} //get value for key aKey V operator()( const K& aKey ) { typename Cache::iterator it = mCache.find( aKey ); if( it == mCache.end() ) //cache-miss: did not find the key { V v = mFn( aKey ); insert( aKey, v ); return v; } // cache-hit // Update access record by moving accessed key to back of the list mList.splice( mList.end(), mList, (it)->second.second ); // return the retrieved value return (it)->second.first; } private: // insert a new key-value pair in the cache void insert( const K& aKey, V aValue ) { //method should be called only when cache-miss happens assert( mCache.find( aKey ) == mCache.end() ); // make space if necessary if( mList.size() == mCapacity ) { evict(); } // record k as most-recently-used key typename std::list<K>::iterator it = mList.insert( mList.end(), aKey ); // create key-value entry, linked to the usage record mCache.insert( std::make_pair( aKey, std::make_pair( aValue, it ) ) ); } //Purge the least-recently used element in the cache void evict() { assert( !mList.empty() ); // identify least-recently-used key const typename Cache::iterator it = mCache.find( mList.front() ); //erase both elements to completely purge record mCache.erase( it ); mList.pop_front(); } private: List mList; Cache mCache; Fn mFn; size_t mCapacity; };

Esta es mi implementación sencilla de ejemplo de C ++ para el caché LRU, con la combinación de hash (unordered_map) y una lista. Los elementos en la lista tienen una clave para acceder al mapa, y los elementos en el mapa tienen un iterador de lista para acceder a la lista.

#include <list> #include <unordered_map> #include <assert.h> using namespace std; template <class KEY_T, class VAL_T> class LRUCache{ private: list< pair<KEY_T,VAL_T> > item_list; unordered_map<KEY_T, decltype(item_list.begin()) > item_map; size_t cache_size; private: void clean(void){ while(item_map.size()>cache_size){ auto last_it = item_list.end(); last_it --; item_map.erase(last_it->first); item_list.pop_back(); } }; public: LRUCache(int cache_size_):cache_size(cache_size_){ ; }; void put(const KEY_T &key, const VAL_T &val){ auto it = item_map.find(key); if(it != item_map.end()){ item_list.erase(it->second); item_map.erase(it); } item_list.push_front(make_pair(key,val)); item_map.insert(make_pair(key, item_list.begin())); clean(); }; bool exist(const KEY_T &key){ return (item_map.count(key)>0); }; VAL_T get(const KEY_T &key){ assert(exist(key)); auto it = item_map.find(key); item_list.splice(item_list.begin(), item_list, it->second); return it->second->second; }; };

Implementé un caché LRU seguro para subprocesos hace dos años.

LRU se implementa típicamente con HashMap y LinkedList. Puedes googlear los detalles de implementación. Hay muchos recursos al respecto (Wikipedia también tiene una buena explicación).

Para que sea seguro para la ejecución de subprocesos, necesita poner el bloqueo cada vez que modifique el estado de la LRU.

Pegaré mi código de C ++ aquí para su referencia.

Aquí está la implementación.

/*** A template thread-safe LRU container. Typically LRU cache is implemented using a doubly linked list and a hash map. Doubly Linked List is used to store list of pages with most recently used page at the start of the list. So, as more pages are added to the list, least recently used pages are moved to the end of the list with page at tail being the least recently used page in the list. Additionally, this LRU provides time-to-live feature. Each entry has an expiration datetime. ***/ #ifndef LRU_CACHE_H #define LRU_CACHE_H #include <iostream> #include <list> #include <boost/unordered_map.hpp> #include <boost/shared_ptr.hpp> #include <boost/make_shared.hpp> #include <boost/date_time/posix_time/posix_time.hpp> #include <boost/thread/mutex.hpp> template <typename KeyType, typename ValueType> class LRUCache { private: typedef boost::posix_time::ptime DateTime; // Cache-entry struct ListItem { ListItem(const KeyType &key, const ValueType &value, const DateTime &expiration_datetime) : m_key(key), m_value(value), m_expiration_datetime(expiration_datetime){} KeyType m_key; ValueType m_value; DateTime m_expiration_datetime; }; typedef boost::shared_ptr<ListItem> ListItemPtr; typedef std::list<ListItemPtr> LruList; typedef typename std::list<ListItemPtr>::iterator LruListPos; typedef boost::unordered_map<KeyType, LruListPos> LruMapper; // A mutext to ensuare thread-safety. boost::mutex m_cache_mutex; // Maximum number of entries. std::size_t m_capacity; // Stores cache-entries from latest to oldest. LruList m_list; // Mapper for key to list-position. LruMapper m_mapper; // Default time-to-live being add to entry every time we touch it. unsigned long m_ttl_in_seconds; /*** Note : This is a helper function whose function call need to be wrapped within a lock. It returns true/false whether key exists and not expires. Delete the expired entry if necessary. ***/ bool containsKeyHelper(const KeyType &key) { bool has_key(m_mapper.count(key) != 0); if (has_key) { LruListPos pos = m_mapper[key]; ListItemPtr & cur_item_ptr = *pos; // Remove the entry if key expires if (isDateTimeExpired(cur_item_ptr->m_expiration_datetime)) { has_key = false; m_list.erase(pos); m_mapper.erase(key); } } return has_key; } /*** Locate an item in list by key, and move it at the front of the list, which means make it the latest item. Note : This is a helper function whose function call need to be wrapped within a lock. ***/ void makeEntryTheLatest(const KeyType &key) { if (m_mapper.count(key)) { // Add original item at the front of the list, // and update <Key, ListPosition> mapper. LruListPos original_list_position = m_mapper[key]; const ListItemPtr & cur_item_ptr = *original_list_position; m_list.push_front(cur_item_ptr); m_mapper[key] = m_list.begin(); // Don''t forget to update its expiration datetime. m_list.front()->m_expiration_datetime = getExpirationDatetime(m_list.front()->m_expiration_datetime); // Erase the item at original position. m_list.erase(original_list_position); } } public: /*** Cache should have capacity to limit its memory usage. We also add time-to-live for each cache entry to expire the stale information. By default, ttl is one hour. ***/ LRUCache(std::size_t capacity, unsigned long ttl_in_seconds = 3600) : m_capacity(capacity), m_ttl_in_seconds(ttl_in_seconds) {} /*** Return now + time-to-live ***/ DateTime getExpirationDatetime(const DateTime &now) { static const boost::posix_time::seconds ttl(m_ttl_in_seconds); return now + ttl; } /*** If input datetime is older than current datetime, then it is expired. ***/ bool isDateTimeExpired(const DateTime &date_time) { return date_time < boost::posix_time::second_clock::local_time(); } /*** Return the number of entries in this cache. ***/ std::size_t size() { boost::mutex::scoped_lock lock(m_cache_mutex); return m_mapper.size(); } /*** Get value by key. Return true/false whether key exists. If key exists, input paramter value will get updated. ***/ bool get(const KeyType &key, ValueType &value) { boost::mutex::scoped_lock lock(m_cache_mutex); if (!containsKeyHelper(key)) { return false; } else { // Make the entry the latest and update its TTL. makeEntryTheLatest(key); // Then get its value. value = m_list.front()->m_value; return true; } } /*** Add <key, value> pair if no such key exists. Otherwise, just update the value of old key. ***/ void put(const KeyType &key, const ValueType &value) { boost::mutex::scoped_lock lock(m_cache_mutex); if (containsKeyHelper(key)) { // Make the entry the latest and update its TTL. makeEntryTheLatest(key); // Now we only need to update its value. m_list.front()->m_value = value; } else { // Key exists and is not expired. if (m_list.size() == m_capacity) { KeyType delete_key = m_list.back()->m_key; m_list.pop_back(); m_mapper.erase(delete_key); } DateTime now = boost::posix_time::second_clock::local_time(); m_list.push_front(boost::make_shared<ListItem>(key, value, getExpirationDatetime(now))); m_mapper[key] = m_list.begin(); } } }; #endif

Aquí están las pruebas unitarias.

#include "cxx_unit.h" #include "lru_cache.h" struct LruCacheTest : public FDS::CxxUnit::TestFixture<LruCacheTest>{ CXXUNIT_TEST_SUITE(); CXXUNIT_TEST(LruCacheTest, testContainsKey); CXXUNIT_TEST(LruCacheTest, testGet); CXXUNIT_TEST(LruCacheTest, testPut); CXXUNIT_TEST_SUITE_END(); void testContainsKey(); void testGet(); void testPut(); }; void LruCacheTest::testContainsKey() { LRUCache<int,std::string> cache(3); cache.put(1,"1"); // 1 cache.put(2,"2"); // 2,1 cache.put(3,"3"); // 3,2,1 cache.put(4,"4"); // 4,3,2 std::string value_holder(""); CXXUNIT_ASSERT(cache.get(1, value_holder) == false); // 4,3,2 CXXUNIT_ASSERT(value_holder == ""); CXXUNIT_ASSERT(cache.get(2, value_holder) == true); // 2,4,3 CXXUNIT_ASSERT(value_holder == "2"); cache.put(5,"5"); // 5, 2, 4 CXXUNIT_ASSERT(cache.get(3, value_holder) == false); // 5, 2, 4 CXXUNIT_ASSERT(value_holder == "2"); // value_holder is still "2" CXXUNIT_ASSERT(cache.get(4, value_holder) == true); // 4, 5, 2 CXXUNIT_ASSERT(value_holder == "4"); cache.put(2,"II"); // {2, "II"}, 4, 5 CXXUNIT_ASSERT(cache.get(2, value_holder) == true); // 2, 4, 5 CXXUNIT_ASSERT(value_holder == "II"); // Cache-entries : {2, "II"}, {4, "4"}, {5, "5"} CXXUNIT_ASSERT(cache.size() == 3); CXXUNIT_ASSERT(cache.get(2, value_holder) == true); CXXUNIT_ASSERT(cache.get(4, value_holder) == true); CXXUNIT_ASSERT(cache.get(5, value_holder) == true); } void LruCacheTest::testGet() { LRUCache<int,std::string> cache(3); cache.put(1,"1"); // 1 cache.put(2,"2"); // 2,1 cache.put(3,"3"); // 3,2,1 cache.put(4,"4"); // 4,3,2 std::string value_holder(""); CXXUNIT_ASSERT(cache.get(1, value_holder) == false); // 4,3,2 CXXUNIT_ASSERT(value_holder == ""); CXXUNIT_ASSERT(cache.get(2, value_holder) == true); // 2,4,3 CXXUNIT_ASSERT(value_holder == "2"); cache.put(5,"5"); // 5,2,4 CXXUNIT_ASSERT(cache.get(5, value_holder) == true); // 5,2,4 CXXUNIT_ASSERT(value_holder == "5"); CXXUNIT_ASSERT(cache.get(4, value_holder) == true); // 4, 5, 2 CXXUNIT_ASSERT(value_holder == "4"); cache.put(2,"II"); CXXUNIT_ASSERT(cache.get(2, value_holder) == true); // {2 : "II"}, 4, 5 CXXUNIT_ASSERT(value_holder == "II"); // Cache-entries : {2, "II"}, {4, "4"}, {5, "5"} CXXUNIT_ASSERT(cache.size() == 3); CXXUNIT_ASSERT(cache.get(2, value_holder) == true); CXXUNIT_ASSERT(cache.get(4, value_holder) == true); CXXUNIT_ASSERT(cache.get(5, value_holder) == true); } void LruCacheTest::testPut() { LRUCache<int,std::string> cache(3); cache.put(1,"1"); // 1 cache.put(2,"2"); // 2,1 cache.put(3,"3"); // 3,2,1 cache.put(4,"4"); // 4,3,2 cache.put(5,"5"); // 5,4,3 std::string value_holder(""); CXXUNIT_ASSERT(cache.get(2, value_holder) == false); // 5,4,3 CXXUNIT_ASSERT(value_holder == ""); CXXUNIT_ASSERT(cache.get(4, value_holder) == true); // 4,5,3 CXXUNIT_ASSERT(value_holder == "4"); cache.put(2,"II"); CXXUNIT_ASSERT(cache.get(2, value_holder) == true); // II,4,5 CXXUNIT_ASSERT(value_holder == "II"); // Cache-entries : {2, "II"}, {4, "4"}, {5, "5"} CXXUNIT_ASSERT(cache.size() == 3); CXXUNIT_ASSERT(cache.get(2, value_holder) == true); CXXUNIT_ASSERT(cache.get(4, value_holder) == true); CXXUNIT_ASSERT(cache.get(5, value_holder) == true); } CXXUNIT_REGISTER_TEST(LruCacheTest);

Tengo una implementación de LRU here . La interfaz sigue a std :: map, por lo que no debería ser tan difícil de usar. Además, puede proporcionar un controlador de respaldo personalizado, que se utiliza si los datos se invalidan en el caché.

sweet::Cache<std::string,std::vector<int>, 48> c1; c1.insert("key1", std::vector<int>()); c1.insert("key2", std::vector<int>()); assert(c1.contains("key1"));

Una lista vinculada + tabla hash de punteros a los nodos de lista enlazados es la forma habitual de implementar cachés LRU. Esto da O (1) operaciones (asumiendo un hash decente). Ventaja de esto (siendo O (1)): puede hacer una versión multiproceso simplemente bloqueando toda la estructura. No tiene que preocuparse por el bloqueo granular, etc.

En pocas palabras, la forma en que funciona:

En un acceso de un valor, mueve el nodo correspondiente en la lista vinculada al encabezado.

Cuando necesite eliminar un valor de la memoria caché, lo eliminará del extremo posterior.

Cuando agrega un valor a la memoria caché, simplemente lo coloca al principio de la lista vinculada.

Gracias a doublep, aquí está el sitio con una implementación en C ++: Plantillas de contenedores varios .

LRU Page Replacement Technique:

Cuando se hace referencia a una página, la página requerida puede estar en la caché.

If in the cache : tenemos que ponerlo al frente de la cola de caché.

If NOT in the cache lo traemos a la memoria caché. En palabras simples, agregamos una nueva página al frente de la cola del caché. Si el caché está lleno, es decir, todos los marcos están llenos, eliminamos una página de la parte posterior de la cola del caché y agregamos la página nueva al frente de la cola del caché.

# Cache Size csize = int(input()) # Sequence of pages pages = list(map(int,input().split())) # Take a cache list cache=[] # Keep track of number of elements in cache n=0 # Count Page Fault fault=0 for page in pages: # If page exists in cache if page in cache: # Move the page to front as it is most recent page # First remove from cache and then append at front cache.remove(page) cache.append(page) else: # Cache is full if(n==csize): # Remove the least recent page cache.pop(0) else: # Increment element count in cache n=n+1 # Page not exist in cache => Page Fault fault += 1 cache.append(page) print("Page Fault:",fault)

De entrada y salida

Input: 3 1 2 3 4 1 2 5 1 2 3 4 5 Output: Page Fault: 10