usar - Seleccione N elementos aleatorios de una Lista<T> en C#

numeros aleatorios que no se repitan c# (25)

¡Seleccionar N elementos aleatorios de un grupo no debería tener nada que ver con el orden ! La aleatoriedad tiene que ver con la imprevisibilidad y no con la combinación de posiciones en un grupo. Todas las respuestas que se refieren a algún tipo de orden serán menos eficientes que las que no lo son. Dado que la eficiencia es la clave aquí, publicaré algo que no cambia demasiado el orden de los elementos.

1) Si necesita valores aleatorios verdaderos , lo que significa que no hay restricciones sobre qué elementos elegir (es decir, si se puede volver a seleccionar el elemento una vez elegido):

public static List<T> GetTrueRandom<T>(this IList<T> source, int count, bool throwArgumentOutOfRangeException = true) { if (throwArgumentOutOfRangeException && count > source.Count) throw new ArgumentOutOfRangeException(); var randoms = new List<T>(count); randoms.AddRandomly(source, count); return randoms; }

Si desactiva el indicador de excepción, puede elegir elementos aleatorios varias veces.

Si tiene {1, 2, 3, 4}, puede dar {1, 4, 4}, {1, 4, 3} etc. para 3 elementos o incluso {1, 4, 3, 2, 4} para 5 artículos!

Esto debería ser bastante rápido, ya que no tiene nada que controlar.

2) Si necesita miembros individuales del grupo sin repetición, entonces confiaría en un diccionario (como muchos ya lo han señalado).

public static List<T> GetDistinctRandom<T>(this IList<T> source, int count) { if (count > source.Count) throw new ArgumentOutOfRangeException(); if (count == source.Count) return new List<T>(source); var sourceDict = source.ToIndexedDictionary(); if (count > source.Count / 2) { while (sourceDict.Count > count) sourceDict.Remove(source.GetRandomIndex()); return sourceDict.Select(kvp => kvp.Value).ToList(); } var randomDict = new Dictionary<int, T>(count); while (randomDict.Count < count) { int key = source.GetRandomIndex(); if (!randomDict.ContainsKey(key)) randomDict.Add(key, sourceDict[key]); } return randomDict.Select(kvp => kvp.Value).ToList(); }

El código es un poco más largo que otros enfoques de diccionario aquí porque no solo estoy agregando, sino también quitando de la lista, por lo que son dos bucles. Puede ver aquí que no he reordenado nada cuando el count llega a ser igual a source.Count . Eso es porque creo que la aleatoriedad debería estar en el conjunto devuelto como un todo . Quiero decir, si quieres 5 elementos aleatorios de 1, 2, 3, 4, 5 , no debería importar si es 1, 3, 4, 2, 5 o 1, 2, 3, 4, 5 , pero si necesitas 4 ítems del mismo conjunto, luego debe rendir de manera impredecible en 1, 2, 3, 4 , 1, 3, 5, 2 , 2, 3, 5, 4 etc. En segundo lugar, cuando el recuento de los ítems aleatorios a devolver es más de la mitad del grupo original, entonces es más fácil eliminar la source.Count - count elementos del grupo que agregar elementos de count . Por razones de rendimiento, utilicé source lugar de sourceDict para obtener el índice aleatorio en el método remove.

Entonces, si tiene {1, 2, 3, 4}, esto puede terminar en {1, 2, 3}, {3, 4, 1}, etc. para 3 elementos.

3) Si necesita valores aleatorios verdaderamente distintos de su grupo teniendo en cuenta los duplicados en el grupo original, entonces puede usar el mismo enfoque que el anterior, pero un HashSet será más ligero que un diccionario.

public static List<T> GetTrueDistinctRandom<T>(this IList<T> source, int count, bool throwArgumentOutOfRangeException = true) { if (count > source.Count) throw new ArgumentOutOfRangeException(); var set = new HashSet<T>(source); if (throwArgumentOutOfRangeException && count > set.Count) throw new ArgumentOutOfRangeException(); List<T> list = hash.ToList(); if (count >= set.Count) return list; if (count > set.Count / 2) { while (set.Count > count) set.Remove(list.GetRandom()); return set.ToList(); } var randoms = new HashSet<T>(); randoms.AddRandomly(list, count); return randoms.ToList(); }

La variable randoms se convierte en un HashSet para evitar que se agreguen duplicados en los casos más raros en los que Random.Next puede arrojar el mismo valor, especialmente cuando la lista de entrada es pequeña.

Entonces {1, 2, 2, 4} => 3 elementos aleatorios => {1, 2, 4} y nunca {1, 2, 2}
{1, 2, 2, 4} => 4 elementos aleatorios => ¡excepción! o {1, 2, 4} dependiendo del conjunto de banderas.

Algunos de los métodos de extensión que he usado:

static Random rnd = new Random(); public static int GetRandomIndex<T>(this ICollection<T> source) { return rnd.Next(source.Count); } public static T GetRandom<T>(this IList<T> source) { return source[source.GetRandomIndex()]; } static void AddRandomly<T>(this ICollection<T> toCol, IList<T> fromList, int count) { while (toCol.Count < count) toCol.Add(fromList.GetRandom()); } public static Dictionary<int, T> ToIndexedDictionary<T>(this IEnumerable<T> lst) { return lst.ToIndexedDictionary(t => t); } public static Dictionary<int, T> ToIndexedDictionary<S, T>(this IEnumerable<S> lst, Func<S, T> valueSelector) { int index = -1; return lst.ToDictionary(t => ++index, valueSelector); }

Si todo se trata del rendimiento con decenas de miles de elementos en la lista que deben iterarse 10000 veces, entonces es posible que desee tener una clase aleatoria más rápida que System.Random , pero no creo que eso sea un gran problema teniendo en cuenta lo último probablemente nunca es un cuello de botella, es bastante rápido ...

Editar: si necesita volver a organizar el orden de los artículos devueltos, entonces no hay nada que pueda superar el enfoque de Fisher-Yates de dhakim : corto, dulce y simple.

Necesito un algoritmo rápido para seleccionar 5 elementos aleatorios de una lista genérica. Por ejemplo, me gustaría obtener 5 elementos aleatorios de una List<string> .

Acabo de toparme con este problema, y un poco más de búsqueda en Google me llevó al problema de barajar aleatoriamente una lista: http://en.wikipedia.org/wiki/Fisher-Yates_shuffle

Para mezclar aleatoriamente su lista (en su lugar), haga esto:

Para barajar un conjunto de n elementos (índices 0..n-1):

for i from n − 1 downto 1 do j ← random integer with 0 ≤ j ≤ i exchange a[j] and a[i]

Si solo necesita los primeros 5 elementos, en lugar de ejecutar i desde n-1 hasta 1, solo debe ejecutarlo en n-5 (es decir, n-5)

Digamos que necesitas k artículos,

Esto se convierte en:

for (i = n − 1; i >= n-k; i--) { j = random integer with 0 ≤ j ≤ i exchange a[j] and a[i] }

Cada elemento que se selecciona se intercambia hacia el final de la matriz, por lo que los k elementos seleccionados son los últimos k elementos de la matriz.

Esto lleva tiempo O (k), donde k es la cantidad de elementos seleccionados aleatoriamente que necesita.

Además, si no desea modificar su lista inicial, puede anotar todos sus swaps en una lista temporal, revertir esa lista y aplicarlos nuevamente, realizando así el conjunto inverso de swaps y devolviéndole su lista inicial sin cambiar el tiempo de ejecución O (k).

Finalmente, para el stickler real, si (n == k), debe detenerse en 1, no en nk, ya que el entero elegido al azar siempre será 0.

Aquí tiene una implementación basada en Fisher-Yates Shuffle cuya complejidad de algoritmo es O (n) donde n es el subconjunto o tamaño de muestra, en lugar del tamaño de la lista, como señaló John Shedletsky.

public static IEnumerable<T> GetRandomSample<T>(this IList<T> list, int sampleSize) { if (list == null) throw new ArgumentNullException("list"); if (sampleSize > list.Count) throw new ArgumentException("sampleSize may not be greater than list count", "sampleSize"); var indices = new Dictionary<int, int>(); int index; var rnd = new Random(); for (int i = 0; i < sampleSize; i++) { int j = rnd.Next(i, list.Count); if (!indices.TryGetValue(j, out index)) index = j; yield return list[index]; if (!indices.TryGetValue(i, out index)) index = i; indices[j] = index; } }

Basado en la respuesta de Kyle, aquí está mi implementación de c #.

/// <summary> /// Picks random selection of available game ID''s /// </summary> private static List<int> GetRandomGameIDs(int count) { var gameIDs = (int[])HttpContext.Current.Application["NonDeletedArcadeGameIDs"]; var totalGameIDs = gameIDs.Count(); if (count > totalGameIDs) count = totalGameIDs; var rnd = new Random(); var leftToPick = count; var itemsLeft = totalGameIDs; var arrPickIndex = 0; var returnIDs = new List<int>(); while (leftToPick > 0) { if (rnd.Next(0, itemsLeft) < leftToPick) { returnIDs .Add(gameIDs[arrPickIndex]); leftToPick--; } arrPickIndex++; itemsLeft--; } return returnIDs ; }

Combiné varias de las respuestas anteriores para crear un método de extensión evaluado Lazily. Mis pruebas mostraron que el enfoque de Kyle (Orden (N)) es muchas veces más lento que el uso de un conjunto por drzaus para proponer los índices aleatorios para elegir (Orden (K)). El primero realiza muchas más llamadas al generador de números aleatorios, además itera más veces sobre los elementos.

Los objetivos de mi implementación fueron:

1) No se da cuenta de la lista completa si se le da un IEnumerable que no es un IList. Si me dan una secuencia de un trillón de elementos, no me quiero quedar sin memoria. Use el enfoque de Kyle para una solución en línea.

2) Si puedo decir que es un IList, use el enfoque drzaus, con un giro. Si K es más de la mitad de N, corro el riesgo de agredirme ya que elijo muchos índices aleatorios una y otra vez y tengo que omitirlos. Por lo tanto, compongo una lista de los índices para NO mantener.

3) Garantizo que los artículos serán devueltos en el mismo orden en que se encontraron. El algoritmo de Kyle no requirió ninguna alteración. El algoritmo de drzaus requiere que no emita elementos en el orden en que se eligen los índices aleatorios. Reúno todos los índices en un SortedSet, luego emito los ítems en orden de índice ordenado.

4) Si K es grande en comparación con N e I invierte el sentido del conjunto, entonces enumero todos los elementos y pruebo si el índice no está en el conjunto. Esto significa que pierdo el tiempo de ejecución de la orden (K), pero como K está cerca de N en estos casos, no pierdo mucho.

Aquí está el código:

/// <summary> /// Takes k elements from the next n elements at random, preserving their order. /// /// If there are fewer than n elements in items, this may return fewer than k elements. /// </summary> /// <typeparam name="TElem">Type of element in the items collection.</typeparam> /// <param name="items">Items to be randomly selected.</param> /// <param name="k">Number of items to pick.</param> /// <param name="n">Total number of items to choose from. /// If the items collection contains more than this number, the extra members will be skipped. /// If the items collection contains fewer than this number, it is possible that fewer than k items will be returned.</param> /// <returns>Enumerable over the retained items. /// /// See http://.com/questions/48087/select-a-random-n-elements-from-listt-in-c-sharp for the commentary. /// </returns> public static IEnumerable<TElem> TakeRandom<TElem>(this IEnumerable<TElem> items, int k, int n) { var r = new FastRandom(); var itemsList = items as IList<TElem>; if (k >= n || (itemsList != null && k >= itemsList.Count)) foreach (var item in items) yield return item; else { // If we have a list, we can infer more information and choose a better algorithm. // When using an IList, this is about 7 times faster (on one benchmark)! if (itemsList != null && k < n/2) { // Since we have a List, we can use an algorithm suitable for Lists. // If there are fewer than n elements, reduce n. n = Math.Min(n, itemsList.Count); // This algorithm picks K index-values randomly and directly chooses those items to be selected. // If k is more than half of n, then we will spend a fair amount of time thrashing, picking // indices that we have already picked and having to try again. var invertSet = k >= n/2; var positions = invertSet ? (ISet<int>) new HashSet<int>() : (ISet<int>) new SortedSet<int>(); var numbersNeeded = invertSet ? n - k : k; while (numbersNeeded > 0) if (positions.Add(r.Next(0, n))) numbersNeeded--; if (invertSet) { // positions contains all the indices of elements to Skip. for (var itemIndex = 0; itemIndex < n; itemIndex++) { if (!positions.Contains(itemIndex)) yield return itemsList[itemIndex]; } } else { // positions contains all the indices of elements to Take. foreach (var itemIndex in positions) yield return itemsList[itemIndex]; } } else { // Since we do not have a list, we will use an online algorithm. // This permits is to skip the rest as soon as we have enough items. var found = 0; var scanned = 0; foreach (var item in items) { var rand = r.Next(0,n-scanned); if (rand < k - found) { yield return item; found++; } scanned++; if (found >= k || scanned >= n) break; } } } }

Utilizo un generador de números aleatorios especializado, pero puede usar C # ''s Random si lo desea. ( FastRandom fue escrito por Colin Green y es parte de SharpNEAT. Tiene un período de 2 ^ 128-1 que es mejor que muchos RNG).

Estas son las pruebas unitarias:

[TestClass] public class TakeRandomTests { /// <summary> /// Ensure that when randomly choosing items from an array, all items are chosen with roughly equal probability. /// </summary> [TestMethod] public void TakeRandom_Array_Uniformity() { const int numTrials = 2000000; const int expectedCount = numTrials/20; var timesChosen = new int[100]; var century = new int[100]; for (var i = 0; i < century.Length; i++) century[i] = i; for (var trial = 0; trial < numTrials; trial++) { foreach (var i in century.TakeRandom(5, 100)) timesChosen[i]++; } var avg = timesChosen.Average(); var max = timesChosen.Max(); var min = timesChosen.Min(); var allowedDifference = expectedCount/100; AssertBetween(avg, expectedCount - 2, expectedCount + 2, "Average"); //AssertBetween(min, expectedCount - allowedDifference, expectedCount, "Min"); //AssertBetween(max, expectedCount, expectedCount + allowedDifference, "Max"); var countInRange = timesChosen.Count(i => i >= expectedCount - allowedDifference && i <= expectedCount + allowedDifference); Assert.IsTrue(countInRange >= 90, String.Format("Not enough were in range: {0}", countInRange)); } /// <summary> /// Ensure that when randomly choosing items from an IEnumerable that is not an IList, /// all items are chosen with roughly equal probability. /// </summary> [TestMethod] public void TakeRandom_IEnumerable_Uniformity() { const int numTrials = 2000000; const int expectedCount = numTrials / 20; var timesChosen = new int[100]; for (var trial = 0; trial < numTrials; trial++) { foreach (var i in Range(0,100).TakeRandom(5, 100)) timesChosen[i]++; } var avg = timesChosen.Average(); var max = timesChosen.Max(); var min = timesChosen.Min(); var allowedDifference = expectedCount / 100; var countInRange = timesChosen.Count(i => i >= expectedCount - allowedDifference && i <= expectedCount + allowedDifference); Assert.IsTrue(countInRange >= 90, String.Format("Not enough were in range: {0}", countInRange)); } private IEnumerable<int> Range(int low, int count) { for (var i = low; i < low + count; i++) yield return i; } private static void AssertBetween(int x, int low, int high, String message) { Assert.IsTrue(x > low, String.Format("Value {0} is less than lower limit of {1}. {2}", x, low, message)); Assert.IsTrue(x < high, String.Format("Value {0} is more than upper limit of {1}. {2}", x, high, message)); } private static void AssertBetween(double x, double low, double high, String message) { Assert.IsTrue(x > low, String.Format("Value {0} is less than lower limit of {1}. {2}", x, low, message)); Assert.IsTrue(x < high, String.Format("Value {0} is more than upper limit of {1}. {2}", x, high, message)); } }

Creo que la respuesta seleccionada es correcta y muy dulce. Sin embargo, lo implementé de manera diferente, ya que también quería el resultado en orden aleatorio.

static IEnumerable<SomeType> PickSomeInRandomOrder<SomeType>( IEnumerable<SomeType> someTypes, int maxCount) { Random random = new Random(DateTime.Now.Millisecond); Dictionary<double, SomeType> randomSortTable = new Dictionary<double,SomeType>(); foreach(SomeType someType in someTypes) randomSortTable[random.NextDouble()] = someType; return randomSortTable.OrderBy(KVP => KVP.Key).Take(maxCount).Select(KVP => KVP.Value); }

De Dragons in the Algorithm , una interpretación en C #:

int k = 10; // items to select var items = new List<int>(new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 }); var selected = new List<int>(); var needed = k; var available = items.Count; var rand = new Random(); while (selected.Count < k) { if( rand.NextDouble() < needed / available ) { selected.Add(items[available-1]) needed--; } available--; }

Este algoritmo seleccionará indicios únicos de la lista de elementos.

Estaba pensando en un comentario de @JohnShedletsky sobre la respuesta aceptada con respecto a (paráfrasis):

deberías poder hacer esto en O (subconjunto.Length), en lugar de O (originalList.Length)

Básicamente, debería poder generar índices aleatorios de subset y luego extraerlos de la lista original.

El método

public static class EnumerableExtensions { public static Random randomizer = new Random(); // you''d ideally be able to replace this with whatever makes you comfortable public static IEnumerable<T> GetRandom<T>(this IEnumerable<T> list, int numItems) { return (list as T[] ?? list.ToArray()).GetRandom(numItems); // because ReSharper whined about duplicate enumeration... /* items.Add(list.ElementAt(randomizer.Next(list.Count()))) ) numItems--; */ } // just because the parentheses were getting confusing public static IEnumerable<T> GetRandom<T>(this T[] list, int numItems) { var items = new HashSet<T>(); // don''t want to add the same item twice; otherwise use a list while (numItems > 0 ) // if we successfully added it, move on if( items.Add(list[randomizer.Next(list.Length)]) ) numItems--; return items; } // and because it''s really fun; note -- you may get repetition public static IEnumerable<T> PluckRandomly<T>(this IEnumerable<T> list) { while( true ) yield return list.ElementAt(randomizer.Next(list.Count())); } }

Si quisiera ser aún más eficiente, probablemente use un HashSet de los índices , no los elementos de la lista real (en caso de que tenga tipos complejos o comparaciones costosas);

La prueba de unidad

Y para asegurarnos de que no tengamos colisiones, etc.

[TestClass] public class RandomizingTests : UnitTestBase { [TestMethod] public void GetRandomFromList() { this.testGetRandomFromList((list, num) => list.GetRandom(num)); } [TestMethod] public void PluckRandomly() { this.testGetRandomFromList((list, num) => list.PluckRandomly().Take(num), requireDistinct:false); } private void testGetRandomFromList(Func<IEnumerable<int>, int, IEnumerable<int>> methodToGetRandomItems, int numToTake = 10, int repetitions = 100000, bool requireDistinct = true) { var items = Enumerable.Range(0, 100); IEnumerable<int> randomItems = null; while( repetitions-- > 0 ) { randomItems = methodToGetRandomItems(items, numToTake); Assert.AreEqual(numToTake, randomItems.Count(), "Did not get expected number of items {0}; failed at {1} repetition--", numToTake, repetitions); if(requireDistinct) Assert.AreEqual(numToTake, randomItems.Distinct().Count(), "Collisions (non-unique values) found, failed at {0} repetition--", repetitions); Assert.IsTrue(randomItems.All(o => items.Contains(o)), "Some unknown values found; failed at {0} repetition--", repetitions); } } }

Este es en realidad un problema más difícil de lo que parece, principalmente porque muchas soluciones matemáticamente correctas no le permitirán alcanzar todas las posibilidades (más sobre esto a continuación).

En primer lugar, aquí hay algunos generadores de números correctos, si tiene un número aleatorio, fáciles de implementar:

(0) La respuesta de Kyle, que es O (n).

(1) Genere una lista de n pares [(0, rand), (1, rand), (2, rand), ...], ordénelos por la segunda coordenada, y use la primera k (para usted, k = 5) índices para obtener su subconjunto aleatorio. Creo que esto es fácil de implementar, aunque es el tiempo O (n log n).

(2) Inicie una lista vacía s = [] que crecerá para ser los índices de k elementos aleatorios. Elija un número r en {0, 1, 2, ..., n-1} al azar, r = rand% n, y añádalo a s. Luego toma r = rand% (n-1) y pega en s; agregue r elementos # menos que en s para evitar colisiones. Luego tome r = rand% (n-2), y haga lo mismo, etc. hasta que tenga k elementos distintos en s. Este tiene el peor tiempo de ejecución O (k ^ 2). Entonces, para k << n, esto puede ser más rápido. Si sigue ordenando y rastreando los intervalos contiguos que tiene, puede implementarlo en O (k log k), pero es más trabajo.

@Kyle - tienes razón, en segundo lugar, estoy de acuerdo con tu respuesta. Lo leí al principio apresuradamente y pensé equivocadamente que me indicabas que debía elegir secuencialmente cada elemento con una probabilidad fija k / n, lo que hubiera sido incorrecto, pero tu enfoque adaptativo parece correcto para mí. Lo siento por eso.

Ok, y ahora para el pateador: asintóticamente (para k fijo, n creciendo), ¡hay n ^ k / k! opciones de k subconjunto de elementos de n elementos [esta es una aproximación de (n elegir k)]. Si n es grande, yk no es muy pequeño, entonces estos números son enormes. La mejor longitud de ciclo que se puede esperar en cualquier generador de números aleatorios estándar de 32 bits es 2 ^ 32 = 256 ^ 4. Entonces, si tenemos una lista de 1000 elementos y queremos elegir 5 al azar, no hay forma de que un generador de números aleatorios estándar alcance todas las posibilidades. Sin embargo, siempre y cuando esté bien con una opción que funcione bien para conjuntos más pequeños, y siempre "se vea" al azar, entonces estos algoritmos deberían estar bien.

Adición : Después de escribir esto, me di cuenta de que es complicado implementar la idea (2) correctamente, por lo que quería aclarar esta respuesta. Para obtener el tiempo O (k log k), necesita una estructura similar a una matriz que admita búsquedas e inserciones O (log m); un árbol binario equilibrado puede hacer esto. Usando tal estructura para construir una matriz llamada s, aquí hay algún pseudopython:

# Returns a container s with k distinct random numbers from {0, 1, ..., n-1} def ChooseRandomSubset(n, k): for i in range(k): r = UniformRandom(0, n-i) # May be 0, must be < n-i q = s.FirstIndexSuchThat( s[q] - q > r ) # This is the search. s.InsertInOrder(q ? r + q : r + len(s)) # Inserts right before q. return s

Sugiero analizar algunos casos de muestra para ver cómo esto implementa de manera eficiente la explicación en inglés anterior.

Este método puede ser equivalente al de Kyle.

Supongamos que su lista es de tamaño ny quiere k elementos.

Random rand = new Random(); for(int i = 0; k>0; ++i) { int r = rand.Next(0, n-i); if(r<k) { //include element i k--; } }

Funciona de maravilla :)

-Alex Gilbert

Iterar a través de y para cada elemento hacer la probabilidad de selección = (número necesario) / (número dejado)

Entonces, si tuviera 40 elementos, el primero tendría una probabilidad de 5/40 de ser seleccionado. Si es así, el próximo tiene una posibilidad de 4/39, de lo contrario, tiene una probabilidad de 5/39. Cuando llegues al final, tendrás tus 5 elementos y, a menudo, los tendrás todos antes de eso.

La solución simple que uso (probablemente no es buena para listas grandes): copie la lista en lista temporal, luego en el bucle seleccione aleatoriamente el elemento de la lista temporal y colóquelo en la lista de elementos seleccionados mientras lo elimina de la lista temporal (para que no pueda reselected).

Ejemplo:

List<Object> temp = OriginalList.ToList(); List<Object> selectedItems = new List<Object>(); Random rnd = new Random(); Object o; int i = 0; while (i < NumberOfSelectedItems) { o = temp[rnd.Next(temp.Count)]; selectedItems.Add(o); temp.Remove(o); i++; }

Puede usar esto pero el orden ocurrirá en el lado del cliente

.AsEnumerable().OrderBy(n => Guid.NewGuid()).Take(5);

Usando linq:

YourList.OrderBy(x => rnd.Next()).Take(5)

Here''s my approach (full text here http://krkadev.blogspot.com/2010/08/random-numbers-without-repetition.html ).

It should run in O(K) instead of O(N), where K is the number of wanted elements and N is the size of the list to choose from:

public <T> List<T> take(List<T> source, int k) { int n = source.size(); if (k > n) { throw new IllegalStateException( "Can not take " + k + " elements from a list with " + n + " elements"); } List<T> result = new ArrayList<T>(k); Map<Integer,Integer> used = new HashMap<Integer,Integer>(); int metric = 0; for (int i = 0; i < k; i++) { int off = random.nextInt(n - i); while (true) { metric++; Integer redirect = used.put(off, n - i - 1); if (redirect == null) { break; } off = redirect; } result.add(source.get(off)); } assert metric <= 2*k; return result; }

I recently did this on my project using an idea similar to Tyler''s point 1 .
I was loading a bunch of questions and selecting five at random. Sorting was achieved using an IComparer .
aAll questions were loaded in the a QuestionSorter list, which was then sorted using the List''s Sort function and the first k elements where selected.

private class QuestionSorter : IComparable<QuestionSorter> { public double SortingKey { get; set; } public Question QuestionObject { get; set; } public QuestionSorter(Question q) { this.SortingKey = RandomNumberGenerator.RandomDouble; this.QuestionObject = q; } public int CompareTo(QuestionSorter other) { if (this.SortingKey < other.SortingKey) { return -1; } else if (this.SortingKey > other.SortingKey) { return 1; } else { return 0; } } }

Uso:

List<QuestionSorter> unsortedQuestions = new List<QuestionSorter>(); // add the questions here unsortedQuestions.Sort(unsortedQuestions as IComparer<QuestionSorter>); // select the first k elements

I would use an extension method.

public static IEnumerable<T> TakeRandom<T>(this IEnumerable<T> elements, int countToTake) { var random = new Random(); var internalList = elements.ToList(); var selected = new List<T>(); for (var i = 0; i < countToTake; ++i) { var next = random.Next(0, internalList.Count - selected.Count); selected.Add(internalList[next]); internalList[next] = internalList[internalList.Count - selected.Count]; } return selected; }

It is a lot harder than one would think. See the great Article "Shuffling" from Jeff.

I did write a very short article on that subject including C# code:
Return random subset of N elements of a given array

This is the best I could come up with on a first cut:

public List<String> getRandomItemsFromList(int returnCount, List<String> list) { List<String> returnList = new List<String>(); Dictionary<int, int> randoms = new Dictionary<int, int>(); while (randoms.Count != returnCount) { //generate new random between one and total list count int randomInt = new Random().Next(list.Count); // store this in dictionary to ensure uniqueness try { randoms.Add(randomInt, randomInt); } catch (ArgumentException aex) { Console.Write(aex.Message); } //we can assume this element exists in the dictonary already //check for randoms length and then iterate through the original list //adding items we select via random to the return list if (randoms.Count == returnCount) { foreach (int key in randoms.Keys) returnList.Add(list[randoms[key]]); break; //break out of _while_ loop } } return returnList; }

Using a list of randoms within a range of 1 - total list count and then simply pulling those items in the list seemed to be the best way, but using the Dictionary to ensure uniqueness is something I''m still mulling over.

Also note I used a string list, replace as needed.

This isn''t as elegant or efficient as the accepted solution, but it''s quick to write up. First, permute the array randomly, then select the first K elements. In python,

import numpy N = 20 K = 5 idx = np.arange(N) numpy.random.shuffle(idx) print idx[:K]

Using LINQ with large lists (when costly to touch each element) AND if you can live with the possibility of duplicates:

new int[5].Select(o => (int)(rnd.NextDouble() * maxIndex)).Select(i => YourIEnum.ElementAt(i))

For my use i had a list of 100.000 elements, and because of them being pulled from a DB I about halfed (or better) the time compared to a rnd on the whole list.

Having a large list will reduce the odds greatly for duplicates.

When N is very large, the normal method that randomly shuffles the N numbers and selects, say, first k numbers, can be prohibitive because of space complexity. The following algorithm requires only O(k) for both time and space complexities.

http://arxiv.org/abs/1512.00501

def random_selection_indices(num_samples, N): modified_entries = {} seq = [] for n in xrange(num_samples): i = N - n - 1 j = random.randrange(i) # swap a[j] and a[i] a_j = modified_entries[j] if j in modified_entries else j a_i = modified_entries[i] if i in modified_entries else i if a_i != j: modified_entries[j] = a_i elif j in modified_entries: # no need to store the modified value if it is the same as index modified_entries.pop(j) if a_j != i: modified_entries[i] = a_j elif i in modified_entries: # no need to store the modified value if it is the same as index modified_entries.pop(i) seq.append(a_j) return seq

why not something like this:

Dim ar As New ArrayList Dim numToGet As Integer = 5 ''hard code just to test ar.Add("12") ar.Add("11") ar.Add("10") ar.Add("15") ar.Add("16") ar.Add("17") Dim randomListOfProductIds As New ArrayList Dim toAdd As String = "" For i = 0 To numToGet - 1 toAdd = ar(CInt((ar.Count - 1) * Rnd())) randomListOfProductIds.Add(toAdd) ''remove from id list ar.Remove(toAdd) Next ''sorry i''m lazy and have to write vb at work :( and didn''t feel like converting to c#

public static IEnumerable<T> GetRandom<T>(this IList<T> list, int count, Random random) { // Probably you should throw exception if count > list.Count count = Math.Min(list.Count, count); var selectedIndices = new SortedSet<int>(); // Random upper bound int randomMax = list.Count - 1; while (selectedIndices.Count < count) { int randomIndex = random.Next(0, randomMax); // skip over already selected indeces foreach (var selectedIndex in selectedIndices) if (selectedIndex <= randomIndex) ++randomIndex; else break; yield return list[randomIndex]; selectedIndices.Add(randomIndex); --randomMax; } }

Memory: ~count
Complexity: O(count ² )

public static List<T> GetRandomElements<T>(this IEnumerable<T> list, int elementsCount) { return list.OrderBy(arg => Guid.NewGuid()).Take(elementsCount).ToList(); }