c++ arrays sorting lexicographic

ordenar matriz de enteros lexicográficamente C++



arrays sorting (12)

Quiero ordenar una gran cantidad de enteros (por ejemplo, 1 millón de elementos) lexicográficamente.

Ejemplo:

input [] = { 100, 21 , 22 , 99 , 1 , 927 } sorted[] = { 1 , 100, 21 , 22 , 927, 99 }

Lo he hecho utilizando el método más simple posible:

  • convertir todos los números en cadenas (muy costoso porque tomará mucha memoria)
  • usa std:sort con strcmp como función de comparación
  • convertir de nuevo las cadenas a números enteros

¿Hay un método mejor que este?


Aquí hay otro algoritmo que realiza algunos de los cálculos antes de ordenar. Parece ser bastante rápido, a pesar de la copia adicional (ver comparisons ).

Nota:

  • solo soporta enteros positivos
  • solo admite enteros <= std::numeric_limits<int>::max()/10

NB puedes optimizar count_digits y my_pow10 ; por ejemplo, vea Tres consejos de optimización para C ++ de Andrei Alexandrescu y ¿ Alguna manera más rápida que pow () para calcular una potencia entera de 10 en C ++?

Ayudantes:

#include <random> #include <vector> #include <utility> #include <cmath> #include <iostream> #include <algorithm> #include <limits> #include <iterator> // non-optimized version int count_digits(int p) // returns `0` for `p == 0` { int res = 0; for(; p != 0; ++res) { p /= 10; } return res; } // non-optimized version int my_pow10(unsigned exp) { int res = 1; for(; exp != 0; --exp) { res *= 10; } return res; }

Algoritmo (nota - no en el lugar):

// helper to provide integers with the same number of digits template<class T, class U> std::pair<T, T> lexicographic_pair_helper(T const p, U const maxDigits) { auto const digits = count_digits(p); // append zeros so that `l` has `maxDigits` digits auto const l = static_cast<T>( p * my_pow10(maxDigits-digits) ); return {l, p}; } template<class RaIt> using pair_vec = std::vector<std::pair<typename std::iterator_traits<RaIt>::value_type, typename std::iterator_traits<RaIt>::value_type>>; template<class RaIt> pair_vec<RaIt> lexicographic_sort(RaIt p_beg, RaIt p_end) { if(p_beg == p_end) return {}; auto max = *std::max_element(p_beg, p_end); auto maxDigits = count_digits(max); pair_vec<RaIt> result; result.reserve( std::distance(p_beg, p_end) ); for(auto i = p_beg; i != p_end; ++i) result.push_back( lexicographic_pair_helper(*i, maxDigits) ); using value_type = typename pair_vec<RaIt>::value_type; std::sort(begin(result), end(result), [](value_type const& l, value_type const& r) { if(l.first < r.first) return true; if(l.first > r.first) return false; return l.second < r.second; } ); return result; }

Ejemplo de uso:

int main() { std::vector<int> input = { 100, 21 , 22 , 99 , 1 , 927 }; // generate some numbers /*{ constexpr int number_of_elements = 1E6; std::random_device rd; std::mt19937 gen( rd() ); std::uniform_int_distribution<> dist(0, std::numeric_limits<int>::max()/10); for(int i = 0; i < number_of_elements; ++i) input.push_back( dist(gen) ); }*/ std::cout << "unsorted: "; for(auto const& e : input) std::cout << e << ", "; std::cout << "/n/n"; auto sorted = lexicographic_sort(begin(input), end(input)); std::cout << "sorted: "; for(auto const& e : sorted) std::cout << e.second << ", "; std::cout << "/n/n"; }


Aquí hay una wiki de la comunidad para comparar las soluciones. Tomé el código de Nim y lo hice fácilmente extensible. Siéntase libre de agregar sus soluciones y salidas.

La muestra ejecuta una computadora lenta antigua (3 GB de RAM, Core2Duo U9400) con g ++ 4.9 @ -O3 -march=native :

number of elements: 1e+03 size of integer type: 4 reference solution: Lightness Races in Orbit solution "dyp": duration: 0 ms and 301 microseconds comparison to reference solution: exact match solution "Nim": duration: 2 ms and 160 microseconds comparison to reference solution: exact match solution "nyarlathotep": duration: 8 ms and 126 microseconds comparison to reference solution: exact match solution "notbad": duration: 1 ms and 102 microseconds comparison to reference solution: exact match solution "Eric Postpischil": duration: 2 ms and 550 microseconds comparison to reference solution: exact match solution "Lightness Races in Orbit": duration: 17 ms and 469 microseconds comparison to reference solution: exact match solution "pts": duration: 1 ms and 92 microseconds comparison to reference solution: exact match ========================================================== number of elements: 1e+04 size of integer type: 4 reference solution: Lightness Races in Orbit solution "nyarlathotep": duration: 109 ms and 712 microseconds comparison to reference solution: exact match solution "Lightness Races in Orbit": duration: 272 ms and 819 microseconds comparison to reference solution: exact match solution "dyp": duration: 1 ms and 748 microseconds comparison to reference solution: exact match solution "notbad": duration: 16 ms and 115 microseconds comparison to reference solution: exact match solution "pts": duration: 15 ms and 10 microseconds comparison to reference solution: exact match solution "Eric Postpischil": duration: 33 ms and 301 microseconds comparison to reference solution: exact match solution "Nim": duration: 17 ms and 83 microseconds comparison to reference solution: exact match ========================================================== number of elements: 1e+05 size of integer type: 4 reference solution: Lightness Races in Orbit solution "Nim": duration: 217 ms and 4 microseconds comparison to reference solution: exact match solution "pts": duration: 199 ms and 505 microseconds comparison to reference solution: exact match solution "dyp": duration: 20 ms and 330 microseconds comparison to reference solution: exact match solution "Eric Postpischil": duration: 415 ms and 477 microseconds comparison to reference solution: exact match solution "Lightness Races in Orbit": duration: 3955 ms and 58 microseconds comparison to reference solution: exact match solution "notbad": duration: 215 ms and 259 microseconds comparison to reference solution: exact match solution "nyarlathotep": duration: 1341 ms and 46 microseconds comparison to reference solution: mismatch found ========================================================== number of elements: 1e+06 size of integer type: 4 reference solution: Lightness Races in Orbit solution "Lightness Races in Orbit": duration: 52861 ms and 314 microseconds comparison to reference solution: exact match solution "Eric Postpischil": duration: 4757 ms and 608 microseconds comparison to reference solution: exact match solution "nyarlathotep": duration: 15654 ms and 195 microseconds comparison to reference solution: mismatch found solution "dyp": duration: 233 ms and 779 microseconds comparison to reference solution: exact match solution "pts": duration: 2181 ms and 634 microseconds comparison to reference solution: exact match solution "Nim": duration: 2539 ms and 9 microseconds comparison to reference solution: exact match solution "notbad": duration: 2675 ms and 362 microseconds comparison to reference solution: exact match ========================================================== number of elements: 1e+07 size of integer type: 4 reference solution: Lightness Races in Orbit solution "notbad": duration: 33425 ms and 423 microseconds comparison to reference solution: exact match solution "pts": duration: 26000 ms and 398 microseconds comparison to reference solution: exact match solution "Eric Postpischil": duration: 56206 ms and 359 microseconds comparison to reference solution: exact match solution "Lightness Races in Orbit": duration: 658540 ms and 342 microseconds comparison to reference solution: exact match solution "nyarlathotep": duration: 187064 ms and 518 microseconds comparison to reference solution: mismatch found solution "Nim": duration: 30519 ms and 227 microseconds comparison to reference solution: exact match solution "dyp": duration: 2624 ms and 644 microseconds comparison to reference solution: exact match

Los algoritmos deben ser estructuras con plantillas de operador de llamada de función que admitan la interfaz:

template<class RaIt> operator()(RaIt begin, RaIt end);

Se proporciona una copia de los datos de entrada como un parámetro, se espera que el algoritmo proporcione el resultado en el mismo rango (por ejemplo, clasificación in situ).

#include <iostream> #include <vector> #include <algorithm> #include <iterator> #include <random> #include <vector> #include <utility> #include <cmath> #include <cassert> #include <chrono> #include <cstring> #include <climits> #include <functional> #include <cstdlib> #include <iomanip> using duration_t = decltype( std::chrono::high_resolution_clock::now() - std::chrono::high_resolution_clock::now()); template<class T> struct result_t { std::vector<T> numbers; duration_t duration; char const* name; }; template<class RaIt, class F> result_t<typename std::iterator_traits<RaIt>::value_type> apply_algorithm(RaIt p_beg, RaIt p_end, F f, char const* name) { using value_type = typename std::iterator_traits<RaIt>::value_type; std::vector<value_type> inplace(p_beg, p_end); auto start = std::chrono::high_resolution_clock::now(); f(begin(inplace), end(inplace)); auto end = std::chrono::high_resolution_clock::now(); auto duration = end - start; return {std::move(inplace), duration, name}; } // non-optimized version int count_digits(int p) // returns `0` for `p == 0` { int res = 0; for(; p != 0; ++res) { p /= 10; } return res; } // non-optimized version int my_pow10(unsigned exp) { int res = 1; for(; exp != 0; --exp) { res *= 10; } return res; } // !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! // paste algorithms here // !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! int main(int argc, char** argv) { using integer_t = int; constexpr integer_t dist_min = 0; constexpr integer_t dist_max = std::numeric_limits<integer_t>::max()/10; constexpr std::size_t default_number_of_elements = 1E6; const std::size_t number_of_elements = argc>1 ? std::atoll(argv[1]) : default_number_of_elements; std::cout << "number of elements: "; std::cout << std::scientific << std::setprecision(0); std::cout << (double)number_of_elements << "/n"; std::cout << /*std::defaultfloat <<*/ std::setprecision(6); std::cout.unsetf(std::ios_base::floatfield); std::cout << "size of integer type: " << sizeof(integer_t) << "/n/n"; std::vector<integer_t> input; { input.reserve(number_of_elements); std::random_device rd; std::mt19937 gen( rd() ); std::uniform_int_distribution<> dist(dist_min, dist_max); for(std::size_t i = 0; i < number_of_elements; ++i) input.push_back( dist(gen) ); } auto b = begin(input); auto e = end(input); using res_t = result_t<integer_t>; std::vector< std::function<res_t()> > algorithms; #define MAKE_BINDER(B, E, ALGO, NAME) / std::bind( &apply_algorithm<decltype(B),decltype(ALGO)>, / B,E,ALGO,NAME ) constexpr auto lightness_name = "Lightness Races in Orbit"; algorithms.push_back( MAKE_BINDER(b, e, lightness(), lightness_name) ); algorithms.push_back( MAKE_BINDER(b, e, dyp(), "dyp") ); algorithms.push_back( MAKE_BINDER(b, e, nim(), "Nim") ); algorithms.push_back( MAKE_BINDER(b, e, pts(), "pts") ); algorithms.push_back( MAKE_BINDER(b, e, epost(), "Eric Postpischil") ); algorithms.push_back( MAKE_BINDER(b, e, nyar(), "nyarlathotep") ); algorithms.push_back( MAKE_BINDER(b, e, notbad(), "notbad") ); { std::srand( std::random_device()() ); std::random_shuffle(begin(algorithms), end(algorithms)); } std::vector< result_t<integer_t> > res; for(auto& algo : algorithms) res.push_back( algo() ); auto reference_solution = *std::find_if(begin(res), end(res), [](result_t<integer_t> const& p) { return 0 == std::strcmp(lightness_name, p.name); }); std::cout << "reference solution: "<<reference_solution.name<<"/n/n"; for(auto const& e : res) { std::cout << "solution /""<<e.name<<"/":/n"; auto ms = std::chrono::duration_cast<std::chrono::microseconds>(e.duration); std::cout << "/tduration: "<<ms.count()/1000<<" ms and " <<ms.count()%1000<<" microseconds/n"; std::cout << "/tcomparison to reference solution: "; if(e.numbers.size() != reference_solution.numbers.size()) { std::cout << "ouput count mismatch/n"; break; } auto mismatch = std::mismatch(begin(e.numbers), end(e.numbers), begin(reference_solution.numbers)).first; if(end(e.numbers) == mismatch) { std::cout << "exact match/n"; }else { std::cout << "mismatch found/n"; } } }

Algoritmos actuales; tenga en cuenta que reemplacé los contadores de dígitos y pow-of-10 con la función global, por lo que todos nos beneficiamos si alguien optimiza.

struct lightness { template<class RaIt> void operator()(RaIt b, RaIt e) { using T = typename std::iterator_traits<RaIt>::value_type; /** * Sorts the array lexicographically. * * The trick is that we have to compare digits left-to-right * (considering typical Latin decimal notation) and that each of * two numbers to compare may have a different number of digits. * * This is very efficient in storage space, but inefficient in * execution time; an approach that pre-visits each element and * stores a translated representation will at least double your * storage requirements (possibly a problem with large inputs) * but require only a single translation of each element. */ std::sort( b, e, [](T lhs, T rhs) -> bool { // Returns true if lhs < rhs // Returns false otherwise const auto BASE = 10; const bool LHS_FIRST = true; const bool RHS_FIRST = false; const bool EQUAL = false; // There''s no point in doing anything at all // if both inputs are the same; strict-weak // ordering requires that we return `false` // in this case. if (lhs == rhs) { return EQUAL; } // Compensate for sign if (lhs < 0 && rhs < 0) { // When both are negative, sign on its own yields // no clear ordering between the two arguments. // // Remove the sign and continue as for positive // numbers. lhs *= -1; rhs *= -1; } else if (lhs < 0) { // When the LHS is negative but the RHS is not, // consider the LHS "first" always as we wish to // prioritise the leading ''-''. return LHS_FIRST; } else if (rhs < 0) { // When the RHS is negative but the LHS is not, // consider the RHS "first" always as we wish to // prioritise the leading ''-''. return RHS_FIRST; } // Counting the number of digits in both the LHS and RHS // arguments is *almost* trivial. const auto lhs_digits = ( lhs == 0 ? 1 : std::ceil(std::log(lhs+1)/std::log(BASE)) ); const auto rhs_digits = ( rhs == 0 ? 1 : std::ceil(std::log(rhs+1)/std::log(BASE)) ); // Now we loop through the positions, left-to-right, // calculating the digit at these positions for each // input, and comparing them numerically. The // lexicographic nature of the sorting comes from the // fact that we are doing this per-digit comparison // rather than considering the input value as a whole. const auto max_pos = std::max(lhs_digits, rhs_digits); for (auto pos = 0; pos < max_pos; pos++) { if (lhs_digits - pos == 0) { // Ran out of digits on the LHS; // prioritise the shorter input return LHS_FIRST; } else if (rhs_digits - pos == 0) { // Ran out of digits on the RHS; // prioritise the shorter input return RHS_FIRST; } else { const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE; const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE; if (lhs_x < rhs_x) return LHS_FIRST; else if (rhs_x < lhs_x) return RHS_FIRST; } } // If we reached the end and everything still // matches up, then something probably went wrong // as I''d have expected to catch this in the tests // for equality. assert("Unknown case encountered"); // dyp: suppress warning and throw throw "up"; } ); } }; namespace ndyp { // helper to provide integers with the same number of digits template<class T, class U> std::pair<T, T> lexicographic_pair_helper(T const p, U const maxDigits) { auto const digits = count_digits(p); // append zeros so that `l` has `maxDigits` digits auto const l = static_cast<T>( p * my_pow10(maxDigits-digits) ); return {l, p}; } template<class RaIt> using pair_vec = std::vector<std::pair<typename std::iterator_traits<RaIt>::value_type, typename std::iterator_traits<RaIt>::value_type>>; template<class RaIt> pair_vec<RaIt> lexicographic_sort(RaIt p_beg, RaIt p_end) { if(p_beg == p_end) return pair_vec<RaIt>{}; auto max = *std::max_element(p_beg, p_end); auto maxDigits = count_digits(max); pair_vec<RaIt> result; result.reserve( std::distance(p_beg, p_end) ); for(auto i = p_beg; i != p_end; ++i) result.push_back( lexicographic_pair_helper(*i, maxDigits) ); using value_type = typename pair_vec<RaIt>::value_type; std::sort(begin(result), end(result), [](value_type const& l, value_type const& r) { if(l.first < r.first) return true; if(l.first > r.first) return false; return l.second < r.second; } ); return result; } } struct dyp { template<class RaIt> void operator()(RaIt b, RaIt e) { auto pairvec = ndyp::lexicographic_sort(b, e); std::transform(begin(pairvec), end(pairvec), b, [](typename decltype(pairvec)::value_type const& e) { return e.second; }); } }; namespace nnim { bool comp(int l, int r) { int lv[10] = {}; // probably possible to get this from numeric_limits int rv[10] = {}; int lc = 10; // ditto int rc = 10; while (l || r) { if (l) { auto t = l / 10; lv[--lc] = l - (t * 10); l = t; } if (r) { auto t = r / 10; rv[--rc] = r - (t * 10); r = t; } } while (lc < 10 && rc < 10) { if (lv[lc] == rv[rc]) { lc++; rc++; } else return lv[lc] < rv[rc]; } return lc > rc; } } struct nim { template<class RaIt> void operator()(RaIt b, RaIt e) { std::sort(b, e, nnim::comp); } }; struct pts { template<class T> static bool lex_less(T a, T b) { unsigned la = 1, lb = 1; for (T t = a; t > 9; t /= 10) ++la; for (T t = b; t > 9; t /= 10) ++lb; const bool ll = la < lb; while (la > lb) { b *= 10; ++lb; } while (lb > la) { a *= 10; ++la; } return a == b ? ll : a < b; } template<class RaIt> void operator()(RaIt b, RaIt e) { std::sort(b, e, lex_less<typename std::iterator_traits<RaIt>::value_type>); } }; struct epost { static bool compare(int x, int y) { static const double limit = .5 * (log(INT_MAX) - log(INT_MAX-1)); double lx = log10(x); double ly = log10(y); double fx = lx - floor(lx); // Get the mantissa of lx. double fy = ly - floor(ly); // Get the mantissa of ly. return fabs(fx - fy) < limit ? lx < ly : fx < fy; } template<class RaIt> void operator()(RaIt b, RaIt e) { std::sort(b, e, compare); } }; struct nyar { static bool lexiSmaller(int i1, int i2) { int digits1 = count_digits(i1); int digits2 = count_digits(i2); double val1 = i1/pow(10.0, digits1-1); double val2 = i2/pow(10.0, digits2-1); while (digits1 > 0 && digits2 > 0 && (int)val1 == (int)val2) { digits1--; digits2--; val1 = (val1 - (int)val1)*10; val2 = (val2 - (int)val2)*10; } if (digits1 > 0 && digits2 > 0) { return (int)val1 < (int)val2; } return (digits2 > 0); } template<class RaIt> void operator()(RaIt b, RaIt e) { std::sort(b, e, lexiSmaller); } }; struct notbad { static int up_10pow(int n) { int ans = 1; while (ans < n) ans *= 10; return ans; } static bool compare(int v1, int v2) { int ceil1 = up_10pow(v1), ceil2 = up_10pow(v2); while ( ceil1 != 0 && ceil2 != 0) { if (v1 / ceil1 < v2 / ceil2) return true; else if (v1 / ceil1 > v2 / ceil2) return false; ceil1 /= 10; ceil2 /= 10; } if (v1 < v2) return true; return false; } template<class RaIt> void operator()(RaIt b, RaIt e) { std::sort(b, e, compare); } };


Creo que las siguientes funciones funcionan como una función de comparación de ordenación para enteros positivos siempre que el tipo de entero utilizado sea sustancialmente más estrecho que el tipo double (por ejemplo, 32 bits int y 64 bits double ) y la rutina log10 utilizada devuelve resultados exactamente correctos para las potencias exactas de 10 (lo que hace una buena implementación):

static const double limit = .5 * (log(INT_MAX) - log(INT_MAX-1)); double lx = log10(x); double ly = log10(y); double fx = lx - floor(lx); // Get the mantissa of lx. double fy = ly - floor(ly); // Get the mantissa of ly. return fabs(fx - fy) < limit ? lx < ly : fx < fy;

Funciona comparando las mantisas de los logaritmos. Las mantisas son las partes fraccionarias del logaritmo e indican el valor de los dígitos significativos de un número sin la magnitud (por ejemplo, los logaritmos de 31, 3.1 y 310 tienen exactamente la misma mantisa).

El propósito de fabs(fx - fy) < limit es permitir errores en la toma del logaritmo, que se producen tanto porque las implementaciones de log10 son imperfectas como porque el formato de punto flotante provoca algún error. (Las partes enteras de los logaritmos de 31 y 310 utilizan diferentes números de bits, por lo que quedan diferentes números de bits para el significado, por lo que terminan redondeando a valores ligeramente diferentes). Siempre que el tipo entero sea sustancialmente más estrecho que el tipo double , el limit calculado será mucho mayor que el error en log10 . Por lo tanto, el fabs(fx - fy) < limit prueba fabs(fx - fy) < limit esencialmente nos dice si dos mantisas calculadas serían iguales si se calcularan exactamente.

Si las mantisas difieren, indican el orden lexicográfico, por lo que devolvemos fx < fy . Si son iguales, entonces la parte entera del logaritmo nos dice el orden, por lo que devolvemos lx < ly .

Es simple probar si log10 devuelve resultados correctos para cada potencia de diez, ya que hay muy pocos de ellos. Si no lo hace, los ajustes se pueden hacer fácilmente: Insertar if (1-fx < limit) fx = 0; if (1-fu < limit) fy = 0; if (1-fx < limit) fx = 0; if (1-fu < limit) fy = 0; . Esto permite que cuando log10 devuelva algo como 4.99999 ... cuando debería haber devuelto 5.

Este método tiene la ventaja de no usar bucles o división (lo cual consume mucho tiempo en muchos procesadores).


La tarea parece un ajuste natural para una variante MSD de Radix Sort con relleno ( http://en.wikipedia.org/wiki/Radix_sort ).

Depende de la cantidad de código que quieras darle. El código simple que muestran los demás es O (log n) complejidad, mientras que una clasificación de radix totalmente optimizada sería O (kn).


Para proporcionar cualquier orden de clasificación personalizado, puede proporcionar un comparador a std::sort . En este caso, va a ser algo complejo, usando logaritmos para inspeccionar dígitos individuales de su número en la base 10.

Aquí hay un ejemplo: los comentarios en línea describen lo que está pasando.

#include <iostream> #include <algorithm> #include <cmath> #include <cassert> int main() { int input[] { 100, 21, 22, 99, 1, 927, -50, -24, -160 }; /** * Sorts the array lexicographically. * * The trick is that we have to compare digits left-to-right * (considering typical Latin decimal notation) and that each of * two numbers to compare may have a different number of digits. * * This is very efficient in storage space, but inefficient in * execution time; an approach that pre-visits each element and * stores a translated representation will at least double your * storage requirements (possibly a problem with large inputs) * but require only a single translation of each element. */ std::sort( std::begin(input), std::end(input), [](int lhs, int rhs) -> bool { // Returns true if lhs < rhs // Returns false otherwise const auto BASE = 10; const bool LHS_FIRST = true; const bool RHS_FIRST = false; const bool EQUAL = false; // There''s no point in doing anything at all // if both inputs are the same; strict-weak // ordering requires that we return `false` // in this case. if (lhs == rhs) { return EQUAL; } // Compensate for sign if (lhs < 0 && rhs < 0) { // When both are negative, sign on its own yields // no clear ordering between the two arguments. // // Remove the sign and continue as for positive // numbers. lhs *= -1; rhs *= -1; } else if (lhs < 0) { // When the LHS is negative but the RHS is not, // consider the LHS "first" always as we wish to // prioritise the leading ''-''. return LHS_FIRST; } else if (rhs < 0) { // When the RHS is negative but the LHS is not, // consider the RHS "first" always as we wish to // prioritise the leading ''-''. return RHS_FIRST; } // Counting the number of digits in both the LHS and RHS // arguments is *almost* trivial. const auto lhs_digits = ( lhs == 0 ? 1 : std::ceil(std::log(lhs+1)/std::log(BASE)) ); const auto rhs_digits = ( rhs == 0 ? 1 : std::ceil(std::log(rhs+1)/std::log(BASE)) ); // Now we loop through the positions, left-to-right, // calculating the digit at these positions for each // input, and comparing them numerically. The // lexicographic nature of the sorting comes from the // fact that we are doing this per-digit comparison // rather than considering the input value as a whole. const auto max_pos = std::max(lhs_digits, rhs_digits); for (auto pos = 0; pos < max_pos; pos++) { if (lhs_digits - pos == 0) { // Ran out of digits on the LHS; // prioritise the shorter input return LHS_FIRST; } else if (rhs_digits - pos == 0) { // Ran out of digits on the RHS; // prioritise the shorter input return RHS_FIRST; } else { const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE; const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE; if (lhs_x < rhs_x) return LHS_FIRST; else if (rhs_x < lhs_x) return RHS_FIRST; } } // If we reached the end and everything still // matches up, then something probably went wrong // as I''d have expected to catch this in the tests // for equality. assert("Unknown case encountered"); } ); std::cout << ''{''; for (auto x : input) std::cout << x << ", "; std::cout << ''}''; // Output: {-160, -24, -50, 1, 100, 21, 22, 927, 99, } }

Demo

Hay formas más rápidas de calcular la cantidad de dígitos en un número , pero lo anterior lo ayudará a comenzar.


Puede intentar usar el operador% para darle acceso a cada dígito individual; por ejemplo, el 121% 100 le dará el primer dígito y lo marcará de esa manera, pero tendrá que encontrar una manera de evitar el hecho de que tienen diferentes tamaños.

Entonces encuentra el valor máximo en la matriz. No sé si hay una función para esto incorporada que podrías probar.

int Max (int* pdata,int size) { int temp_max =0 ; for (int i =0 ; i < size ; i++) { if (*(pdata+i) > temp_max) { temp_max = *(pdata+i); } } return temp_max; }

Esta función devolverá el número de dígitos en el número

int Digit_checker(int n) { int num_digits = 1; while (true) { if ((n % 10) == n) return num_digits; num_digits++; n = n/10; } return num_digits; }

Deje que el número de dígitos en max igual a n. Una vez que tenga este, abra un bucle for en el formato de for (int i = 1; i <n; i ++)

luego puede ir a través de su y usar "data [i]% (10 ^ (ni))" para obtener acceso al primer dígito, luego ordenar eso y luego en la próxima iteración obtendrá acceso al segundo dígito. Aunque no sé cómo los clasificarás.

No funcionará con números negativos y tendrá que sortear los datos [i]% (10 ^ (ni)) que se devuelven a sí mismos para números con menos dígitos que el máximo


Una solución compacta si todos sus números no son negativos y son lo suficientemente pequeños para que multiplicarlos por 10 no provoque un desbordamiento:

template<class T> bool lex_less(T a, T b) { unsigned la = 1, lb = 1; for (T t = a; t > 9; t /= 10) ++la; for (T t = b; t > 9; t /= 10) ++lb; const bool ll = la < lb; while (la > lb) { b *= 10; ++lb; } while (lb > la) { a *= 10; ++la; } return a == b ? ll : a < b; }

Ejecutalo de esta manera:

#include <iostream> #include <algorithm> int main(int, char **) { unsigned short input[] = { 100, 21 , 22 , 99 , 1 , 927 }; unsigned input_size = sizeof(input) / sizeof(input[0]); std::sort(input, input + input_size, lex_less<unsigned short>); for (unsigned i = 0; i < input_size; ++i) { std::cout << '' '' << input[i]; } std::cout << std::endl; return 0; }


Use std::sort() con una función de comparación adecuada. Esto reduce los requisitos de memoria.

La función de comparación puede usar n % 10 , n / 10 % 10 , n / 100 % 10 etc. para acceder a los dígitos individuales (para enteros positivos; los enteros negativos funcionan de manera un poco diferente).


Basado en la respuesta de @Oswald, a continuación hay un código que hace lo mismo.

#include <iostream> #include <vector> #include <algorithm> using namespace std; bool compare(string a, string b){ // Check each digit int i = 0, j = 0; while(i < a.size() && j < b.size()){ // If different digits if(a[i] - ''0'' != b[j] - ''0'') return (a[i] - ''0'' < b[j] - ''0''); i++, j++; } // Different sizes return (a.size() < b.size()); } int main(){ vector<string> array = {"1","2","3","4","5","6","7","8","9","10","11","12"}; sort(array.begin(), array.end(), compare); for(auto value : array) cout << value << " "; return 0; }

Entrada: 1 2 3 4 5 6 7 8 9 10 11 12

Salida: 1 10 11 12 2 3 4 5 6 7 8 9


Si bien algunas otras respuestas aquí (Lightness''s, notbad''s) ya muestran un código bastante bueno, creo que puedo agregar una solución que podría ser más eficaz (ya que no requiere división ni potencia en cada ciclo; pero requiere aritmética de punto flotante, que nuevamente podría hacerlo lento, y posiblemente inexacto para grandes números):

#include <algorithm> #include <iostream> #include <assert.h> // method taken from http://.com/a/1489873/671366 template <class T> int numDigits(T number) { int digits = 0; if (number < 0) digits = 1; // remove this line if ''-'' counts as a digit while (number) { number /= 10; digits++; } return digits; } bool lexiSmaller(int i1, int i2) { int digits1 = numDigits(i1); int digits2 = numDigits(i2); double val1 = i1/pow(10.0, digits1-1); double val2 = i2/pow(10.0, digits2-1); while (digits1 > 0 && digits2 > 0 && (int)val1 == (int)val2) { digits1--; digits2--; val1 = (val1 - (int)val1)*10; val2 = (val2 - (int)val2)*10; } if (digits1 > 0 && digits2 > 0) { return (int)val1 < (int)val2; } return (digits2 > 0); } int main(int argc, char* argv[]) { // just testing whether the comparison function works as expected: assert (lexiSmaller(1, 100)); assert (!lexiSmaller(100, 1)); assert (lexiSmaller(100, 22)); assert (!lexiSmaller(22, 100)); assert (lexiSmaller(927, 99)); assert (!lexiSmaller(99, 927)); assert (lexiSmaller(1, 927)); assert (!lexiSmaller(927, 1)); assert (lexiSmaller(21, 22)); assert (!lexiSmaller(22, 21)); assert (lexiSmaller(22, 99)); assert (!lexiSmaller(99, 22)); // use the comparison function for the actual sorting: int input[] = { 100 , 21 , 22 , 99 , 1 ,927 }; std::sort(&input[0], &input[5], lexiSmaller); std::cout << "sorted: "; for (int i=0; i<6; ++i) { std::cout << input[i]; if (i<5) { std::cout << ", "; } } std::cout << std::endl; return 0; }

Aunque debo admitir que todavía no he probado el rendimiento.


Aquí está la solución tonta que no utiliza ningún truco de punto flotante. Es más o menos lo mismo que la comparación de cadenas, pero no usa una cadena por voz, tampoco maneja números negativos, para agregar una sección en la parte superior ...

bool comp(int l, int r) { int lv[10] = {}; // probably possible to get this from numeric_limits int rv[10] = {}; int lc = 10; // ditto int rc = 10; while (l || r) { if (l) { auto t = l / 10; lv[--lc] = l - (t * 10); l = t; } if (r) { auto t = r / 10; rv[--rc] = r - (t * 10); r = t; } } while (lc < 10 && rc < 10) { if (lv[lc] == rv[rc]) { lc++; rc++; } else return lv[lc] < rv[rc]; } return lc > rc; }

Es rápido, y estoy seguro de que es posible hacerlo aún más rápido, pero funciona y es lo suficientemente tonto como para entender ...

EDIT: Comí para volcar un montón de código, pero aquí hay una comparación de todas las soluciones hasta ahora ...

#include <iostream> #include <vector> #include <algorithm> #include <iterator> #include <random> #include <vector> #include <utility> #include <cmath> #include <cassert> #include <chrono> std::pair<int, int> lexicographic_pair_helper(int p, int maxDigits) { int digits = std::log10(p); int l = p*std::pow(10, maxDigits-digits); return {l, p}; } bool l_comp(int l, int r) { int lv[10] = {}; // probably possible to get this from numeric_limits int rv[10] = {}; int lc = 10; // ditto int rc = 10; while (l || r) { if (l) { auto t = l / 10; lv[--lc] = l - (t * 10); l = t; } if (r) { auto t = r / 10; rv[--rc] = r - (t * 10); r = t; } } while (lc < 10 && rc < 10) { if (lv[lc] == rv[rc]) { lc++; rc++; } else return lv[lc] < rv[rc]; } return lc > rc; } int up_10pow(int n) { int ans = 1; while (ans < n) ans *= 10; return ans; } bool l_comp2(int v1, int v2) { int n1 = up_10pow(v1), n2 = up_10pow(v2); while ( v1 != 0 && v2 != 0) { if (v1 / n1 < v2 / n2) return true; else if (v1 / n1 > v2 / n2) return false; v1 /= 10; v2 /= 10; n1 /= 10; n2 /= 10; } if (v1 == 0 && v2 != 0) return true; return false; } int main() { std::vector<int> numbers; { constexpr int number_of_elements = 1E6; std::random_device rd; std::mt19937 gen( rd() ); std::uniform_int_distribution<> dist; for(int i = 0; i < number_of_elements; ++i) numbers.push_back( dist(gen) ); } std::vector<int> lo(numbers); std::vector<int> dyp(numbers); std::vector<int> nim(numbers); std::vector<int> nb(numbers); std::cout << "starting..." << std::endl; { auto start = std::chrono::high_resolution_clock::now(); /** * Sorts the array lexicographically. * * The trick is that we have to compare digits left-to-right * (considering typical Latin decimal notation) and that each of * two numbers to compare may have a different number of digits. * * This probably isn''t very efficient, so I wouldn''t do it on * "millions" of numbers. But, it works... */ std::sort( std::begin(lo), std::end(lo), [](int lhs, int rhs) -> bool { // Returns true if lhs < rhs // Returns false otherwise const auto BASE = 10; const bool LHS_FIRST = true; const bool RHS_FIRST = false; const bool EQUAL = false; // There''s no point in doing anything at all // if both inputs are the same; strict-weak // ordering requires that we return `false` // in this case. if (lhs == rhs) { return EQUAL; } // Compensate for sign if (lhs < 0 && rhs < 0) { // When both are negative, sign on its own yields // no clear ordering between the two arguments. // // Remove the sign and continue as for positive // numbers. lhs *= -1; rhs *= -1; } else if (lhs < 0) { // When the LHS is negative but the RHS is not, // consider the LHS "first" always as we wish to // prioritise the leading ''-''. return LHS_FIRST; } else if (rhs < 0) { // When the RHS is negative but the LHS is not, // consider the RHS "first" always as we wish to // prioritise the leading ''-''. return RHS_FIRST; } // Counting the number of digits in both the LHS and RHS // arguments is *almost* trivial. const auto lhs_digits = ( lhs == 0 ? 1 : std::ceil(std::log(lhs+1)/std::log(BASE)) ); const auto rhs_digits = ( rhs == 0 ? 1 : std::ceil(std::log(rhs+1)/std::log(BASE)) ); // Now we loop through the positions, left-to-right, // calculating the digit at these positions for each // input, and comparing them numerically. The // lexicographic nature of the sorting comes from the // fact that we are doing this per-digit comparison // rather than considering the input value as a whole. const auto max_pos = std::max(lhs_digits, rhs_digits); for (auto pos = 0; pos < max_pos; pos++) { if (lhs_digits - pos == 0) { // Ran out of digits on the LHS; // prioritise the shorter input return LHS_FIRST; } else if (rhs_digits - pos == 0) { // Ran out of digits on the RHS; // prioritise the shorter input return RHS_FIRST; } else { const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE; const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE; if (lhs_x < rhs_x) return LHS_FIRST; else if (rhs_x < lhs_x) return RHS_FIRST; } } // If we reached the end and everything still // matches up, then something probably went wrong // as I''d have expected to catch this in the tests // for equality. assert("Unknown case encountered"); } ); auto end = std::chrono::high_resolution_clock::now(); auto elapsed = end - start; std::cout << "Lightness: " << elapsed.count() << ''/n''; } { auto start = std::chrono::high_resolution_clock::now(); auto max = *std::max_element(begin(dyp), end(dyp)); int maxDigits = std::log10(max); std::vector<std::pair<int,int>> temp; temp.reserve(dyp.size()); for(auto const& e : dyp) temp.push_back( lexicographic_pair_helper(e, maxDigits) ); std::sort(begin(temp), end(temp), [](std::pair<int, int> const& l, std::pair<int, int> const& r) { if(l.first < r.first) return true; if(l.first > r.first) return false; return l.second < r.second; }); auto end = std::chrono::high_resolution_clock::now(); auto elapsed = end - start; std::cout << "Dyp: " << elapsed.count() << ''/n''; } { auto start = std::chrono::high_resolution_clock::now(); std::sort (nim.begin(), nim.end(), l_comp); auto end = std::chrono::high_resolution_clock::now(); auto elapsed = end - start; std::cout << "Nim: " << elapsed.count() << ''/n''; } // { // auto start = std::chrono::high_resolution_clock::now(); // std::sort (nb.begin(), nb.end(), l_comp2); // auto end = std::chrono::high_resolution_clock::now(); // auto elapsed = end - start; // std::cout << "notbad: " << elapsed.count() << ''/n''; // } std::cout << (nim == lo) << std::endl; std::cout << (nim == dyp) << std::endl; std::cout << (lo == dyp) << std::endl; // std::cout << (lo == nb) << std::endl; }


Sobrecargar el operador <para comparar dos enteros lexicográficamente. Para cada entero, encuentra el 10 ^ k más pequeño, que no es menor que el entero dado. Que comparar los dígitos uno por uno.

class CmpIntLex { int up_10pow(int n) { int ans = 1; while (ans < n) ans *= 10; return ans; } public: bool operator ()(int v1, int v2) { int ceil1 = up_10pow(v1), ceil2 = up_10pow(v2); while ( ceil1 != 0 && ceil2 != 0) { if (v1 / ceil1 < v2 / ceil2) return true; else if (v1 / ceil1 > v2 / ceil2) return false; ceil1 /= 10; ceil2 /= 10; } if (v1 < v2) return true; return false; } int main() { vector<int> vi = {12,45,12134,85}; sort(vi.begin(), vi.end(), CmpIntLex()); }