Osqp cuda version demo costs more time

hello all,
i’ve encountered this problem, when using osqp cuda-1.0, that osqp_demo.c costs 0.58s, which is far more than expectation. (demo in master branch with similar problem dimension would take 3e-5 s or so) the problem dimension is not big and cuda osqp is said to cost less calculation time.i tried nvprof to profile cuda time cost, and cudaFree consumed the 66% of time. by the way, my puter uses gtx 1660 and cuda-11.

can any one help me figure out this problem? is there any setting that i should configure before running the demo.

hello all,
i've encountered this problem, when using osqp cuda-1.0, that osqp_demo.c costs 0.58s, which is far more than expectation. (demo in master branch with similar problem dimension would take 3e-5 s or so) the problem dimension is not big and cuda osqp is said to cost less calculation time.i tried nvprof to profile cuda time cost, and cudaFree consumed the 66% of time. by the way, my puter uses gtx 1660 and cuda-11.

can any one help me figure out this problem? is there any setting that i should configure before running the demo.

problem:  variables n = 2, constraints m = 3
          nnz(P) + nnz(A) = 7
settings: linear system solver = cuda pcg,
          eps_abs = 1.0e-03, eps_rel = 1.0e-03,
          eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
          rho = 1.00e-01 (adaptive),
          sigma = 1.00e-06, alpha = 1.60, max_iter = 4000
          check_termination: on (interval 5),
          scaling: on, scaled_termination: off
          warm start: on, polish: on, time_limit: off

iter   objective    pri res    dua res    rho        time
   1  -2.4617e-01   1.76e+00   9.87e-01   1.00e-01   5.74e-01s
  30   1.8769e+00   1.02e-03   4.72e-04   4.15e-01   5.82e-01s
plsh   1.8800e+00   0.00e+00   2.38e-07   --------   5.84e-01s

status:               solved
solution polish:      successful
number of iterations: 30
optimal objective:    1.8800
run time:             5.84e-01s
optimal rho estimate: 1.19e+00