hello all,

i’ve encountered this problem, when using osqp cuda-1.0, that osqp_demo.c costs 0.58s, which is far more than expectation. (demo in master branch with similar problem dimension would take 3e-5 s or so) the problem dimension is not big and cuda osqp is said to cost less calculation time.i tried nvprof to profile cuda time cost, and cudaFree consumed the 66% of time. by the way, my puter uses gtx 1660 and cuda-11.

can any one help me figure out this problem? is there any setting that i should configure before running the demo.

```
hello all,
i've encountered this problem, when using osqp cuda-1.0, that osqp_demo.c costs 0.58s, which is far more than expectation. (demo in master branch with similar problem dimension would take 3e-5 s or so) the problem dimension is not big and cuda osqp is said to cost less calculation time.i tried nvprof to profile cuda time cost, and cudaFree consumed the 66% of time. by the way, my puter uses gtx 1660 and cuda-11.
can any one help me figure out this problem? is there any setting that i should configure before running the demo.
problem: variables n = 2, constraints m = 3
nnz(P) + nnz(A) = 7
settings: linear system solver = cuda pcg,
eps_abs = 1.0e-03, eps_rel = 1.0e-03,
eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
rho = 1.00e-01 (adaptive),
sigma = 1.00e-06, alpha = 1.60, max_iter = 4000
check_termination: on (interval 5),
scaling: on, scaled_termination: off
warm start: on, polish: on, time_limit: off
iter objective pri res dua res rho time
1 -2.4617e-01 1.76e+00 9.87e-01 1.00e-01 5.74e-01s
30 1.8769e+00 1.02e-03 4.72e-04 4.15e-01 5.82e-01s
plsh 1.8800e+00 0.00e+00 2.38e-07 -------- 5.84e-01s
status: solved
solution polish: successful
number of iterations: 30
optimal objective: 1.8800
run time: 5.84e-01s
optimal rho estimate: 1.19e+00
```