首页 > > 详细

辅导3D 辅导留学生Matlab语言

1  
Fundamentals  of  Computer  Vision,  Spring  2020  
Project  Assignment  2  
(3D  point  to  2D)  Point  and  Inverse  (2D  point  to  3D  ray)  Camera  Projection  
Due  Date:  Sunday,  April  19,  2020  11:59pm  EST  
 
1   Motivation  
The  goal  of  this  project  is  to  implement  forward  (3D  point  to  2D  point)  and  inverse  (2D  point  to  
3D   ray)   camera   projection,   and   to   perform   triangulation   from   two   cameras   to   do   3D  
reconstruction  from  pairs  of  matching  2D  image  points.    This  project  will  involve  understanding  
relationships   between   2D   image   coordinates   and   3D   world   coordinates   and   the   chain   of  
transformations   that   make   up   the   pinhole   camera   model   that   was   discussed   in   class.   Your  
specific   tasks  will   be   to   project   3D   coordinates   (sets   of   3D   joint   locations   on   a   human  body,  
measured  by  motion  capture  equipment)  into  image  pixel  coordinates  that  you  can  overlay  on  
top   of   an   image,   to   then   convert   those   2D   points   back   into   3D   viewing   rays,   and   then  
triangulate   the   viewing   rays  of   two   camera   views   to   recover   the  original   3D   coordinates   you  
started  with  (or  values  close  to  those  coordinates).  
You  will  be  provided:  
•   3D  point  data  for  each  of  12  body  joints  for  a  set  of  motion  capture  frames  recorded  of  a  
subject   performing   a   Taiji   exercise.     The   12   joints   represent   the   shoulders,   elbows,  
wrists,   hips,   knees,   and   ankles.   Each   joint   will   be   provided   for   a   time   series   that   is  
~30,000  frames  long,  representing  a  5-­‐minute  performance  recorded  at  100  frames  per  
second  in  a  3D  motion  capture  lab.    
•   Camera  calibration  parameters  (Intrinsic  and  extrinsic)  for  two  video  cameras  that  were  
also  recording  the  performance.    Each  set  of  camera  parameters  contains  all  information  
needed  to  project  3D  joint  data  into  pixel  coordinates  in  one  of  the  two  camera  views.      
•   An   mp4   movie   file   containing   the   video   frames   recorded   by   each   of   the   two   video  
cameras.    The  video  was  recorded  at  50  frames  per  second.  
While   this   project   appears   to   be   a   simple   task   at   first,   you   will   discover   that   practical  
applications  have  hurdles   to  overcome.  Specifically,   in  each   frame  of  data   there  are  12   joints  
with  ~30,000  frames  of  data  to  be  projected  into  2  separate  camera  coordinate  systems.  That  is  
over   ~700,000   joint   projections   into   camera   views   and   ~350,000   reconstructions   back   into  
world   coordinates!     Furthermore,   you   will   need   to   have   a   very   clear   understanding   of   the  
pinhole   camera   model   that   we   covered   in   class,   to   be   able   to   write   functions   to   correctly  
project  from  3D  to  2D  and  back  again.  
The  specific  project  outcomes  include:  
•   Experience  in  Matlab  programming    
2  
•   Understanding  intrinsic  and  extrinsic  camera  parameters  
•   Projection  of  3D  data  into  2D  images  coordinates    
•   Reconstruction  of  3D  locations  by  triangulation  from  two  camera  views  
•   Measurement  of  3D  reconstruction  error  
•   Practical  understanding  of  epipolar  geometry.      
 
2   The  Basic  Operations  
The  following  steps  will  be  essential  to  the  successful  completion  of  the  project:  
1.   Input  and  parsing  of  mocap  dataset.  Read  in  and  properly  interpret  the  3D  joint  data.  
2.   Input   and  parsing  of   camera  parameters.   Read   in   each   set  of   camera  parameters   and  
interpret  with  respect  to  our  mathematical  camera  projection  model.  
3.   Use  the  camera  parameters  to  project  3D  joints   into  pixel   locations   in  each  of  the  two  
image  coordinate  systems.  
4.   Reconstruct   the   3D   location   of   each   joint   in   the   world   coordinate   system   from   the  
projected  2D  joints  you  produced  in  Step3,  using  two-­‐camera  triangulation.  
5.   Compute  Euclidean  (L²)  distance  between  all  joint  pairs.  This  is  a  per  joint,  per  frame  L²  
distance   between   the   original   3D   joints   and   the   reconstructed   3D   joints   providing   a  
quantitative  analysis  of  the  distance  between  the  joint  pairs.  
2.1  Reading  the  3D  joint  data  
The  motion  capture  data  is  in  file  Subject4-­‐Session3-­‐Take4_mocapJoints.mat  .    Once  you  load  it  
in,  you  have  a  21614x12x4  array  of  numbers.    The  first  dimension  is  frame  number,  the  second  
is  joint  number,  and  the  last  is  joint  coordinates  +  confidence  score  for  each  joint.    Specifically,  
the  following  snippet  of  code  will  extract  x,y,z  locations  for  the  joints  in  a  specific  mocap  frame.      
mocapFnum = 1000; %mocap frame number 1000
x = mocapJoints(mocapFnum,:,1); %array of 12 X coordinates
y = mocapJoints(mocapFnum,:,2); % Y coordinates
z = mocapJoints(mocapFnum,:,3); % Z coordinates
conf = mocapJoints(mocapFnum,:,4) %confidence values

Each   joint  has  a  binary  “confidence”  associated  with   it.   Joints  that  are  not  defined   in  a  frame  
have  a  confidence  of  0.  Feel  free  to  Ignore  any  frames  don’t  have  all  confidences  =  1.  
There  are  12  joints,  in  this  order:  
1 Right shoulder
2 Right elbow
3 Right wrist
4 Left shoulder
5 Left elbow
6 Left wrist
7 Right hip
3  
8 Right knee
9 Right ankle
10 Left hip
11 Left knee
12 Left ankle

2.2  Reading  camera  parameters  
There   are   two   cameras,   called   “vue2”   and   “vue4”,   and   two   files   specifying   their   calibration  
parameters:  vue2CalibInfo.mat  and  vue4Calibinfo.mat  .    Each  of  these  contains  a  structure  with  
intrinsic,  extrinsic,  and  nonlinear  distortion  parameters  for  each  camera.    Here  are  the  values  of  
the  fields  after  reading  in  one  of  the  structures  
vue2 =
struct with fields:
foclen: 1557.8
orientation: [-0.27777 0.7085 -0.61454 -0.20789]
position: [-4450.1 5557.9 1949.1]
prinpoint: [976.04 562.82]
radial: [1.4936e-07 4.3841e-14]
aspectratio: 1
skew: 0
Pmat: [3×4 double]
Rmat: [3×3 double]
Kmat: [3×3 double]
 
Part  of  your  job  will  be  figuring  out  what  those  fields  mean  in  regards  to  the  pinhole  camera  
model  parameters  we  discussed  in  class  lectures.    Which  are  the  internal  parameters?    Which  
are  the  external  parameters?    Which  internal  parameters  combine  to  form  the  matrix  Kmat?    
Which  external  parameters  combine  to  form  the  matrix  Pmat?    Hint:  the  field  “orientation”  is  a  
unit  quaternion  vector  describing  the  camera  orientation,  which  is  also  represented  by  the  3x3  
matrix  Rmat.    What  is  the  location  of  the  camera?    Verify  that  location  of  the  camera  and  the  
rotation  Rmat  of  the  camera  combine  in  the  expected  way  (expected  as  per  one  of  the  slides  in  
our  class  lectures  on  camera  parameters)  to  yield  the  appropriate  entries  in  Pmat.  
2.3  Projecting  3D  points  into  2D  pixel  locations  
Ignoring  the  nonlinear  distortion  parameters  in  the  “radial”  field  for  now,  write  a  function  from  
scratch  that  takes  either  a  single  3D  point  or  an  array  of  3D  points  and  projects  it  (or  them)  into  
2D  pixel  coordinates.    You  will  want  to  refer  to  our  lecture  notes  for  the  transformation  chain  
that  maps  3D  world  coordinates  into  2D  pixel  coordinates.  
For   verification,   it  will   be   helpful   to   visualize   your   projected   2D   joints   by   overlaying   them  as  
points  on  the  2D  video  frame  corresponding  to  the  motion  capture  frame.  Two  video  files  are  
given   to  you:   Subject4-­‐Session3-­‐24form-­‐Full-­‐Take4-­‐Vue2.mp4   is   the  video   from  camera  vue2,  
and  Subject4-­‐Session3-­‐24form-­‐Full-­‐Take4-­‐Vue4.mp4   is   the  video  from  camera  vue4.    To  get  a  
video  frame  out  of  the  mp4  file  we  can  use  VideoReader  in  matlab.    It  is  nonintuitive  to  use,  so  
4  
to  help  out,  here  is  a  snippet  of  code  that  can  read  the  video  frame  from  vue2  corresponding  to  
the  motion  capture  frame  number  mocapFnum.  
%initialization of VideoReader for the vue video.
%YOU ONLY NEED TO DO THIS ONCE AT THE BEGINNING
filenamevue2mp4 = 'Subject4-Session3-24form-Full-Take4-Vue2.mp4';
vue2video = VideoReader(filenamevue2mp4);

%now we can read in the video for any mocap frame mocapFnum.
%the (50/100) factor is here to account for the difference in frame
%rates between video (50 fps) and mocap (100 fps).

vue2video.CurrentTime = (mocapFnum-1)*(50/100)/vue2video.FrameRate;
vid2Frame = readFrame(vue2video);

The   result   is   a   1088x1920x3   unsigned   8-­‐bit   integer   color   image   that   can   be   displayed   by  
image(vid2Frame).  
If   all   went   well   with   your   projection   of   3D   to   2D,   you   should   be   able   to   plot   the   x   and   y  
coordinates   of   your   2D   points   onto   the   image,   and   they   should   appear   to   be   in   roughly   the  
correct  places.    IMPORTANT  NOTE:  since  we  ignore  nonlinear  distortion  for  now,  it  might  be  the  
case  that  your  projected  points   look  shifted  off  from  the  correct   image  locations.    That   is  OK.    
However,  if  the  body  points  are  grossly  incorrect  (body  is  much  larger  or  smaller  or  forming  a  
really  weird  shape  that  doesn’t   look   like   the  arms  and   legs  of   the  person   in   the   image),   then  
something  is  likely  wrong  in  your  projection  code.  
2.4  Triangulation  back  into  a  set  of  3D  scene  points  
As  a  result  of  the  above  step,  for  a  given  mocap  frame  you  now  have  two  sets  of  corresponding  
2D  pixel  locations,  in  the  two  camera  views.    Perform  triangulation  on  each  of  the  12  pairs  of  2D  
points  to  estimate  a  recovered  3D  point  position.    As  per  our  class  lecture  on  triangulation,  this  
will   be   done,   for   a   corresponding   pair   of   2D   points,   by   converting   each   into   a   viewing   ray  
represented  by  camera  center  and  unit  vector  pointing  along   the   ray  passing   through   the  2D  
point  in  the  image  and  out  into  the  3D  scene.    You  will  then  compute  the  3D  point  location  that  
is  closest  to  both  sets  of  rays  (because  they  might  not  exactly  intersect).    Go  back  and  refer  to  
our  lecture  on  Triangulation  to  see  how  to  do  the  computation.  
 

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!