CDS data parsing

Discussion et échanges sur l'utilisation de MatLab en spectroscopie
Post Reply
Stephane Charbonnel
Posts: 70
Joined: Thu Sep 29, 2011 10:20 pm
Location: FR, 49 ou 44

CDS data parsing

Post by Stephane Charbonnel »

Hello,

A little code to explore CDS data parsing.
You can play with "regexp" to obtain informations of an object from CDS HTML page. To understand all patterns, you must look at the code of the HTML page.
It is necessary to test the size of some results because all informations are not given for all objects (try "VV_Cep" and "HD256128" ... I don't know why this last star).

Stephane

Code: Select all

Name = 'VV_Cep';  % Name from FITS 1D - ISIS; works also with 'VV Cep' for example.

% URL with questions after ?
URL = sprintf('http://simbad.u-strasbg.fr/simbad/sim-basic?Ident=%s&submit=SIMBAD+search',Name);
Str = urlread(URL);

% coordinates :
Type = 'ICRS';
pattern = sprintf('%s.{1,150}TT.{1,5}(?<RA>\\d{2}\\s\\d{2}\\s\\d{2}\\.\\d*)\\s(?<DEC>[+-]\\d{2}\\s\\d{2}\\s\\d{2}\\.\\d*).*<A HREF',Type);
Coord = regexp(Str,pattern,'names');

% Spectral type :
Type = 'Spectral type:';
pattern = sprintf('%s.{1,150}TT.{1,5}(?<SpectralType>\\S{2,20}\\s\\S).*<SPAN',Type);
SpectralType = regexp(Str,pattern,'names');

%MagV :
Type = 'Fluxes';
pattern = sprintf('%s.{1,1200}TT>.V......(?<MagV>\\d{1,2}\\.\\d{2}).*<SPAN',Type); 
MagV = regexp(Str,pattern,'names');

%Vr
Type = 'Radial velocity';
pattern = sprintf('%s.{1,150}km/s..(?<Vr>[+-]\\d{1,2}\\.\\d{1,2}).*<SPAN',Type);  
Radial_Velocity = regexp(Str,pattern,'names');

% PRINT :
disp(Coord.RA);
disp(Coord.DEC);
disp(MagV.MagV);

if size(SpectralType) > 0
    disp(SpectralType.SpectralType);
end

if size(Radial_Velocity) > 0
    disp(Radial_Velocity.Vr);
end
Stephane Charbonnel
Posts: 70
Joined: Thu Sep 29, 2011 10:20 pm
Location: FR, 49 ou 44

Re: CDS data parsing

Post by Stephane Charbonnel »

Hello,

A big new version as a function with help of Olivier Thizy. We have also corrected :
* Coordinates are now formated.
* Correction for reading of spectral type.
* Sign of Magnitude V could be '-' ;-) (only for a little number of stars).
* Possibility of only one digit after comma(for French) or dot.
* CDS is sometimes very slow ... so weboptions.Timeout at 10 (seconds).
* Radial velocity > 0 (as 'Gj436', star with exoplanet). It is not yet good for RV with no digit after dot ... (as 'BF_Cyg').


Regards
Stephane

Code: Select all

function CDSdata = CDS_Infos(Name)
%
% CDS_Infos.m
%
% Query CDS (Centre Données Strasbourg) for an object and return several
% information from that query parsing...
%
% v1.1 / 20160905 (c) Olivier Thizy, Stéphane Charbonnel
% Based on an idea/routine from Eran O. Ofek:
%    URL : http://weizmann.ac.il/home/eofek/matlab/
%
%

% Name = 'VV_Cep';  % Name from FITS 1D - ISIS; works also with 'VV Cep' for example.

% URL with questions after ?
URL = sprintf('http://simbad.u-strasbg.fr/simbad/sim-basic?Ident=%s&submit=SIMBAD+search',Name);
options = weboptions('Timeout',10);
Str = webread(URL);

% RA/Dec coordinates
Type = 'ICRS';
pattern = sprintf('%s.{1,150}TT.{1,5}(?<RAh>\\d{2})\\s(?<RAm>\\d{2})\\s(?<RAs>\\d{2}\\.\\d*)\\s(?<DECd>[+-]\\d{2})\\s(?<DECm>\\d{2})\\s(?<DECs>\\d{2}\\.\\d*).*<A HREF',Type);
Coord = regexp(Str,pattern,'names');
% CDSdata.Coord = Coord;
CDSdata.RA = [Coord.RAh ' ' Coord.RAm ' ' Coord.RAs];
CDSdata.RArad = (str2double(Coord.RAh)+(str2double(Coord.RAm)+str2double(Coord.RAs)/60)/60) * pi()/12;
CDSdata.Dec = [Coord.DECd ' ' Coord.DECm ' ' Coord.DECs];
CDSdata.DecRad = (str2double(Coord.DECd)+(str2double(Coord.DECm)+str2double(Coord.DECs)/60)/60) * pi()/180;


% Spectral type :
Type = 'Spectral type:';
pattern = sprintf('%s.{1,150}TT..(?<SpectralType>\\S{1,20}\\s\\S).*<SPAN',Type);
SpecT = regexp(Str,pattern,'names');
if isempty(SpecT)
    CDSdata.SpectralType = ''; % Spectral Type unknwon
else
    CDSdata.SpectralType = SpecT.SpectralType;
end


% V Magnitude
Type = 'Fluxes';
pattern = sprintf('%s.{1,1200}TT>.V......(?<SigneMagV>\\S).*<SPAN',Type);  
SigneMag_V = regexp(Str,pattern,'names');
if SigneMag_V.SigneMagV == '-'
    pattern = sprintf('%s.{1,1200}TT>.V......(?<MagV>[+-]\\d{1,2}\\.\\d{1,2}).*<SPAN',Type);
    Mag_V = regexp(Str,pattern,'names');
else
    pattern = sprintf('%s.{1,1200}TT>.V......(?<MagV>\\d{1,2}\\.\\d{1,2}).*<SPAN',Type);
    Mag_V = regexp(Str,pattern,'names'); 
end
CDSdata.MagV = Mag_V.MagV;


% Radial Velocity
Type = 'Radial velocity';
pattern = sprintf('%s.{1,150}km/s..(?<SignRV>\\S).*<SPAN',Type);  
SignRadial_V = regexp(Str,pattern,'names');
if isempty(SignRadial_V)
    CDSdata.RV = ''; % Spectral Type unknwon
else
    if SignRadial_V.SignRV == '-'    % if RV is < 0
        pattern = sprintf('%s.{1,150}km/s..(?<RV>[+-]\\d{1,2}\\.\\d{1,2}).*<SPAN',Type);
        Radial_V = regexp(Str,pattern,'names');
    else
        pattern = sprintf('%s.{1,150}km/s..(?<RV>\\d{1,2}\\.\\d{1,2}).*<SPAN',Type);
        Radial_V = regexp(Str,pattern,'names');
    end        
    CDSdata.RV = Radial_V.RV;
end

disp(CDSdata);

% That's all folks!
Olivier Thizy
Posts: 370
Joined: Sat Sep 24, 2011 10:52 am
Location: in the french Alps...
Contact:

Re: CDS data parsing

Post by Olivier Thizy »

version 1.2 with the following changes:

-high Radial Velocoty in case of Quasars taken into account
-check for some errors (when object is not found or for "sn 2016fnr" which seems unknown to CDS)

Code: Select all

function CDSdata = CDS_Infos(Name)
%
% CDS_Infos.m
%
% Query CDS (Centre Données Strasbourg) for an object and return several
% information from that query parsing...
%
% v1.2 / 20160905 (c) Olivier Thizy, Stéphane Charbonnel
% Based on an idea/routine from Eran O. Ofek:
%    URL : http://weizmann.ac.il/home/eofek/matlab/
%
%

% Name = 'VV_Cep';  % Name from FITS 1D - ISIS; works also with 'VV Cep' for example.

% URL with questions after ?
URL = sprintf('http://simbad.u-strasbg.fr/simbad/sim-basic?Ident=%s&submit=SIMBAD+search',Name);
options = weboptions('Timeout',10);
Str = webread(URL);

% Check if object is found on CDS
if isempty(strfind(Str,'====Sorry, no entry could be found====')) && isempty(strfind(Str,'***Stopped after 10 entries****'))
    CDSdata.Valid = true;
else
    CDSdata.Valid = false;
    return;
end

% RA/Dec coordinates
Type = 'ICRS';
pattern = sprintf('%s.{1,150}TT.{1,5}(?<RAh>\\d{2})\\s(?<RAm>\\d{2})\\s(?<RAs>\\d{2}\\.\\d*)\\s(?<DECd>[+-]\\d{2})\\s(?<DECm>\\d{2})\\s(?<DECs>\\d{2}\\.\\d*).*<A HREF',Type);
Coord = regexp(Str,pattern,'names');
% CDSdata.Coord = Coord;
CDSdata.RA = [Coord.RAh ' ' Coord.RAm ' ' Coord.RAs];
CDSdata.RArad = (str2double(Coord.RAh)+(str2double(Coord.RAm)+str2double(Coord.RAs)/60)/60) * pi()/12;
CDSdata.Dec = [Coord.DECd ' ' Coord.DECm ' ' Coord.DECs];
CDSdata.DecRad = (str2double(Coord.DECd)+(str2double(Coord.DECm)+str2double(Coord.DECs)/60)/60) * pi()/180;


% Spectral type :
Type = 'Spectral type:';
pattern = sprintf('%s.{1,150}TT..(?<SpectralType>\\S{1,20}\\s\\S).*<SPAN',Type);
SpecT = regexp(Str,pattern,'names');
if isempty(SpecT)
    CDSdata.SpectralType = ''; % Spectral Type unknwon
else
    CDSdata.SpectralType = SpecT.SpectralType;
end


% V Magnitude
Type = 'Fluxes';
pattern = sprintf('%s.{1,1200}TT>.V......(?<SigneMagV>\\S).*<SPAN',Type);
SigneMag_V = regexp(Str,pattern,'names');
if SigneMag_V.SigneMagV == '-'
    pattern = sprintf('%s.{1,1200}TT>.V......(?<MagV>[+-]\\d{1,2}\\.\\d{1,2}).*<SPAN',Type);
    Mag_V = regexp(Str,pattern,'names');
else
    pattern = sprintf('%s.{1,1200}TT>.V......(?<MagV>\\d{1,2}\\.\\d{1,2}).*<SPAN',Type);
    Mag_V = regexp(Str,pattern,'names');
end
CDSdata.MagV = Mag_V.MagV;


% Radial Velocity
Type = 'Radial velocity';
pattern = sprintf('%s.{1,150}km/s..(?<SignRV>\\S).*<SPAN',Type);
SignRadial_V = regexp(Str,pattern,'names');
if isempty(SignRadial_V)
    CDSdata.RV = ''; % Spectral Type unknwon
else
    if SignRadial_V.SignRV == '-'    % if RV is < 0
        pattern = sprintf('%s.{1,150}km/s..(?<RV>[+-](\\d{1,10})|[(\\d{1,2}\\.\\d{1,2})).*<SPAN',Type);
        Radial_V = regexp(Str,pattern,'names');
    else
        pattern = sprintf('%s.{1,150}km/s..(?<RV>(\\d{1,10})|[(\\d{1,2}\\.\\d{1,2})).*<SPAN',Type);
        Radial_V = regexp(Str,pattern,'names');
    end
    CDSdata.RV = Radial_V.RV;
end

% disp(CDSdata);

% That's all folks!

Here is a small code for testing, more cases can be added if needed:

Code: Select all

t=CDS_Infos('VV Cep')
t=CDS_Infos('vega')
t=CDS_Infos('hd256128')
t=CDS_Infos('Sco X-1')
t=CDS_Infos('ls5036')
t=CDS_Infos('V1686 Cyg')
t=CDS_Infos('WR 140')
t=CDS_Infos('sn2016fnr')
t=CDS_Infos('sn 2016fnr')
t=CDS_Infos('PG1718+481')
Olivier Thizy
Posts: 370
Joined: Sat Sep 24, 2011 10:52 am
Location: in the french Alps...
Contact:

Re: CDS data parsing

Post by Olivier Thizy »

OK, here is v1.4 which seems to work fine now.

-Added MagB & MagR
-RV corrected (we were only taking interger part in previous option... still learning pattern in regexp function!)

Code: Select all

function CDSdata = CDS_Infos(Name)
%
% CDS_Infos.m
%
% Query CDS (Centre Données Strasbourg) for an object and return several
% information from that query parsing...
%
% v1.0 / 20160904 - first published
% v1.4 / 20160906 - operational, RV bug corrected.
% Based on an idea/routine from Eran O. Ofek:
%    URL : http://weizmann.ac.il/home/eofek/matlab/
%
% Output: CDSdata structur with:
%   Valid: true (1) is object is found; false (0) otherwise
%   RA : Right Ascension in standard text format
%   RARad: RA in radian (double)
%   Dec: Declination in standard text format
%   DecRad: Dec in radian (double)
%   SpectralType: Spectral Type of the object
%   MagB: B magnitude (double)
%   MagV: V magnitude (double)
%   MagR: R magnitude (double)
%   RV: Radial Velocity (double) in km/s
%
% Tested with MatLab201a
% (c) Olivier Thizy, Stéphane Charbonnel
%


% Name = 'VV_Cep';  % Name from FITS 1D - ISIS; works also with 'VV Cep' for example.

% URL with questions after ?
URL = sprintf('http://simbad.u-strasbg.fr/simbad/sim-basic?Ident=%s&submit=SIMBAD+search',Name);
options = weboptions('Timeout',10);
Str = webread(URL);


% Check if object is found on CDS
if isempty(strfind(Str,'====Sorry, no entry could be found====')) && isempty(strfind(Str,'***Stopped after 10 entries****'))
    CDSdata.Valid = true;
else
    CDSdata.Valid = false;
    return;
end


% Basic Data % DOESN'T WORK YET...!!!
%Type = 'id="basic_data"';
%pattern = sprintf('%s.{1,20}<B>.{1,20}(?<BDStdName>.{1,20}<B>.{1,20}--.{1,20}(?<BDType>.{1,200))\\s{1,10}</FONT',Type);
%BasicData = regexp(Str,pattern,'names');
%CDSdata.CommonName = BasicData.BDStdName;
%CDSdata.Type = BasicData.BDType;


% RA/Dec coordinates
Type = 'ICRS';
pattern = sprintf('%s.{1,150}TT.{1,5}(?<RAh>\\d{2})\\s(?<RAm>\\d{2})\\s(?<RAs>\\d{2}\\.\\d*)\\s(?<DECd>[+-]\\d{2})\\s(?<DECm>\\d{2})\\s(?<DECs>\\d{2}\\.\\d*).*<A HREF',Type);
Coord = regexp(Str,pattern,'names');
% CDSdata.Coord = Coord;
CDSdata.RA = [Coord.RAh ' ' Coord.RAm ' ' Coord.RAs];
CDSdata.RARad = (str2double(Coord.RAh)+(str2double(Coord.RAm)+str2double(Coord.RAs)/60)/60) * pi()/12;
CDSdata.Dec = [Coord.DECd ' ' Coord.DECm ' ' Coord.DECs];
CDSdata.DecRad = (str2double(Coord.DECd)+(str2double(Coord.DECm)+str2double(Coord.DECs)/60)/60) * pi()/180;


% Spectral type :
Type = 'Spectral type:';
pattern = sprintf('%s.{1,150}TT..(?<SpectralType>\\S{1,20}\\s\\S).*<SPAN',Type);
SpecT = regexp(Str,pattern,'names');
if isempty(SpecT)
    CDSdata.SpectralType = ''; % Spectral Type unknwon
else
    CDSdata.SpectralType = SpecT.SpectralType;
end


% V Magnitude
Type = 'Fluxes';
pattern = sprintf('%s.{1,1200}TT>.B......(?<SigneMagB>\\S).*<SPAN',Type);
SigneMag_B = regexp(Str,pattern,'names');
if isempty(SigneMag_B)
    CDSdata.MagB = 0; % R mag unknwon (should we use -99 ?)
else
    if SigneMag_B.SigneMagB == '-'
        pattern = sprintf('%s.{1,1200}TT>.B......(?<MagB>[+-]\\d{1,2}\\.\\d{1,2}).*<SPAN',Type);
        Mag_B = regexp(Str,pattern,'names');
    else
        pattern = sprintf('%s.{1,1200}TT>.B......(?<MagB>\\d{1,2}\\.\\d{1,2}).*<SPAN',Type);
        Mag_B = regexp(Str,pattern,'names');
    end
    CDSdata.MagB = str2double(Mag_B.MagB);
end


% V Magnitude
Type = 'Fluxes';
pattern = sprintf('%s.{1,1200}TT>.V......(?<SigneMagV>\\S).*<SPAN',Type);
SigneMag_V = regexp(Str,pattern,'names');
if isempty(SigneMag_V)
    CDSdata.MagV = 0; % R mag unknwon (should we use -99 ?)
else
    if SigneMag_V.SigneMagV == '-'
        pattern = sprintf('%s.{1,1200}TT>.V......(?<MagV>[+-]\\d{1,2}\\.\\d{1,2}).*<SPAN',Type);
        Mag_V = regexp(Str,pattern,'names');
    else
        pattern = sprintf('%s.{1,1200}TT>.V......(?<MagV>\\d{1,2}\\.\\d{1,2}).*<SPAN',Type);
        Mag_V = regexp(Str,pattern,'names');
    end
    CDSdata.MagV = str2double(Mag_V.MagV);
end


% R Magnitude
Type = 'Fluxes';
pattern = sprintf('%s.{1,1600}TT>.R......(?<SigneMagR>\\S).*<SPAN',Type);
SigneMag_R = regexp(Str,pattern,'names');
if isempty(SigneMag_R)
    CDSdata.MagR = 0; % R mag unknwon (should we use -99 ?)
else
    if SigneMag_R.SigneMagR == '-'
        pattern = sprintf('%s.{1,1600}TT>.R......(?<MagR>[+-]\\d{1,2}\\.\\d{1,2}).*<SPAN',Type);
        Mag_R = regexp(Str,pattern,'names');
    else
        pattern = sprintf('%s.{1,1600}TT>.R......(?<MagR>\\d{1,2}\\.\\d{1,2}).*<SPAN',Type);
        Mag_R = regexp(Str,pattern,'names');
    end
    CDSdata.MagR = str2double(Mag_R.MagR);
end


% Radial Velocity
Type = 'Radial velocity';
pattern = sprintf('%s.{1,150}km/s..(?<SignRV>\\S).*<SPAN',Type);
SignRadial_V = regexp(Str,pattern,'names');
if isempty(SignRadial_V)
    CDSdata.RV = 0; % RV unknown (should we use another value?)
else
    if SignRadial_V.SignRV == '-'    % if RV is < 0
        pattern = sprintf('%s.{1,150}km/s..(?<RV>[+-](\\d{1,10}\\.\\d{1,10}|\\d{1,10})).*<SPAN',Type);
        Radial_V = regexp(Str,pattern,'names');
    else
        pattern = sprintf('%s.{1,150}km/s..(?<RV>(\\d{1,10}\\.\\d{1,10}|\\d{1,10})).*<SPAN',Type);
        Radial_V = regexp(Str,pattern,'names');
    end
    CDSdata.RV = str2double(Radial_V.RV);
end

% That's all folks!

And as before a list of target I use for testing:

Code: Select all

t=CDS_Infos('VV Cep')
t=CDS_Infos('vega')
t=CDS_Infos('hd256128')
t=CDS_Infos('Sco X-1')
t=CDS_Infos('ls5036')
t=CDS_Infos('V1686 Cyg')
t=CDS_Infos('WR 140')
t=CDS_Infos('sn2016fnr')
t=CDS_Infos('sn 2016fnr')
t=CDS_Infos('PG1718+481')
t=CDS_Infos('SAO104807')
t=CDS_Infos('SAO112958')
Post Reply