GSOC Work Report

All PR’s Link – https://github.com/tardis-sn/tardis/pulls/vg3095

Worked on following 3 projects –

HDF Data Storage Capabilities

• Implemented a HDFWriterMixin class , which have these functions –

  1. Method to aggregate selected properties of class
  2. Write these selected properties to a HDF file
  3. to_hdf method that combines both of these methods.

• It determines, how these properties will be uniformly stored, such as scalars , 1D or 2D objects.
• Implemented unit tests for the same using Mock setups.
• It can easily be extended and inherited, eliminating the need of implementing HDF functionalities in separate classes.
• Wrote Documentation for the same.

Related PRs

PR#744 – HDFWriter class + Unit Tests
PR#747 – Update Model and Density classes to use HDFWriter + Unit Tests
PR#748 – Update Runner and Spectrum classes to use HDFWriter + Unit Tests
PR#749 – Change name of HDFWriter to HDFWriterMixin
PR#752 – PlasmaWriterMixin + Unit Tests
PR#753 – HDFWriter Documentation
PR#768 – Simulation HDF and deprecated to_hdf cleanup
PR#769 – [DOC] Updated to_hdf notebook

Isotope handling within TARDIS

• Extended current Tardis Configuration System to support parsing of Isotopic Elements.
• Using Pyne library , decayed isotopic elements , and then merge it into normal elemental elements abundance dataframe.
• Wrote a console script to convert CMFGEN files into TARDIS format.
• Unit tests and Documentation for the same.

Related PRs

PR#756 – Isotope Abundances class
PR#757 – Decay and merge isotopic abundance dataframe
PR#762 – Isotope uniform config option
PR#764 – Isotope stratified file support
PR#767 – Docs for Isotope config
PR#771 – Add cmfgen2tardis script
PR#772 – CMFGEN density parser

Testing Framework for multiple reference data

• Worked on adding ability to test and generate reference data for Plasma module of TARDIS.
• It will allow to test Plasma module and easily expand it to add new references.

Related PRs

PR#775 – Transition from atomic-dataset config option to tardis-refdata
PR#779 – Save coverage report when tests use tardis-refdata
PR#774 – Replace Comparison values in Plasma Unit Tests with reference HDF file
PR#781 – Omit coverage of test related files
PR#782 – Plasma write_to_tex/dot unit tests
PR#785 – Update Running tests doc

Others

PR#712 – Colorize Logger

Advertisements

Week 10-11

My last objective for GSOC is to have 100% coverage of Plasma tests. I think , this will wrap up this last week. The idea is to  parametrize all settings as fixtures and then have a ‘config’ fixture which combines these. Pytest would then test every possible combination of these settings.

It is done by parametrizing a config fixture and the argument is a dictionary of ‘overwrites’. That means, we take a basic configuration and then overwrite everything from the dict.

I think only option left to cover is when helium_treatment is numerical_nlte, because it requires some special files, that I don’t have, apart from that I think I have covered all Plasma config options.

 

 

Week 8-9

Second Evaluation is now complete, and my work related to CMFGEN files is also done.

PR#771 is related to converting CMFGEN files into TARDIS format. I wrote a script which uses just 2 arguments, input file and output path. It got merged in decay branch.

PR#772 is related to reading this new format (converted file using the above script), into TARDIS. It supports new quantities such as electron_densities and temperature. It bypasses the calculation of these quantities , as they are present in the file. It is also approved.

Now , next task, it to replace harcoded value in plasma tests, so that when we update plasma test , we don`t have to change it

Week 7-8

So, in these 2 weeks , I was working , to include isotopic abundances within the TARDIS framework. The idea is to maintain 2 types of abundances, one elemental abundance, and other isotopic abundance. Whenever, model.abundance will be called, isotopic abundance dataframe will decay according to model.time_explosion and then it will get merged with elemental abundance dataframe. Decay happens using Pyne module in Python.

Second part was , how we will integrate this to Tardis Configuration System. Tardis currently supports ‘uniform’ and stratified ‘file’ support. For uniform case , I simply extended it to support parsing of isotopes. For providing isotopes by file, we discussed a new CSV based format , and named it ‘tardis_model’. In this format, Header row will contain, elements and isotopes symbol, and below each column, its abundances for each shell.

I also started working on CMFGEN files. We want to convert these files to TARDIS format, so that they can be used with tardis framework.

 

Week 5-6

I passed the first evaluation , and this could not be possible without the help of my mentor @Wolfgang and @yeganer . I have to say , the detailed feedback by my mentor, is very helpful to me moving forward. The code reviews by team members also helps me to get better grasp of how to code in a correct manner, which I think , I was never taught in class, or anywhere.
This week I was working on Isotope decay in TARDIS.
I don`t have an astronomical background,  so it sometimes makes me difficult to comprehend , what should be the next step. But @yeganer and @Wolfgang always come to my rescue. Whether my doubt is small or big, they answer it patiently.
I had trouble using  many Pandas operations related Dataframe. First, I did not have the clarity when to use join(), merge(), concat(), append(), or add(). They were looking similar to me. The objective was to decay isotope abundance MultiIndex dataframe and then merge it into normal abundance dataframe. Isotope Abundance has atomic no and mass no. as its index, and Abundance dataframe has atomic no. as its index.
I made a PR for it, but the code I wrote for merging those was unnecessarily complex. After suggestion by my mentor I changed it, and the final set of Pandas operation was very short and sweet. And, I was wondering , what the hell I was doing for a whole day, when I could not think of this.
My first month work was mainly related to making a HDFWriter class. Here is short tutorial on how it works .

Example Usage of HDFWriter

If properties of a class needs to be saved in a hdf file, then the class should inherit from HDFWriterMixin as demonstrated below.

hdf_properties (list) : Contains names of all the properties that needs to be saved.

hdf_name (str) : Specifies the default name of the group under which the properties will be saved.

In [1]:
from tardis.io.util import HDFWriterMixin

class ExampleClass(HDFWriterMixin):
    hdf_properties = ['property1', 'property2']
    hdf_name = 'mock_setup'
    def __init__(self, property1, property2):
        self.property1 = property1
        self.property2 = property2
        
In [2]:
import numpy as np
import pandas as pd

#Instantiating Object
property1 = np.array([4.0e14, 2, 2e14, 27.5])
property2 = pd.DataFrame({'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
                        'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])})
obj = ExampleClass(property1, property2)

You can now save properties using to_hdf method.

Parameters

file_path : Path where the HDF file will be saved

path : Path inside the HDF store to store the elements

name : Name of the group inside HDF store, under which properties will be saved.

If not specified , then it uses the value specified in hdf_name attribute.

If hdf_name is also not defined , then it converts the Class name into Snake Case, and uses this value.

Like for example , if name is not passed as an argument , and hdf_name is also not defined for ExampleClass above, then , it will save properties under example_class group.

In [3]:
obj.to_hdf(file_path='test.hdf', path='test')
#obj.to_hdf(file_path='test.hdf', path='test', name='hdf')

You can now read hdf file using pd.HDFStore , or pd.read_hdf

In [4]:
#Read HDF file
with pd.HDFStore('test.hdf','r') as data:
    print data
    #print data['/test/mock_setup/property1']
<class 'pandas.io.pytables.HDFStore'>
File path: test.hdf
/test/mock_setup/property1            series       (shape->[4])  
/test/mock_setup/property2            frame        (shape->[4,2])

Saving nested class objects.

Just extend hdf_properties list to include that class object.

In [5]:
class NestedExampleClass(HDFWriterMixin):
    hdf_properties = ['property1', 'nested_object']
    def __init__(self, property1, nested_obj):
        self.property1 = property1
        self.nested_object = nested_obj
In [6]:
obj2 = NestedExampleClass(property1, obj)
In [7]:
obj2.to_hdf(file_path='nested_test.hdf')
In [8]:
#Read HDF file
with pd.HDFStore('nested_test.hdf','r') as data:
    print data
<class 'pandas.io.pytables.HDFStore'>
File path: nested_test.hdf
/nested_example_class/nested_object/property1            series       (shape->[4])  
/nested_example_class/nested_object/property2            frame        (shape->[4,2])
/nested_example_class/property1                          series       (shape->[4])  

Modifed Usage

In BasePlasma class, the way properties of object are collected is different. It does not uses hdf_properties attribute.

That`s why , PlasmaWriterMixin (which extends HDFWriterMixin) changes how the properties of BasePlasma class will be collected, by changing get_properties function.

Here is a quick demonstration, if behaviour of default get_properties function inside HDFWriterMixin needs to be changed, by subclassing it to create a new Mixin.

In [9]:
class ModifiedWriterMixin(HDFWriterMixin):
    def get_properties(self):
        #Change behaviour here, how properties will be collected from Class
        data = {name: getattr(self, name) for name in self.outputs}
        return data

A demo class , using this modified mixin.

In [10]:
class DemoClass(ModifiedWriterMixin):
    outputs = ['property1']
    hdf_name = 'demo'
    def __init__(self, property1):
        self.property1 = property1
In [11]:
obj3 = DemoClass('random_string')
obj3.to_hdf('demo_class.hdf')
with pd.HDFStore('demo_class.hdf','r') as data:
    print data
<class 'pandas.io.pytables.HDFStore'>
File path: demo_class.hdf
/demo/scalars            series       (shape->[1])

Week 4

This week, my two PR`s (PR#744 and PR#747) got merged. One was regarding HDFWriterMixin class , and another was to update Model class to use this HDFWriterMixin , alongside with unit tests.

I also created two new PR`s this week , one regarding PlasmaWriterMixin , which inherits from current HDFWriterMixin, and change how hdf_properties are collected for Plasma class.

Another was related to updation in MonteCarloRunner and Spectrum classes to use current HDFWriterMixin , with unit tests.

I hope , both PR`s will be merged by next week.

Week 3

This week started with splitting my PR into 4 parts , as it was getting big to monitor. Then,  I designed a generic from/to_hdf method under HDFReaderWriter class , so that there is no need to implement them in any class(except classes like BasePlasma, which are to complex) , I also wrote the unit tests for every sub-module (Model, Homologous Density , Runner , Spectrum ) and for HDFReaderWriter class.

On Wednesday, my mentor told me basically what we are doing , can be done through Pickle library . So , he said to remove focus from from_hdf related methods , and focus on making a HDFWriter class. Then I made a PR for this , alongwith its unit tests. I think there is slight issue over the regex I used to convert from Camel Case to snake_case. So , I wrote unit test for that also.

Then @yeganer said , that we can make a HDFReader class to access hdf files using dot notation , like for ex – > model/scalars/time_explosion , can be accessed through model.scalars.time_explosion , I experimented for a day with the function calls in Pandas library related to HDF. I wrote 3 to 4 unnecessary complex versions of it , before writing a less complex and more short code for it.