Cryptosynchrotron

2013/07/29

Scopes refactoring

Usually august is a good time to think on refactorings. During the year normal operation there is never enough time to look back. The sentence "I want all, and I want it now" it's a classical sentence from clients and bosses (many times followed with a "I want it fix, but don't touch any thing").

Also is good because too many managers doesn't care about refactoring. They don't really understand what this means and didn't give to it the real value of this action. For them, often, the maintenance of some code is sort of magic, they don't know how this can be made. Furthermore they don't care, and only think in do it fast an cheap.

In the case of the scope, we are currently using the scope waveforms in a Filling Pattern calculation using the Fast Current Transformer signal. This was the fpFCT project. In this case, we went far away from the original design of the access to the scopes. This original design was already improved long a go, and looks that now it's time to remade it base on the newer requirements.

The current bottleneck is the Visa middlemen. The agent that shows the user the interface of an instrument like an scope doesn't know about protocols. This agent knows the language to talk with the instrument, but not the channel used to have bidirectional communication with it.

From Max IV, there is a proposal to talk directly via socket. This will remove from the scene one of the actors, and for sure will improve.

The second proposal, should be helped by a new feature requested in tango: give the information to the device that someone is listening (or not) over the events on an attribute. This is, avoid to request information to the instrument if noone is listening.

Current idea of the refactoring is to use proposal one, and by commands (until the mentioned feature is available) configure the attributes that are being updated by an isolated thread, optimizing the scope access, in an approach to proposal two.

Update 20130806: Max IV has contributed with their nice idea in a branch.

2013/07/09

Some more numbers from Alba, last 3 months

Some time a go, I wrote about numbers that are around this facility. I have say some summary, in the update at the end of the post, about the operation calendar. But this calendar has been changed by the management (unilaterally) after the problems we've faced after eastern until June.

I didn't post anything about those almost 3 months of unexpected shutdown, because I saw what happens as surrealistic.

After the week shutdown of eastern (April 2nd), it was not possible to start up the machine. Some of the collected data from the latest days shows that in March 27th the flow on certain places of the water cooling system went down due to an unknown reason.

Ok, at the beginning all of us thought that this will be a short problem, easy to be fixed in commissioned and well known system. But it wasn't the case. Day after day, the system was not recovered. Shifts of beam time where cancel. During April very shy, beam time cancel by weeks (but working shifts not). First week many people was purging the water cooling system in shifts. Popularly we start calling Alba as the purgatory (as a joke word combination).

Well, days pass and the origin of the problem was not found. In May the beam time of the month was full cancelled, one complete run. Even that, the work shifts wasn't. The workers on shifts gets the notification of the shift cancel by half a week, on average (but the spread makes people notified that the night shift is cancelled when they arrive to the facility).

Finally, past June was found a working point of the cooling system (meta)stable. Many things has been changed and from the information given to all the workers, it looks that is not fully understood what had happen, neither know what or which of the things made has make the system running.

But this was the end of the water cooling system issue, but the beginning of some other collateral problems. One of the things is that some dirt was found in the water circuit. As far as I know previously, this was a circuit of deionized pure water, but looks not compatible with what was found in there.

Another problem faced was a poor materials quality in some hoses. During one of the affected weekends, it has been explained to the workers, one PLC that manages the cooling system was weirdly hanged (no new values measured, but the last read was send with a new time-stamp as if it was really read). Because many equipments where in test, specially magnets, they heat up the water temperature and some hoses vulcanizes. Water leaks started, the circuit gets half empty, pump cavitates, and on Monday morning the problem was faced as burned pumps.

After 2 months trying to realise what was happening, this last was slap. Undermining the moral. But wasn't the latest issue. Even if you cannot believe, a TLD placed in a insertion device (ID) with standard stick to measure the dose in some encoders, fells on the cavity of the ID and when the gap was close it dent the cavity: enough to hit the beam, having to change the section.

By today, the synchrotron is up again and giving beam time for experiments. But the new calendar has magically appeared:

What has special this calendar? Well it sets the unexpected shutdown as "warm up" not as "off" as a shutdown is. Well, yes, it can be think like that. Many people has work very very hard this time, and no one of us was something like "off". (I'm not saying that "off" is no work because many maintenance tasks waits from those "off" periods).

One of (the many) issues of this calendar is that it is trying to recover the lost time telling the workers as "we have to complain with this unexpected problem". But is forgetting the efforts already made, and is asking renewed efforts foreseeing the problem as unpredictable. It wasn't an accident, this was a lack of knowledge! We still don't know what causes the problem, neither what fixes it (or looks like it fixes). There is no proves that this will not happen again.

But how the calendar has changed?

The expected 4992h of machine up time (57% of the year) it reschedules to 4592h (52%, converting the 2.5 months of shutdown in only a 5% of the year time). From the experimentation point of view, the beam time was scheduled to be available for 3600h (41.10%) and now would be 3312h (37.80%) only 3.30% of reduction.

But what has been lost in this unexpected shutdown is 1008h of the scheduled beam time (1176h of the machine up time).

It we lost the 11.50% of the beam time, how can be possible that at the end of the year this drops down to only a 3.30%? Easy, stilling the rest time. Now some of the "off" days and what was originally scheduled as "warm up" is now beam or machine time.

It's easy to realize that the 6 extra weekends that now are planned to work in shifts will not be worked by the rescheduled calendar...

Even all of those things, this morning, planned to have beam for users, the "solved" problem peep again:

No beam...

Universal (Hexa)Decimal Classification

Since I have started with research over elliptic curves, many papers, book and other similar stuff has been read by me to take the knowledge. But there is an issue when the number of this things get too big. How can I organize the information of those papers (or books and so) in order to get back to this information and refresh the memory contents?

Very long a go, I did a course on bibliographic search and catalogue. In this course I have learned the way that the libraries in general organizes the books in there: the UDC (Universal Decimal Classification). I've looking how can I reuse this to classify the documents I think relevant for my research.

I have set up my own main classes and also I have tried to let at least one of them vacant for further updates, that mean to extend the fields of research. Event that pretension, by today, it's full again. This means that probably I'll have to do my second refactoring of this structure, because in the past I had once already this problem.

There is another way to modify my classification to have more vacants, that is think in another base. As far as I understood, the UDC was restricted from 0 to 9 to have like only one character field. Then why not a computer scientist can think in a Universal Hexadecimal Classification (UHC)?

For the records, my current main classes and their categories:

0.Mathematics

010.Number Theory
020.Abstract Algebra
030.Information Theory
040.Logic
050.Elliptic curves

1.Public Key

101.Finite fields cryptography
102.Elliptic curves cryptography
103.Hyperelliptic curves cryptography
110.Cryptoanalysis
111.Finite fields cryptoanalysis
112.Elliptic curves cryptoanalysis
113.Hyperelliptic curves cryptoanalysis
120.Isogenies
121.Isogeny volcanoes and stars
122.Isogenies cryptography

2.Symmetric cryptology

210.Rijndael

3.Hash functions

310.SHA
320.Elliptic and hyperelliptic curves

4.Stream cyphers
5.Secret sharing
6.Homomorphic encryption
7. vacant
8. vacant
9. vacant
A. vacant
B. vacant
C. vacant
D.Computation

D10. Distributed systems
D20. Parallel computing

E.Hardware

E10.Embedded systems
E20.Smartcards
E30.Rfid

F.Standards

F01.Internationals
F10.European
F20.North American
F30.Russian

Because to the uncategorised classes, I think this is full and it's the first place to start merging classes and dividing them in categories. The solution of the hexadecimal, even if I have already applied, is not fixed for ever. May some day, certain categories would be subcategories. Currently this classification contains 409 documents (papers, books, and so on). My bibtex files aren't and they should.

What this system is not complaining is how to archive the notes that I wrote in the margins on the papers I read and place in here or the marks I set labelling pages on books, and so on. But this is an issue that this tool is not useful for.

How to have a search tool on this structure and inside those documents, including the posted comments?

(more) frustraited @ work

Alba is broken like since eastern. When we come back from that shutdown at the end of March. Water cooling problems, the water flow was unstable; sponges were found in the "clean water" cooling pipes, together with other dust; a weekend instrument hang causes a temperature increase and bad quality of hoses become vulcanized (below the specs) and caused a disaster in one of the two main power supplies of the booster; a TLD wrong placed in a weird space and when an Insertion Device was moved, it bends the beam pipe blocking the beam orbit, requiring to vent and replace a section. It's like we stomp shit.

This is one of the issues in this accelerator complex, but there are others that may interact.

Panic reaches the management at the beginning of this years when 12 people announces their new positions in other institutes. If you take into account that we are 145 workers, you can realizes the magnitude of the problem. From the people that says goodbye, 8 were from the computing division (around 50) and from those 4 were from the controls section (of 14). This panic didn't mention the 1 or 2 per month that has announced that the left during the past two years.

Management has announced 3 months later the creation of a commission from work behaviour improvements. Now, in July, this commission has made the first meeting by last week.

As I mention in the previous post, another reason of personal sadness: my bosses didn't case about a worker that has been called to do a talk in the university.

I didn't write here since then. I didn't found any reason to write about software design here.

But last reason to get one step further in frustration went last week.

Starting 2 months a go, in the tango meeting, there was a request to the tango community to introduce security embedded inside the tango implementation. From the last 4 years I've been trying to start a PhD in cryptology (but too busy due to the work at Alba), and I've been getting closer and closer to the field of the RFIDs and the smartcards. During the presentation where this was requested, I thought that many of the schemas to ensure RFID can be also valid to be applied in between the communication of the agents in a distributed system, like tango.

I thought it can be interesting for Alba and the tango community, to exploit that one of the workers has a hobby in cryptology and security. In this terms I've talked to my boss.

For that, I need 3 weeks to talk to my boss (this is already one problem). During this time, I've been able to, out side working hours, explain this idea to the research group in the university (they are in another city, 150 km away). I found a great acceptance about "Ensuring Tango control system", even more I've seen enthusiasm about the idea of a PhD based on an application of all the cryptological work made by the whole group (many things about public key -specially elliptic curves- symmetric cyphers, stream ciphers, secret sharing, homomorphic encryption and field like that).

At work, the response wasn't that good. The answer was like: "do what ever you like in your free time, but this must not affect your current duties at all. It's not possible to dedicate any of your working hours in such a thing". Clearly have said, if I do a PhD, is not meaning anything for Alba at all. Ooh!! This is a very clear way to motivate a worker. For my partners, also fun when was said "if you do, others would ask to do this", what's the problem on workers training?

Went I talked to my boss, I said many time to him and have to point here, that my duties have been increase every time that a partner left. Last Friday, there was a presentation about the Alba's controls system, and when the subsystems of the machine was listed, half of the elements on that list are on my behalf (and not all my duties were there).

When I went to talk about this with my boss, I went there offering my free time to work in a industrial PhD. My proposal was not to stop doing my job and disappear for full time dedication to this, my proposal was mostly the free time, but being realistic that the brain thinks at any time. As a collaboration between the university and the industry, all the involved has to put something, specially the PhD candidate.

Well done, with this deal, the simpler solution is that Alba will not appear at all in my PhD. Or if it appears will be to be mention explicitly that they haven't contribute at all; even worst Alba's position was opposition to doing a work like that.

2013/04/30

untitled

Yesterday I gave a talk in the college where I studied the Bachelor and the Master degrees and where I am now starting the PhD. What stuck me most is the lack of interest from my bosses in this informative talk. After tell them about the invitation received from the university the answer wasn't further that "oh, that's nice".

But what ever. I take this opportunity to present to the students how an engineer can work in a place that often is not in the main path of jobs that one have.

Starting with how I got this job, and explaining that 6 months before start working here I didn't know what a synchrotron was, I have explained the heterogeneous different jobs I had in the past. I've put the accent in the English, because it has been told to me that many students didn't evaluate as necessary to have a good English level. Well, is not a thing to have a absolutely very nice level, but starting with a communicative sufficient level (and with practice the rest will come).

I have explained to them, with some youtube videos, has is not a synchrotron. Why start the other way around? The idea behind this was more a way to list other kind of scientific facilities where an engineer have job opportunities and sometimes are not in the main target because is not known.

Two basic ideas can summarize my talk: internationalization and search for jobs in more unexpected fields. And an important thing in the job search it to apply to those offers. Without this step it's impossible to get it!

2013/02/27

Some numbers from Alba

Yesterday, I've placed some numbers in the post and I think there are other important numbers on this facility.

From the operation calendar, some numbers can be extracted. The synchrotron will be running 5184h (60% of the year time).

Well this is more the number of hours that can be running, because the 27% of this time (1416h, 16% of the year) are machine studies and accelerator test. This means that during this time is not necessary that we have beam in the storage.

For the users, it's plan that we will deliver 3768 beam hour. This represent the 72% of the available running time (or the 43% of the year time).

The rest of the year is dedicated to maintenance and improvements (984h, 11% of the total) and warm-up (2432h, 27.7%). There are special time slots dedicates to the Personal Safety System (64h, 0.7%) and the CSN (spanish acronym of the Nuclear Security Council).

This give a total amount of 8760h (365 days).

More numbers?

The storage ring has a perimeter of about 270m, designed to have an energy of 3GeV and the accumulated current at 400mA. In this circumference there are:

32 bending magnets that generates a magnetic field up to 1.42T each and a magnetic gradient (G) of 5.5T/m.
112 quadrupoles
120 sextupoles with B=1.12T

Those sextupoles have additional coils to apply corrector dipole field.

The booster ring has a perimeter of about 250m, and is the propeller phase that accelerates the electron beam from 100MeV to the 3GeV that has to be given to the storage ring. In this circumference there are:

40 bending magnets with a B=0.89 T, and G=2.2T/m
60 quadrupoles
16 sextupoles
72 correctors

In total we have more than 700 magnets:

More numbers? There are many more fields where numbers can be written here. This may starts a series of articles.

Update 20130228: Summary:

Off	984h	11%
Warm-up	2432h	27.3%
Beamtime	5184h	60%	Machine	1416h	27% (16% of the year)
			Beamlines	3768h	72% (43% of the year)
CSN	96h	1%
PSS	64h	0.7%
Total	8760h

2013/02/26

Why some of our cameras die

From the 54 cameras that we have in side the tunnel, there is one location that is registering more failures than the other positions. Years a go, to find the cameras on their places (and the power source for all of them) I did by hand a map:

You have to have on mind that, following the current information in our cabling data base, we have 18767 cables (with a total length of 168km, on average 11.33m), with 6683 equipments of 728 different types, located in 374 racks.

But back to the reason of this post, there is one ccd camera location, where the hardware dies too often. The technical service from the manufacturer, Basler, have answered to us, something like "what the hell did you did to those cameras!?". They didn't saw this level of damages before.

It is on the top of the map image, but a detail will de useful:

The circles indicates locations of cameras. In red the "sr16/di/fs-01" that to often dies, in blue other cameras. The one near is the "bt/di/fsotr-03" who is not failing (at least not by now). A picture of the place is:

On this picture, the camera is the black box (with white letters saying Scout) on top with a grey network cable on the top.

It's a very important camera, due to the long list of uses it has, specially because when an injected beam to the storage ring is seen here means that one turn is completed (no obstacles in the path).

But why it dies? We know that we are placing this equipments in the tunnel and they can suffer mal functions due to the radiation, but why this is the only place where this cameras die? It looks that we found the reason.

The first hypothesis was: synchrotron light from the transferline. This camera is almost aligned with the second bending magnet from the booster transferline. But is not convincing us.

Basically this camera is 40cm up to the beam pipe. Like, in radians from the aperture, the blue circle at the exit of this bending, where a synchrotron radiation is placed and is not failing.

A secondary hypothesis is neutron scattering. But this is neither easy because the last camera of the booster transferline is also there. Its blue dot is near. The only difference is that the one in the transferline is down the beam pipe and perhaps those particles could have more obstacles in the middle. Maybe yes, maybe not.

Lest look on the storage injection in more detail. Remember the drawing with the coloured circles, there where two boxes besides the red dot, after a bigger box and followed by two more boxes from the first size. The four with the same size are kicker magnets, and the bigger one represents the septum magnet.

Those are pulsed magnets that when there is no injection they don't act, but when there is a beam injection the kickers divert the beam trajectory in four squares and the septum is placing the incoming (blue) and the stored (red) beams very very close:

A very schematic view of what is happening in the septum can be (yes, it can be draw by a kid):

The green line in between both beams is a metallic shielding to isolate the magnetic field of the septum for the booster incoming beam (blue), that the stored beam (red) should not see this field.

The current hypothesis now is that we maybe heating this shielding, scattering back "something" (probably neutrons) and they find the camera in the path. The last camera of the booster transferline is not in the same plane than this shielding.